Computer Vision
CS-E4850, 5 study credits
Lecturer: Juho Kannala
Lecture 5: Image alignment and
object instance recognition
• Given two images of a planar scene (or from a rotating camera),
find the parameters of a global geometric transformation that
accounts for most true point correspondences between the images
• Given a query image and a database of object images, detect
whether the objects are visible in the query image
• Reading:
– Szeliski’s book, Sections 8.1.1 - 8.1.4 in 2nd edition
– Chapter 4 in Hartley & Zisserman
– Lowe’s SIFT paper: http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf
Acknowledgement: many slides from Svetlana Lazebnik, Derek Hoiem, Kristen
Grauman, and others (detailed credits on individual slides)
Source: S. Lazebnik
Image alignment
Image from http://graphics.cs.cmu.edu/courses/15-463/2010_fall/
Source: S. Lazebnik
Alignment applications
• A look into the past
Source: S. Lazebnik
Alignment applications
Panorama stitching
Source: S. Lazebnik
Alignment applications
Recognition
of object
instances
Source: S. Lazebnik
Alignment
Alignment: find parameters of model that maps one set of
points to another
Typically want to solve for a global transformation that
accounts for most true correspondences
Difficulties
• Noise (typically 1-3 pixels)
• Outliers (often 30-50%)
• Many-to-one matches or multiple objects
Source: S. Lazebnik
Alignment challenges
Small degree of overlap
Intensity changes
Occlusion,
clutter
Source: S. Lazebnik
Feature-based alignment
• Search sets of feature matches that agree in terms of:
a) Local appearance
b) Geometric configuration
?
Source: S. Lazebnik
Feature-based alignment: Overview
• Alignment as fitting
• Affine transformations
• Homographies
• Robust alignment
• Descriptor-based feature matching
• RANSAC
Source: S. Lazebnik
Alignment as fitting
• Previous lectures: fitting a model to features in one image
M
Find model M that minimizes
xi
∑ residual( x , M )
i
i
Source: S. Lazebnik
Alignment as fitting
• Previous lectures: fitting a model to features in one image
M
Find model M that minimizes
xi
∑ residual( x , M )
i
i
• Alignment: fitting a model to a transformation between
pairs of features (matches) in two images
xi
x'i Find transformation T
T that minimizes
∑ residual(T ( x ), xʹ)
i
i i
Source: S. Lazebnik
2D transformation models
• Similarity
(translation,
scale, rotation)
• Affine
• Projective
(homography)
Source: S. Lazebnik
Let’s start with affine transformations
• Simple fitting procedure (linear least squares)
• Approximates viewpoint changes for roughly planar
objects and roughly orthographic cameras
• Can be used to initialize fitting for more complex
models
Source: S. Lazebnik
Fitting an affine transformation
• Assume we know the correspondences, how do we
get the transformation?
( xi , yi )
( xiʹ, yiʹ)
xʹi = Mx i + t
⎡ xiʹ ⎤ ⎡ m1 m2 ⎤ ⎡ xi ⎤ ⎡ t1 ⎤
⎢ yʹ ⎥ = ⎢m ⎥ ⎢ ⎥ +⎢ ⎥ Want to find M, t to minimize
⎣ i⎦ ⎣ 3 m4 ⎦ ⎣ yi ⎦ ⎣t2 ⎦ n
2
∑ || xi − Mxi − t ||
i =1
ʹ
Source: S. Lazebnik
Fitting an affine transformation
• Assume we know the correspondences, how do we
get the transformation?
( xi , yi )
( xiʹ, yiʹ)
⎡ m1 ⎤
⎢ ⎥
⎡ ! ⎤ ⎢m2 ⎥ ⎡!⎤
⎢x 1 0⎥⎥ ⎢ m3 ⎥ ⎢⎢ xiʹ ⎥⎥
⎡ xiʹ ⎤ ⎡ m1 m2 ⎤ ⎡ xi ⎤ ⎡ t1 ⎤ ⎢ i yi 0 0
⎢ yʹ ⎥ = ⎢m ⎥ ⎢ ⎥ +⎢ ⎥ ⎢ ⎥=
⎣ i⎦ ⎣ 3 m4 ⎦ ⎣ yi ⎦ ⎣t2 ⎦ ⎢0 0 xi yi 0 1⎥ ⎢m4 ⎥ ⎢ yiʹ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ ! ⎦ t1 ⎣!⎦
⎢ ⎥
⎢⎣ t 2 ⎥⎦
Source: S. Lazebnik
Fitting an affine transformation
⎡ m1 ⎤
⎢ ⎥
⎡ ! ⎤ ⎢m2 ⎥ ⎡!⎤
⎢x yi 0 0 1 0⎥⎥ ⎢ m3 ⎥ ⎢⎢ xiʹ ⎥⎥
⎢ i ⎢ ⎥=
⎢0 0 xi yi 0 1⎥ ⎢m4 ⎥ ⎢ yiʹ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ ! ⎦ t1 ⎣!⎦
⎢ ⎥
⎢⎣ t 2 ⎥⎦
• Linear system with six unknowns
• Each match gives us two linearly independent
equations: need at least three to solve for the
transformation parameters
Source: S. Lazebnik
Fitting a plane projective transformation
• Homography: plane projective transformation
(transformation taking a quad to another arbitrary
quad)
Source: S. Lazebnik
Homography
• The transformation between two views of a planar
surface
• The transformation between images from two
cameras that share the same center
Application: Panorama stitching
Source: Hartley & Zisserman
Source: S. Lazebnik
Fitting a homography
• Recall: homogeneous coordinates
Converting to homogeneous Converting from homogeneous
image coordinates image coordinates
Source: S. Lazebnik
Fitting a homography
• Recall: homogeneous coordinates
Converting to homogeneous Converting from homogeneous
image coordinates image coordinates
• Equation for homography:
⎡ xʹ ⎤ ⎡ h11 h12 h13 ⎤ ⎡ x ⎤
⎢ ⎥ ⎢
λ ⎢ y ⎥ = ⎢h21 h22
ʹ ⎥ ⎢
h23 ⎥ ⎢ y ⎥ ⎥
⎢⎣ 1 ⎥⎦ ⎢⎣h31 h32 h33 ⎥⎦ ⎢⎣ 1 ⎥⎦
Source: S. Lazebnik
Fitting a homography
• Equation for homography:
⎡ xiʹ ⎤ ⎡ h11 h12 h13 ⎤ ⎡ xi ⎤ λ xʹi = H xi
λ ⎢⎢ yiʹ ⎥⎥ = ⎢⎢h21 h22 h23 ⎥⎥ ⎢⎢ yi ⎥⎥
⎢⎣ 1 ⎥⎦ ⎢⎣h31 h32 h33 ⎥⎦ ⎢⎣ 1 ⎥⎦
xʹi × H xi = 0
T T T
x
⎡ i⎤ ʹ ⎡h x
1 i
⎤ ⎡ y ʹ h
i 3 i x − h 2 xi
⎤
⎢ yʹ ⎥ × ⎢hT x ⎥ = ⎢ hT x − xʹ hT x ⎥
⎢ i⎥ ⎢ 2 i⎥ ⎢ 1 i i 3 i ⎥
⎢⎣ 1 ⎥⎦ ⎢⎣hT3 xi ⎥⎦ ⎢⎣ xiʹ hT2 xi − yiʹ h1T xi ⎥⎦
⎡ 0T − xTi yiʹ xTi ⎤⎛ h1 ⎞
⎢ T T⎥
⎜ ⎟ 3 equations,
⎢ xi 0T − xiʹ xi ⎥⎜ h 2 ⎟ = 0 only 2 linearly
⎢− yiʹ xTi
⎣ xiʹ xTi 0T ⎥⎦⎜⎝ h 3 ⎟⎠ independent
Source: S. Lazebnik
Fitting a homography: DLT algorithm
⎡0T x1T − y1ʹ x1T ⎤
⎢ T T T ⎥
⎢x1 0 − x1ʹ x1 ⎥⎛ h1 ⎞
⎜ ⎟
⎢! ! ! ⎥⎜ h 2 ⎟ = 0 Ah = 0
⎢ T T ⎥⎜
⎢0 x n − ynʹ x n ⎥⎝ h 3 ⎟⎠
T
⎢ xT T
0 − xn x n ⎦
ʹ T⎥
⎣ n
• H has 8 degrees of freedom (9 parameters, but scale is
arbitrary)
• One match gives us two linearly independent equations
• Homogeneous least squares: find h minimizing ||Ah||2
• Eigenvector of ATA corresponding to smallest eigenvalue
• Four matches needed for a minimal solution
• For more info, see Sec. 4.1 in (Hartley & Zisserman)
Source: S. Lazebnik
Robust feature-based alignment
• So far, we’ve assumed that we are given a set of
“ground-truth” correspondences between the two
images we want to align
• What if we don’t know the correspondences?
( xi , yi )
( xiʹ, yiʹ)
Source: S. Lazebnik
Robust feature-based alignment
• So far, we’ve assumed that we are given a set of
“ground-truth” correspondences between the two
images we want to align
• What if we don’t know the correspondences?
?
Source: S. Lazebnik
Robust feature-based alignment
Source: S. Lazebnik
Robust feature-based alignment
• Extract features
Source: S. Lazebnik
Robust feature-based alignment
• Extract features
• Compute putative matches
Source: S. Lazebnik
Robust feature-based alignment
• Extract features
• Compute putative matches
• Loop:
• Hypothesize transformation T
Source: S. Lazebnik
Robust feature-based alignment
• Extract features
• Compute putative matches
• Loop:
• Hypothesize transformation T
• Verify transformation (search for other matches consistent
with T)
Source: S. Lazebnik
Robust feature-based alignment
• Extract features
• Compute putative matches
• Loop:
• Hypothesize transformation T
• Verify transformation (search for other matches consistent
with T)
Source: S. Lazebnik
Generating putative correspondences
?
Source: S. Lazebnik
Generating putative correspondences
?
() = ()
feature feature
descriptor descriptor
• Need to compare feature descriptors of local patches
surrounding interest points
Source: S. Lazebnik
Feature descriptors
• Recall: feature detection and description
Source: S. Lazebnik
Feature descriptors
• Simplest descriptor: vector of raw intensity values
• How to compare two such vectors?
• Sum of squared differences (SSD)
2
SSD(u, v ) = ∑ (ui − vi )
i
– Not invariant to intensity change
• Normalized correlation
(u − u ) ( v − v ) ∑ i
(ui − u )(vi − v )
ρ (u, v) = ⋅ =
|| u − u || || v − v || ⎛ ⎞⎛ ⎞
2 2
⎜ ∑ (u j − u ) ⎟⎜ ∑ (v j − v ) ⎟
⎜ ⎟⎜ ⎟
⎝ j ⎠⎝ j ⎠
– Invariant to affine intensity change
Source: S. Lazebnik
Disadvantage of intensity vectors as descriptors
• Small deformations can affect the matching
score a lot
Source: S. Lazebnik
Feature descriptors: SIFT
• Descriptor computation:
• Divide patch into 4x4 sub-patches
• Compute histogram of gradient orientations (8 reference
angles) inside each sub-patch
• Resulting descriptor: 4x4x8 = 128 dimensions
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60
(2), pp. 91-110, 2004.
Source: S. Lazebnik
Feature descriptors: SIFT
• Descriptor computation:
• Divide patch into 4x4 sub-patches
• Compute histogram of gradient orientations (8 reference
angles) inside each sub-patch
• Resulting descriptor: 4x4x8 = 128 dimensions
• Advantage over raw vectors of pixel values
• Gradients less sensitive to illumination change
• Pooling of gradients over the sub-patches achieves
robustness to small shifts, but still preserves some spatial
information
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60
(2), pp. 91-110, 2004.
Source: S. Lazebnik
Feature matching
• Generating putative matches: for each patch in one
image, find a short list of patches in the other image
that could match it based solely on appearance
?
Source: S. Lazebnik
Problem: Ambiguous putative matches
Source: Y. Furukawa
Source: S. Lazebnik
Rejection of unreliable matches
• How can we tell which putative matches are more reliable?
• Heuristic: compare distance of nearest neighbor to that of
second nearest neighbor
• Ratio of closest distance to second-closest distance will be high
for features that are not distinctive
Threshold of 0.8
provides good
separation
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60
(2), pp. 91-110, 2004.
Source: S. Lazebnik
RANSAC
• The set of putative matches contains a very high
percentage of outliers
RANSAC loop:
1. Randomly select a seed group of matches
2. Compute transformation from seed group
3. Find inliers to this transformation
4. If the number of inliers is sufficiently large, re-compute
least-squares estimate of transformation on all of the
inliers
Keep the transformation with the largest number of inliers
Source: S. Lazebnik
RANSAC example: Translation
Putative matches
Source: S. Lazebnik
RANSAC example: Translation
Select one match, count inliers
Source: S. Lazebnik
RANSAC example: Translation
Select translation with the most inliers
Object Instance Recognition
Slide credit: David Lowe
Source: D. Hoeim
Object Instance Recognition
A1 B3
1. Match keypoints to object
A3
model A2
Matched B2
B1
2. Solve for affine keypoints
transformation parameters
3. Score by inliers and choose Affine
Parameters
solutions with score above
threshold
# Inliers
Choose hypothesis with max
score above threshold
Source: D. Hoeim
Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1 B3
2. Define a region
around each
A2 A3 keypoint
B2
3. Extract and
B1
normalize the
region content
fA fB
4. Compute a local
N pixels
descriptor from the
e.g. color e.g. color
normalized region
N pixels d( f A, fB ) < T
5. Match local
K. Grauman, B. Leibe descriptors
Source: D. Hoeim
Finding the objects (overview)
Input
Image Stored
Image
1. Match interest points from input image to database image
2. Matched points vote for rough position/orientation/scale of
object
3. Find position/orientation/scales that have at least three
votes
4. Compute affine registration and matches using iterative
least squares with outlier check
5. Report object if there are at least T matched points
Source: D. Hoeim
Matching Keypoints
Want to match keypoints between:
1. Query image
2. Stored image containing the object
Given descriptor x0, find two nearest neighbors x1, x2 with
distances d1, d2
x1 matches x0 if d1/d2 < 0.8
• This gets rid of 90% false matches, 5% of true matches in
Lowe’s study
Source: D. Hoeim
Affine Object Model
Accounts for 3D rotation of a surface under orthographic
projection
Source: D. Hoeim
Affine Object Model
⎡ x⎤
⎡ x ʹ⎤ ⎡ a b c ⎤⎢ ⎥
⎢ y ʹ⎥ = ⎢d e ⎥ y
⎣ ⎦ ⎣ f⎦ ⎥⎢
⎢⎣ 1 ⎥⎦
⎡a ⎤
⎡ x1 y1 1 0 0 0⎤ ⎢ b ⎥ ⎡ x1ʹ ⎤
⎢ ⎥
⎢0 0 0 x1 y1 1 ⎢ c ⎥ ⎢ y1ʹ ⎥
⎥
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎢ x2 y2 1 0 0 0⎥ ⎢ d ⎥ ⎢ x2ʹ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣ .! ⎦ e ⎣!⎦
⎢ ⎥
⎣f⎦
Source: D. Hoeim
Finding the objects (in detail)
1. Match interest points from input image to database image
2. Get location/scale/orientation using Hough voting
• In training, each point has known position/scale/orientation wrt
whole object
• Matched points vote for the position, scale, and orientation of the
entire object
• Bins for x, y, scale, orientation
– Wide bins (0.25 object length in position, 2x scale, 30 degrees
orientation)
– Vote for two closest bin centers in each direction (16 votes
total)
3. Geometric verification
• For each bin with at least 3 keypoints
• Iterate between least squares fit and checking for inliers and
outliers
4. Report object if > T inliers (T is typically 3, can be computed to match
some probabilistic threshold)
Examples of recognized objects
Slide credit: David Lowe
Location Recognition
Training
[Lowe04]
Slide credit: David Lowe
Key concepts
Alignment as robust fitting
• Affine transformations
• Homographies
• Descriptor-based feature
matching
• RANSAC
Object instance recognition
• Find keypoints, compute
descriptors
• Match descriptors
• Vote for / fit affine parameters
• Return object if # inliers > T
Adapted from D. Hoeim