Local features: main components
1) Detection: Identify the
interest points
2) Description: Extract vector
feature descriptor surrounding x1 [ x1(1) , , xd(1) ]
each interest point.
3) Matching: Determine
correspondence between x2 [ x1( 2) , , xd( 2) ]
descriptors in two views
Image transformations
• Geometric
Rotation
Scale
• Photometric
Intensity change
Invariance and equivariance
• We want corner locations to be invariant to photometric transformations and
equivariant to geometric transformations
– Invariance: image is transformed and corner locations do not change
– Equivariance: if we have two transformed versions of the same image,
features should be detected in corresponding locations
– (Sometimes “invariant” and “equivariant” are both referred to as “invariant”)
– (Sometimes “equivariant” is called “covariant”)
Harris detector: Invariance properties
-- Image translation
• Derivatives and window function are equivariant
Corner location is equivariant w.r.t. translation
Harris detector: Invariance properties
-- Image rotation
Second moment ellipse rotates but its shape (i.e.
eigenvalues) remains the same
Corner location is equivariant w.r.t. image rotation
Harris detector: Invariance properties –
Affine intensity change
IaI+b
• Only derivatives are used =>
invariance to intensity shift I I + b
• Intensity scaling: I a I
R R
threshold
x (image coordinate) x (image coordinate)
Partially invariant to affine intensity change
Harris Detector: Invariance Properties
• Scaling
Corner
All points will be
classified as edges
Neither invariant nor equivariant to scaling
Scale invariant detection
Suppose you’re looking for corners
Key idea: find scale that gives local maximum of f
– in both position and scale
– One definition of f: the Harris operator
Lindeberg et al, et
Lindeberg 1996
al., 1996
Slide
Slidefrom
fromTinne
TinneTuytelaars
Tuytelaars
Gaussian pyramid
Image by cmglee, CC BY-SA 3.0
Implementation
• Instead of computing f for larger and larger
windows, we can implement using a fixed
window size with a Gaussian pyramid
(sometimes need to create in-
between levels, e.g. a ¾-size image)
Another common definition of f
• The Laplacian of Gaussian (LoG)
g g
2 2 (very similar to a Difference of Gaussians (DoG) –
g 2 2
2
i.e. a Gaussian minus a slightly smaller Gaussian)
x y
Laplacian of Gaussian
• “Blob” detector minima
=
*
maximum
• Find maxima and minima of LoG operator in
space and scale
Scale selection
• At what scale does the Laplacian achieve a
maximum response for a binary circle of
radius r?
image Laplacian
Characteristic scale
• We define the characteristic scale as the scale
that produces peak of Laplacian response
characteristic scale
T. Lindeberg (1998). "Feature detection with automatic scale selection."
International Journal of Computer Vision 30 (2): pp 77--116.
Find local maxima in 3D position-scale space
5
4
Lxx ( ) Lyy ( ) 3
2
List of
(x, y, s)
K. Grauman, B. Leibe
Note: The LoG and DoG operators
covariant
are both rotation equivariant
Local features: main components
1) Detection: Identify the
interest points
2) Description: Extract vector
feature descriptor surrounding
x1 [ x1
(1)
, , xd
(1)
]
each interest point.
3) Matching: Determine x2 [ x1( 2) , , xd( 2) ]
correspondence between
descriptors in two views
Kristen Grauman
Feature descriptors
We know how to detect good points
Next question: How to match them?
?
Answer: Come up with a descriptor for each point,
find similar descriptors between the two images
Feature descriptors
We know how to detect good points
Next question: How to match them?
?
Lots of possibilities
– Simple option: match square windows around the point
– State of the art approach: SIFT
• David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/
Invariance vs. discriminability
• Invariance:
– Descriptor shouldn’t change even if image is
transformed
• Discriminability:
– Descriptor should be highly unique for each point
Rotation invariance for
feature descriptors
• Find dominant orientation of the image patch
– E.g., given by xmax, the eigenvector of H corresponding to max (the
larger eigenvalue)
– Or simply the orientation of the (smoothed) gradient
– Rotate the patch according to this angle
Figure by Matthew Brown
Multiscale Oriented PatcheS descriptor
Take 40x40 square window
around detected feature
– Scale to 1/5 size (using 8 pixels
prefiltering)
– Rotate to horizontal
– Sample 8x8 square window
centered at feature
– Intensity normalize the
window by subtracting the
mean, dividing by the
standard deviation in the
CSE 576: Computer Vision
window
Adapted from slide by Matthew Brown
Detections at multiple scales
Scale Invariant Feature Transform
Scale Invariant Feature Transform
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2
angle histogram
Adapted from slide by David Lowe
SIFT descriptor
Full version
• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by David Lowe
Properties of SIFT
Extraordinarily robust matching technique
– Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation
– Can handle significant changes in illumination
• Sometimes even day vs. night (below)
– Fast and efficient—can run in real time
– Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
Feature matching
Given a feature in I1, how to find the best match
in I2?
1. Define distance function that compares two
descriptors
2. Test all the features in I2, find the one with min
distance
Feature distance
How to define the difference between two features f1, f2?
– Simple approach: L2 distance, ||f1 - f2 ||
– can give small distances for ambiguous (incorrect) matches
f1 f2
I1 I2
Feature distance
How to define the difference between two features f1, f2?
• Better approach: ratio distance = ||f1 - f2 || / || f1 - f2’ ||
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• gives large values for ambiguous matches
f1 f2' f2
I1 I2
Feature distance
• Does the SSD vs “ratio distance” change the
best match to a given feature in image 1?
Feature matching example
58 matches (thresholded by ratio score)
We’ll deal with
outliers later
Feature matching example
51 matches (thresholded by ratio score)
Evaluating the results
How can we measure the performance of a feature matcher?
50
75
200
feature distance
True/false positives
How can we measure the performance of a feature matcher?
50
true match
75
200
false match
feature distance
The distance threshold affects performance
– True positives = # of detected matches that are correct
• Suppose we want to maximize these—how to choose threshold?
– False positives = # of detected matches that are incorrect
• Suppose we want to minimize these—how to choose threshold?
Evaluating the results
How can we measure the performance of a feature matcher?
0.7
true
# true positives
positive
# matching features (positives)
rate
recall
0 0.1 false positive rate 1
# false positives
# unmatched features (negatives)
precision
Evaluating the results
How can we measure the performance of a feature matcher?
ROC curve (“Receiver Operator Characteristic”)
1
0.7
Single number: Area
true
# true positives
positive Under the Curve (AUC)
# matching features (positives)
rate E.g. AUC = 0.87
recall 1 is the best
0 0.1 false positive rate 1
# false positives
# unmatched features (negatives)
1 - specificity