Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views75 pages

Ch4 Part2 Feature ExtractionMatching

Uploaded by

Trung Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views75 pages

Ch4 Part2 Feature ExtractionMatching

Uploaded by

Trung Phạm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

20/02/2024

Computer Vision
Course
School of Information and
Communication Technology

Computer Vision
Chapter 4: Feature detection and Image matching
(Part 2: Feature extraction and matching)
Computer Vision Group
School of Information and Communications Technology

1
20/02/2024

Plan

• Edge detection
• General approach
• Image gradient, Canny detector, Laplacian
• Line detection from contour points (Hough transform)
• Feature Extraction
• Global features
• Local features
• Image matching and Applications

CV Group – School of Information and Communication and Technology 3

Feature extraction: what and why?

A ball?
An official match ball
for Qatar 2022?

Matrix of integers:
- Redundant information Decision model
- Many of discret numbers

A (compact) representation that


describes well the object/image that
is used to distinguish object from one
another.
(called Features)

CV Group – School of Information and Communication and Technology 4

2
20/02/2024

Feature extraction: what and why?

Can we GO Useful feature: color


or STOP?

Player vs Useful feature:


Ball? shape

https://www.theguardian.com/

CV Group – School of Information and Communication and Technology 5

Good feature?

• A good feature will help us recognize an object in all


the ways it may appear

• Identifiable
• easily tracked and compared
• Consistent across different scales, lighting conditions, and
viewing angles
• Still visible in noisy images or when only part of an object
is visible

CV Group – School of Information and Communication and Technology 6

3
20/02/2024

Local features vs Global features

• Two types of features are extracted from the image:


• local and global features (descriptors)
• Global feature:
• Describe the image as a whole to the generalize the entire object
• Include contour representations, shape descriptors, and texture
features
• Examples: Invariant Moments (Hu, Zernike), Histogram Oriented
Gradients (HOG), PHOG, and Co-HOG,...

• Local feature:
• the local features describe the image patches (key points in the image)
of an object
• represents the texture/color in an image patch
• Examples: SIFT, SURF, LBP, BRISK, MSER and FREAK, …

CV Group – School of Information and Communication and Technology 7

Feature extraction

• Global features
• Color / Shape / Texture
• Local features

CV Group – School of Information and Communication and Technology 8

4
20/02/2024

Global features?

How to distinguish these objects?

CV Group – School of Information and Communication and Technology 9

Types of features

• Contour representation, Shape features


• Color descriptors
• Texture features

CV Group – School of Information and Communication and Technology 10

10

5
20/02/2024

Color features

• Histogram
256 bins intensity histogram 16 bins intensity histogram

CV Group – School of Information and Communication and Technology 11

11

Distance / Similarity

• L1 ou L2 (euclidian) distances are often used

N
d L1 (H, G) = å hi - g i
i =1

• Histogram intersection

∑! min( ℎ! , 𝑔! )
∩ (H,G) =
∑! 𝑔!

CV Group – School of Information and Communication and Technology 12

12

6
20/02/2024

Advantages of histogram

• Invariant to basic geometric transformations:


• Rotation
• Translation
• Zoom (Scaling)

CV Group – School of Information and Communication and Technology 13

13

Some inconveniences

• The similarity between colors in


adjacent colors (bin) is not taken
into account

• The spatial distribution of pixel


values is not considered: 2
different images may have the
same histogram

CV Group – School of Information and Communication and Technology 14

14

7
20/02/2024

Some inconveniences

• Background effect: d(I1,I2) ? d(I1, I3)

I3
I1

I2

• Color representation dependency (color space),


device dependency, …

CV Group – School of Information and Communication and Technology 15

15

Texture features

• A texture can be defined as


• a region with variation of intensity
• as a spatial organization of pixels

CV Group – School of Information and Communication and Technology 16

16

8
20/02/2024

Texture features

• There are several methods for analyzing textures:


• First order statistics
• Statistics on histogram

• Co-occurence matrices
• Searching patterns

• Frequential analysis
• Gabor filter
• …
• The most difficult is to find a good representation
(good parameters) for each texture

CV Group – School of Information and Communication and Technology 17

17

Texture features

Texture features

Statistical
Spectral
General statistics parameters
Haralick’s co-occurrence matrices PWT
Tamura features Model-based
TWT
DCT, DST, DHT
Markov random fields
Complex wavelets
Fractals
Gabor filters
Geometrical
ICA filters
Voronoi tesselation features
Structural methods

CV Group – School of Information and Communication and Technology 18

18

9
20/02/2024

First order statistics

• Histogram-based: mean, variance, skewness,


kurtosis, energy, entropy, ...
• Given an image with n pixels:
• h(i) represents the number of occurrences of the i-th gray
level.
• L-1: maximum gray level
• P(i): normalized histogram

CV Group – School of Information and Communication and Technology 19

19

First order statistics

$ &'
µ= $ 𝑖 .𝑃 𝑖 Mean = image brightness
!"#
$ &'
Variance measures the deviation of gray
𝜎( = $ (𝑖 − 𝜇)(. 𝑃 𝑖 levels from the Mean
!"#
1 $ &'
𝑠 = *$ (𝑖 − 𝜇)*. 𝑃 𝑖 Skewness is a measure of the degree of
𝜎 !"# histogram asymmetry around the Mean

1 $ &'
Kurtosis is a measure of the
𝑘= ) $ (𝑖 − 𝜇)). 𝑃 𝑖 − 3 histogram sharpness
𝜎 !"#
$ &'
Entropy is a measure of
𝐻 = −$ 𝑃 𝑖 log 𝑃(𝑖) randomness
!"#
$ &' (
𝐸= $ 𝑃 𝑖 Energy measures the smoothness
!"#
CV Group – School of Information and Communication and Technology 20

20

10
20/02/2024

First order statistics

• Skewness vs. kurtosis (red: normal distribution with mean


= 5, variance = 4)

CV Group – School of Information and Communication and Technology 21

21

GLCM (Gray Level Co-occurence Matrices)


• Features generated from the first-order statistics:
• Provide information related to the gray-level distribution
of the image
• But do not give any information about the relative
positions of the various gray levels within the image

èGray Level Co-occurrences matrix that measures


second-order image statistics

CV Group – School of Information and Communication and Technology 22

22

11
20/02/2024

GLCM (Gray Level Co-occurence Matrices)


• GLCM describes how frequently two pixels with
gray-levels ci, cj appear in the image, separated by
a distance d in direction β
• Haralick features
• Matrix of size L x L
• L is the number of gray level in the image (256x256)
• We often reduce that number to 8x8, 16x16 or 32x32
• Many matrices, one for each distance and direction:
• Relative distance (in pixel numbers) d: 1, 2, 3 (,4, …)
• Relative direction β: 0°, 45°, 90°, 135° (, …)
• Processing time can be very long 0°

135° 90° 45°

CV Group – School of Information and Communication and Technology 23

23

GLCM (Gray Level Co-occurence Matrices)


the number of pairs of pixels (p1, p2) where
- p2 is a neigbor of p1
- p1 with gray level ci and p2 with gray
level cj

card ({ p1 , p2 I ( p1 ) = ci , I ( p2 ) = c j , N d , b ( p1 , p2 ) = true})
CM d , b (ci , c j ) =
card ({ p1, p 2 N d , b ( p1 , p2 ) = true})

the number of all pairs (p1,p2) where p2


is the neigbor of p1 (used to normalize
the GLCM)

Card(.) : the cardinality of a set


N d , b ( p1 , p2 ) = true Pixel p2 is a neigbor of pixel p1 at a
distance d in direction b

CV Group – School of Information and Communication and Technology 24

24

12
20/02/2024

GLCM (Gray Level Co-occurence Matrices)


n Example on how to compute these matrices:

Cj: grayscale of p2

1 2 3 4
1 4 4 3

Ci: grayscale of p1
1 ? ? ? ?
4 2 3 2
2 ? ? ? ?
1 2 1 4
3 ? ? ? ?
1 2 2 3
4 ? ? ? ?
Image
Matrix for distance=1 and direction=0°

p1
p2
We loop over the image and for each pair of pixels following the given
distance and orientation, we increment the co-occurence matrix

CV Group – School of Information and Communication and Technology 25

25

GLCM (Gray Level Co-occurence Matrices)


n Example on how to compute these matrices:

1 2 3 4
1 4 4 3
1 0 0 0 1
4 2 3 2
2 0 0 0 0
1 2 1 4
3 0 0 0 0
1 2 2 3
4 0 0 0 0
Image
Matrix for distance=1 and
direction=0°

Pair of neighbor pixels (1,4)

CV Group – School of Information and Communication and Technology 26

26

13
20/02/2024

GLCM (Gray Level Co-occurence Matrices)


n Example on how to compute these matrices:

1 2 3 4
1 4 4 3
1 0 0 0 1
4 2 3 2
2 0 0 0 0
1 2 1 4
3 0 0 0 0
1 2 2 3
4 0 0 0 1
Image
Matrix for distance=1 and
direction=0°

Pair of neighbor pixels (4,4)

CV Group – School of Information and Communication and Technology 27

27

GLCM (Gray Level Co-occurence Matrices)


n Example on how to compute these matrices (final):

1 2 3 4 1 2 3 4
1 4 4 3
1 0 2 0 2 1 0 2 1 0
4 2 3 2
2 1 1 2 0 2 1 1 0 0
1 2 1 4
3 0 1 0 0 3 0 0 0 1
1 2 2 3
4 0 1 1 1 4 0 2 1 0
Image
Matrix for distance=1 Matrix for distance=1
and direction=0° and direction=45°

…and so on for each matrix (several matrices at the end)

CV Group – School of Information and Communication and Technology 29

29

14
20/02/2024

GLCM (Gray Level Co-occurence Matrices)


• Most important/popular parameters computed from
GLCM:

Angular Seconde Moment (ASM):


𝐴𝑆𝑀 = / / 𝐶𝑀#(𝑖, 𝑗) minimal when all elements are equal
! "

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − / / 𝐶𝑀 (𝑖, 𝑗) log( 𝐶𝑀(𝑖, 𝑗)) a measure of chaos,


! " maximal when all elements are equal

𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 = / /(𝑖 − 𝑗)#𝐶𝑀(𝑖, 𝑗) small values when big elements


! " are near the main diagonal
1
idm = / / 𝐶𝑀(𝑖, 𝑗)
1 + (𝑖 − 𝑗)#
! " idm (inverse differential moment) has small values
when big elements are far from the main diagonal

CV Group – School of Information and Communication and Technology 30

30

GLCM (Gray Level Co-occurence Matrices)


• Haralick features:
• For each GLCM, we can compute up to 14 parameters
characterizing the texture, of which the most important :
mean, variance, energy, inertia, entropy, inverse
differential moment
• Ref: http://haralick.org/journals/TexturalFeatures.pdf

CV Group – School of Information and Communication and Technology 31

31

15
20/02/2024

Invariances

• All features are function of the distance d and the


orientation β
• Rotation?
• Average on all directions

• Scaling?
• Multi-resolutions

CV Group – School of Information and Communication and Technology 32

32

Texture features comparision

Source : William Robson Schwartz et al. Evaluation of Feature Descriptors for Texture Classification – 2012 JEI
CV Group – School of Information and Communication and Technology 33

33

16
20/02/2024

Shape features

• Contour-based features:
• Chain coding, polygone approximation, geometric
parameters, angular profile, surface, perimeter, …
• Region based:
• Invariant moments, …

CV Group – School of Information and Communication and Technology 34

34

Shape features

Shape features

Boundary-based methods
Region-based methods

Gometrical Signatures
Global
Perimeter Centroid Distance Geometrical
Eccentricity Complex Coordinates Moment invariants
Curvature Curvature signature Area Zernike moments
Axes directionality Turning Angle Compactness Pseudo Zernike moments
Euler number
Grid method
Others Signature descriptors Decompasition
Chain codes Fourier Descriptors Triangulation
UNL-Fourier Medial Axis Transform
NFD (Skeleton Transform)
Wavelet Descriptors

B-Splines

CV Group – School of Information and Communication and Technology 35

35

17
20/02/2024

Examples: angular profile, …

CV Group – School of Information and Communication and Technology 37

37

Examples : Image moments

• Moment

𝑀#,# = 𝑎𝑟𝑒𝑎 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑔𝑖𝑜𝑛 𝐷

𝑀#,', 𝑀',# = 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑜𝑓 𝐷


• Central moments:

Invariant to
translation

CV Group – School of Information and Communication and Technology 38

38

18
20/02/2024

Invariant moments (Hu's moments)

invariant to
translation,
scale, and
rotation, and
reflection

Change for
image
reflection

CV Group – School of Information and Communication and Technology 39

39

Examples : Hu's moments

6 images and their Hu Moments

https://www.learnopencv.com/wp-content/uploads/2018/12/HuMoments-Shape-Matching.png

CV Group – School of Information and Communication and Technology 40

40

19
20/02/2024

Shape Context

https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sc_digits.html

CV Group – School of Information and Communication and Technology 41

41

Examples: HOG

Idea: appearance of objects in an image can be described using the distribution of


edge directions
- HOG: Dalal, N., and Triggs, B. (2005). Histograms of oriented gradients for human
detection, CVPR2005

Source: LearnOpenCV.com

CV Group – School of Information and Communication and Technology 42

42

20
20/02/2024

Examples: PHOG

PHOG:
Pyramid Histogram of Oriented Gradients

Source:http://www.robots.ox.ac.uk/~vgg/research/caltech/phog.html

CV Group – School of Information and Communication and Technology 43

43

Feature extraction

• Global features
• Local features
• Interest point detector
• Local descriptor

CV Group – School of Information and Communication and Technology 44

44

21
20/02/2024

Why local features?

• Image matching: a challenging problem

Source: CS131 - Juan Carlos Niebles and Ranjay Krishna

CV Group – School of Information and Communication and Technology 45

45

Image matching

by swashford

by Diva Sian

the Trevi Fountain in Rome

by scgbt
Slide credit: Steve Seitz

CV Group – School of Information and Communication and Technology 46

46

22
20/02/2024

Harder Still?

NASA Mars Rover images Slide credit: Steve Seitz

CV Group – School of Information and Communication and Technology 47

47

Answer Below (Look for tiny colored squares)

NASA Mars Rover images with SIFT feature matches Slide credit: Steve Seitz
(Figure by Noah Snavely)
CV Group – School of Information and Communication and Technology 48

48

23
20/02/2024

Motivation for using local features

• Global representations have major limitations


• Instead, describe and match only local regions
• Increased robustness to
• Occlusions

• Articulation
d dq
φ
φ
θq
θ

• Intra-category variations
Source: CS131 - Juan Carlos Niebles and Ranjay Krishna
CV Group – School of Information and Communication and Technology 49

49

Local features and applications

• Partial search
• Object recognition/ detection

Sivic and Zisserman, 2003

D. Lowe 2002

CV Group – School of Information and Communication and Technology 50

50

Lowe 2002

24
20/02/2024

Local features and applications

• Panorama:
• Detect feature points in both images
• Find corresponding pairs

[Darya Frolova and Denis Simakov]


CV Group – School of Information and Communication and Technology 52

52

Local features and applications

• Panorama:
• Detect feature points in both images
• Find corresponding pairs
• Use these pairs to align images

CV Group – School of Information and Communication and Technology


[Darya 53
Frolova and Denis Simakov]

53

25
20/02/2024

• Image matching:

Source : Jim Little, Lowe: features, UBC.

CV Group – School of Information and Communication and Technology 55

55

Local feature extraction


• Local features: how to determine image patches / local regions

Dividing into
patches with Keypoint detection
regular grid
Image segmentation

Without knowledge about


image content
Based on the content of image

CV Group – School of Information and Communication and Technology 57

57

26
20/02/2024

Common Requirements

• Problem 1:
• Detect the same point independently in both images

No chance to match!

• Problem 2: We need a repeatable detector!


• For each point correctly recognize the corresponding one

We need a reliable and distinctive descriptor!


Slide credit: Darya Frolova, Denis Simakov
CV Group – School of Information and Communication and Technology 58

58

Requirements- Summary

• Region extraction needs to be repeatable and accurate


• Invariant to translation, rotation, scale changes
• Robust or covariant to out-of-plane (»affine) transformations
• Robust to lighting variations, noise, blur, quantization

• Locality: Features are local, therefore robust to occlusion and


clutter

• Quantity: We need a sufficient number of regions to cover


the object

• Distinctiveness: The regions should contain “interesting”


structure

• Efficiency: Close to real-time performance

CV Group – School of Information and Communication and Technology 62

62

27
20/02/2024

Main questions

• Where will the interest points come from?


• What are salient features that we’ll detect in multiple
views?

• How to describe a local region?

• How to establish correspondences, i.e., compute


matches?

CV Group – School of Information and Communication and Technology 63

63

Feature extraction

• Global features
• Local features
• Interest point detector
• Local descriptor
• Matching

CV Group – School of Information and Communication and Technology 64

64

28
20/02/2024

Interest points: why and where?

Yarbus eye tracking


Source : Derek Hoiem, Computer Vision, University of Illinois.

CV Group – School of Information and Communication and Technology 65

65

Keypoint Localization

• Goals:
• Repeatable detection
• Precise localization
• Interesting content
Þ Look for two-dimensional signal changes

Slide credit: Bastian Leibe

CV Group – School of Information and Communication and Technology 67

67

29
20/02/2024

Finding Corners

• Key property:
• In the region around a corner, image gradient has two or
more dominant directions
• Corners are repeatable and distinctive

C.Harris and M.Stephens. "A Combined Corner and Edge Detector.“ Proceedings of the 4th Alvey
Vision Conference, 1988.
Slide credit: Svetlana Lazebnik

CV Group – School of Information and Communication and Technology 68

68

Corners as distinctive interest points


• Design criteria
• We should easily recognize the point by looking through a small
window (locality)
• Shifting the window in any direction should give a large change in
intensity (good localization)

“flat” region: “edge”: “corner”:


no change in no change along significant change
all directions the edge in all directions
direction
Slide credit: Alyosha Efros

CV Group – School of Information and Communication and Technology 69

69

30
20/02/2024

Corners versus edges

Large
Corner
Large

Small
Edge
Large

Small
Nothing
Small

CV Group – School of Information and Communication and Technology 70

70

Harris detector formulation

Change of intensity for the shift [u,v]:

E (u , v) = å w( x, y ) [ I ( x + u , y + v) - I ( x, y ) ]
2

x, y

Window Shifted Intensity


function intensity

or
Window function w(x,y) =

1 in window, 0 outside Gaussian


Source: R. Szeliski
CV Group – School of Information and Communication and Technology 71

71

31
20/02/2024

Corner Detection by Auto-correlation

Change in appearance of window w(x,y) for shift [u,v]:

E (u, v) = å w( x, y) [ I ( x + u, y + v) - I ( x, y) ]
2

x, y

I(x, y)
E(u, v)

E(0,0)

w(x, y)
CV Group – School of Information and Communication and Technology 72

72

Corner Detection by Auto-correlation

Change in appearance of window w(x,y) for shift [u,v]:

E (u, v) = å w( x, y) [ I ( x + u, y + v) - I ( x, y) ]
2

x, y

I(x, y)
E(u, v)

E(3,2)

w(x, y)

CV Group – School of Information and Communication and Technology 73

73

32
20/02/2024

Harris detector formulation

• This measure of change can be approximated by:

éu ù
E (u, v) » [u v] M ê ú
ëv û
where M is a 2´2 matrix computed from image derivatives:

é I2 IxI y ù
M = å w( x, y) ê x
Gradient with respect to x,
ú times gradient with respect to y
x, y êë I x I y I y2 úû

Sum over image region – the area


we are checking for corner

M
Slide credit: Rick Szeliski

CV Group – School of Information and Communication and Technology 81

81

What does this matrix reveal?

• First, let’s consider an axis-aligned corner:

é å I x2 åI I x y
ù él1 0 ù
M =ê ú=ê ú
ëêå I x I y åI ûú ë 0 l2 û
2
y
• This means:
• Dominant gradient directions align with x or y axis
• If either λ is close to 0, then this is not a corner, so look for
locations where both are large.

• What if we have a corner that is not aligned with the


image axes?
Slide credit: David Jacobs

CV Group – School of Information and Communication and Technology 83

83

33
20/02/2024

General case

• Since M is symmetric, we have


él 0 ù
M = R -1 ê 1 úR
ë 0 l2 û
• We can visualize M as an ellipse with(Eigenvalue
axis lengths determined
decomposition)

by the eigenvalues and orientation determined by a rotation


matrix R

Direction of the
fastest change

Direction of the
slowest change

(lmax)-1/2
(lmin)-1/2

adapted from Darya Frolova, Denis Sim akov

CV Group – School of Information and Communication and Technology 84

84

Interpreting the eigenvalues

• Classification of image points using eigenvalues of M :

l2 “Edge”
l2 >> l1 “Corner”
l1 and l2 are large, l1 ~ l2 ;
E increases in all directions

l1 and l2 are small;


E is almost constant in “Flat” “Edge”
all directions region l1 >> l2
l1

Slide credit: Kristen Grauman

CV Group – School of Information and Communication and Technology 85

85

34
20/02/2024

Corner response function

θ = det(M ) − α trace(M )2 = λ1λ2 − α (λ1 + λ2 )2


l2 “Edge”
θ<0 “Corner”
θ>0

• Fast approximation
• Avoid computing the
eigenvalues “Flat” “Edge”
• α: constant region θ<0
(0.04 to 0.06) l1

Slide credit: Kristen Grauman

CV Group – School of Information and Communication and Technology 86

86

Window Function w(x,y)

• Option 1: uniform window é I2 IxI y ù


M = å w( x, y) ê x ú
• Sum over square window x, y ëê I x I y I y2 ûú

é I2 IxI y ù
M = åê x ú
1 in window, 0 outside êIx I y
x, y ë I y2 ûú
• Problem: not rotation invariant

• Option 2: Smooth with Gaussian


• Gaussian already performs weighted sum
é I2 IxI y ù
M = g (s ) * ê x ú
Gaussian êë I x I y I y2 úû
• Result is rotation invariant
Slide credit: Bastian Leibe

CV Group – School of Information and Communication and Technology 87

87

35
20/02/2024

Summary: Harris Detector [Harris88]

• Compute second moment matrix


(autocorrelation matrix)
Ix Iy
1. Image
derivatives
é I 2 (s ) I x I y (s D ) ù Ix2 Iy2 IxIy
2. Square of
M (s I , s D ) = g (s I ) * ê x D ú derivatives
êë I x I y (s D ) I y (s D ) úû
2

3. Gaussian
filter g(sI)
• Compute corner response g(Ix2) g(Iy2) g(IxIy)

4. Cornerness function – two strong eigenvalues

θ = det[M (σ I ,σ D )]− α[trace(M (σ I ,σ D ))]2


= g ( I x2 ) g ( I y2 ) - [ g ( I x I y )]2 - a[ g ( I x2 ) + g ( I y2 )]2
R
5. Perform non-maximum suppression

C.Harris and M.Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey
Vision Conference: pages 147—151, 1988. Slide credit: Krystian Mikolajczyk

CV Group – School of Information and Communication and Technology 88

88

Harris Detector: Workflow

Slide adapted from Darya Frolova, Denis Simakov


CV Group – School of Information and Communication and Technology 89

89

36
20/02/2024

Harris Detector: Workflow

computer corner responses θ

Slide adapted from Darya Frolova, Denis Simakov


CV Group – School of Information and Communication and Technology 90

90

Harris Detector: Workflow

Take points where θ > threshold

Slide adapted from Darya Frolova, Denis Simakov

CV Group – School of Information and Communication and Technology 91

91

37
20/02/2024

Harris Detector: Workflow

Take only the local maxima of θ, where θ > threshold

Slide adapted from Darya Frolova, Denis Simakov


CV Group – School of Information and Communication and Technology 92

92

Harris Detector: Workflow

Resulting Harris points

Slide adapted from Darya Frolova, Denis Simakov


CV Group – School of Information and Communication and Technology 93

93

38
20/02/2024

Harris Detector – Responses[Harris88]

Effect: A very precise


corner detector.

Slide credit: Krystian Mikolajczyk

CV Group – School of Information and Communication and Technology 94

94

Harris Detector: Properties

• Translation invariance
• Rotation invariance?

Ellipse rotates but its shape (i.e.


eigenvalues) remains the same
Corner response θ is invariant to image rotation

• Scale invariance?

Corner All points will be


classified as edges!

Slide credit: Kristen Grauman


Not invariant to image scale!
CV Group – School of Information and Communication and Technology 95

95

39
20/02/2024

Harris Detector: Properties

• Partially invariant to affine intensity change

CV Group – School of Information and Communication and Technology 96

96

Invariance

• Extract patch from each image individually

Slide adapted from T. Tuytelaars ECCV 2006 tutorial


CV Group – School of Information and Communication and Technology 99

99

40
20/02/2024

Automatic scale selection

• Solution:
• Design a function on the region, which is “scale invariant” (the
same for corresponding regions, even if they are at different scales)

Example: average intensity. For corresponding regions (even of


different sizes) it will be the same.

• For a point in one image, we can consider it as a function of


region size (patch width)

f Image 1 f Image 2

scale = 1/2

region size region size


CV Group – School of Information and Communication and Technology 100

100

Automatic scale selection

• Common approach:

Take a local maximum of this function


Observation: region size, for which the maximum is
achieved, should be invariant to image scale.
Important: this scale invariant region size is found in
each image independently!

f Image 1 f Image 2

scale = 1/2

s1 region size s2 region size


CV Group – School of Information and Communication and Technology 101

101

41
20/02/2024

Automatic Scale Selection

f ( I i1im ( x, s )) = f ( I i1im ( x¢, s ¢))

Same operator responses if the patch contains the same image up to scale
factor.

K. Grauman, B. Leibe

102
CV Group – School of Information and Communication and Technology 102

102

Example

Function responses for increasing scale (scale signature)

f ( I i1im ( x, s )) f ( I i1im ( x¢, s ))

CV Group – School of Information and Communication and Technology 103

103

42
20/02/2024

Example

Function responses for increasing scale (scale signature)

f ( I i1im ( x, s )) f ( I i1im ( x¢, s ))

CV Group – School of Information and Communication and Technology 105

105

Example

Function responses for increasing scale (scale signature)

f ( I i1im ( x, s )) f ( I i1im ( x¢, s ))

CV Group – School of Information and Communication and Technology 106

106

43
20/02/2024

Example

Function responses for increasing scale (scale signature)

f ( I i1im ( x, s )) f ( I i1im ( x¢, s ¢))

CV Group – School of Information and Communication and Technology 107

107

Scale Invariant Detection

• A “good” function for scale detection:


has one stable sharp peak

f f f
Good !
bad bad
region size region size region size

• For usual images: a good function would be one


which responds to contrast (sharp local intensity
change)

CV Group – School of Information and Communication and Technology 108

108

44
20/02/2024

What is a useful signature function?


• Functions for determining scale
f = Kernel * Image
Kernels:
L = s 2 ( Gxx ( x, y, s ) + Gyy ( x, y, s ) )
(Laplacian)

DoG = G( x, y, ks ) - G( x, y, s )
(Difference of Gaussians)

where Gaussian
x2 + y 2
-
G ( x, y , s ) = 1
2ps
e 2s 2
Note: both kernels are invariant
to scale and rotation

CV Group – School of Information and Communication and Technology 109

109

What is a useful signature function?


• Laplacian-of-Gaussian = “blob” detector

Source: K. Grauman, B. Leibe

CV Group – School of Information and Communication and Technology 110

110

45
20/02/2024

Characteristic scale

• We define the characteristic scale as the scale that


produces peak of Laplacian response

characteristic scale
T. Lindeberg (1998). "Feature detection with automatic scale selection." IJCV 30 (2): pp 77--116.
Source: Lana Lazebnik
CV Group – School of Information and Communication and Technology 111

111

Laplacian-of-Gaussian (LoG)

• Interest points:
Local maxima in scale s5
space of LoG
s4

Lxx (s ) + Lyy (s ) s3

s2

Þ List of
s (x, y, σ)

Source: K. Grauman, B. Leibe

CV Group – School of Information and Communication and Technology 112

112

46
20/02/2024

Alternative approach

Approximate LoG with Difference-of-Gaussian (DoG).

CV Group – School of Information and Communication and Technology 116

116

Alternative approach

• Approximate LoG with Difference-of-Gaussian (DoG):


• 1. Blur image with σ Gaussian kernel
• 2. Blur image with kσ Gaussian kernel
• 3. Subtract 2. from 1.

Small k gives a closer approximation to LoG, but usually we want to build a


scale space quickly out of this. k = 1.6 gives an appropriate scale space, k =
sqrt(2)

- =

Source: K. Grauman, B. Leibe


CV Group – School of Information and Communication and Technology 117

117

47
20/02/2024

Find local maxima in position-scale space of


DoG

Find maxima


ks 2ks
- =
s
- ks Þ List of
= (x, y, s)

- =
s
Input image

CV Group – School of Information and Communication and Technology 118

118

Harris-Laplacian

• Harris-Laplacian1
Find local maximum of: scale
¬ Laplacian ®

• Harris corner detector in space


(image coordinates)
y
• Laplacian in scale
¬ Harris ® x

1
K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2
D.Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. IJCV 2004

CV Group – School of Information and Communication and Technology 119

119

48
20/02/2024

Scale Invariant Detectors

• Harris-Laplacian1
Find local maximum of: scale

¬ Laplacian ®
• Harris corner detector in space (image coordinates)
• Laplacian in scale
y

¬ Harris ® x

¡ SIFT (D.Lowe)2 scale

¬ DoG ®
Find local maximum of:
- Difference of Gaussians in space y
and scale
¬ DoG ® x

1 K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001
2 D.Lowe. “Distinctive Image Features from Scale-Invariant Keypoints”. IJCV 2004

CV Group – School of Information and Communication and Technology 120

120

DoG (SIFT) keypoint Detector

• DoG at multi-octaves
• Extrema detection in scale space
• Keypoint location
• Interpolation
• Removing instable points
• Orientation Assignment

CV Group – School of Information and Communication and Technology 121

121

49
20/02/2024

DoG (SIFT) Detector

• DoG at multi-octaves

( )
G k 2s * I

G(ks )* I
D(s ) º (G(ks ) - G(s ))* I
G(s )* I

CV Group – School of Information and Communication and Technology 122

122

DoG (SIFT) Detector

• Scale-Space Extrema Choose all extrema within 3x3x3


neighborhood

( )
D k 2s

D(ks )

D(s )

X is selected if it is larger or smaller than all 26 neighbors


(its 8 neighbors in the current image and 9 neighbors
each in the scales above and below)

CV Group – School of Information and Communication and Technology 123

123

50
20/02/2024

DoG (SIFT) Detector

• Orientation assignment
• Create histogram of local gradient
directions at selected scale
• Assign orientation at peak of smoothed
histogram
• Each key specifies stable 2D
coordinates
(x, y, scale,orientation)

If 2 major orientations, use both.

CV Group – School of Information and Communication and Technology 124

124

Example of keypoint detection

(a) 233x189 image


(b) 832 DOG extrema
(c) 729 left after peak
value threshold
(d) 536 left after testing
ratio of principle
curvatures (removing
edge responses)

CV Group – School of Information and Communication and Technology 125

125

51
20/02/2024

DoG (SIFT) Detector

A SIFT keypoint : {x, y, scale, dominant orientation}

Source: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004


CV Group – School of Information and Communication and Technology 126

126

Scale Invariant Detectors

• Experimental evaluation of detectors w.r.t. scale change

Repeatability rate:
# correspondences
# possible correspondences

K.Mikolajczyk, C.Schmid. “Indexing Based on Scale Invariant Interest Points”. ICCV 2001

Slide credit: CS131 -Juan Carlos Niebles and Ranjay Krishna

CV Group – School of Information and Communication and Technology 127

127

52
20/02/2024

Many existing detectors available

• Hessian & Harris [Beaudet ‘78], [Harris ‘88]

• Laplacian, DoG [Lindeberg ‘98], [Lowe ‘99]

• Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]

• Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04]

• EBR and IBR [Tuytelaars & Van Gool ‘04]

• MSER [Matas ‘02]

• Salient Regions [Kadir & Brady ‘01]

• Others…

• Those detectors have become a basic building block for


many recent applications in Computer Vision.

Slide credit: Bastian Leibe

CV Group – School of Information and Communication and Technology 128

128

Feature extraction

• Global features
• Local features
• Interest point detector
• Local descriptor
• Matching

CV Group – School of Information and Communication and Technology 129

129

53
20/02/2024

Local Descriptor

• Compact, good representation for local information

• Invariant
• Geometric transformations: rotation, translation, scaling,..
• Camera view change
• Illiminution
• Exemples
• SIFT, SURF(Speeded Up Robust Features), PCA-SIFT, …
• LBP, BRISK, MSER and FREAK, …

CV Group – School of Information and Communication and Technology 130

130

Invariant local features

• Image content is transformed into local feature coordinates that are


invariant to translation, rotation, scale, and other imaging parameters

Following slides credit: CVPR 2003 Tutorial on Recognition and Matching Based on Local Invariant Features David Lowe

CV Group – School of Information and Communication and Technology 131

131

54
20/02/2024

Advantages of invariant local features


• Locality:
• features are local, so robust to occlusion and clutter (no prior
segmentation)
• Distinctiveness:
• individual features can be matched to a large database of objects
• Quantity:
• many features can be generated for even small objects
• Efficiency:
• close to real-time performance
• Extensibility:
• can easily be extended to wide range of differing feature types, with
each adding robustness

CV Group – School of Information and Communication and Technology 132

132

Becoming rotation invariant

• We are given a keypoint and its scale from DoG

• We will select a characteristic orientation for the


keypoint (based on the most prominent gradient
there)

• We will describe all features


relative to this orientation
0 2p

CV Group – School of Information and Communication and Technology 133

133

55
20/02/2024

SIFT descriptor formation

• Use the blurred image associated with the keypoint’s scale


• Take image gradients over the keypoint neighborhood.
• In order to achieve orientation invariance, the coordinates of the
descriptor and the gradient orientations are rotated relative to the
keypoint orientation

Source: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004


http://campar.in.tum.de/twiki/pub/Chair/TeachingWs13TDCV/feature_descriptors.pdf

CV Group – School of Information and Communication and Technology 134

134

SIFT descriptor formation

• Create array of orientation histograms (a 4x4 array is shown)


• Each local orientation histogram has n orientation bins (here n = 8)
• The SIFT authors found that best results were with 8 orientation bins per
histogram and and a 4x4 histogram array è a SIFT descriptor: vector of
128 values

0 2p

SIFT: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004


Image: Ashish A Gupta, PhD thesis 2013

CV Group – School of Information and Communication and Technology 135

135

56
20/02/2024

SIFT descriptor: more

• More about techniques used to SIFT become more stable and


robust: see detail in section 6.2 at
https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

CV Group – School of Information and Communication and Technology 136

136

SIFT

• Extraordinarily robust matching technique


• Can handle changes in viewpoint: up to about 60 degree out of plane rotation
• Can handle significant changes in illumination
• Sometimes even day vs. night (below)
• Fast and efficient—can run in real time

Steve Seitz

Steve
CV Group – School of Information and Communication and Technology Seitz 142

142

57
20/02/2024

Sensitivity to number of histogram orientations

CV Group – School of Information and Communication and Technology 143

143

Feature stability to noise


• Match features after random change in image scale & orientation, with
differing levels of image noise
• Find nearest neighbor in database of 30,000 features

David G. Lowe, "Distinctive image features from scale-invariant keypoints," IJCV, 60, 2 (2004), pp. 91-110

CV Group – School of Information and Communication and Technology 144

144

58
20/02/2024

Feature stability to affine change


• Match features after random change in image scale & orientation, with 2%
image noise, and affine distortion
• Find nearest neighbor in database of 30,000 features

CV Group – School of Information and Communication and Technology 145

145

Distinctiveness of features
• Vary size of database of features, with 30 degree affine change, 2% image
noise
• Measure % correct for single nearest neighbor match

CV Group – School of Information and Communication and Technology 146

146

59
20/02/2024

SIFT Keypoint Descriptor: summary

Blur the image Compute orientation


Compute gradients in
using the scale of histogram in 8
respect to the keypoint
the keypoint directions over 4x4
orientation(rotation
(scale invariance) sample regions
invariance)

Source: Distinctive Image Features from Scale-Invariant Keypoints – IJCV 2004


http://campar.in.tum.de/twiki/pub/Chair/TeachingWs13TDCV/feature_descriptors.pdf
CV Group – School of Information and Communication and Technology 147

147

Other detectors and descriptors

Popular features: SURF, HOG, SIFT


http://campar.in.tum.de/twiki/pub/Chair/TeachingWs13TDCV/feature_descriptors.p
df

Summary some local features:


http://www.cse.iitm.ac.in/~vplab/courses/CV_DIP/PDF/Feature_Detectors_and_Descri
ptors.pdf

CV Group – School of Information and Communication and Technology 148

148

60
20/02/2024

Feature extraction

• Global features
• Local features
• Interest point detector
• Local descriptor
• Matching

CV Group – School of Information and Communication and Technology 149

149

Feature matching

Given a feature in I1, how to find the best match in I2?


1. Define distance function that compares two descriptors
• Use L1, L2, cosine, Mahalanobis,… distance

2. Test all the features in I2, find the one with min distance

OpenCV:
- Brute force matching
- Flann Matching: Fast Library for Approximate Nearest Neighbors
[Muja and Lowe, 2009]

Marius Muja and David G Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In VISAPP (1),
pages 331–340, 2009

CV Group – School of Information and Communication and Technology 150

150

61
20/02/2024

Feature matching

• How to define the difference between two features f1, f2?


• Simple approach: use only distance value d(f1, f2)
• è can give good score to very ambiguous matches

• Better approaches: add additional constraints


• Radio of distance
• Spatial constraints between neigborhood pixels
• Fitting the transformation, then refine the matches (RANSAC)

CV Group – School of Information and Communication and Technology 151

151

Feature matching

• Simple approach: use distance value d(f1, f2)


è can give good score to very ambiguous matches

f1 f2

I1 I2
CV Group – School of Information and Communication and Technology 152

152

62
20/02/2024

Feature matching

• Better approaches: radio of distance = d(f1,f2) / d(f1,f2')


• f2 is best match to f1 in I2;
• f2’ is 2nd best SSD match to f1 in I2
• An ambiguous/bad match will have ratio close 1
• Look for unique matches which have low ratio

f1 f2' f2

153
I1 I2
CV Group – School of Information and Communication and Technology

153

Ratio of distances reliable for matching

David G. Lowe, "Distinctive image features from scale-invariant keypoints," IJCV, 60, 2 (2004), pp. 91-110

CV Group – School of Information and Communication and Technology 154

154

63
20/02/2024

Feature matching

• Better approaches: Spatial constraints between neigborhood pixels

Source: from slides of Valérie Gouet-Brunet


CV Group – School of Information and Communication and Technology 155

155

Feature matching

• Better approaches: fitting the transformation (RANSAC alg.)


• Fitting 2D transformation matrix

• Six variables
• Each point give two equations
• è at least three points
• Least squares

• RANSAC: refinement of matches


• Compute error:

CV Group – School of Information and Communication and Technology 156

156

64
20/02/2024

Evaluating the results

How can we measure the performance of a feature matcher?

50
75
200

Best feature distance

CV Group – School of Information and Communication and Technology 157

157

True/false positives

The distance threshold affects performance


• True positives = # of detected matches that are correct
• Suppose we want to maximize these—how to choose threshold?
• False positives = # of detected matches that are incorrect
• Suppose we want to minimize these—how to choose threshold?

50
true match
75
200
false match

feature distance

CV Group – School of Information and Communication and Technology 158

158

65
20/02/2024

Image matching

• How to define the distance between 2 images I1, I2?


• Using global features: easy
d(I1, I2) = d(feature of I1, feature of I2)

• Using local features:


• Voting strategy
• Solving an optimization problem (time consuming)
• Building a "global" feature from local features: BoW (bag-of-
words, bag-of-features), VLAD, ..

CV Group – School of Information and Communication and Technology 159

159

Voting strategy

Input image Images in a database

Selected region
for query The similarity between 2
images is based on the
number of matches

Source: Modified from slides of Valérie Gouet-Brunet

CV Group – School of Information and Communication and Technology 160

160

66
20/02/2024

Optimization problem

• Transportation problem

I1 : {(ri, wi), i=1, N} Provider


I2 : {(r’j, wj), j=1, M} Consommer
d(I1, I2) =???

d ( I1 , I 2 ) = min åå f ij ´ d (ri , r ' j )


i j

f ij ³ 0
åfi
ij £ w j , å f ij £ wi
j

åå f ´ d (r , r '
*
)
åå f ij = min(å wi , å w j )
d EMD (I , I ) =i j
ij i j

åå f
i j i j 1 2 *
ij
i j

http://vellum.cz/~mikc/oss-projects/CarRecognition/doc/dp/node29.html
CV Group – School of Information and Communication and Technology 161

161

Bag-of-words

• Local feature ~~ a word


• An image ~~ a document
• Apply a technique for textual document representation:
vector model

CV Group – School of Information and Communication and Technology 162

162

67
20/02/2024

Visual Vocabulary …

1.Extracting local features from a 2. Builing visual vocabulary


set of images (dictionary) using a clustering
method

3. An image is represented by a bag of words


è can be represented by tf.idf vector

CV Group – School of Information and Communication and Technology 163

163

Bag of words: outline

1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of
“visual words”

CV Group – School of Information and Communication and Technology 164

164

68
20/02/2024

Applications

CV Group – School of Information and Communication and Technology 165

165

Object detection/recognition/search

Sivic and Zisserman, 2003

Lowe 2002
Rothganger et al. 2003

CV Group – School of Information and Communication and Technology 166

166

69
20/02/2024

Object detection/recognition

David Low, Distinctive Image Features from Scale-Invariant Keypoints, IJCV 2004

CV Group – School of Information and Communication and Technology 167

167

Application: Image Panoramas


Slide credit: Darya Frolova, Denis Simakov

CV Group – School of Information and Communication and Technology 168

168

70
20/02/2024

Application: Image Panoramas


16
9
• Procedure:
• Detect feature points in both images
• Find corresponding pairs
• Use these pairs to align the images

Slide credit: Darya Frolova, Denis Simakov


CV Group – School of Information and Communication and Technology 169

169

Automatic mosaicing

http://www.cs.ubc.ca/~mbrown/autostitch/autostitch.html

CV Group – School of Information and Communication and Technology 170

170

71
20/02/2024

Wide baseline stereo

[Image from T. Tuytelaars ECCV 2006 tutorial]

CV Group – School of Information and Communication and Technology 171

171

CBIR (Content-based image retrieval)

CV Group – School of Information and Communication and Technology 172

172

72
20/02/2024

CBIR: partial retrieval

Source. http://www-rocq.inria.fr/imedia

CV Group – School of Information and Communication and Technology 173

173

CBIR: BoW with SIFT + histogram

Tập đặc
20 ảnh/nhóm trưng
SIFT

5 ảnh/nhóm

Tập ảnh lưu Từ điển hình ảnh


trong DB
Tính vector trọng
số cho từng ảnh

~80 ảnh/nhóm

~60 ảnh/nhóm
CSDL mô tả đặc
trưng ảnh SIFT
dưới dạng các
vector trọng số
Tập ảnh test
Source: ĐATN – Phạm Xuân Trường K52 - BK

CV Group – School of Information and Communication and Technology 174

174

73
20/02/2024

CBIR: BoW with SIFT + histogram

Source :ĐATN – Phạm Xuân Trường K52 - BK


CV Group – School of Information and Communication and Technology 175

175

CBIR: BoW with SIFT + histogram

Source: ĐATN – Phạm Xuân Trường K52 - BK

CV Group – School of Information and Communication and Technology 176

176

74
20/02/2024

References

• Lecture 5,6: CS231 - Juan Carlos Niebles and


Ranjay Krishna, Stanford Vision and Learning Lab
• Vision par Ordinateur, Alain Boucher, IFI
• SIFT: Keypoint detector (ubc.ca)
• SURF: https://people.ee.ethz.ch/~surf/eccv06.pdf
• Harris corner detector:
https://home.cis.rit.edu/~cnspci/references/dip/feature_extraction/harris198
8.pdf
• CBIR: J. Sivic & A. Zisserman (2003). "Video Google: A Text Retrieval
Approach to Object Matching in Videos" (PDF). Proc. of ICCV
• HOG for Human detection:
https://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf

CV Group – School of Information and Communication and Technology 177

177

THANK YOU !

CV Group – School of Information and Communication and Technology

178

75

You might also like