0% found this document useful (0 votes)

36 views8 pages

CVML Mulakat Notlari

Uploaded by

dessas061

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views8 pages

CVML Mulakat Notlari

Uploaded by

dessas061

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

SIFT / SURF vs HOG:

1
https://medium.com/@deepanshut041/introduction-to-sift-scale-invariant-feature-transform-6
5d7f3a72d40
The scale space of an image is a function L(x,y,σ) that is produced from the convolution of a
Gaussian kernel(Blurring) at different scales with the input image. Scale-space is separated
into octaves and the number of octaves and scale depends on the size of the original image.
So we generate several octaves of the original image. Each octave’s image size is half the
previous one.

Within an octave, images are progressively blurred using the Gaussian Blur operator.
Mathematically, “blurring” is referred to as the convolution of the Gaussian operator and the
image. Gaussian blur has a particular expression or “operator” that is applied to each pixel.
What results is the blurred image.

Now we use those blurred images to generate another set of images, the Difference of
Gaussians (DoG). These DoG images are great for finding out interesting keypoints in the
image. The difference of Gaussian is obtained as the difference of Gaussian blurring of an
image with two different σ, let it be σ and kσ. This process is done for different octaves of the
image in the Gaussian Pyramid.

Up till now, we have generated a scale space and used the scale space to calculate the
Difference of Gaussians. Those are then used to calculate Laplacian of Gaussian
approximations that are scale invariant. One pixel in an image is compared with its 8
neighbors as well as 9 pixels in the next scale and 9 pixels in previous scales. This way, a
total of 26 checks are made. If it is a local extrema, it is a potential keypoint. It basically
means that keypoint is best represented in that scale.

SIFT descriptor chooses a 16x16 and then divides it into 4x4 windows. Over each of these 4
windows it computes a Histogram of Oriented gradients. While computing this histogram, it
also performs an interpolation between neighboring angles. Once you have all the 4x4
windows, it uses a gaussian of half the window size, centered at the center of the 16x16
block to weight the values in the whole 16x16 descriptor.

HoG on the other hand only computes a simple histogram of oriented gradients as the name
says.

1) In SIFT gaussian smoothing is applied in order to compute the DOG (difference of

gaussian). Then performing Scale Extrema Detection you will detect the feature points. Once
you have this feature points you will need to compute the HOG for each feature. Since it
takes a 16x16 neighbourhood the result will be a 128 length descriptor. Whereas HOG
compute edge gradient of a whole image and find orientation of each pixel so it can generate
a histogram.
2) HOG is used to extract global feature whereas SIFT is used for extracting local features.
3) SIFT is also scale and rotation invariant whereas HOG is not scale and rotation invariant.

2
LBP:
The LBP feature vector, in its simplest form, is created in the following manner:

● Divide the examined window into cells (e.g. 16x16 pixels for each cell).
● For each pixel in a cell, compare the pixel to each of its 8 neighbors (on its left-top,
left-middle, left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise
or counter-clockwise.
● Where the center pixel's value is greater than the neighbor's value, write "0".
Otherwise, write "1". This gives an 8-digit binary number (which is usually converted
to decimal for convenience).
● Compute the histogram, over the cell, of the frequency of each "number" occurring
(i.e., each combination of which pixels are smaller and which are greater than the
center). This histogram can be seen as a 256-dimensional feature vector.
● Optionally normalize the histogram.
● Concatenate (normalized) histograms of all cells. This gives a feature vector for the
entire window.

ORB (Oriented FAST and Rotated BRIEF):

ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many
modifications to enhance the performance.
First it use FAST to find keypoints, then apply Harris corner measure to find top N points
among them. It also use pyramid to produce multiscale-features.
Unlike BRIEF, ORB is comparatively scale and rotation invariant while still employing the
very efficient Hamming distance metric for matching. As such, it is preferred for real-time
applications.

FAST:
1. Select a pixel p in the image which is to be identified as an interest point or not. Let
its intensity be I_p.
2. Select appropriate threshold value t.
3. Consider a circle of 16 pixels around the pixel under test. (See the image below)
4. Now the pixel p is a corner if there exists a set of n contiguous pixels in the circle (of
16 pixels) which are all brighter than I_p + t, or all darker than I_p − t. (Shown as
white dash lines in the above image). n was chosen to be 12.

3
Motion Estimation:
Optical flow
– Recover image motion at each pixel from spatio-temporal image brightness variations
(optical flow)
• Feature-tracking
– Extract visual features (corners, textured areas) and “track” them over multiple frames

Bag of Words:

We detect features, extract descriptors from each image in the dataset, and build a visual
dictionary. Detecting features and extracting descriptors in an image can be done by using
feature extractor algorithms (for example, SIFT, KAZE, etc). Next, we make clusters from the
descriptors (we can use K-Means, DBSCAN or another clustering algorithm). The center of
each cluster will be used as the visual dictionary’s vocabularies. Finally, for each image, we
make frequency histogram from the vocabularies and the frequency of the vocabularies in
the image. Those histograms are our bag of visual words (BOVW).

Use Nearest neighbour or SVM for classification.

Histogram comparison: Earth Movers Distance, correlation, chi-square, Bhattacharyya

Harris Corner and Shi-Tomasi Corner:

Harris Corner Detector basically finds the difference in intensity for a displacement of (u,v) in
all directions. This is expressed as below:

We've looking for windows that produce a large E value. To do that, we need to high values
of the terms inside the square brackets.

Taylor series expansion + first order derivatives + etc. For small shifts [u,v] we have a
bilinear approximation:

It was figured out that eigenvalues of the matrix can help determine the suitability of a
window. A score, R, is calculated for each window:

4
The Shi-Tomasi corner detector is based entirely on the Harris corner detector. However,
one slight variation in a "selection criteria" made this detector much better than the original. It
works quite well where even the Harris corner detector fails. So here's the minor change that
Shi and Tomasi did to the original Harris corner detector:

The Harris corner detector has a corner selection criteria. A score is calculated for each
pixel, and if the score is above a certain value, the pixel is marked as a corner. The score is
calculated using two eigenvalues. That is, you gave the two eigenvalues to a function. The
function manipulates them, and gave back a score. Shi and Tomasi suggested that the
function should be done away with. Only the eigenvalues should be used to check if the pixel
was a corner or not.
The scoring function in Harris Corner Detector was given by:

Instead of this, Shi-Tomasi proposed:

KLT (Kanade–Lucas–Tomasi)Tracker:
Find a good point to track (harris corner)
• Use intensity second moment matrix and difference across frames to find displacement
• Iterate and use coarse-to-fine search to deal with larger movements
• When creating long tracks, check appearance of registered patch against appearance of
initial patch to find points that have drifted
---
1. Detect Harris corners in the first frame of the video.
2. For each detected Harris corner, compute the motion between consecutive frames
using the optical flow (translator) and local affine transformation (affine).
3. Now link these motion vectors from frame-to-frame to track the corners.
4. Generate new Harris corners after a specific number of frames (say, 10 to 20) to
compensate for new points entering the scene or to discard the ones going out of the
scene.
5. Track the new and old Harris points.
• cost function: sum of squared intensity differences between template and window
• optimization technique: gradient descent
• model learning: no update / last frame / convex combination
• attractive properties:
–fast

5
–easily extended to image-to-image transformations with
multiple parameters

Correlation filter based tracking and KCF:

The basic idea of the correlation filter tracking is estimating an optimal image filter such that
the filtration with the input image produces a desired response. The desired response is
typically of a Gaussian shape centered at the target location, so the score decreases with
the distance.

The filter is trained from translated (shifted) instances of the target patch. When testing, the
response of the filter is evaluated and the maximum gives the new position of the target. The
filter is trained on-line and updated successively with every frame in order the tracker adapts
to moderate target changes.

Major advantage of the correlation filter tracker is the computation efficiency. The reason is
that the computation can be performed efficiently in the Fourier domain. Thus the tracker
runs super-realtime, several hundreds FPS.

----

Filter based trackers model the appearance of objects using filters trained on example
images. The target is initially selected based on a small tracking window centered on the
object in the first frame. From this point on, tracking and filter training work together. The
target is tracked by correlating the filter over a search window in next frame; the location
corresponding to the maximum value in the correlation output indicates the new position of
the target. An online update is then performed based on that new location.

To create a fast tracker, correlation is computed in the Fourier domain Fast Fourier
Transform (FFT) [15]. First, the 2D Fourier transform of the input image: F = F(f), and of the
filter: H = F(h) are computed. The Convolution Theorem states that correlation becomes an
element wise multiplication in the Fourier domain. Using the ⊙ symbol to explicitly denote
element-wise multiplication and ∗ to indicate the complex conjugate, correlation takes the
form:
G = F ⊙ H∗ (1)
The correlation output is transformed back into the spatial domain using the inverse FFT.
The bottleneck in this process is computing the forward and inverse FFTs so that the entire
process has an upper bound time of O(P log P) where P is the number of pixels in the
tracking window.

BACF (background aware correlation filter tracking):

Learning CF trackers in the frequency domain, however, comes at the high cost of learning
from circular shifted examples of the foreground target. These shifted patches are implicitly
generated through the circulant property of correlation in the frequency domain and are used

6
as negative examples for training the filter [20]. All shifted patches are plagued by circular
boundary effects and are not truly representative of negative patches in real-world scenes.
These boundary effects have been shown to have a drastic impact on tracking performance,
due to a number of factors. First, learning from limited shifted patches may lead to training
an over-fitted filter which is not well generalized to rapid visual deformation e.g. caused by
fast motion [10]. Second, the lack of real negative training examples can drastically degrade
the robustness of such trackers against cluttered background, and as a result, increase the
risk of tracking drift specifically when the target and background display similar visual cues.
Third, discarding background information from the learning process may reduce the tracker’s
ability to distinguish the target from occlusion patches. This limits the potential of such
trackers to re-detect after an occlusion or out-of-plane movement .

BACF is capable of learning/updating filters from real negative examples densely extracted
from the background. We demonstrate that learning trackers from negative background
patches, instead of shifted foreground patches, achieves superior accuracy with real-time
performance. This paper offers the following contributions:
• We propose a new correlation filter for real-time visual tracking. Unlike prior CF-based
trackers in which negative examples are limited to circular shifted patches, our tracker is
trained from real negative training

CRF:
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting
structured data, such as sequences, trees and lattices. The underlying idea is that of
defining a conditional probability distribution over label sequences given a particular
observation sequence, rather than a joint distribution over both label and observation
sequences. The primary advantage of CRFs over hidden Markov models is their conditional
nature, resulting in the relaxation of the independence assumptions required by HMMs in
order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a
weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional
Markov models based on directed graphical models. CRFs outperform both MEMMs and
HMMs on a number of real-world tasks in many fields, including bioinformatics,
computational linguistics and speech recognition.

---
Hidden Markov Models are generative, and give output by modeling the joint probability
distribution. On the other hand, Conditional Random Fields are discriminative, and model the
conditional probability distribution. CRFs don’t rely on the independence assumption (that
the labels are independent of each other), and avoid label bias. One way to look at it is that
Hidden Markov Models are a very specific case of Conditional Random Fields, with constant
transition probabilities used instead. HMMs are based on Naive Bayes, which we say can be
derived from Logistic Regression, from which CRFs are derived.

7
Hough Transform

The Hough transform is a technique which can be used to isolate features of a particular
shape within an image. Because it requires that the desired features be specified in some
parametric form, the classical Hough transform is most commonly used for the detection of
regular curves such as lines, circles, ellipses, etc. A generalized Hough transform can be
employed in applications where a simple analytic description of a feature(s) is not possible.

The Hough transform is a feature extraction technique used in image analysis, computer
vision, and digital image processing. The purpose of the technique is to find imperfect
instances of objects within a certain class of shapes by a voting procedure. This voting
procedure is carried out in a parameter space, from which object candidates are obtained as
local maxima in a so-called accumulator space that is explicitly constructed by the algorithm
for computing the Hough transform.

Osint Complete Resources
No ratings yet
Osint Complete Resources
43 pages
Citroen 2 CV
100% (6)
Citroen 2 CV
49 pages
Computer Vision: SIFT Explained
No ratings yet
Computer Vision: SIFT Explained
66 pages
Featuredescriptor
No ratings yet
Featuredescriptor
45 pages
Conclusion
No ratings yet
Conclusion
32 pages
Feature Description & Extraction: FAST (Features From Accelerated Segment Test)
No ratings yet
Feature Description & Extraction: FAST (Features From Accelerated Segment Test)
11 pages
cv2021 Lec2 Features I - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec2 Features I - 1600 - PDF - Gdrive.vip
68 pages
Sse2 09 Features Keypoints
No ratings yet
Sse2 09 Features Keypoints
13 pages
Sse2 09 Features Keypoints
No ratings yet
Sse2 09 Features Keypoints
50 pages
06 Features
No ratings yet
06 Features
94 pages
SRM Ramapuram Digital Image Processing Unit 5 DIP
No ratings yet
SRM Ramapuram Digital Image Processing Unit 5 DIP
41 pages
CV Assessment 2 Question Bank
No ratings yet
CV Assessment 2 Question Bank
17 pages
Document From Sindhu Reddy... ??
No ratings yet
Document From Sindhu Reddy... ??
94 pages
Advanced Satellite Image Processing
No ratings yet
Advanced Satellite Image Processing
86 pages
Illumination Scale Rotation
No ratings yet
Illumination Scale Rotation
16 pages
Global Feature?: Local Feature Detection and Extraction
No ratings yet
Global Feature?: Local Feature Detection and Extraction
6 pages
Classical Computer Vision - Session 2
No ratings yet
Classical Computer Vision - Session 2
74 pages
Computer Vision
No ratings yet
Computer Vision
6 pages
4.01 08 2022 - FeatureDescriptors
No ratings yet
4.01 08 2022 - FeatureDescriptors
46 pages
Lecture 6: Finding Features (Part 1/2) : Professor Fei - Fei Li Stanford Vision Lab
No ratings yet
Lecture 6: Finding Features (Part 1/2) : Professor Fei - Fei Li Stanford Vision Lab
41 pages
Unit 4 Int345
No ratings yet
Unit 4 Int345
45 pages
Chap03 Part01 FeatureDetectionMatching
No ratings yet
Chap03 Part01 FeatureDetectionMatching
29 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
510 pages
Sift
No ratings yet
Sift
28 pages
9 Vision Lec 6
No ratings yet
9 Vision Lec 6
58 pages
Comparis I On
No ratings yet
Comparis I On
68 pages
A409163882 29458 9 2025 Unit-4
No ratings yet
A409163882 29458 9 2025 Unit-4
89 pages
SIFT: Scale-Invariant Image Features
No ratings yet
SIFT: Scale-Invariant Image Features
33 pages
Recognition Local Features
No ratings yet
Recognition Local Features
41 pages
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
No ratings yet
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
46 pages
SIFT Algorithm for Image Analysis
No ratings yet
SIFT Algorithm for Image Analysis
22 pages
Module 2
No ratings yet
Module 2
140 pages
V-Unit AIIA Complete Material
No ratings yet
V-Unit AIIA Complete Material
162 pages
Module 3.1 Morphology
No ratings yet
Module 3.1 Morphology
97 pages
CV Unit 3
No ratings yet
CV Unit 3
41 pages
Features Extraction DR - Tamizhselvan
No ratings yet
Features Extraction DR - Tamizhselvan
56 pages
HW Xla
No ratings yet
HW Xla
11 pages
Feature Descriptor Spring2021
No ratings yet
Feature Descriptor Spring2021
16 pages
Point Feature Detection & Matching
No ratings yet
Point Feature Detection & Matching
65 pages
Bai09 Descriptors
No ratings yet
Bai09 Descriptors
81 pages
CV4 F
No ratings yet
CV4 F
43 pages
Feature Detection: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Feature Detection: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
54 pages
Local Invariant Features
No ratings yet
Local Invariant Features
120 pages
Local Features Harris Corner Detection
No ratings yet
Local Features Harris Corner Detection
8 pages
CHAP 7 Features Recognition and Classification
No ratings yet
CHAP 7 Features Recognition and Classification
94 pages
Ref 1
No ratings yet
Ref 1
6 pages
SIFT: Scale Invariant Feature Transform
No ratings yet
SIFT: Scale Invariant Feature Transform
34 pages
SIFT Transform
No ratings yet
SIFT Transform
50 pages
CH 8
No ratings yet
CH 8
21 pages
Computer Vision Unit 3
No ratings yet
Computer Vision Unit 3
19 pages
Free Writing 2 - 1746419560
No ratings yet
Free Writing 2 - 1746419560
8 pages
Bay, Herbert Et Al. (2008) - Speed-Up Robust Features
No ratings yet
Bay, Herbert Et Al. (2008) - Speed-Up Robust Features
14 pages
Image Features and Descriptors
No ratings yet
Image Features and Descriptors
55 pages
Lecture 5 Stitching Blending
No ratings yet
Lecture 5 Stitching Blending
75 pages
DLCV Day2
No ratings yet
DLCV Day2
5 pages
Unit 4
No ratings yet
Unit 4
88 pages
Improved SIFT Algorithm Image Matching
No ratings yet
Improved SIFT Algorithm Image Matching
7 pages
FREAK: Fast Retina Keypoint
No ratings yet
FREAK: Fast Retina Keypoint
8 pages
AX7203 User Manual
No ratings yet
AX7203 User Manual
57 pages
Cost-Effective Home Studio Acoustics
No ratings yet
Cost-Effective Home Studio Acoustics
104 pages
Optimizing Energy Consumption in Smart Homes Using Machine Learning Techniques
No ratings yet
Optimizing Energy Consumption in Smart Homes Using Machine Learning Techniques
7 pages
CS335 Lecture 1 Slides
No ratings yet
CS335 Lecture 1 Slides
30 pages
Data Democratization: Toward A Deeper Understanding: September 2021
No ratings yet
Data Democratization: Toward A Deeper Understanding: September 2021
18 pages
Vrontis 2021
No ratings yet
Vrontis 2021
31 pages
INTERMEDIATE PROGRAMMING. Midterm Exam.
No ratings yet
INTERMEDIATE PROGRAMMING. Midterm Exam.
14 pages
Assignment Problems: Paul Dawkins
No ratings yet
Assignment Problems: Paul Dawkins
176 pages
1992 Mercedes 300 SE Audio Wiring Guide
100% (1)
1992 Mercedes 300 SE Audio Wiring Guide
3 pages
TDX Agentforce Hackathon Rules
No ratings yet
TDX Agentforce Hackathon Rules
11 pages
Q1 Module+2 Internet+and+Computing+Fundamentals+III Dostilla,+Mark+William+M. AJ+Villegas+Voc+HS
No ratings yet
Q1 Module+2 Internet+and+Computing+Fundamentals+III Dostilla,+Mark+William+M. AJ+Villegas+Voc+HS
8 pages
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Subject Title: MICROCONTROLLER: 18EC46 Model Question Paper-2 With Effect From 2019-20 (CBCS Scheme)
2 pages
Introduction To UX Design
No ratings yet
Introduction To UX Design
8 pages
C++ Imp Que
No ratings yet
C++ Imp Que
4 pages
KAUST Update - November 2022
No ratings yet
KAUST Update - November 2022
26 pages
To Operating: Systems
No ratings yet
To Operating: Systems
564 pages
Abhilash Resume
No ratings yet
Abhilash Resume
5 pages
Mastering SQL Profiler
No ratings yet
Mastering SQL Profiler
4 pages
Cyble Sensor CM3030 CYBLE Manual
No ratings yet
Cyble Sensor CM3030 CYBLE Manual
2 pages
Usage of AI in Image Processing
No ratings yet
Usage of AI in Image Processing
3 pages
Revista Encora - First Edition - Feb 2024
No ratings yet
Revista Encora - First Edition - Feb 2024
8 pages
Poultry Farm Management System
No ratings yet
Poultry Farm Management System
64 pages
HTML Cheatsheet
No ratings yet
HTML Cheatsheet
6 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Linear Equations
No ratings yet
Linear Equations
4 pages
GT20 Quick Use Instruction
No ratings yet
GT20 Quick Use Instruction
3 pages
Skoda Amundsen MIB 2 Map Update Guide
No ratings yet
Skoda Amundsen MIB 2 Map Update Guide
12 pages
2nd Quarter ICT CSS Grade 10 Q2 W4 M4 NK 1
No ratings yet
2nd Quarter ICT CSS Grade 10 Q2 W4 M4 NK 1
17 pages

CVML Mulakat Notlari

Uploaded by

CVML Mulakat Notlari

Uploaded by

SIFT / SURF vs HOG:

1) In SIFT gaussian smoothing is applied in order to compute the DOG (difference of

ORB (Oriented FAST and Rotated BRIEF):

Use Nearest neighbour or SVM for classification.

Histogram comparison: Earth Movers Distance, correlation, chi-square, ​Bhattacharyya

Harris Corner and Shi-Tomasi Corner:

Instead of this, Shi-Tomasi proposed:

Correlation filter based tracking and KCF:

BACF (background aware correlation filter tracking):

You might also like

Histogram comparison: Earth Movers Distance, correlation, chi-square, Bhattacharyya