Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views139 pages

Computer Vision

Uploaded by

Sumit Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views139 pages

Computer Vision

Uploaded by

Sumit Chauhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 139

COMPUTER VISION

LEC 1
BY : DR. SHEFALI ARORA CHOUHAN
ASSISTANT PROFESSOR, DEPT. OF CSE
COURSE OUTCOMES

► Understand key features of Computer Vision to analyse and interpret the visible world
around us
► Design and implement multi-dimensional signal processing, feature extraction, pattern
analysis visual geometric modelling, and stochastic optimization
► Apply the computer vision concepts to Biometrics, Medical diagnosis, document
processing, mining of visual content, to surveillance, advanced rendering
SYLLABUS
What is an Image?

► A signal is a function which depends on some variable


► An image is a signal which can be modelled into a 2D or 3D function
► The values of the function correspond to parameters such as brightness of image, pressure
etc.
THE 3Rs OF COMPUTER VISION

► The central problems in computer vision are recognition, reconstruction and


reorganization
► Recognition is about attaching semantic category labels to objects and scenes as well as to
events and activities.
► Reconstruction is traditionally about estimating shape, spatial layout, reflectance and
illumination – which could be used together to render the scene to produce an image.
► Reorganization is our term for what is usually called “perceptual organization” in human
vision; the “re” prefix makes the analogy with recognition and reconstruction more salient.
BASIC CONCEPTS

► In monochrome images the minimum value corresponds to black and the maximum to
white.
► The different values the intensity function can take are called gray levels.
► Gray level indicates brightness of a pixel
► F(x,y)= 0(black), 1(white)
DIGITIZATION

► Discretization : The function f(x,y) is sampled into an array of MxN . Each element in this
matrix is called pixel
► Quantification: Continuous range of f(x,y) is divided into K intervals and given a value
► Digital images can be quantized upto 256 gray levels
COLOR QUANTIZATION

► Coloured images can be quantized into three vectors, one for red colour, green and blue
► Any colour is a combination of these three primary colours
► There are various colour models
RGB TO GRAYSCALE CONVERSION

► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
► import cv2
► from matplotlib import pyplot as plt

im = cv2.imread('/content/shapes.jpg')

► im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)



# Display the image using matplotlib
► plt.imshow(im_rgb)
► plt.show()

► plt.subplot(131)
► plt.imshow(r, cmap= 'Reds')
► plt.title('Red Channel')

plt.subplot(132)
► plt.imshow(g, cmap= 'Greens')
► plt.title('Green Channel' )

plt.subplot(133)
► plt.imshow(b, cmap= 'Blues')
► plt.title('Blue Channel' )

► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► src = cv2.imread('/content/shapes.jpg')
► print(src.shape)
► red_channel = src[:,:,2]
► red_img = np.zeros(src.shape)
► red_img[:,:,2] = red_channel
► cv2_imshow(red_img)
MORE PARAMETERS

► Color consistency
► Shadows
► Lighting
► Brightness
► Contrast
► r, g, b = cv2.split(img)
► plt.imshow(r)
► plt.show()
► plt.imshow(g)
► plt.show()
► plt.imshow(b)
► plt.show()
HSI MODEL

► Represents colors as human eye represents colors


► Three components: Hue, Saturation & Intensity
► Saturation & intensity range from 0-1
► Color will be same for intensity 1

Saturation

Intensity
IMAGE PRE-PROCESSING

► Series of operations at the lowest level of abstraction


► Input and output is an image
► Objective is to improve the image
► Remove distortions
► Improve quality or highlight important features
COMMON OPERATIONS

► Gray level transformations


► Histograms
► Geometric transformations
► Arithmetic Operations
► Convolution
► Smoothing
HSI MODEL

► import cv2
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► bgr_img = cv2.imread('/content/shapes.jpg')
► hsv_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2HSV)
► cv2_imshow(hsv_img)
► cv2.waitKey(0)
► cv2.destroyAllWindows()
What will this do?

► colored_negative = abs(255-im_rgb)
► cv2_imshow(colored_negative)
► cv2_imshow(ad)
RGB TO GRAYSCALE CONVERSION

► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
BRIGHTNESS OF IMAGE

► Load the image


► Define a variable with the amount of brightness to be increased

► brightness_increase = 50 brightened_image = np.clip(img +


brightness_increase, 0, 255).astype(np.uint8)
GRAY LEVEL HISTOGRAMS

► Depicts frequency of occurrence of each gray value


► Can be interpreted by probability density functions
► PDF represents the likelihood of a pixel having a particular intensity value
► To convert a histogram to a PDF, you need to normalize it.
► Normalization involves dividing each bin count by the total number of pixels in the image.
► The normalized histogram values then represent probabilities, indicating the likelihood of a pixel
having a specific intensity value.
► The area under the PDF curve should sum to 1, as it represents the probability of a pixel having
an intensity value within the entire intensity range.
► import cv2
► import numpy as np
► from matplotlib import pyplot as plt
► import numpy as np
► from google.colab.patches import cv2_imshow
► path = '/content/2.jpg'
► img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
► cv2_imshow(img)

dst = cv2.calcHist(img, [0], None, [256], [0,256])

plt.hist(img.ravel(),256,[0,256])
► plt.title('Histogram for gray scale image')
► plt.show()
► image = cv2.resize(img, (
200, 200))


b, g, r = cv2.split(image)

► hist_b = cv2.calcHist([b], [
0], None, [256], [0, 256])

► hist_g = cv2.calcHist([g], [
0], None, [256], [0, 256])

► hist_r = cv2.calcHist([r], [
0], None, [256], [0, 256])

► plt.plot(hist_b, color='blue', label='Blue Channel')

► plt.plot(hist_g, color='green', label='Green Channel')

► plt.plot(hist_r, color='red', label='Red Channel')


plt.title('RGB Histogram')

► plt.xlabel('Pixel Value')

► plt.ylabel('Frequency')

► plt.legend()

► plt.show()
HISTOGRAM EQUALIZATION

► Histogram equalization is a method in image processing of contrast adjustment using the


image’s histogram
► This allows for areas of lower local contrast to gain a higher contrast
► The goal is to create an image with evenly distributed gray levels
► import cv2
► import numpy as np
► img = cv2.imread('/content/2.jpg', 0)
► equ = cv2.equalizeHist(img)
► res = np.hstack((img, equ))
► cv2_imshow( res)
► cv2.waitKey(0)
► cv2.destroyAllWindows()
Equalized Histogram
AFFINE TRANSFORMATIONS

► To find the transformation matrix, we need three points from input image and their
corresponding locations in the output image
► Is a geometric transformation that preserves lines and parallelism (but not necessarily
distances and angles
Common transformations

► Rotation: cv2.getRotationMatrix2D(center, angle,scale) function is used to make the


transformation matrix M which will be used for rotating a image.

► Translation: A translation matrix is created and passed to cv.warpAffine to shift the


object’s location
► import cv2
► import numpy as np
► image = cv2.imread('/content/2.jpg')
► height, width = image.shape[:2]
► tx, ty = width / 4, height / 4
► # create the translation matrix using tx and ty, it is a NumPy array
► translation_matrix = np.array([
► [1, 0, tx],
► [0, 1, ty]
► ], dtype=np.float32)
► translated_image = cv2.warpAffine(src=image, M=translation_matrix, dsize=(width, height))
► # display the original and the Translated images
► cv2_imshow(translated_image)
AFFINE TRANSFORMATIONS

► img = cv2.imread('/content/2.jpg')
► rows, cols, ch = img.shape

► pts1 = np.float32([[50, 50],[200, 50], [50, 200]])

► pts2 = np.float32([[10, 100],[200, 50], [100, 250]])

► M = cv2.getAffineTransform(pts1, pts2)
► dst = cv2.warpAffine(img, M, (cols, rows))
► cv2_imshow(dst)
DIFFERENTIATE BETWEEN
TRANSFORMATIONS

► Euclidean
► Affine
► Projective
Affine Vs Non Affine

Affine Non-affine
Includes scaling, translation Includes projective transformations
Parallelism is preserved Not preserved
Also called homography
Euclidean transformation

► In Affine transformations, we can do the following operations:Rotation, Shearing,


Translation, Scaling etc. {2x3 matrix is used}->refer slides
► In Euclidian transformation, we can do rotation and translation
► Subset of Affine transform
► Preserves distance, shape
► Parallelism is maintained
► Also called isometric transform
Projective/Perspective Transformation

► 3D scenes projected to 2D
► Resultant image depends on camera’s viewpoint
► Ratios or dimensions of objects change
► May not preserve angles
► No parallelism
► Generalized Affine transform
► For affine transformation, the projection vector is equal to 0. Thus, affine
transformation can be considered as a particular case of perspective transformation.
► Since the transformation matrix (M) is defined by 8 constants(degree of freedom), thus to
find this matrix we first select 4 points in the input image and map these 4 points to the
desired locations in the unknown output image according to the use-case (This way we
will have 8 equations and 8 unknowns and that can be easily solved).
CONVOLUTION AND FILTERING

► In order to detect vertical and horizontal images, we can go for convolution with the help
of filters
► Use of mxm filter on nxn image to extract features and detect edges
► Gives nxm-1 dimension image
Solve

Filters
TYPES OF FILTERS

► Weights are added to a filter matrix in order to extract features or add edges
► For instance, Sober filter adds more weight to the central row of pixels
as compared to the filter used before
Use of different filters helps to add robustness
CONVOLUTIONAL NEURAL
NETWORKS
• A CNN typically has three layers: a convolutional layer, a pooling layer,
and a fully connected layer.
• The convolution layer is the core building block of the CNN. It carries the
main portion of the network’s computational load.
• This layer performs a dot product between two matrices, where one
matrix is the set of learnable parameters otherwise known as a kernel,
and the other matrix is the restricted portion of the receptive field.
Nxn
Kxk 64
(n-k+1) x (n-k+1)
4x4

2x2 matrix
Pooling
2x2
POOLING LAYER
FULLY CONNECTED LAYER

► Neurons in this layer have full connectivity with all neurons in the preceding and
succeeding layer as seen in regular FCNN.
► This is why it can be computed as usual by a matrix multiplication followed by a
bias effect.
► The FC layer helps to map the representation between the input and the output.
NON-LINEAR ACTIVATION FUNCTIONS

► Sigmoid
► The sigmoid non-linearity has the mathematical form
σ(κ) = 1/(1+e-k)
► It takes a real-valued number and “squashes” it into a range between 0 and 1.
► Tanh
► Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid, the activation
saturates, but — unlike the sigmoid neurons — its output is zero centered.
► ReLU
► The Rectified Linear Unit (ReLU) has become very popular in the last few years. It
computes the function ƒ(κ)=max (0,κ). In other words, the activation is simply threshold at
zero.
YOLO- YOU ONLY LOOK ONCE - 4/2

► Object detection is the problem of both locating AND classifying objects


► Goal of YOLO algorithm is to do object detection both fast AND with high accuracy
Image Localization Using YOLO

► Image Localization is the process of identifying the correct location of one or


multiple objects using bounding boxes, which correspond to rectangular shapes
around the objects.
► This process is sometimes confused with image classification or image
recognition, which aims to predict the class of an image or an object within an
image into one of the categories or classes.
► The illustration below corresponds to the visual representation of the previous
explanation. The object detected within the image is “Person.”
Advantages

► High Detection Accuracy


► Speed
► Better Generalization
► Open-source
Approaches Behind YOLO

- Residual blocks
► This first step starts by dividing the original image (A) into NxN grid cells of equal shape, where
N in our case is 4 shown on the image on the right. Each cell in the grid is responsible for
localizing and predicting the class of the object that it covers, along with the
probability/confidence value.
-Bounding boxes
The next step is to determine the bounding boxes which correspond to rectangles highlighting all
the objects in the image. We can have as many bounding boxes as there are objects within a given
image.
► YOLO determines the attributes of these bounding boxes using a single regression module in
the following format, where Y is the final vector representation for each bounding box.
► Y = [pc, bx, by, bh, bw, c1, c2]
► This is especially important during the training phase of the model.
YOLO- YOU ONLY LOOK ONCE

• Given an image generate bounding boxes, one for each detectable object in image
• For each bounding box, output 5 predictions: x, y, w, h, confidence.
• Also output class
x, y (coordinates for center of bounding box)
w,h (width and height)
confidence (probability bounding box has object)
class (classification of object in bounding box)
YOLO- YOU ONLY LOOK ONCE
Fourier Transform

► Converts image from spatial domain to frequency domain


► Used for correction of images
► FFT works on grayscale images
► Single image band formation can help to detect spots, noise etc. from the image
Discrete FFT

► Sampled FFT
► Sets of samples which describe the spatial image
► Helps to find periodic patterns in spatial domain images
► Inverse FFT converts image from frequency to spatial domain
► Separation based on sine and cosine components
► cos Ɵ+ isin Ɵ, cos Ɵ – i sin Ɵ (e-i Ɵ & ei Ɵ
► The term in exponential power is called basis function.
Inverse FFT:
FOURIER TRANSFORM

► import cv2

► import numpy as np

► from matplotlib import pyplot as plt

► image = cv2.imread('/content/shapes.jpg', cv2.IMREAD_GRAYSCALE)

► f_transform = np.fft.fft2(image)

► f_transform_shifted = np.fft.fftshift(f_transform)


magnitude_spectrum = np.log(np.
abs(f_transform_shifted) + 1)


plt.subplot(121), plt.imshow(image, cmap=
'gray')

► plt.title('Original Image'), plt.xticks([]), plt.yticks([])


plt.subplot(122), plt.imshow(magnitude_spectrum, cmap=
'gray')

► plt.title('Fourier Transform'), plt.xticks([]), plt.yticks([])


plt.show()
IMAGE NOISE & FILTERS 7/2

► Noise adds random variations in brightness and color information of an existing image
► Adding noise to images can help in testing the performance of image processing
algorithms such as denoising, segmentation, and feature detection under different
levels of noise.
TYPES OF FILTERS

► LINEAR
Gaussian
Box filter
Weighted Average Filter
► NON-LINEAR
Median Filter
Min filter
Max filter
IMAGE SMOOTHING

► Box filter- Uses a box matrix / kernel with equal coefficients

► Weighted average filter- Gives more weight to pixels near the output location

► Gaussian filter- Gets weights using 2D Gaussian function


A Gaussian kernel

► import numpy as np

def gaussian_kernel(size, sigma=1.0):
► kernel = np.fromfunction(
► lambda x, y: (1/(2*np.pi*sigma**2)) * np.exp(-((x-(size-1)/2)**2 + (y-(size-1)/2)**2)/(2*sigma**2)),
► (size, size)
► )
► return kernel / np.sum(kernel)
► kernel_size = 5
► sigma = 1.0
► gaussian_kernel_matrix = gaussian_kernel(kernel_size, sigma)
► print("Gaussian Kernel Matrix:")
► for row in gaussian_kernel_matrix:
► print(["{:.5f}".format(value) for value in row])
IMAGE SMOOTHING USING GAUSSIAN
FILTER

► Gaussian filter is the most popular filter


► Window weights follow a Gaussian distribution
► Influence of neigboring pixels decreases with distance to the center
► Degree of smoothing depends on standard deviation.
► More the standard deviation, broader the Gaussian distribution
► The normalization factor is important to ensure that area under the distribution remains
same
GAUSSIAN NOISE & GAUSSIAN FILTER

► import matplotlib.pyplot as plt


► def apply_gaussian_filter(image, kernel_size=(5, 5), sigma=4):
► return cv2.GaussianBlur(image, kernel_size, sigma)
► image = cv2.imread('/content/noisy.jpg')

filtered_image = apply_gaussian_filter(image)

# Save the filtered image to a file
► cv2.imwrite('path/to/your/output/image_filtered.jpg', filtered_image)
► plt.subplot(131), plt.imshow(cv2.cvtColor(noisy_image, cv2.COLOR_BGR2RGB)), plt.title('Original
Image')
► plt.subplot(133), plt.imshow(cv2.cvtColor(filtered_image, cv2.COLOR_BGR2RGB)),
plt.title('Filtered Image')
► plt.show()
EDGE DETECTION: CANNY EDGE
DETECTION

► Contours are found by looking for differences between adjacent pixels


► Regions are found by looking for similarities between adjacent pixel values
► To segment images, we can separate pixels base don gray levels, depth, texture etc.
► Contours appear if they correspond to edges
► Canny Edge Detector helps to detect various edges in an image
► Step 1: Apply Gaussian filter (for noise removal)
► Step 2: Find the gradients
► Step 3: Remove local maxima
► Step 4: Binarize the image
NOISE REDUCTION & GRADIENT
CALCULATION

► Edge detection results are highly sensitive to image noise


► Applying Gaussian filter to smooth it
► Edges correspond to a change of pixels’ intensity. T
► To detect it, the easiest way is to apply filters that highlight this intensity
change in both directions: horizontal (x) and vertical (y) by convolving
sobel filters with image
NOISE REDUCTION & GRADIENT
CALCULATION

► Magnitude of Gradient and angle is calculated as:

► The result is almost the expected one, but some of the edges are thick
and others are thin.
► Non-Max Suppression step will help us mitigate the thick ones.
NON-MAX SUPPRESSION

► The algorithm goes through all the points on the gradient intensity matrix and finds
the pixels with the maximum value in the edge directions.
► if one those two pixels are more intense than the one being processed, then only
the more intense one is kept.
► Hence, the intensity value of the current pixel (i, j) is set to 0. If there are no pixels
in the edge direction having more intense values, then the value of the current
pixel is kept.
• Create a matrix initialized to 0 of the same size of the original gradient intensity
matrix;
• Identify the edge direction based on the angle value from the angle matrix;
• Check if the pixel in the same direction has a higher intensity than the pixel that is
currently processed;
• Return the image processed with the non-max suppression algorithm.
• We can still notice some variation regarding the edges’ intensity: some pixels
seem to be brighter than others.
DOUBLE THRESHOLDING

• High threshold is used to identify the strong pixels (intensity higher than the high
threshold)
• Low threshold is used to identify the non-relevant pixels (intensity lower than the
low threshold)
• All pixels having intensity between both thresholds are flagged as weak and the
Hysteresis mechanism will help us identify the ones that could be considered as
strong and the ones that are considered as non-relevant.
• the hysteresis consists of transforming weak pixels into strong ones, if and only if
at least one of the pixels around the one being processed is a strong one
CANNY EDGE DETECTOR

► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► image = cv2.imread('/content/shapes.jpg', cv2.IMREAD_GRAYSCALE)

blurred_image = cv2.GaussianBlur(image, (5, 5), 0)



canny_edges = cv2.Canny(blurred_image, 30, 150)
► cv2_imshow(image)
► cv2_imshow(canny_edges)
► cv2.waitKey(0)
► cv2.destroyAllWindows()

LAPLACIAN OPERATOR

► Edge detection algorithms using Sobel Operator work on the first derivative of an
image.
► When the image is smoothed, the derivatives Ix and Iy w.r.t. x and y are
calculated. by convolving I with Sobel kernels Kx and Ky, respectively.
► First derivative of an image might be subject to noise
► Laplacian operator makes use of second derivative of the images
LAPLACIAN OF GAUSSIAN (LOG)

► We can approximate the second derivatives by using the following convolutional


kernels
► An edge occurs where the graph of the second derivative crosses zero
► Calculating just the Laplacian will result in a lot of noise, so we need to convolve a
Gaussian smoothing filter with the Laplacian filter to reduce noise prior to
computing the second derivatives.
► The LoG kernel is convolved with a grayscale input image to detect the zero
crossings of the second derivative. We set a threshold for these zero crossings
and retain only those zero crossings that exceed the threshold.
Types of Laplacian operator

► In Positive Laplacian we have standard mask in which center element of the mask should
be negative and corner elements of mask should be zero.
► Positive Laplacian Operator is use to take out outward edges in an image.
► In negative Laplacian operator we also have a standard mask, in which center element
should be positive. All the elements in the corner should be zero and rest of all the
elements in the mask should be -1.
► Negative Laplacian operator is use to take out inward edges in an image
► The Laplacian Operator achieves a sharpening effect by enhancing the grayscale contrast
of the image. As a second-order differential operator, it enhances areas with sudden
grayscale changes in the image and weakens areas with slow grayscale changes. But the
processed image loses the direction information of the edges and enhances the noise
CAMERA GEOMETRY

► How does a camera map perspective projection points on the image plane?
► A camera projects a 3D scene onto a 2D image plane. This transformation can be
represented using a projection matrix P, which maps a 3D point to a 2D image point
► Determining external and internal parameters of a camera is called camera calibration
► Estimating linear models is easier than non-linear camera models.
► Used to develop camera calibration
► Determine projection matrix
LINEAR CAMERA MODEL: 3D to 2D
INTRINSIC MATRIX

► Given mx as number of pixels per mm in the X direction and my as number of pixels per
mm in the Y direction
► U=mx*xi + ox
► V=my*yi + oy
► Consider (ox,oy) is the centre of the image
► Fx,fy,ox,oy are known as intrinsic parameters of a camera
► Also called camera’s internal geometry
► Corresponding intrinsic matrix:

Calibration
matrix
EXTRINSIC MATRIX

► Position C and Orientation R of camera are extrinsic parameters


► R is the interpretation of rotation matrix
EXTRACTING THE PARAMETERS
STEREO VISION

► Process of comparing 2 or more images of the same scene


► Recovers 3D structure of a scene from 2D image
► Also called binocular vision
► Used in self driving cars, robotics etc.
Stereo Vision Acquisition
STEREO VISION

► Images captured from one integrated stereo vision camera or two cameras at a time
► Also called binocular vision
► Camera calibration accurate in integrated camera
► No movement in case of multiple cameras
► Orientation of one camera with respect to another
► Used for depth extraction
DEPTH ESTIMATION

► Take images from two cameras


► Calculate the disparities between images
► Obtain the disparity maps and depth maps
► Exact depth or distance from that object
► Disparity is calculated a sthe difference between xl & xr
Camera systems at (0,0,0) and (b,0,0)
DEPTH ESTIMATION

► Disparity is inversely proportional to depth of a point


► If a point is close to the two-camera system, disparity will be large
► Disparity shrinks as we move away
► The closer a point/more the disparity, the brighter it is in disparity map
► Stereo Matching refers to finding disparities
► There will be no disparity in vertical direction
► import cv2 as cv
► from matplotlib import pyplot as plt

imgR = cv.imread('/content/right.png')
► imgL = cv.imread('/content/lef2t.jpg')

stereo = cv.StereoBM_create(numDisparities = 16,
► blockSize = 15)

disparity = stereo.compute(imgL, imgR)
► plt.imshow(disparity, 'gray')
► plt.show()
BASIC MORPHOLOGICAL OPERATIONS
ON IMAGES
MORPHOLOGICAL OPERATIONS

• Performed on binarized images


• Each pixel is adjusted based on value in the neighborhood
• Processing done based on kernel which defines the operation
EROSIONS

• Erodes away the boundaries of the foreground object


• Used to diminish the features of an image.
• A kernel(a matrix of odd size(3,5,7) is convolved with the image.
• A pixel in the original image (either 1 or 0) will be considered 1 if any of the pixels
surrounding it in the kernel during convolution are 1.
• It increases the white region in the image o
DILATION & EROSION

• Dilatio expands the boundaries of an object in an image. This is done by convolving the
image with a structuring element, which determines the size and shape of the dilation. The
output of the dilation operation is a new image where the pixels in the original image are
expanded or dilated.
• Erosion is a morphological operation that shrinks the boundaries of an object in an image.
This is done by convolving the image with a structuring element, which determines the
size and shape of the erosion. The output of the erosion operation is a new image where
the pixels in the original image are eroded or shrunk.
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► img = cv2.imread('/content/ip.png', 0)
► kernel = np.ones((5, 5), np.uint8)
► img_erosion = cv2.erode(img, kernel, iterations=
1)
► img_dilation = cv2.dilate(img, kernel, iterations=
1)
► cv2_imshow(img)
► cv2_imshow(img_erosion)
► cv2_imshow(img_dilation)
► cv2.waitKey(0)
OPENING AND CLOSING

• Opening is just another name of erosion followed by dilation.


• It is useful in removing noise
• Closing is reverse of Opening, Dilation followed by Erosion.
• It is useful in closing small holes inside the foreground objects, or small black points on
the object.
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► # Reading the input image
► img = cv2.imread('/content/k.png', 0)

► # Taking a matrix of size 5 as the kernel


► kernel = np.ones((5, 5), np.uint8)

► opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)


► cv2_imshow(img)
► cv2_imshow(opening)
► cv2.waitKey(0)
GRADIENT

► It is the difference between dilation and erosion of an image.


► Helps to find outlines of images
TOP HAT

► Highlights minor details of images


► Input image - Opening
BLACK HAT

► Highlights bright objects in a dark background


► Closing - Image
LINE DETECTION: HOUGH TRANSFORM

► Problem of extraneous data in edge detection problems


► Given the edge points, we can detect lines from the points
► Detects line y=mx+c
► Image space depicts the point and parameter space depicts number of lines passing
through that point
LINE DETECTION: HOUGH TRANSFORM

► Create an acculumator array A


► Set all values A(m,c)= 0
► For each edge (xi,yi)
Convert the line from (x,y) plane to (m,c) plane
C=-mx+y
For every m you get for a straight line, increment the values
A(m,c)=A(m,c)+1 for all lines passing through these points
At the points of intersection, we will get values such as 2, 3 and so on
Disadvantages

► If accumulator array is small, lines might get missed


► If array is large, it will waste memory
► Solution : Use line equation x sin Ɵ – y cos Ɵ + r =0
► Ɵ lies between 0 and 180
► R is finite (distance of line from origin)
► Better parameterization
Using line parameters

► Maps to a sinusoidal wave


► Better parameterization
► Mapping as shown in figure
► Too big acculumator array may merge different lines
► Too small array might lead to missing of lines
► Extract peaks in the accumulator array
Detecting a circle

► Works on given set of edges in a circle


► (x-a)2+ (y-b)2 = r2
► A point (xi,yi) on the circle will map to a circle
in the Hough space
All the circles intersect at a point (a,b)
► import cv2
► import numpy as np
► img = cv2.imread( '/content/shapes.jpg' )
► gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
► edges = cv2.Canny(gray, 50, 150, apertureSize= 3)
► lines = cv2.HoughLines(edges, 1, np.pi/180, 200)
► for r_theta in lines:
► arr = np.array(r_theta[ 0], dtype=np.float64)
► r, theta = arr
► a = np.cos(theta)
► b = np.sin(theta)
► x0 = a*r
► y0 = b*r
► x1 = int(x0 + 1000*(-b))
How to solve?

► Consider image of 100x100


► Select the first point (x,y) and vary values of r from 0 to 180
► Check value of r
► For every (r, theta) pair, increment the accumulator array by 1
► Try the next point and repeat the procedure
► For instance, the blue point will be voted up
► Accumulator with maximum votes indicates a line
Corner detection: Harris & Hessian

► A corner is a point whose local neighborhood is characterized by large intensity


variation in all directions.
► Corners are important features in computer vision because they are points stable over
changes of viewpoint and illumination
► Large variation in gradients at all points of interest
► Intersection of two lines
► Matching of corners is easier than edges
Harris Corner Detector

► Recognize a point by looking through a small window


► Shifting a window in any direction will give a large change in intensity
► No change in intensity in the flat region or edge region
► Change in intensity along the corners
► Value of R is calculated
► When |R| is small, the region is flat.
► When R<0, indicates edge.
► When R is large, are large and λ1∼λ2, the region is a corner.
Algorithm

► Apply a Gaussian filter to smooth out any noise


► Apply Sobel operator to find the x and y gradient values for every pixel in the
grayscale image
► For each pixel p in the grayscale image, consider a 3×3 window around it and
compute the corner strength function. Call this its Harris value.
► Find all pixels that exceed a certain threshold and are the local maxima within a
certain window (to prevent redundant dupes of features)
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► image = cv2.imread('/content/hh.png')
► operatedImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
► operatedImage = np.float32(operatedImage)

► dest = cv2.cornerHarris(operatedImage, 2, 5, 0.07)


► dest = cv2.dilate(dest, None)

► image[dest > 0.01 * dest.max()]=[0, 0, 255]


► cv2_imshow(image)
Advantages of Harris Detector

► It finds pixel-intensity displacement of (u, v) such that the function E gets


maximized for pixels in the window
► Estimates the second moment matrix to get a clue about whether a corner lies
inside of the window or not, by looking at the matrix.
► If one eigenvalue is significantly higher => derivative with respect to one direction
is much stronger than the other => pixel lies on an edge
► If both eigenvalues are small => pixel intensities do not change in any direction =>
pixel lies on a flat region
► If both eigenvalues are large => pixel intensities largely chance in
both x and y direction => pixel lies on a corner
Hessian Detector

► While the basic ideas of detecting corners remain the same as the Harris detector,
the Hessian detector makes use of the Hessian matrix and determinant, instead of
second-moment matrix M and corner response function R, respectively.
► Entries in Hessian matrix are second derivatives.
Disadvantages

► Once a corner gets magnified and becomes bigger than the size of the window by
zooming, the Harris and Hessian can no longer detect the corner.
► It is because what the detectors perceive through the window is not a corner
anymore but an edge due to the scale change.
Feature Extraction

► This is an area of image processing that uses algorithms to detect and isolate various
desired portions of a digitized image.
► A feature is a significant piece of information extracted from an image which provides
more detailed understanding of the image.
► Example, Detecting of faces in an image filled with people and other objects, Detecting of
facial features such as eyes, nose, mouth, Detecting of edges, so that a feature can be
extracted and compared with another
Feature Detection

► Feature detection is to identify the presence of a certain type of feature or object in an


image.
► Feature detection is usually achieved by studying the statistic variations of certain regions
and their backgrounds to locate unusual activities.
► Once an interesting feature has been detected, the representation of this feature will be
used to compare with all possible features known to the processor.
Need For Feature Descriptors

► Need to recognize objects with unique and descriptive features in the process of object
recognition
► Detection of different feature families:
► Local pixels (SIFT, SURF..)
► Global pixel features (Histogram, Texture, Color)
► Shape of pixel regions(Area, Perimeter)
► Basis sets (FFT, Haar Wavelet)
Characteristics of features

► Salient
► Robust to clutter
► Repeatable
► Fewer and efficient
Local Feature Descriptors

► Detectors and descriptors can be used combined or independently for local feature
descriptions
► Searching strategies can be pixel-wise or tiled
► Aims to find pieces of objects
► SIFT, SURF, HOG
Global Feature Descriptors

► Texture Histograms
► Spatial Dependency matrix
► Regional Descriptors
Shape Features

► Area
► Perimeter
► Centroid
Basis set descriptors

► HAAR Wavelets
► Fourier transforms
Basic CV Pipeline

► Sensor processing
► Image Processing
► Global Metrics
► Local features
► Training
► Augmentation & Control
► Performance
SIFT : Feature Detector and Descriptor

► Scale Invariant Feature Transform


► 2D object detection
► Image Alignment
► Detects patches with local appearance
► Detects key interest points
► Handles multiple scales (position, magnification)
► SIFT detector and descriptor

You might also like