Computer Vision
Computer Vision
LEC 1
BY : DR. SHEFALI ARORA CHOUHAN
ASSISTANT PROFESSOR, DEPT. OF CSE
COURSE OUTCOMES
► Understand key features of Computer Vision to analyse and interpret the visible world
around us
► Design and implement multi-dimensional signal processing, feature extraction, pattern
analysis visual geometric modelling, and stochastic optimization
► Apply the computer vision concepts to Biometrics, Medical diagnosis, document
processing, mining of visual content, to surveillance, advanced rendering
SYLLABUS
What is an Image?
► In monochrome images the minimum value corresponds to black and the maximum to
white.
► The different values the intensity function can take are called gray levels.
► Gray level indicates brightness of a pixel
► F(x,y)= 0(black), 1(white)
DIGITIZATION
► Discretization : The function f(x,y) is sampled into an array of MxN . Each element in this
matrix is called pixel
► Quantification: Continuous range of f(x,y) is divided into K intervals and given a value
► Digital images can be quantized upto 256 gray levels
COLOR QUANTIZATION
► Coloured images can be quantized into three vectors, one for red colour, green and blue
► Any colour is a combination of these three primary colours
► There are various colour models
RGB TO GRAYSCALE CONVERSION
► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
► import cv2
► from matplotlib import pyplot as plt
►
im = cv2.imread('/content/shapes.jpg')
► Color consistency
► Shadows
► Lighting
► Brightness
► Contrast
► r, g, b = cv2.split(img)
► plt.imshow(r)
► plt.show()
► plt.imshow(g)
► plt.show()
► plt.imshow(b)
► plt.show()
HSI MODEL
Saturation
Intensity
IMAGE PRE-PROCESSING
► import cv2
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► bgr_img = cv2.imread('/content/shapes.jpg')
► hsv_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2HSV)
► cv2_imshow(hsv_img)
► cv2.waitKey(0)
► cv2.destroyAllWindows()
What will this do?
► colored_negative = abs(255-im_rgb)
► cv2_imshow(colored_negative)
► cv2_imshow(ad)
RGB TO GRAYSCALE CONVERSION
► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
BRIGHTNESS OF IMAGE
►
b, g, r = cv2.split(image)
► hist_b = cv2.calcHist([b], [
0], None, [256], [0, 256])
► hist_g = cv2.calcHist([g], [
0], None, [256], [0, 256])
► hist_r = cv2.calcHist([r], [
0], None, [256], [0, 256])
►
plt.title('RGB Histogram')
► plt.xlabel('Pixel Value')
► plt.ylabel('Frequency')
► plt.legend()
► plt.show()
HISTOGRAM EQUALIZATION
► To find the transformation matrix, we need three points from input image and their
corresponding locations in the output image
► Is a geometric transformation that preserves lines and parallelism (but not necessarily
distances and angles
Common transformations
► img = cv2.imread('/content/2.jpg')
► rows, cols, ch = img.shape
►
► pts1 = np.float32([[50, 50],[200, 50], [50, 200]])
►
► pts2 = np.float32([[10, 100],[200, 50], [100, 250]])
►
► M = cv2.getAffineTransform(pts1, pts2)
► dst = cv2.warpAffine(img, M, (cols, rows))
► cv2_imshow(dst)
DIFFERENTIATE BETWEEN
TRANSFORMATIONS
► Euclidean
► Affine
► Projective
Affine Vs Non Affine
Affine Non-affine
Includes scaling, translation Includes projective transformations
Parallelism is preserved Not preserved
Also called homography
Euclidean transformation
► 3D scenes projected to 2D
► Resultant image depends on camera’s viewpoint
► Ratios or dimensions of objects change
► May not preserve angles
► No parallelism
► Generalized Affine transform
► For affine transformation, the projection vector is equal to 0. Thus, affine
transformation can be considered as a particular case of perspective transformation.
► Since the transformation matrix (M) is defined by 8 constants(degree of freedom), thus to
find this matrix we first select 4 points in the input image and map these 4 points to the
desired locations in the unknown output image according to the use-case (This way we
will have 8 equations and 8 unknowns and that can be easily solved).
CONVOLUTION AND FILTERING
► In order to detect vertical and horizontal images, we can go for convolution with the help
of filters
► Use of mxm filter on nxn image to extract features and detect edges
► Gives nxm-1 dimension image
Solve
Filters
TYPES OF FILTERS
► Weights are added to a filter matrix in order to extract features or add edges
► For instance, Sober filter adds more weight to the central row of pixels
as compared to the filter used before
Use of different filters helps to add robustness
CONVOLUTIONAL NEURAL
NETWORKS
• A CNN typically has three layers: a convolutional layer, a pooling layer,
and a fully connected layer.
• The convolution layer is the core building block of the CNN. It carries the
main portion of the network’s computational load.
• This layer performs a dot product between two matrices, where one
matrix is the set of learnable parameters otherwise known as a kernel,
and the other matrix is the restricted portion of the receptive field.
Nxn
Kxk 64
(n-k+1) x (n-k+1)
4x4
2x2 matrix
Pooling
2x2
POOLING LAYER
FULLY CONNECTED LAYER
► Neurons in this layer have full connectivity with all neurons in the preceding and
succeeding layer as seen in regular FCNN.
► This is why it can be computed as usual by a matrix multiplication followed by a
bias effect.
► The FC layer helps to map the representation between the input and the output.
NON-LINEAR ACTIVATION FUNCTIONS
► Sigmoid
► The sigmoid non-linearity has the mathematical form
σ(κ) = 1/(1+e-k)
► It takes a real-valued number and “squashes” it into a range between 0 and 1.
► Tanh
► Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid, the activation
saturates, but — unlike the sigmoid neurons — its output is zero centered.
► ReLU
► The Rectified Linear Unit (ReLU) has become very popular in the last few years. It
computes the function ƒ(κ)=max (0,κ). In other words, the activation is simply threshold at
zero.
YOLO- YOU ONLY LOOK ONCE - 4/2
- Residual blocks
► This first step starts by dividing the original image (A) into NxN grid cells of equal shape, where
N in our case is 4 shown on the image on the right. Each cell in the grid is responsible for
localizing and predicting the class of the object that it covers, along with the
probability/confidence value.
-Bounding boxes
The next step is to determine the bounding boxes which correspond to rectangles highlighting all
the objects in the image. We can have as many bounding boxes as there are objects within a given
image.
► YOLO determines the attributes of these bounding boxes using a single regression module in
the following format, where Y is the final vector representation for each bounding box.
► Y = [pc, bx, by, bh, bw, c1, c2]
► This is especially important during the training phase of the model.
YOLO- YOU ONLY LOOK ONCE
• Given an image generate bounding boxes, one for each detectable object in image
• For each bounding box, output 5 predictions: x, y, w, h, confidence.
• Also output class
x, y (coordinates for center of bounding box)
w,h (width and height)
confidence (probability bounding box has object)
class (classification of object in bounding box)
YOLO- YOU ONLY LOOK ONCE
Fourier Transform
► Sampled FFT
► Sets of samples which describe the spatial image
► Helps to find periodic patterns in spatial domain images
► Inverse FFT converts image from frequency to spatial domain
► Separation based on sine and cosine components
► cos Ɵ+ isin Ɵ, cos Ɵ – i sin Ɵ (e-i Ɵ & ei Ɵ
► The term in exponential power is called basis function.
Inverse FFT:
FOURIER TRANSFORM
► import cv2
► import numpy as np
► f_transform = np.fft.fft2(image)
► f_transform_shifted = np.fft.fftshift(f_transform)
►
magnitude_spectrum = np.log(np.
abs(f_transform_shifted) + 1)
►
plt.subplot(121), plt.imshow(image, cmap=
'gray')
►
plt.subplot(122), plt.imshow(magnitude_spectrum, cmap=
'gray')
►
plt.show()
IMAGE NOISE & FILTERS 7/2
► Noise adds random variations in brightness and color information of an existing image
► Adding noise to images can help in testing the performance of image processing
algorithms such as denoising, segmentation, and feature detection under different
levels of noise.
TYPES OF FILTERS
► LINEAR
Gaussian
Box filter
Weighted Average Filter
► NON-LINEAR
Median Filter
Min filter
Max filter
IMAGE SMOOTHING
► Weighted average filter- Gives more weight to pixels near the output location
► import numpy as np
►
def gaussian_kernel(size, sigma=1.0):
► kernel = np.fromfunction(
► lambda x, y: (1/(2*np.pi*sigma**2)) * np.exp(-((x-(size-1)/2)**2 + (y-(size-1)/2)**2)/(2*sigma**2)),
► (size, size)
► )
► return kernel / np.sum(kernel)
► kernel_size = 5
► sigma = 1.0
► gaussian_kernel_matrix = gaussian_kernel(kernel_size, sigma)
► print("Gaussian Kernel Matrix:")
► for row in gaussian_kernel_matrix:
► print(["{:.5f}".format(value) for value in row])
IMAGE SMOOTHING USING GAUSSIAN
FILTER
► The result is almost the expected one, but some of the edges are thick
and others are thin.
► Non-Max Suppression step will help us mitigate the thick ones.
NON-MAX SUPPRESSION
► The algorithm goes through all the points on the gradient intensity matrix and finds
the pixels with the maximum value in the edge directions.
► if one those two pixels are more intense than the one being processed, then only
the more intense one is kept.
► Hence, the intensity value of the current pixel (i, j) is set to 0. If there are no pixels
in the edge direction having more intense values, then the value of the current
pixel is kept.
• Create a matrix initialized to 0 of the same size of the original gradient intensity
matrix;
• Identify the edge direction based on the angle value from the angle matrix;
• Check if the pixel in the same direction has a higher intensity than the pixel that is
currently processed;
• Return the image processed with the non-max suppression algorithm.
• We can still notice some variation regarding the edges’ intensity: some pixels
seem to be brighter than others.
DOUBLE THRESHOLDING
• High threshold is used to identify the strong pixels (intensity higher than the high
threshold)
• Low threshold is used to identify the non-relevant pixels (intensity lower than the
low threshold)
• All pixels having intensity between both thresholds are flagged as weak and the
Hysteresis mechanism will help us identify the ones that could be considered as
strong and the ones that are considered as non-relevant.
• the hysteresis consists of transforming weak pixels into strong ones, if and only if
at least one of the pixels around the one being processed is a strong one
CANNY EDGE DETECTOR
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► image = cv2.imread('/content/shapes.jpg', cv2.IMREAD_GRAYSCALE)
►
► Edge detection algorithms using Sobel Operator work on the first derivative of an
image.
► When the image is smoothed, the derivatives Ix and Iy w.r.t. x and y are
calculated. by convolving I with Sobel kernels Kx and Ky, respectively.
► First derivative of an image might be subject to noise
► Laplacian operator makes use of second derivative of the images
LAPLACIAN OF GAUSSIAN (LOG)
► In Positive Laplacian we have standard mask in which center element of the mask should
be negative and corner elements of mask should be zero.
► Positive Laplacian Operator is use to take out outward edges in an image.
► In negative Laplacian operator we also have a standard mask, in which center element
should be positive. All the elements in the corner should be zero and rest of all the
elements in the mask should be -1.
► Negative Laplacian operator is use to take out inward edges in an image
► The Laplacian Operator achieves a sharpening effect by enhancing the grayscale contrast
of the image. As a second-order differential operator, it enhances areas with sudden
grayscale changes in the image and weakens areas with slow grayscale changes. But the
processed image loses the direction information of the edges and enhances the noise
CAMERA GEOMETRY
► How does a camera map perspective projection points on the image plane?
► A camera projects a 3D scene onto a 2D image plane. This transformation can be
represented using a projection matrix P, which maps a 3D point to a 2D image point
► Determining external and internal parameters of a camera is called camera calibration
► Estimating linear models is easier than non-linear camera models.
► Used to develop camera calibration
► Determine projection matrix
LINEAR CAMERA MODEL: 3D to 2D
INTRINSIC MATRIX
► Given mx as number of pixels per mm in the X direction and my as number of pixels per
mm in the Y direction
► U=mx*xi + ox
► V=my*yi + oy
► Consider (ox,oy) is the centre of the image
► Fx,fy,ox,oy are known as intrinsic parameters of a camera
► Also called camera’s internal geometry
► Corresponding intrinsic matrix:
Calibration
matrix
EXTRINSIC MATRIX
► Images captured from one integrated stereo vision camera or two cameras at a time
► Also called binocular vision
► Camera calibration accurate in integrated camera
► No movement in case of multiple cameras
► Orientation of one camera with respect to another
► Used for depth extraction
DEPTH ESTIMATION
• Dilatio expands the boundaries of an object in an image. This is done by convolving the
image with a structuring element, which determines the size and shape of the dilation. The
output of the dilation operation is a new image where the pixels in the original image are
expanded or dilated.
• Erosion is a morphological operation that shrinks the boundaries of an object in an image.
This is done by convolving the image with a structuring element, which determines the
size and shape of the erosion. The output of the erosion operation is a new image where
the pixels in the original image are eroded or shrunk.
► import cv2
► import numpy as np
► from google.colab.patches import cv2_imshow
► img = cv2.imread('/content/ip.png', 0)
► kernel = np.ones((5, 5), np.uint8)
► img_erosion = cv2.erode(img, kernel, iterations=
1)
► img_dilation = cv2.dilate(img, kernel, iterations=
1)
► cv2_imshow(img)
► cv2_imshow(img_erosion)
► cv2_imshow(img_dilation)
► cv2.waitKey(0)
OPENING AND CLOSING
► cv2_imshow(image)
Advantages of Harris Detector
► While the basic ideas of detecting corners remain the same as the Harris detector,
the Hessian detector makes use of the Hessian matrix and determinant, instead of
second-moment matrix M and corner response function R, respectively.
► Entries in Hessian matrix are second derivatives.
Disadvantages
► Once a corner gets magnified and becomes bigger than the size of the window by
zooming, the Harris and Hessian can no longer detect the corner.
► It is because what the detectors perceive through the window is not a corner
anymore but an edge due to the scale change.
Feature Extraction
► This is an area of image processing that uses algorithms to detect and isolate various
desired portions of a digitized image.
► A feature is a significant piece of information extracted from an image which provides
more detailed understanding of the image.
► Example, Detecting of faces in an image filled with people and other objects, Detecting of
facial features such as eyes, nose, mouth, Detecting of edges, so that a feature can be
extracted and compared with another
Feature Detection
► Need to recognize objects with unique and descriptive features in the process of object
recognition
► Detection of different feature families:
► Local pixels (SIFT, SURF..)
► Global pixel features (Histogram, Texture, Color)
► Shape of pixel regions(Area, Perimeter)
► Basis sets (FFT, Haar Wavelet)
Characteristics of features
► Salient
► Robust to clutter
► Repeatable
► Fewer and efficient
Local Feature Descriptors
► Detectors and descriptors can be used combined or independently for local feature
descriptions
► Searching strategies can be pixel-wise or tiled
► Aims to find pieces of objects
► SIFT, SURF, HOG
Global Feature Descriptors
► Texture Histograms
► Spatial Dependency matrix
► Regional Descriptors
Shape Features
► Area
► Perimeter
► Centroid
Basis set descriptors
► HAAR Wavelets
► Fourier transforms
Basic CV Pipeline
► Sensor processing
► Image Processing
► Global Metrics
► Local features
► Training
► Augmentation & Control
► Performance
SIFT : Feature Detector and Descriptor