Chapter 1: Visual Perception and Imaging Fundamentals
a. Discuss the human visual system's modular transfer function and its implications for computer vision
systems. How does understanding the human visual system inform the design of computer vision
algorithms?
b. Analyze the process of image degradation and restoration, focusing on various noise models and the
application of spatial, inverse and Wiener filtering techniques. Provide examples of scenarios where each
filtering method is most effective.
c. An 8-bit grayscale image of size 512x512 pixels is sampled and quantized. The image is
originally represented with a sampling rate of 200 pixels per inch (ppi) and each pixel value can
range from 0 to 255.
i. Sampling:
1. Calculate the spatial resolution of the image in terms of inches
(i.e., the dimensions of the image in inches).
2. If the sampling rate is reduced to 50 ppi, what will be the new
dimensions of the image in pixels?
ii. Quantization:
1. Given that the image is originally 8-bit, how many distinct
grayscale levels are available?
2. If the image is quantized to 2 bits, how many distinct grayscale
levels are now available?
3. Calculate the Mean Squared Error (MSE) introduced by reducing
the quantization from 8 bits to 2 bits assuming the original pixel
values are uniformly distributed.
iii. File Size:
1. Calculate the file size of the original 8-bit image in bytes.
2. Calculate the file size of the quantized 2-bit image in bytes.
Chapter 2: Low-Level and Intermediate-Level Vision
a. Explain the morphological operations (e.g., dilation, erosion) and their role in boundary extraction and
hole filling. Provide examples of practical applications where these operations are used.
b. Examine the scale-invariant feature transform (SIFT) algorithm. Describe its key stages and explain
how it achieves invariance to image scale and rotation. Discuss its strengths and limitations compared to
other feature detection algorithms.
c. What do you understand by thresholding in the context of image processing.
Consider a grayscale image with the following pixel intensity distribution:
Intensity Level Number of pixels
0 80
1 20
2 10
3 50
4 10
Using Otsu's binarization method, calculate the optimal threshold that maximizes
the between-class variance.
d. Given the following 5x5 image matrix, apply the Sobel operator to detect edges at the center pixel. Use
the following Sobel kernels:
Calculate the gradient magnitude at the center pixel.
Chapter 3: Camera Models and Camera Calibration
a. Provide a detailed explanation of the thin lens equation and what is its significance in understanding the
concepts of focus and depth of field in digital image processing. Explain how these concepts impact
image quality and camera calibration?
b. Discuss the methods used for camera calibration, including the use of calibration patterns and the
estimation of camera parameters from the projection matrix. Analyze the challenges involved in achieving
accurate calibration and ways to address them.
Chapter 4: Multiple View Geometry and Stereo Correspondence
a. Explain the process of triangulation in multiple view vision systems. How does epipolar geometry
facilitate the determination of 3D point positions from multiple images? Provide a detailed example to
support your explanation.
b. Analyze the role of the essential and fundamental matrices in stereo vision. Compare the eight-point
algorithm with other methods for estimating these matrices, discussing their relative advantages and
disadvantages.
Chapter 5: Motion Analysis
a. Discuss the computation and application of optical flow fields in motion analysis. Provide a detailed
explanation of at least two optical flow estimation techniques, highlighting their differences and use
cases.
b. Evaluate the Kanade-Lucas-Tomasi (KLT) tracker in the context of motion tracking. How does the KLT
tracker compare with other motion tracking methods such as those based on Kalman filters? Discuss
scenarios where KLT would be preferred over other methods.
c. Consider a one-dimensional discrete-time linear system described by the following state space
model:
Given:
● A=2; B=1; H=3; Q=0.2; R=0.1
● Initial state estimate, 𝑥0 = 0
● Initial estimation error covariance P0 = 2
● Control inputs u = [2, 3, 4]
● Measurements z = [0.2, 4.4, 5.3]
Calculate the state estimates 𝑥𝑘 and the estimation error covariance 𝑃𝑘for
k = 1, 2, 3, 4
Chapter 6: Image Classification and Object Detection
a. A CNN model has the following architecture:
● Convolutional layer with 64 filters of size 3x3, stride 2, and padding 2
● Max pooling layer with size 2x2 and stride 1
● Convolutional layer with 128 filters of size 2x2, stride 2, and padding 1
● Max pooling layer with size 2x2 and stride 2
● Fully connected layer with 256 neurons
● Output layer with 10 neurons (for classification)
Given an input image of size 28x28x1, calculate the
● output dimensions after each layer.
● Total number of trainable parameters
Show all steps of your calculation.
b. Explain the architecture of a typical CNN network and explain the major components,
e.g. convolution layer, pooling layer, classification layer.
c. What is transfer learning? Explain at least 2 transfer learning techniques in computer
vision.