IP Notes Unit 2,3,4,5
IP Notes Unit 2,3,4,5
📌 Key Components:
• Cornea: Transparent layer that rst bends (refracts) light entering the eye.
• Pupil: The black circular opening that adjusts in size to control the amount of light.
• Iris: The colored part of the eye; it adjusts the pupil size based on brightness.
• Lens: Focuses light rays onto the retina by changing its shape (accommodation).
• Retina: The innermost layer with light-sensitive cells—this is where the image is actually
formed.
🧠 Photoreceptors:
◦ Three types:
2. It hits the retina, where rods and cones convert it to electrical signals.
Page 1 of 73
fi
3. These signals travel via the optic nerve to the visual cortex in the brain.
📍 Fun fact: The image formed on the retina is actually inverted, but the brain ips it for correct
perception!
Our brain interprets different wavelengths of light as different colors. This is thanks to cone cells in
our retina.
• Proposes that the human eye has three types of cone cells, each sensitive to a different
range of wavelengths:
🧠 Our brain combines signals from these three cones to perceive all colors. For example:
This is the portion of the electromagnetic spectrum that the human eye can detect:
Wavelength
Color
Range
Violet ~380–450 nm
Blue ~450–495 nm
Green ~495–570 nm
Page 2 of 73
fl
Yello
~570–590 nm
w
Orang
~590–620 nm
e
Red ~620–750 nm
📍 Interesting fact: We don’t see “pure” colors all the time—our perception is often a mix of
multiple wavelengths!
Would you like to continue to 3. Alternative Color Models – HSV and CMYK next?
HSV is more intuitive for humans than RGB because it separates color (hue) from brightness
(value) and vividness (saturation).
📌 Components:
• Hue (H):
▪ 0° = Red
▪ 120° = Green
▪ 240° = Blue
• Saturation (S):
• Value (V):
📌 Components:
• Cyan (C)
• Magenta (M)
• Yellow (Y)
🧾 In subtractive mixing:
• Mixing all CMY ideally makes black, but in reality, it makes a muddy brown, so Black (K)
is added for depth and contrast.
🖨 Use Cases:
Use
Model Type Notes
Case
RGB Screens Additive Light-based model
More intuitive for human color
HSV Design Additive
selection
CMY Subtractiv
Printing Ink-based model
K e
Page 4 of 73
Illumination Models – Ambient, Di use, Specular
These models describe how light interacts with surfaces, which is super important in computer
graphics and 3D rendering to make scenes look realistic.
💡 1. Ambient Illumination
📌 Features:
✅ Use Case:
💡 2. Diffuse Illumination
• This is light that hits a surface and scatters evenly in all directions.
✅ Use Case:
Creates matte-looking surfaces and helps reveal the shape and depth of an object.
💡 3. Specular Illumination
Page 5 of 73
fi
ff
• This simulates shiny highlights or re ections on glossy surfaces.
📌 Features:
• Based on the angle between the viewer and the re ected light.
✅ Use Case:
Used for metallic, glassy, or wet surfaces. Adds realism and detail.
✨ Visual Analogy:
🖼 Halftones
📌 What is it?
Halftoning is a technique that simulates shades of gray (or color) using dots of different sizes or
spacing.
Page 6 of 73
fi
fl
fl
◦ Small dots = Lighter areas
• Works well because printers typically print using only one ink color per channel (e.g.,
black in grayscale, CMYK in color).
🔍 Example:
Up close, a halftone photo is just a bunch of dots. From a distance, the eye blends them into
smooth shades.
🎛 Dithering
📌 What is it?
Dithering is a technique that creates the illusion of color depth by arranging pixels of different
colors.
• Instead of varying the size of dots (like halftoning), it spreads colors around to simulate
intermediate shades.
🧠 How it works:
• If a device supports only 2 colors (e.g., black & white), you can still simulate gray by
mixing the two in a certain pattern.
🔧 Types of Dithering:
• Floyd–Steinberg Dithering: Error diffusion method; spreads the error of a pixel to its
neighbors.
• Ordered Dithering: Uses a xed pattern to determine how pixels are displayed.
🆚 Halftone vs Dithering:
Page 7 of 73
fi
fi
Uses dots Yes, of varying size No, uses pixels in patterns
Common Digital media & low-bit
Printing
in graphics
Continuous tones in
Simulates More colors with limited palette
print
Let me know if you want diagrams, examples, or quiz-style revision for any of the topics!
Page 8 of 73
UNIT-3
Histogram Operations
A histogram is a graphical representation of the distribution of pixel intensities (brightness) in an
image. It shows how many pixels in an image have a particular intensity value. In an 8-bit
grayscale image, pixel intensities range from 0 (black) to 255 (white).
1. Histogram Equalization
What is it?
Histogram Equalization is a method to enhance the contrast of an image. The main goal is to
spread out the pixel intensity values more evenly across the entire range, resulting in an image with
better contrast.
• X-axis: Represents pixel intensity values (0 for black, 255 for white in an 8-bit grayscale
image).
How it works:
1. Calculate the histogram of the image: It shows the distribution of pixel values.
2. Compute the cumulative distribution function (CDF) from the histogram. The CDF is
essentially the cumulative sum of the histogram values.
3. Normalize the CDF: This step rescales the cumulative values so that they span the entire
intensity range, typically from 0 to 255.
4. Map each pixel's intensity: The pixel values in the original image are mapped to new
values using the normalized CDF. The result is an image where pixel intensities are more
evenly distributed.
• If an image is too dark or too bright (e.g., there are very few dark or bright pixels),
histogram equalization can stretch the pixel values to cover a broader range, improving the
overall visibility.
• Contrast adjustment: You can stretch or shrink the histogram to enhance image contrast.
• Image equalization: The process of adjusting the image to make the histogram at,
spreading pixel values across the entire range.
Page 9 of 73
fl
Effect:
• The image will appear with enhanced contrast, especially in low-contrast regions.
2. Histogram Stretching
What is it?
How it works:
2. Stretch the pixel values: The minimum pixel value is mapped to 0, and the maximum is
mapped to 255.
3. Every pixel intensity in the image is then linearly rescaled to t the new intensity range.
• This technique is useful for improving the visibility of an image that might have very
narrow pixel intensity ranges, such as when an image is underexposed.
Effect:
• The pixel intensities are stretched across the full range (0–255), resulting in improved
contrast for the image.
What is it?
Histogram Speci cation (or Histogram Matching) is a technique used to modify the histogram of
an image so that it matches a given reference histogram.
How it works:
3. Use the reference histogram to modify the pixel values of the source image, ensuring that
the resulting image has a similar intensity distribution to the target.
• When you need to adjust an image to match the brightness and contrast of a desired
reference image (e.g., matching an image to a standard look).
Page 10 of 73
fi
fi
fi
Effect:
• The image will adopt a histogram similar to the reference image, which may involve both
contrast and brightness adjustments.
• Image Segmentation: Better contrast can make it easier to segment objects or regions of
interest in an image.
• Visual Consistency: Matching histograms can be used to make images visually consistent
in brightness and contrast.
Here’s an example of how you might perform histogram equalization using OpenCV (a popular
computer vision library):
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Original Histogram
plt.subplot(1, 2, 1)
plt.hist(image.ravel(), bins=256, color='black')
plt.title('Original Histogram')
# Equalized Histogram
plt.subplot(1, 2, 2)
plt.hist(equalized_image.ravel(), bins=256, color='black')
plt.title('Equalized Histogram')
plt.show()
Page 11 of 73
# Save the equalized image
cv2.imwrite('equalized_image.jpg', equalized_image)
Explanation:
• plt.hist(): Displays the histogram for both the original and equalized images.
Conclusion
Page 12 of 73
Color Operations
Color operations refer to modifying or enhancing the color properties of an image, including
adjusting the brightness, contrast, or color balance. These operations work with different color
spaces or models (like RGB, HSV, etc.) and apply mathematical functions to modify pixel values.
1. Brightness Adjustment
What is it?
How it works:
• To adjust the brightness, you add or subtract a constant value to all pixel intensity values in
an image.
• If the pixel values are in the range [0, 255], adding a positive value will increase the
brightness, while subtracting a value will decrease it.
Mathematical Representation:.
• Increase brightness: When an image appears too dark or lacks detail in dark areas,
increasing the brightness can make the details more visible.
• Decrease brightness: When an image is too bright or washed out, reducing the brightness
can help enhance details.
Effect:
• Increased brightness results in an image that appears lighter and more illuminated.
Page 13 of 73
• Decreased brightness results in an image that appears darker.
2. Contrast Adjustment
What is it?
Contrast adjustment involves modifying the difference between the darkest and lightest areas of
the image. Increasing the contrast makes the image appear more vibrant and detailed, while
decreasing the contrast results in a more muted image.
How it works:
• Contrast can be adjusted by stretching or compressing the pixel intensity values. The goal
is to either expand the intensity range to improve visual separation between light and dark
areas or compress it to create a more at image.
• Contrast adjustment is typically done using a linear transformation of the pixel values.
Mathematical Representation:
• Increase contrast: When an image appears at with little distinction between dark and light
areas, increasing contrast can make it more visually striking.
• Decrease contrast: When the image has too many stark transitions, reducing contrast can
soften the image.
Effect:
Page 14 of 73
fl
fl
• Increased contrast results in brighter whites and darker blacks, making details stand out
more.
• Decreased contrast results in an image with more even intensity distribution and less visual
differentiation between light and dark areas.
What is it?
Color balance refers to the adjustment of the intensities of the primary colors (Red, Green, and Blue
in the RGB model) to correct or enhance the overall color appearance of the image. This operation
is particularly useful when the image has an undesirable color cast (e.g., too much red, too much
blue).
How it works:
• In the RGB color model, an image is represented as a combination of Red, Green, and Blue
channels. By adjusting the intensities of these channels, you can change the image's color
balance.
• You can modify the intensity of each color channel by adding or subtracting a constant value
or by multiplying with a scaling factor.
• Warm tones: If the image looks too cold (e.g., too much blue), you can increase the red and
green channels to give the image a warmer appearance.
• Cool tones: If the image looks too warm (e.g., too much red), you can reduce the red
channel or increase the blue channel to make the image cooler.
Effect:
• Adjusting the color balance makes the image look more natural or visually pleasing by
correcting unwanted color casts or enhancing certain tones.
4. Channel Manipulation
What is it?
Channel manipulation involves manipulating individual color channels (Red, Green, Blue) of an
image to either modify or enhance certain parts of the image.
How it works:
• Each channel can be manipulated separately. For instance, increasing the red channel will
enhance the red elements of the image, while reducing the blue channel can make the image
look warmer.
Page 15 of 73
Example Use Case:
• Red Channel Enhancement: If you want to make the image appear more reddish, you can
increase the values of the red channel and adjust the others accordingly.
• Grayscale Conversion: By setting the red, green, and blue channels to the same value for
all pixels, you can convert the image to grayscale.
Effect:
• Channel manipulation can dramatically change the look of an image and allow for artistic
effects or corrective color adjustments.
Here’s an example using OpenCV to adjust the brightness and contrast of an image:
import cv2
import numpy as np
Page 16 of 73
Conclusion
• Color operations allow you to adjust and manipulate the appearance of an image, either to
enhance certain features (like brightness and contrast) or to correct color balance issues.
• Brightness and contrast adjustment can signi cantly improve the visibility of details in an
image, while color balance and channel manipulation provide ways to correct or enhance
speci c tones.
Let me know if you would like to explore any speci c color operation in more detail or need code
examples!
1. Thresholding
What is it?
Thresholding is a simple method of image segmentation that converts a grayscale image into a
binary image (black and white) based on a threshold value. It helps separate objects in an image
from the background by making pixels above a certain intensity white (255) and those below the
threshold black (0).
• Object detection: Thresholding is commonly used to separate objects from the background
in cases where objects have a clear intensity contrast with the background.
• Binarization: When converting grayscale images to binary images for further analysis.
Effect:
• The result is a binary image where only the pixels that meet the threshold criteria are white,
and the rest are black.
2. Inversion
What is it?
Page 17 of 73
fi
fi
fi
Inversion is the process of inverting the pixel values of an image. For each pixel, its intensity is
inverted by subtracting the pixel value from the maximum possible value (usually 255 for 8-bit
images).
• Negative images: Inverting the pixel values can create a photographic negative, which can
be useful in artistic effects or for creating more distinct separations between different
regions of an image.
Effect:
Page 18 of 73
• The darkest pixels (0) become the lightest (255), and the lightest pixels (255) become the
darkest (0).
3. Pixel Addition/Subtraction
What is it?
Pixel addition and subtraction involve adding or subtracting a constant value to/from the pixel
intensities of an image. This can be used to adjust brightness, enhance features, or perform other
transformations.
• Brightness adjustment: By adding a constant value to all pixels, you can increase the
overall brightness of an image.
• Image correction: Pixel subtraction can be used to darken or correct an overexposed image.
Effect:
4. Bitwise Operations
What is it?
Bitwise operations are logical operations that directly manipulate the bits of pixel values. These
operations are used for tasks like image masking, image merging, and logical comparison between
pixel values.
• AND (&)
Page 19 of 73
• OR (|)
• XOR (^)
• NOT (~)
How it works:
For each pixel, the corresponding bitwise operation is applied between two pixel values.
• AND: The result pixel will have 1 only if both the pixels have 1 in the corresponding bit
position.
• OR: The result pixel will have 1 if either of the pixels has 1 in the corresponding bit
position.
• XOR: The result pixel will have 1 if only one of the pixels has 1 in the corresponding bit
position.
• Masking: Bitwise operations are useful for creating masks where certain areas of the image
are preserved and others are ignored.
• Combining Images: Bitwise operations can be used to combine two images in creative
ways, such as overlaying one image over another.
Effect:
What is it?
In color images, operations can be applied on individual color channels (Red, Green, and Blue). For
example, you might want to enhance only the red channel of an image while leaving the other
channels unchanged.
How it works:
For each color channel, you can manipulate its pixel values in various ways, such as increasing the
intensity of the red channel or setting the green channel to zero for a monochromatic image with
only red hues.
Page 20 of 73
Example Use Case:
• Grayscale Conversion: By averaging or selecting one color channel (such as the green
channel), you can create a grayscale image from a color image.
• Color isolation: Isolate a color channel to enhance or focus on a particular color in the
image.
Effect:
• Grayscale: Reduces the image to shades of gray by manipulating the RGB channels.
• Color enhancement: Increases or decreases the intensity of speci c colors in the image.
import cv2
import numpy as np
# Thresholding
_, thresholded_image = cv2.threshold(image, 128, 255,
cv2.THRESH_BINARY)
# Inversion
inverted_image = cv2.bitwise_not(image)
• Thresholding: Converts the image to binary based on a threshold value (128 in this case).
Page 21 of 73
fi
Conclusion
• These operations allow you to manipulate the individual pixel values of an image to achieve
various effects, such as segmentation, enhancement, and masking.
Let me know if you'd like to dive deeper into any of these operations or need further explanations!
Let's proceed to the next topic: Non-Linear and Morphological Operations in image processing.
1. Noise Filtering
What is it?
Noise ltering refers to the process of removing unwanted variations or "noise" from an image.
Noise can be caused by various factors like sensor errors, transmission errors, or environmental
conditions. Noise ltering smooths the image and makes it easier to perform further image analysis.
Types of Noise:
• Gaussian Noise: A form of statistical noise with a bell-curve distribution of pixel values.
• Salt-and-Pepper Noise: Noise that appears as black and white pixels scattered randomly
over the image.
How it works:
• Mean Filter: Replaces each pixel with the average value of its neighbors.
• Median Filter: Replaces each pixel with the median value of its neighbors. It is particularly
effective at removing salt-and-pepper noise.
• Gaussian Filter: Applies a weighted average where closer pixels contribute more to the
nal value, smoothing the image while preserving edges.
Page 22 of 73
fi
fi
fi
fi
Example Use Case:
• Noise Removal: Filters like median or Gaussian lters can be used to remove salt-and-
pepper noise or Gaussian noise from images.
Effect:
• Gaussian Filter: Smoothens the image by reducing high-frequency noise, preserving edges
better than a mean lter.
2. Dilation
What is it?
Dilation is a morphological operation used to expand the white regions (foreground) of a binary
image. It involves placing a structuring element (a small binary shape) on each pixel of the image
and setting the pixel to 1 if any part of the structuring element overlaps with the foreground.
How it works:
• For each pixel in the image, check if the structuring element overlaps with any white pixel
(1) in the neighborhood.
• Enhance Object Size: Dilation is often used to expand the size of objects in a binary image.
This can help in connecting broken parts of an object.
Effect:
• Increases the area of foreground objects and lls small holes or gaps.
3. Erosion
What is it?
Erosion is the opposite of dilation. It is a morphological operation that shrinks the foreground
(white) regions of a binary image. It works by placing a structuring element at each pixel and
setting the pixel to 0 if any part of the structuring element overlaps with the background (black) in
the image.
Page 23 of 73
fi
fi
fi
fi
How it works:
• For each pixel in the image, check if the structuring element ts completely within the
foreground (1) region.
• Shrink Objects: Erosion is useful when you want to shrink the foreground objects in a
binary image.
• Remove Small Objects: It is often used to remove small noise and artifacts in an image.
Effect:
4. Majority Operation
What is it?
The majority operation involves replacing the value of a pixel with the most frequent value in its
neighborhood. It is a non-linear operation, often used for noise reduction.
How it works:
• For each pixel, examine its neighborhood and nd the most frequent pixel value (either 0 or
1 for binary images).
• Noise Reduction: The majority operation can be used to remove isolated noise pixels in a
binary image, ensuring that pixels are replaced by the majority value of their neighbors.
Effect:
• Removes isolated noise and smooths the image while preserving overall structure.
import cv2
import numpy as np
Page 24 of 73
fi
fi
fi
# Convert the image to binary
_, binary_image = cv2.threshold(image, 128, 255,
cv2.THRESH_BINARY)
# Perform Dilation
dilated_image = cv2.dilate(binary_image, kernel,
iterations=1)
# Perform Erosion
eroded_image = cv2.erode(binary_image, kernel, iterations=1)
• Dilation: The dilated image will show an expanded foreground, and small gaps in objects
may be lled.
• Erosion: The eroded image will shrink the foreground objects, and small objects may be
removed.
Conclusion
• Noise ltering: It removes unwanted variations in an image, improving clarity for further
processing.
• Dilation and Erosion: These morphological operations are used to modify the shape of
objects in a binary image. Dilation expands objects, while erosion shrinks them.
• Majority operation: A noise reduction technique that replaces pixel values with the most
common value in its neighborhood.
Page 25 of 73
fi
fi
Morphological operations are particularly useful when dealing with binary images or images where
structural changes are important, such as in object detection, image segmentation, and noise
reduction.
Geometric Operations
geometric transformations — a fundamental toolset that enables the modi cation of
an image’s spatial arrangement. These transformations are essential for a variety of
applications, including computer vision, medical imaging, robotics, and more.
Reflection
Re ection transformation in image processing is a geometric operation that involves
ipping an image across a speci c axis. The re ection can be done horizontally,
vertically, or diagonally, resulting in a mirrored version of the original image.
Horizontal Re ection
In horizontal re ection, each row of pixels is reversed, creating a mirror image along
the horizontal axis. Let I represent the original image and I_rh represent the
horizontally re ected image.
This is the equation of this transformation:
where i and j are row and column indices, and h is the height of the image. Here’s the
code implementation of horizontal re ection.
def horizontal_re ection(image):
Page 26 of 73
fl
fl
fl
fl
fl
fl
fi
fi
fl
fl
fi
height, width, channels = image.shape
re ected_image = np.zeros((height, width, channels), dtype=np.uint8)
for i in range(height):
re ected_image[i, :] = image[height - i - 1, :]
return re ected_image
Vertical Re ection
In vertical re ection, each column of pixels is reversed, creating a mirror image along
the vertical axis. Let I represent the original image and I_rv represent the vertically
re ected image.
The equation of this transformation is given by:
where i and j are row and column indices, and w is the width of the image. Here’s the
code implementation of vertical re ection.
def vertical_re ection(image):
height, width, channels = image.shape
re ected_image = np.zeros((height, width, channels), dtype=np.uint8)
for j in range(width):
re ected_image[:, j] = image[:, width - j - 1]
return re ected_image
Translation
Translation is a fundamental geometric transformation involving the shifting of an
object within an image from one location to another. This shift can occur both
horizontally and vertically, determined by speci ed offset values measured in pixels.
The translation equations for transforming the coordinates of a point (x, y) to a new
point (x’, y’) with respective horizontal and vertical offsets (tx, ty) can be expressed
as follows:
Page 27 of 73
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fi
where tx represents the horizontal shift and ty represents the vertical shift, both
denoted in the number of pixels by which we need to shift in their respective
directions.
To implement this translation transformation in Python,
the cv2.warpAf ne() function is utilized, requiring the translation matrix and the
image as input. Here’s a Python script demonstrating how to achieve image
translation:
def translate_image(image, tx, ty):
# De ne the translation matrix
translation_matrix = np. oat32([[1, 0, tx], [0, 1, ty]])
return img_translation
Scaling
Scaling, a core geometric transformation, involves resizing an image in the
horizontal (x) and/or vertical (y) directions. This transformation is achieved using
scaling equations, which determine the new size of the image based on speci ed
scaling factors.
Page 28 of 73
fi
fi
fi
fi
fl
fi
fi
fi
fi
Let’s denote the scaling factors for the x and y directions as Sx and Sy respectively.
The scaling equations for transforming the coordinates of a point (x, y) to a
new point (x’, y’) with the scaling factors Sx and Sy can be expressed as
follows:
In these equations, x’ and y’ represent the coordinates of the point after scaling,
while x and y are the original coordinates. The scaling factors Sx and Sy determine
the extent of scaling in the respective directions. If Sx and Sy are greater than 1, the
image is enlarged in the x and y directions, respectively. Conversely, if Sx and Sy are
less than 1, the image is reduced in size.
To implement scaling in Python, we use transformation matrices. The scaling
transformation matrix takes the form of a 2×2 array:
In this matrix, Sx represents the scaling factor in the x-direction, and Sy represents
the scaling factor in the y-direction. The scaling matrix is then applied to each point
in the image to achieve the desired scaling effect.
Implementing scaling in Python involves using the scaling matrix and the
`cv2.warpAf ne()` function from the OpenCV library. Here’s a Python script
demonstrating how to achieve image scaling:
return img_scaled
Rotation
Rotation is a geometric transformation that involves changing the orientation of an
image by a speci ed angle around a given axis or point. The rotation can be
mathematically expressed using equations:
Page 30 of 73
fl
fi
fi
fi
fi
fi
fi
fi
# Load the image (replace with the actual path to your image)
original_image = cv2.imread('path_to_your_image.jpg')
Shearing
Shearing, much like rotation, is a fundamental geometric transformation used in
image processing. Unlike translation, shearing involves shifting the pixel values of an
image either horizontally or vertically, but the shift is not uniform across the image.
This creates a distorted effect by displacing parts of the image in different directions.
Mathematically, shearing can be expressed using equations that describe the
transformation of coordinates. Let’s denote the original coordinates as x and y and the
transformed coordinates after shearing as x’ and y’. The shearing can be applied
either horizontally or vertically.
Horizontal Shearing:
Vertical Shearing:
In these equations, k represents the shearing factor, determining the amount of shear
applied in the respective direction. When k is positive, it shifts the points towards a
speci c direction, and when k is negative, it shifts in the opposite direction.
Visualization helps to grasp this concept better. Imagine a rectangular image, and
applying horizontal shearing will shift the upper part of the image to the right and the
lower part to the left, creating a trapezoidal shape.
To implement image shearing in Python, we’ll use the similar approach of creating a
shearing matrix and applying it using the `cv2.warpAf ne()` function from the
OpenCV library. Below is a Python script demonstrating how to achieve image
shearing:
Page 31 of 73
fi
fi
fi
def shear_image(image, shear_factor):
# De ne the shearing matrix
shearing_matrix = np. oat32([[1, shear_factor, 0], [0, 1, 0]])
return img_sheared
Conclusion
Geometric transformations are the building blocks of image processing, allowing us
to alter images in meaningful ways. From simple translations to complex perspective
corrections, these transformations empower various applications that rely on accurate
image manipulation. As we continue to advance in the eld of image processing, a
deeper understanding of geometric transformations will be indispensable for
innovating and improving the way we perceive and interact with images.
Page 32 of 73
fi
fi
fi
fl
fi
fi
fi
fi
UNIT-4
Image Filters
Image lters are mathematical operations applied to images to enhance features, suppress
noise, or extract information like edges, textures, or patterns.
🧃 Simple De nition:
An image lter modi es the intensity values of pixels by using a kernel or mask, typically by
convolution.
Purpose Examples
Smoothing Blur, Gaussian Filter
Laplacian, High-Pass
Sharpening
Filter
Edge Detection Sobel, Prewitt, Canny
Noise Removal Median, Gaussian
Feature
Gabor, Wavelet
Extraction
plaintext
CopyEdit
g(x, y) = ∑∑ f(x-i, y-j) * h(i, j)
Page 33 of 73
fi
fi
fi
fi
fi
fi
🧊 Example Filter (Box Blur 3x3):
plaintext
CopyEdit
[1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9]
This averages the 3x3 neighborhood around each pixel → blurs the image.
Image as a Signal
📌 What is an Image?
• Just like a 1D signal (like sound), an image has amplitude (pixel intensity) and spatial
coordinates (x, y).
• The value at each (x, y) location is the brightness or color (in case of RGB images).
🔍 Key Concepts:
Concept Meaning
Spatial Domain Where an image is represented by pixels.
A single point in an image, has intensity (grayscale) or RGB
Pixel
values.
Grayscale
Each pixel has one intensity value (0–255).
Image
Color Image Each pixel has 3 values – Red, Green, Blue (RGB).
Analog Image Continuous – in nite values. Like natural light.
Page 34 of 73
fi
Digital Image Sampled (discrete) in space and intensity.
plaintext
CopyEdit
f(x, y) = intensity at pixel location (x, y)
For example, a 3x3 grayscale image:
ini
CopyEdit
[ 10 20 30
40 50 60
70 80 90 ]
🧮 Mathematical Representation:
plaintext
CopyEdit
f(x, y), where:
- x, y are spatial coordinates
- f is intensity (grayscale value)
In signals terms:
• 1D signal: f(t)
• Apply signal processing techniques like convolution, Fourier Transform, ltering, etc.
📝 Exam Tip:
Page 35 of 73
fi
An image is treated as a two-dimensional signal, where the intensity of each pixel represents the
amplitude, and the spatial coordinates (x, y) represent the independent variables. This allows us to
apply signal processing techniques to perform operations such as ltering, edge detection, and
frequency analysis.
Image processing involves modifying or analyzing digital images to improve quality or extract
information.
📌 De nition:
You apply a function or lter directly on the pixel intensity values (f(x, y)) of the image.
⚙ Formula:
plaintext
CopyEdit
g(x, y) = T[f(x, y)]
Where:
🔧 Techniques Used:
Technique Purpose
Filtering Smoothing, sharpening
Edge Detection Detect boundaries
Histogram
Improve contrast
Equalization
Page 36 of 73
fi
fi
fi
Shape-based
Morphological Ops
processing
📌 Examples:
[1 1 1
• 1 1 1
• 1 1 1] / 9
•
📌 De nition:
Processing an image by transforming it into the frequency domain using a mathematical tool like
the Fourier Transform.
Because many image features (edges, textures, patterns) become easier to analyze and modify
when seen in terms of frequency components.
plaintext
CopyEdit
F(u, v) = ∑∑ f(x, y) * e^(-j2π(ux/M + vy/N))
• Converts spatial image f(x, y) → frequency domain F(u, v)
• u, v = frequency components
• M, N = image dimensions
Page 37 of 73
fi
🔄 Inverse Fourier Transform:
plaintext
CopyEdit
f(x, y) = ∑∑ F(u, v) * e^(j2π(ux/M + vy/N))
📊 Frequency Interpretation:
Frequenc
Meaning
y
Smooth areas, gradual changes
Low
(background)
High Sharp transitions, edges, noise
📝 Exam Tips:
Spatial domain techniques manipulate pixels directly (e.g., smoothing, sharpening), while frequency
domain techniques modify the image’s frequency components using transforms like the Fourier
Transform.
Page 38 of 73
fi
fi
Q2: Why use frequency domain?
Some image operations (e.g., noise reduction, texture analysis) are more effective or easier in the
frequency domain because they isolate different patterns or noise by frequency.
At the core of convolution is the kernel (or lter), a small matrix (e.g., 3x3, 5x5), which is applied
to the image by sliding it over each pixel in the image. The value of each pixel in the output image
is calculated by taking the weighted sum of the corresponding input image pixels, where the
weights are provided by the kernel.
Example of Convolution:
Let’s say we have an image f(x, y) and a 3x3 kernel h(i, j):
Image (f):
css
CopyEdit
[ 3 5 7 ]
[ 4 8 6 ]
[ 2 6 9 ]
Kernel (h):
css
CopyEdit
[ 0 -1 0 ]
[ -1 4 -1 ]
[ 0 -1 0 ]
Convolution operation:
We apply the kernel to the image at each position, multiply corresponding elements, and sum them
up. For the center pixel of the image (pixel at position (1,1) in a 0-based index):
diff
Page 39 of 73
fi
fi
fi
fi
CopyEdit
(3 * 0) + (5 * -1) + (7 * 0)
+ (4 * -1) + (8 * 4) + (6 * -1)
+ (2 * 0) + (6 * -1) + (9 * 0)
= -5 - 4 + 32 - 6 = 17
This gives the new pixel value for the center pixel (1,1) in the output image.
Different lters (kernels) are used for various operations. Some of the most common kernels
include:
Linear lters are lters that use linear operations (like addition and multiplication) to modify the
image. In the case of convolution, a lter is applied to the image by performing linear operations
over the pixel intensities and using a convolution operation.
Linear lters are called so because the output of the lter is a weighted sum of input pixel values,
meaning they do not alter the original image's structure, but they instead combine the image’s
pixels in some way that results in blurring, sharpening, or edge detection.
1. Box Filter (Average Filter): This is the simplest linear lter, where the output pixel value is
the average of all the surrounding pixels in a window (kernel).
Example of a 3x3 Box Filter:
plaintext
CopyEdit
This kernel is used to blur or smooth the image by averaging the pixel values.
Page 40 of 73
fi
fi
fi
fi
fi
fi
fi
fi
5. Gaussian Filter: The Gaussian lter smooths an image by giving more weight to the
central pixels and less weight to the surrounding pixels.
Gaussian Kernel (5x5 example):
plaintext
CopyEdit
[ 1 4 6 4 1 ]
6. [ 4 16 24 16 4 ]
7. [ 6 24 36 24 6 ]
8. [ 4 16 24 16 4 ]
9. [ 1 4 6 4 1 ] / 256
10.
This kernel helps in reducing high-frequency noise (like sharp edges and ne details) in an
image.
11. Sobel Filter (Edge Detection): The Sobel lter is a gradient-based operator used for
detecting edges in images. It calculates the gradient of image intensity, emphasizing changes
in pixel values along the x-axis (horizontal) or y-axis (vertical).
Sobel Kernel (Horizontal edge detection):
plaintext
CopyEdit
[ -1 0 1 ]
12. [ -2 0 2 ]
13. [ -1 0 1 ]
14.
15. Laplacian Filter (Edge Detection): The Laplacian lter is a second-order derivative
operator that detects areas of rapid intensity change, which is useful for detecting edges in
the image.
Laplacian Kernel:
plaintext
CopyEdit
[ 0 -1 0 ]
16. [ -1 4 -1 ]
17. [ 0 -1 0 ]
18.
Page 41 of 73
fi
fi
fi
fi
🔵 3. Applying Convolution in Image Processing:
Convolution is at the heart of most image processing techniques, where it is used in lters for
blurring, edge detection, sharpening, and more. Convolution is used in a wide variety of
applications:
• Edge Detection: To identify boundaries of objects in the image (using lters like Sobel or
Prewitt).
🧠 Important Concepts:
• Kernel Size: The size of the kernel (e.g., 3x3, 5x5) determines the extent of the
neighborhood used for the operation. Larger kernels usually lead to more smoothing or
blurring, while smaller kernels retain more detail.
• Edge Effects: When applying convolution near the image borders, you often encounter edge
effects because there are fewer neighboring pixels available to apply the kernel. Several
strategies like zero-padding or mirroring the border pixels are used to handle these cases.
1. Blurring Filters: Used for reducing noise or details, commonly used for preprocessing
before feature extraction or object detection.
2. Edge Detection Filters: Used to identify boundaries in images, which is essential for tasks
like object detection and image segmentation.
3. Sharpening Filters: Used to enhance ne details, which is often useful in improving the
visibility of certain features in an image.
• Always remember to explain the kernel's role: Whether it's for blurring, sharpening, or
edge detection, you should be able to describe what the kernel does to the image.
• Know the kernel values for common lters like Sobel, Gaussian, and Laplacian. Being
able to identify the kernel and its purpose will be key to answering exam questions correctly.
Page 42 of 73
fl
fl
fi
fi
fi
fi
fi
Thresholding and Band-Pass Filters
🔵 Thresholding in Image Processing
📌 What is Thresholding?
🧠 Purpose of Thresholding:
✅ Types of Thresholding:
1. Global Thresholding:
◦ Simple and ef cient but not suitable for images with varying lighting conditions.
2. Formula:
plaintext
CopyEdit
◦ Threshold is calculated locally for each pixel based on a neighborhood around it.
7. Otsu's Thresholding:
◦ An automated method that calculates the optimal threshold based on the image
histogram. It works by maximizing the variance between the foreground and
background.
🧠 Applications of Thresholding:
🧠 Thresholding Example:
50 180 210
130 90 200
180 240 160
0 255 255
255 0 255
255 255 255
Here, all pixels above 150 become 255 (foreground), and all below become 0 (background).
🔵 Band-Pass Filters
A band-pass lter is a combination of a low-pass and high-pass lter. It allows signals (or
frequencies in the image) within a certain range (the passband) to pass through while blocking
the frequencies outside that range (the stopband). In image processing, band-pass lters can be
used to enhance certain features within a speci c frequency range.
Page 44 of 73
fi
fi
fi
fi
✅ Band-Pass Filter Properties:
1. Low-Pass Filter: Passes low frequencies (smooth components) and attenuates high
frequencies (details and edges).
2. High-Pass Filter: Passes high frequencies (edges and noise) and attenuates low
frequencies (smooth areas).
3. Band-Pass Filter: Passes a speci c range of frequencies and attenuates frequencies outside
of this range.
Imagine an image with both smooth and textural elements. A band-pass lter can be used to isolate
the textures by blocking both the very high frequencies (noise) and the very low frequencies
(smooth areas), passing only the intermediate frequencies that represent texture or edge-like
features.
• Image Texture Enhancement: Isolate textures in an image while removing large smooth
areas or noise.
• Edge Detection: Enhance edges that lie within a speci c frequency range.
• Noise Removal: Reduce unwanted high-frequency noise while keeping important low-
frequency content.
📌 Exam Tip:
Page 45 of 73
fi
fi
fi
fi
• Thresholding is simple, but ensure you understand adaptive and Otsu's methods as they
are often favored in complex images.
• Band-Pass Filters require you to understand how frequency ranges are selected and
applied in image ltering. Focus on their dual role of removing both high and low-
frequency components to leave the middle range intact.
🔹 1. Gaussian Filter
🎯 What is a Gaussian Filter?
A Gaussian lter is a linear lter that’s used mainly for image smoothing (blurring), noise
reduction, and preprocessing before edge detection or other image processing tasks.
It’s called “Gaussian” because it uses the Gaussian function (bell-shaped curve) to assign weights
to the pixel values within a kernel (window).
🧮 Mathematical Formula:
The 2D Gaussian function used to generate the lter kernel is:
Page 46 of 73
fi
fi
fi
fi
fl
fi
• Farther pixels contribute less.
This makes it ideal for removing high-frequency noise without distorting edges too much.
Edge Detection Preprocessing Applied before Sobel or Canny to suppress irrelevant details
Filter Behavior
Feature Value
Page 47 of 73
Type Low-pass lter
Effect Smoothens image, reduces noise
Kernel Gaussian bell-curve
Awesome! Let’s move to the next one: the Wavelet Filter — this one’s a bit deeper conceptually
but super powerful, especially in compression and multi-resolution analysis. Let's break it down
step-by-step.
🔷 2. Wavelet Filter
🎯 What is a Wavelet Filter?
A wavelet lter is used for analyzing and transforming signals/images at different scales or
resolutions. It’s not just for smoothing — it’s about breaking down the image into high- and low-
frequency components, both in space and scale.
Think of it as a microscope for images: you can zoom in and out to analyze ne vs coarse details.
📊 Wavelet vs Gaussian
# Visualize sub-bands
plt.figure(figsize=(10, 6))
titles = ['LL', 'LH', 'HL', 'HH']
for i, band in enumerate([LL, LH, HL, HH]):
plt.subplot(2, 2, i + 1)
plt.imshow(band, cmap='gray')
plt.title(titles[i])
plt.axis('off')
plt.tight_layout()
plt.show()
Property Description
Localized In both time (space) and frequency
Multiresolution Capture coarse and ne details
Many coef cients near zero → ideal for
Sparsity
compression
Page 50 of 73
fi
fi
Orthogonal /
Supports perfect reconstruction
Biorthogonal
Edge-aware Captures edges very effectively
Feature Value
Type Multi-resolution lter
Effect Decomposes image into frequency subbands
Domain Both spatial and frequency
Reversibl
Yes (can reconstruct original image)
e
Compression, denoising, analysis,
Uses
watermarking
Ready to move to the Gabor lter next? Or would you like a summary or comparison table at the
end too?
Awesome, let’s dive into the third and nal one: Gabor Filter — one of the most powerful tools
when it comes to texture analysis and edge detection in images.
It's inspired by how human vision perceives textures and edges — that’s why it’s used in face
recognition, iris scanning, ngerprint analysis, etc.
🧮 Mathematical Formula
The 2D Gabor lter is de ned as:
Page 51 of 73
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
Symbo
Meaning
l
Standard deviation of Gaussian (controls
σ
spread)
f Frequency of sinusoidal wave
θ Orientation of lter
φ Phase offset
γ Spatial aspect ratio (ellipticity)
/////
You can generate a bank of Gabor lters at multiple angles (e.g. 0°, 45°, 90°, 135°) and
frequencies to detect texture in different directions.
Page 52 of 73
fi
fi
fi
🧪 How It Works in Practice
1. Convolve the image with a bank of Gabor lters.
2. Each lter will respond strongly to features that match its orientation and frequency.
◦ Detect edges
◦ Analyze texture
Page 53 of 73
fi
fi
fi
fi
Property Meaning
Localized Both in space and frequency
You can detect edges in chosen
Orientation-speci c
directions
Frequency-sensitive Detects patterns of speci c scale
Biologically
Mimics visual cortex in mammals
inspired
Just feature extraction, not
Non-reversible
reconstruction
# Apply to image
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
filtered_img = cv2.filter2D(img, cv2.CV_8UC3, gabor_kernel)
# Show results
plt.subplot(1, 2, 1)
plt.imshow(img, cmap='gray')
plt.title("Original")
plt.subplot(1, 2, 2)
plt.imshow(filtered_img, cmap='gray')
plt.title("Gabor Filtered")
plt.show()
Feature Value
Type Linear, edge/texture lter
Effect Emphasizes edges/patterns in speci c orientations
Orientatio
Directional sensitivity
n
Frequency Yes
Page 54 of 73
fi
fi
fi
fi
Texture analysis, biometrics, face/ ngerprint
Uses
recognition
Would you like a summary table comparing all three lters, or maybe examples of when to
choose which one in a real-world scenario?
Page 55 of 73
fi
fi
UNIT -5
According to OpenCV:
“Contours are a curve that simply joins all the continuous points along the boundary having the
same color or intensity.”
• Fourier Descriptors
• Shape Context
🔸 Contour Properties
Property Description
Relationship between contours; helps in nesting structure
Hierarchy
detection.
Area Total number of pixels inside the contour. Useful for size ltering.
Perimeter Length of the contour boundary.
Centroid The center of mass (calculated using image moments).
Page 56 of 73
fi
fi
fi
Orientation Angle of object alignment or rotation.
Convexity Checks if the shape is convex or concave.
Bounding
Smallest rectangle that ts the contour.
Box
Convex Hull Tightest convex shape around the contour.
1. Thresholding-Based Preprocessing
2. Edge-Based Preprocessing
Algorithm Description
Canny Edge Detection Multi-step method; highly accurate for edge detection.
Sobel Operator Gradient-based; highlights horizontal and vertical edges.
Laplacian of Gaussian
Detects zero-crossings in second derivative; captures ne edges.
(LoG)
Scharr Operator Improved version of Sobel; better accuracy and rotation invariance.
Deep learning models (e.g., using CNNs) trained for contour/edge
CNN-Based Methods
detection.
🟡 Basic Detection:
Page 57 of 73
fi
fi
fi
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
🟢 Masking & Highlighting Contours:
highlight = np.ones_like(img)
cv2.drawContours(highlight, contours, -1, (0, 200, 175),
cv2.FILLED)
mask = np.zeros_like(img)
cv2.drawContours(mask, contours, -1, (255, 255, 255),
cv2.FILLED)
foreground = cv2.bitwise_and(img, mask)
Technique Purpose
Filtering Smooth or enhance detected contours.
Morphological
Modify shape/size using dilation, erosion, etc.
Ops
Extract useful data like area, centroid, etc., for object
Feature Extraction
classi cation.
Field Applications
Object Detection Identify objects using shape outlines.
Medical Imaging Detect tumors, cells, organs, etc.
Robotics Shape-based navigation or object manipulation.
Gesture Recognition Hand contour for nger counting, sign language, etc.
Industrial Detect size, shape, or position of products in quality
Inspection control.
Traf c Monitoring Detect and track vehicles using their contour outlines.
✅ Summary
• Contours highlight the boundary information of objects.
• Various properties (area, convexity, orientation, etc.) are used to analyze shapes.
• OpenCV provides simple and effective tools to detect and manipulate contours.
Page 58 of 73
fi
fi
fi
• Contours are essential for recognition, classi cation, and tracking tasks in modern CV
systems.
In simple terms:
• Extract meaningful information for further analysis (e.g., classi cation, recognition)
1. Thresholding-Based Segmentation
• Simplest method.
# Binary Threshold
ret, binary = cv2.threshold(gray_img, 127, 255,
cv2.THRESH_BINARY)
2. Region-Based Segmentation
• Segments pixels into regions based on prede ned criteria like similar intensity, color,
texture.
Page 59 of 73
fi
fi
fi
Types:
• Region Growing: Starts from seed points and grows by adding similar neighboring pixels.
• Region Splitting & Merging: Divides the image and merges similar adjacent regions.
3. Edge-Based Segmentation
4. Clustering-Based Segmentation
Algorithm Description
K-Means Groups pixels into K clusters based on color/texture
Mean Shift Iterative algorithm for nding dense regions in feature space
Fuzzy C- Like K-means but allows soft membership (a pixel can belong to multiple
Means clusters)
5. Watershed Algorithm
cv2.watershed(image, markers)
Modern, accurate, and widely used. Learns features directly from data.
Popular Architectures:
Model Description
Widely used in medical imaging. Encoder-decoder with skip
U-Net
connections.
Mask R-
Extends Faster R-CNN for instance segmentation.
CNN
DeepLab Uses atrous convolution for better boundary precision.
• scikit-image
• TensorFlow / Keras
🎯 Applications of Segmentation
✅ Summary
• Segmentation is essential for scene understanding in images and videos.
Page 61 of 73
fi
🧩 Template Matching in Image Processing
Essentially, you provide a template (a small image) and search where it appears in a larger
image.
It's a form of pattern recognition used to detect and locate objects in an image.
2. The algorithm slides the template across the input image (like a sliding window).
Method Description
cv2.TM_CCOEFF Correlation coef cient
cv2.TM_CCOEFF_NORME
Normalized correlation
D
Page 62 of 73
fi
fi
fi
cv2.TM_CCORR Cross-correlation
cv2.TM_CCORR_NORMED Normalized cross-correlation
Square difference (lower is
cv2.TM_SQDIFF better)
cv2.TM_SQDIFF_NORME
Normalized square difference
D
# Load images
img = cv2.imread('main_image.jpg', 0)
template = cv2.imread('template.jpg', 0)
w, h = template.shape[::-1]
# Template matching
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
cv2.imshow('Detected', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
⚠ Limitations
• Not scale or rotation invariant (fails if object is rotated or resized).
🧠 Alternatives / Improvements
Method Description
Page 63 of 73
Feature Matching (e.g., SIFT, Detects keypoints and matches descriptors — better for scale
ORB) and rotation changes.
Convolutional Neural
Learn patterns and features for robust object detection
Networks (CNNs)
Image Pyramids Helps with scale invariance in template matching
✅ Summary
• Template Matching is a simple but effective method for pattern detection.
📸 You use two cameras placed side by side to capture two views of the same scene. The disparity
(difference in position of objects) between these images helps in estimating depth.
Application Purpose
Autonomous vehicles Depth sensing for obstacle detection
3D reconstruction Creating 3D models of real-world scenes
Augmented reality Understanding depth for interactive
(AR) overlays
Robotics Navigation and environment understanding
Medical imaging Enhanced visualization of anatomy
Page 64 of 73
fi
fi
3. Calculate disparity (horizontal shift) for corresponding points.
Method Description
Block Matching Divides image into blocks and nds best match in other image
Semi-Global Matching
Optimizes disparity with local and global constraints
(SGM)
Graph Cuts Models disparity estimation as a graph optimization problem
Learns disparity estimation using CNNs or deep stereo
Deep Learning-Based
networks
🔹 Step-by-Step Example:
import cv2
import numpy as np
Page 65 of 73
fi
# Load stereo images (left and right views)
imgL = cv2.imread('left.jpg', 0)
imgR = cv2.imread('right.jpg', 0)
• Occlusions: Some parts visible in one view and not in the other.
💡 Modern Improvements
• Use of deep learning models (like PSMNet, StereoNet).
✅ Summary
• Stereo Imaging helps estimate depth and create 3D information from 2D images.
• Useful in many modern technologies like autonomous cars, robotics, and AR/VR.
📸 Computational Photography
It aims to improve image quality, extract information, or create novel visual experiences using
computation.
🔍 Core Goals
• Enhance low-light images
Technique Description
HDR Imaging (High Combines multiple exposures to create a balanced image with bright
Dynamic Range) and dark areas correctly exposed
Panorama Stitching Merges multiple images to form a wide-angle or 360° view
Depth from Focus/ Extracts depth info based on how blurry or sharp objects appear
Defocus
Refocusing/Light Field Lets you refocus an image after capture (e.g., Lytro cameras)
Imaging
Image Deblurring Removes motion blur caused by camera shake or moving objects
Super Resolution Enhances image resolution using interpolation or deep learning
Photometric Stereo Recovers surface normals and lighting using multiple images under
varying lighting conditions
Style Transfer & Filters Uses AI to apply artistic styles or lters to images (e.g., Prisma app)
Page 67 of 73
fi
📱 Real-World Applications
# Merge to HDR
merge_debvec = cv2.createMergeDebevec()
hdr = merge_debvec.process([img1, img2, img3])
# Tonemap to display
tonemap = cv2.createTonemap(2.2)
ldr = tonemap.process(hdr)
ldr = cv2.normalize(ldr, None, 0, 255, cv2.NORM_MINMAX)
ldr = cv2.convertScaleAbs(ldr)
✅ Summary
• Computational Photography transforms basic image capture into smart image processing.
They are inspired by the visual cortex of the human brain and are the backbone of computer vision
tasks like image classi cation, object detection, and segmentation.
Layer Description
Convolutional Applies lters/kernels to extract features like edges or textures.
Layer
Activation Function Adds non-linearity (e.g., ReLU = max(0, x)) to make the network learn
complex patterns.
Downsamples the feature maps to reduce dimensionality and computation.
Pooling Layer
Common pooling: MaxPooling.
Page 69 of 73
fi
fi
fi
Fully Connected
Final layers that atten the data and classify based on learned features.
Layer (FC)
Dropout Layer Prevents over tting by randomly disabling neurons during training.
2. Convolution Layer
Apply multiple lters to extract features
(e.g., edges, corners)
3. Activation (ReLU)
Introduce non-linearity
4. Pooling
Reduce the feature map size
(e.g., from 32x32 → 16x16)
5. Repeat
Multiple conv → relu → pool layers
(deeper = more abstract features)
🔍 Example Architecture
Input (32x32x3)
↓
Conv Layer (5x5 filter, 6 filters) → ReLU
↓
Max Pooling (2x2)
↓
Conv Layer (5x5 filter, 16 filters) → ReLU
↓
Max Pooling (2x2)
↓
Flatten
↓
Fully Connected Layer (120) → ReLU
↓
Fully Connected Layer (84) → ReLU
↓
Output Layer (10 classes, softmax)
Page 70 of 73
fi
fi
fl
This is similar to LeNet-5, one of the earliest CNNs used for digit recognition.
📚 Applications of CNNs
Area Application
Recognizing objects in images (e.g., cats, dogs, traf c
Image Classi cation
signs)
Object Detection Identifying and locating objects (e.g., YOLO, SSD)
Semantic
Labeling every pixel (e.g., medical imaging)
Segmentation
Facial Recognition Identifying or verifying faces (e.g., Face ID)
Scene Understanding Autonomous vehicles, robotics
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(64,
64, 3)),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
Page 71 of 73
fi
fi
fi
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])
✅ Summary
• CNNs are powerful models for image and visual data.
So we use a special type of deep learning model called a Convolutional Neural Network (CNN) to
help the computer understand what’s in the image.
It’s called "convolutional" because it uses a small grid (called a lter or kernel) that slides over the
image to detect patterns like:
• Edges 🔲
• Corners ◻
• Shapes 🟡
If we give all these pixel values directly to a normal neural network, it will:
Page 72 of 73
fi
CNNs x this by:
• Looks for edges, like where one color sharply changes to another.
For example:
👁 Real-life example:
When you upload a photo to Facebook and it tags your friend’s face — it’s using a CNN to
recognize features like:
• Eyes
• Nose
• Mouth
→ and says “This looks like Akanksha!”
Page 73 of 73
fi
fi