Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views73 pages

IP Notes Unit 2,3,4,5

The document covers the basics of human vision, including the anatomy of the eye and the process of image formation. It explains color perception through the trichromatic theory and introduces alternative color models like HSV and CMYK. Additionally, it discusses illumination models and techniques for enhancing image quality, such as histogram operations and color adjustments.

Uploaded by

Akanksha Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views73 pages

IP Notes Unit 2,3,4,5

The document covers the basics of human vision, including the anatomy of the eye and the process of image formation. It explains color perception through the trichromatic theory and introduces alternative color models like HSV and CMYK. Additionally, it discusses illumination models and techniques for enhancing image quality, such as histogram operations and color adjustments.

Uploaded by

Akanksha Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

UNIT-2

Basics of Image Perception: Anatomy of Human


Vision
The human eye works like a biological camera. Here's a deeper look into how we perceive images:

📌 Key Components:

• Cornea: Transparent layer that rst bends (refracts) light entering the eye.

• Pupil: The black circular opening that adjusts in size to control the amount of light.

• Iris: The colored part of the eye; it adjusts the pupil size based on brightness.

• Lens: Focuses light rays onto the retina by changing its shape (accommodation).

• Retina: The innermost layer with light-sensitive cells—this is where the image is actually
formed.

🧠 Photoreceptors:

• Rods (about 120 million):

◦ Work in dim light (night vision).

◦ Sensitive to brightness, but not color.

◦ Help in detecting shapes and motion.

• Cones (about 6 million):

◦ Work in bright light.

◦ Responsible for color vision.

◦ Three types:

▪ S-cones (Short wavelength) → Blue

▪ M-cones (Medium wavelength) → Green

▪ L-cones (Long wavelength) → Red

🔄 Image Formation Process:

1. Light enters and is focused by the cornea and lens.

2. It hits the retina, where rods and cones convert it to electrical signals.

Page 1 of 73
fi
3. These signals travel via the optic nerve to the visual cortex in the brain.

4. The brain processes the signals and forms a complete image.

📍 Fun fact: The image formed on the retina is actually inverted, but the brain ips it for correct
perception!

Awesome! Let’s move on to the next topic:

Color Perception – Trichromatic Color Models and


Visible Light Spectrum
🎨 Color Perception

Our brain interprets different wavelengths of light as different colors. This is thanks to cone cells in
our retina.

📌 Trichromatic Theory (Young-Helmholtz Theory):

• Proposes that the human eye has three types of cone cells, each sensitive to a different
range of wavelengths:

◦ S-cones – Sensitive to short wavelengths (blue light, ~420 nm)

◦ M-cones – Sensitive to medium wavelengths (green light, ~530 nm)

◦ L-cones – Sensitive to long wavelengths (red light, ~560 nm)

🧠 Our brain combines signals from these three cones to perceive all colors. For example:

• Equal stimulation of all three = white

• Stimulation of only L-cones = red

• Stimulation of M and L = yellow, and so on.

🌈 Visible Light Spectrum

This is the portion of the electromagnetic spectrum that the human eye can detect:

Wavelength
Color
Range
Violet ~380–450 nm
Blue ~450–495 nm
Green ~495–570 nm

Page 2 of 73
fl
Yello
~570–590 nm
w
Orang
~590–620 nm
e
Red ~620–750 nm

• Below 380 nm = Ultraviolet (invisible)

• Above 750 nm = Infrared (also invisible)

📍 Interesting fact: We don’t see “pure” colors all the time—our perception is often a mix of
multiple wavelengths!

Would you like to continue to 3. Alternative Color Models – HSV and CMYK next?

Great! Let’s move on to:

Alternative Color Models – HSV and CMYK


These models are used for different applications like digital screens, graphic design, or printing.
Let’s break them down:

🎨 HSV (Hue, Saturation, Value)

HSV is more intuitive for humans than RGB because it separates color (hue) from brightness
(value) and vividness (saturation).

📌 Components:

• Hue (H):

◦ The type of color (red, green, blue, etc.)

◦ Measured in degrees (0°–360°) on a color wheel.

▪ 0° = Red

▪ 120° = Green

▪ 240° = Blue

• Saturation (S):

◦ How pure the color is (0% = gray, 100% = full color).

• Value (V):

◦ How bright the color is (0% = black, 100% = full brightness).


Page 3 of 73
💡 Use Cases:

• Common in color pickers in design tools (e.g., Photoshop, Figma).

• Helps users choose and tweak colors more intuitively.

🖨 CMYK (Cyan, Magenta, Yellow, Key/Black)

CMYK is a subtractive color model used in printing.

📌 Components:

• Cyan (C)

• Magenta (M)

• Yellow (Y)

• Key (Black) (K)

🧾 In subtractive mixing:

• Paper starts white, and ink removes (subtracts) brightness.

• Mixing all CMY ideally makes black, but in reality, it makes a muddy brown, so Black (K)
is added for depth and contrast.

🖨 Use Cases:

• Used by printers, magazines, newspapers, etc.

• RGB to CMYK conversion is necessary for color accuracy in print.

🆚 Quick Comparison Table:

Use
Model Type Notes
Case
RGB Screens Additive Light-based model
More intuitive for human color
HSV Design Additive
selection
CMY Subtractiv
Printing Ink-based model
K e

Ready to move on to 4. Illumination Models – Ambient, Diffuse, Specular?

Awesome! Now let’s dive into:

Page 4 of 73
Illumination Models – Ambient, Di use, Specular
These models describe how light interacts with surfaces, which is super important in computer
graphics and 3D rendering to make scenes look realistic.

💡 1. Ambient Illumination

• Think of this as the general background light in a scene.

• It doesn’t come from a speci c source.

• Affects all objects equally, regardless of their position or orientation.

📌 Features:

• Simulates indirect light (like sunlight bouncing off walls).

• Prevents parts of objects from appearing completely black.

• Doesn’t cast shadows.

✅ Use Case:

Good for ensuring basic visibility of objects in dark scenes.

💡 2. Diffuse Illumination

• This is light that hits a surface and scatters evenly in all directions.

• It depends on the angle between the light and the surface.

📌 Based on Lambert's Cosine Law:

• Intensity = Light Intensity × cos(θ), where θ is the angle between:

◦ the surface normal (perpendicular vector), and

◦ the direction of incoming light.

✅ Use Case:

Creates matte-looking surfaces and helps reveal the shape and depth of an object.

💡 3. Specular Illumination

Page 5 of 73
fi
ff
• This simulates shiny highlights or re ections on glossy surfaces.

• It depends on the viewer’s position, unlike diffuse light.

📌 Features:

• High intensity in a small region → shinier surface.

• Based on the angle between the viewer and the re ected light.

• Controlled by a shininess factor (e.g., Phong or Blinn-Phong models).

✅ Use Case:

Used for metallic, glassy, or wet surfaces. Adds realism and detail.

✨ Visual Analogy:

Imagine a shiny ball under a spotlight:

• Ambient light lights up the whole ball dimly.

• Diffuse shows its roundness by adding shading.

• Specular gives it that bright spot of shine.

Would you like to go to 5. Density – Halftones and Dithering next?

Perfect! Let’s wrap up with the nal topic:

Density – Halftones and Dithering


These techniques are used to simulate continuous-tone images (like photographs) using limited
colors—especially in printing and digital displays.

🖼 Halftones

📌 What is it?

Halftoning is a technique that simulates shades of gray (or color) using dots of different sizes or
spacing.

• The dots are printed close together and vary in size:

◦ Large dots = Darker areas

Page 6 of 73
fi
fl
fl
◦ Small dots = Lighter areas

🖨 Where it’s used:

• Newspapers, magazines, and other print media.

• Works well because printers typically print using only one ink color per channel (e.g.,
black in grayscale, CMYK in color).

🔍 Example:

Up close, a halftone photo is just a bunch of dots. From a distance, the eye blends them into
smooth shades.

🎛 Dithering

📌 What is it?

Dithering is a technique that creates the illusion of color depth by arranging pixels of different
colors.

• Instead of varying the size of dots (like halftoning), it spreads colors around to simulate
intermediate shades.

🧠 How it works:

• If a device supports only 2 colors (e.g., black & white), you can still simulate gray by
mixing the two in a certain pattern.

🔧 Types of Dithering:

• Floyd–Steinberg Dithering: Error diffusion method; spreads the error of a pixel to its
neighbors.

• Ordered Dithering: Uses a xed pattern to determine how pixels are displayed.

💻 Where it’s used:

• Low-color displays, old video games, GIFs.

• Also used in image compression to reduce le sizes.

🆚 Halftone vs Dithering:

Feature Halftone Dithering

Page 7 of 73
fi
fi
Uses dots Yes, of varying size No, uses pixels in patterns
Common Digital media & low-bit
Printing
in graphics
Continuous tones in
Simulates More colors with limited palette
print

Let me know if you want diagrams, examples, or quiz-style revision for any of the topics!

Page 8 of 73
UNIT-3
Histogram Operations
A histogram is a graphical representation of the distribution of pixel intensities (brightness) in an
image. It shows how many pixels in an image have a particular intensity value. In an 8-bit
grayscale image, pixel intensities range from 0 (black) to 255 (white).

Key Histogram Operations

1. Histogram Equalization

What is it?

Histogram Equalization is a method to enhance the contrast of an image. The main goal is to
spread out the pixel intensity values more evenly across the entire range, resulting in an image with
better contrast.

A histogram in image processing is a graphical representation of the intensity distribution of an


image's pixels.

📌 What does it do?

• X-axis: Represents pixel intensity values (0 for black, 255 for white in an 8-bit grayscale
image).

• Y-axis: Represents the frequency of pixels with that intensity.

How it works:

1. Calculate the histogram of the image: It shows the distribution of pixel values.

2. Compute the cumulative distribution function (CDF) from the histogram. The CDF is
essentially the cumulative sum of the histogram values.

3. Normalize the CDF: This step rescales the cumulative values so that they span the entire
intensity range, typically from 0 to 255.

4. Map each pixel's intensity: The pixel values in the original image are mapped to new
values using the normalized CDF. The result is an image where pixel intensities are more
evenly distributed.

Example Use Case:

• If an image is too dark or too bright (e.g., there are very few dark or bright pixels),
histogram equalization can stretch the pixel values to cover a broader range, improving the
overall visibility.

• Contrast adjustment: You can stretch or shrink the histogram to enhance image contrast.

• Image equalization: The process of adjusting the image to make the histogram at,
spreading pixel values across the entire range.
Page 9 of 73
fl
Effect:

• The image will appear with enhanced contrast, especially in low-contrast regions.

2. Histogram Stretching

What is it?

Histogram Stretching (also known as Contrast Stretching) is a simpler approach to improve an


image's contrast by linearly scaling the pixel values to cover the full intensity range (0 to 255 for
an 8-bit image).

How it works:

1. Identify the minimum and maximum pixel values in the image.

2. Stretch the pixel values: The minimum pixel value is mapped to 0, and the maximum is
mapped to 255.

3. Every pixel intensity in the image is then linearly rescaled to t the new intensity range.

Example Use Case:

• This technique is useful for improving the visibility of an image that might have very
narrow pixel intensity ranges, such as when an image is underexposed.

Effect:

• The pixel intensities are stretched across the full range (0–255), resulting in improved
contrast for the image.

3. Histogram Speci cation (Matching)

What is it?

Histogram Speci cation (or Histogram Matching) is a technique used to modify the histogram of
an image so that it matches a given reference histogram.

How it works:

1. Calculate the histogram of the source image.

2. Calculate the histogram of the target (reference) image.

3. Use the reference histogram to modify the pixel values of the source image, ensuring that
the resulting image has a similar intensity distribution to the target.

Example Use Case:

• When you need to adjust an image to match the brightness and contrast of a desired
reference image (e.g., matching an image to a standard look).

Page 10 of 73
fi
fi
fi
Effect:

• The image will adopt a histogram similar to the reference image, which may involve both
contrast and brightness adjustments.

Why Use Histogram Operations?

• Enhancing Contrast: As mentioned, histogram equalization and stretching can dramatically


enhance the visibility of details in dark or bright areas.

• Image Segmentation: Better contrast can make it easier to segment objects or regions of
interest in an image.

• Visual Consistency: Matching histograms can be used to make images visually consistent
in brightness and contrast.

Example Code (Histogram Equalization in Python)

Here’s an example of how you might perform histogram equalization using OpenCV (a popular
computer vision library):

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Read an image in grayscale


image = cv2.imread('input_image.jpg', cv2.IMREAD_GRAYSCALE)

# Perform histogram equalization


equalized_image = cv2.equalizeHist(image)

# Display the original and equalized histograms


plt.figure(figsize=(12, 6))

# Original Histogram
plt.subplot(1, 2, 1)
plt.hist(image.ravel(), bins=256, color='black')
plt.title('Original Histogram')

# Equalized Histogram
plt.subplot(1, 2, 2)
plt.hist(equalized_image.ravel(), bins=256, color='black')
plt.title('Equalized Histogram')

plt.show()

Page 11 of 73
# Save the equalized image
cv2.imwrite('equalized_image.jpg', equalized_image)
Explanation:

• cv2.equalizeHist(): This function is used to equalize the histogram of the


grayscale image.

• plt.hist(): Displays the histogram for both the original and equalized images.

Conclusion

• Histogram operations are fundamental in image processing to improve image quality,


contrast, and visualization.

• Histogram Equalization and Histogram Stretching are commonly used techniques to


enhance image clarity, especially in low-contrast or poorly-lit images.

Page 12 of 73
Color Operations
Color operations refer to modifying or enhancing the color properties of an image, including
adjusting the brightness, contrast, or color balance. These operations work with different color
spaces or models (like RGB, HSV, etc.) and apply mathematical functions to modify pixel values.

1. Brightness Adjustment

What is it?

Brightness adjustment involves altering the overall lightness or darkness of an image. By


increasing the brightness, you make the image lighter, and by decreasing it, you make the image
darker.

How it works:

• To adjust the brightness, you add or subtract a constant value to all pixel intensity values in
an image.

• If the pixel values are in the range [0, 255], adding a positive value will increase the
brightness, while subtracting a value will decrease it.

Mathematical Representation:.

Example Use Case:

• Increase brightness: When an image appears too dark or lacks detail in dark areas,
increasing the brightness can make the details more visible.

• Decrease brightness: When an image is too bright or washed out, reducing the brightness
can help enhance details.

Effect:

• Increased brightness results in an image that appears lighter and more illuminated.

Page 13 of 73
• Decreased brightness results in an image that appears darker.

2. Contrast Adjustment

What is it?

Contrast adjustment involves modifying the difference between the darkest and lightest areas of
the image. Increasing the contrast makes the image appear more vibrant and detailed, while
decreasing the contrast results in a more muted image.

How it works:

• Contrast can be adjusted by stretching or compressing the pixel intensity values. The goal
is to either expand the intensity range to improve visual separation between light and dark
areas or compress it to create a more at image.

• Contrast adjustment is typically done using a linear transformation of the pixel values.

Mathematical Representation:

Example Use Case:

• Increase contrast: When an image appears at with little distinction between dark and light
areas, increasing contrast can make it more visually striking.

• Decrease contrast: When the image has too many stark transitions, reducing contrast can
soften the image.

Effect:

Page 14 of 73
fl
fl
• Increased contrast results in brighter whites and darker blacks, making details stand out
more.

• Decreased contrast results in an image with more even intensity distribution and less visual
differentiation between light and dark areas.

3. Color Balance Adjustment

What is it?

Color balance refers to the adjustment of the intensities of the primary colors (Red, Green, and Blue
in the RGB model) to correct or enhance the overall color appearance of the image. This operation
is particularly useful when the image has an undesirable color cast (e.g., too much red, too much
blue).

How it works:

• In the RGB color model, an image is represented as a combination of Red, Green, and Blue
channels. By adjusting the intensities of these channels, you can change the image's color
balance.

• You can modify the intensity of each color channel by adding or subtracting a constant value
or by multiplying with a scaling factor.

Example Use Case:

• Warm tones: If the image looks too cold (e.g., too much blue), you can increase the red and
green channels to give the image a warmer appearance.

• Cool tones: If the image looks too warm (e.g., too much red), you can reduce the red
channel or increase the blue channel to make the image cooler.

Effect:

• Adjusting the color balance makes the image look more natural or visually pleasing by
correcting unwanted color casts or enhancing certain tones.

4. Channel Manipulation

What is it?

Channel manipulation involves manipulating individual color channels (Red, Green, Blue) of an
image to either modify or enhance certain parts of the image.

How it works:

• An RGB image consists of three channels: Red, Green, and Blue.

• Each channel can be manipulated separately. For instance, increasing the red channel will
enhance the red elements of the image, while reducing the blue channel can make the image
look warmer.

Page 15 of 73
Example Use Case:

• Red Channel Enhancement: If you want to make the image appear more reddish, you can
increase the values of the red channel and adjust the others accordingly.

• Grayscale Conversion: By setting the red, green, and blue channels to the same value for
all pixels, you can convert the image to grayscale.

Effect:

• Channel manipulation can dramatically change the look of an image and allow for artistic
effects or corrective color adjustments.

Example Code (Brightness and Contrast Adjustment in Python)

Here’s an example using OpenCV to adjust the brightness and contrast of an image:

import cv2
import numpy as np

# Read the image


image = cv2.imread('input_image.jpg')

# Brightness and contrast adjustment


alpha = 1.2 # Contrast control (1.0-3.0)
beta = 50 # Brightness control (0-100)

# Apply the formula: new_image = alpha * image + beta


adjusted_image = cv2.convertScaleAbs(image, alpha=alpha,
beta=beta)

# Display the original and adjusted image


cv2.imshow('Original Image', image)
cv2.imshow('Adjusted Image', adjusted_image)

# Wait until a key is pressed


cv2.waitKey(0)
cv2.destroyAllWindows()

# Save the adjusted image


cv2.imwrite('adjusted_image.jpg', adjusted_image)
Explanation:

• alpha: Controls the contrast (a higher value increases contrast).

• beta: Controls the brightness (a higher value increases brightness).

Page 16 of 73
Conclusion

• Color operations allow you to adjust and manipulate the appearance of an image, either to
enhance certain features (like brightness and contrast) or to correct color balance issues.

• Brightness and contrast adjustment can signi cantly improve the visibility of details in an
image, while color balance and channel manipulation provide ways to correct or enhance
speci c tones.

Let me know if you would like to explore any speci c color operation in more detail or need code
examples!

Pixel-Level Operations in Image Processing


Pixel-level operations are the most fundamental operations in image processing, where individual
pixel values are manipulated directly to achieve various effects. These operations can involve
changing pixel values based on mathematical functions, color manipulations, or conditions.

1. Thresholding

What is it?

Thresholding is a simple method of image segmentation that converts a grayscale image into a
binary image (black and white) based on a threshold value. It helps separate objects in an image
from the background by making pixels above a certain intensity white (255) and those below the
threshold black (0).

Example Use Case:

• Object detection: Thresholding is commonly used to separate objects from the background
in cases where objects have a clear intensity contrast with the background.

• Binarization: When converting grayscale images to binary images for further analysis.

Effect:

• The result is a binary image where only the pixels that meet the threshold criteria are white,
and the rest are black.

2. Inversion

What is it?

Page 17 of 73
fi
fi
fi
Inversion is the process of inverting the pixel values of an image. For each pixel, its intensity is
inverted by subtracting the pixel value from the maximum possible value (usually 255 for 8-bit
images).

Example Use Case:

• Negative images: Inverting the pixel values can create a photographic negative, which can
be useful in artistic effects or for creating more distinct separations between different
regions of an image.

Effect:
Page 18 of 73
• The darkest pixels (0) become the lightest (255), and the lightest pixels (255) become the
darkest (0).

3. Pixel Addition/Subtraction

What is it?

Pixel addition and subtraction involve adding or subtracting a constant value to/from the pixel
intensities of an image. This can be used to adjust brightness, enhance features, or perform other
transformations.

Example Use Case:

• Brightness adjustment: By adding a constant value to all pixels, you can increase the
overall brightness of an image.

• Image correction: Pixel subtraction can be used to darken or correct an overexposed image.

Effect:

• Addition: Increases the pixel values, making the image brighter.

• Subtraction: Decreases the pixel values, making the image darker.

4. Bitwise Operations

What is it?

Bitwise operations are logical operations that directly manipulate the bits of pixel values. These
operations are used for tasks like image masking, image merging, and logical comparison between
pixel values.

Common bitwise operations include:

• AND (&)

Page 19 of 73
• OR (|)

• XOR (^)

• NOT (~)

How it works:

For each pixel, the corresponding bitwise operation is applied between two pixel values.

• AND: The result pixel will have 1 only if both the pixels have 1 in the corresponding bit
position.

• OR: The result pixel will have 1 if either of the pixels has 1 in the corresponding bit
position.

• XOR: The result pixel will have 1 if only one of the pixels has 1 in the corresponding bit
position.

• NOT: Inverts all the bits of the pixel.

Example Use Case:

• Masking: Bitwise operations are useful for creating masks where certain areas of the image
are preserved and others are ignored.

• Combining Images: Bitwise operations can be used to combine two images in creative
ways, such as overlaying one image over another.

Effect:

• AND: Allows you to isolate certain bits or regions.

• OR: Combines information from two images or regions.

• XOR: Creates a new pattern based on differences.

• NOT: Reverses the pixel information, effectively inverting the image.

5. Color Channel Operations

What is it?

In color images, operations can be applied on individual color channels (Red, Green, and Blue). For
example, you might want to enhance only the red channel of an image while leaving the other
channels unchanged.

How it works:

For each color channel, you can manipulate its pixel values in various ways, such as increasing the
intensity of the red channel or setting the green channel to zero for a monochromatic image with
only red hues.

Page 20 of 73
Example Use Case:

• Grayscale Conversion: By averaging or selecting one color channel (such as the green
channel), you can create a grayscale image from a color image.

• Color isolation: Isolate a color channel to enhance or focus on a particular color in the
image.

Effect:

• Grayscale: Reduces the image to shades of gray by manipulating the RGB channels.

• Color enhancement: Increases or decreases the intensity of speci c colors in the image.

Example Code (Thresholding and Inversion in Python)

import cv2
import numpy as np

# Read the image in grayscale


image = cv2.imread('input_image.jpg', cv2.IMREAD_GRAYSCALE)

# Thresholding
_, thresholded_image = cv2.threshold(image, 128, 255,
cv2.THRESH_BINARY)

# Inversion
inverted_image = cv2.bitwise_not(image)

# Display the original, thresholded, and inverted images


cv2.imshow('Original Image', image)
cv2.imshow('Thresholded Image', thresholded_image)
cv2.imshow('Inverted Image', inverted_image)

# Wait until a key is pressed and close the windows


cv2.waitKey(0)
cv2.destroyAllWindows()

# Save the images


cv2.imwrite('thresholded_image.jpg', thresholded_image)
cv2.imwrite('inverted_image.jpg', inverted_image)
Explanation:

• Thresholding: Converts the image to binary based on a threshold value (128 in this case).

• Inversion: Inverts the pixel values of the grayscale image.

Page 21 of 73
fi
Conclusion

• Pixel-level operations are fundamental in image processing and include a variety of


techniques such as thresholding, inversion, and bitwise operations.

• These operations allow you to manipulate the individual pixel values of an image to achieve
various effects, such as segmentation, enhancement, and masking.

Let me know if you'd like to dive deeper into any of these operations or need further explanations!

Let's proceed to the next topic: Non-Linear and Morphological Operations in image processing.

Non-Linear and Morphological Operations


Non-linear operations in image processing often focus on structuring the image to enhance features
or remove noise. These operations don't follow a simple pixel-by-pixel mathematical relationship
but instead depend on the local neighborhood of each pixel. Morphological operations, a subset of
non-linear operations, are particularly useful in binary images and deal with the shape or structure
of objects.

1. Noise Filtering

What is it?

Noise ltering refers to the process of removing unwanted variations or "noise" from an image.
Noise can be caused by various factors like sensor errors, transmission errors, or environmental
conditions. Noise ltering smooths the image and makes it easier to perform further image analysis.

Types of Noise:

• Gaussian Noise: A form of statistical noise with a bell-curve distribution of pixel values.

• Salt-and-Pepper Noise: Noise that appears as black and white pixels scattered randomly
over the image.

How it works:

Noise ltering uses various techniques, such as:

• Mean Filter: Replaces each pixel with the average value of its neighbors.

• Median Filter: Replaces each pixel with the median value of its neighbors. It is particularly
effective at removing salt-and-pepper noise.

• Gaussian Filter: Applies a weighted average where closer pixels contribute more to the
nal value, smoothing the image while preserving edges.

Page 22 of 73
fi
fi
fi
fi
Example Use Case:

• Noise Removal: Filters like median or Gaussian lters can be used to remove salt-and-
pepper noise or Gaussian noise from images.

• Smoothing: Reducing high-frequency components in an image to remove noise without


signi cantly affecting the edges.

Effect:

• Mean Filter: Reduces high-frequency noise, but may blur edges.

• Median Filter: Effectively removes salt-and-pepper noise without blurring edges.

• Gaussian Filter: Smoothens the image by reducing high-frequency noise, preserving edges
better than a mean lter.

2. Dilation

What is it?

Dilation is a morphological operation used to expand the white regions (foreground) of a binary
image. It involves placing a structuring element (a small binary shape) on each pixel of the image
and setting the pixel to 1 if any part of the structuring element overlaps with the foreground.

How it works:

• For each pixel in the image, check if the structuring element overlaps with any white pixel
(1) in the neighborhood.

• If it does, set the pixel under the structuring element to 1.

Example Use Case:

• Enhance Object Size: Dilation is often used to expand the size of objects in a binary image.
This can help in connecting broken parts of an object.

• Feature enhancement: It is used in object recognition to enhance features like edges or


regions of interest.

Effect:

• Increases the area of foreground objects and lls small holes or gaps.

3. Erosion

What is it?

Erosion is the opposite of dilation. It is a morphological operation that shrinks the foreground
(white) regions of a binary image. It works by placing a structuring element at each pixel and
setting the pixel to 0 if any part of the structuring element overlaps with the background (black) in
the image.
Page 23 of 73
fi
fi
fi
fi
How it works:

• For each pixel in the image, check if the structuring element ts completely within the
foreground (1) region.

• If it doesn’t t, set the pixel to 0.

Example Use Case:

• Shrink Objects: Erosion is useful when you want to shrink the foreground objects in a
binary image.

• Remove Small Objects: It is often used to remove small noise and artifacts in an image.

Effect:

• Reduces the size of foreground objects and removes small noise.

4. Majority Operation

What is it?

The majority operation involves replacing the value of a pixel with the most frequent value in its
neighborhood. It is a non-linear operation, often used for noise reduction.

How it works:

• For each pixel, examine its neighborhood and nd the most frequent pixel value (either 0 or
1 for binary images).

• Replace the pixel with the majority value.

Example Use Case:

• Noise Reduction: The majority operation can be used to remove isolated noise pixels in a
binary image, ensuring that pixels are replaced by the majority value of their neighbors.

Effect:

• Removes isolated noise and smooths the image while preserving overall structure.

Example Code (Dilation and Erosion in Python)

Here’s an example in Python using OpenCV to perform dilation and erosion:

import cv2
import numpy as np

# Read the image in grayscale


image = cv2.imread('input_image.jpg', cv2.IMREAD_GRAYSCALE)

Page 24 of 73
fi
fi
fi
# Convert the image to binary
_, binary_image = cv2.threshold(image, 128, 255,
cv2.THRESH_BINARY)

# Define a structuring element (kernel)


kernel = np.ones((5, 5), np.uint8)

# Perform Dilation
dilated_image = cv2.dilate(binary_image, kernel,
iterations=1)

# Perform Erosion
eroded_image = cv2.erode(binary_image, kernel, iterations=1)

# Display the original, dilated, and eroded images


cv2.imshow('Original Image', binary_image)
cv2.imshow('Dilated Image', dilated_image)
cv2.imshow('Eroded Image', eroded_image)

# Wait until a key is pressed and close the windows


cv2.waitKey(0)
cv2.destroyAllWindows()

# Save the images


cv2.imwrite('dilated_image.jpg', dilated_image)
cv2.imwrite('eroded_image.jpg', eroded_image)
Explanation:

• Dilation: The dilated image will show an expanded foreground, and small gaps in objects
may be lled.

• Erosion: The eroded image will shrink the foreground objects, and small objects may be
removed.

Conclusion

• Noise ltering: It removes unwanted variations in an image, improving clarity for further
processing.

• Dilation and Erosion: These morphological operations are used to modify the shape of
objects in a binary image. Dilation expands objects, while erosion shrinks them.

• Majority operation: A noise reduction technique that replaces pixel values with the most
common value in its neighborhood.

Page 25 of 73
fi
fi
Morphological operations are particularly useful when dealing with binary images or images where
structural changes are important, such as in object detection, image segmentation, and noise
reduction.

Let me know if you'd like to explore further or have any questions!

Geometric Operations
geometric transformations — a fundamental toolset that enables the modi cation of
an image’s spatial arrangement. These transformations are essential for a variety of
applications, including computer vision, medical imaging, robotics, and more.

Geometric transformations involve altering an image’s geometry by manipulating its


pixels based on prede ned mathematical operations. These operations can include
translations, rotations, scalings, and shearing, among others. By applying these
transformations, we can reposition, resize, or distort an image in various ways while
preserving its structural integrity.

Understanding the intricacies of geometric transformations in image processing is


critical for practitioners and researchers alike. It forms the foundation for a wide
array of image-based applications, such as image registration, object recognition,
panoramic image stitching, and augmented reality.

Reflection
Re ection transformation in image processing is a geometric operation that involves
ipping an image across a speci c axis. The re ection can be done horizontally,
vertically, or diagonally, resulting in a mirrored version of the original image.

Horizontal Re ection
In horizontal re ection, each row of pixels is reversed, creating a mirror image along
the horizontal axis. Let I represent the original image and I_rh represent the
horizontally re ected image.
This is the equation of this transformation:

where i and j are row and column indices, and h is the height of the image. Here’s the
code implementation of horizontal re ection.
def horizontal_re ection(image):
Page 26 of 73
fl
fl
fl
fl
fl
fl
fi
fi
fl
fl
fi
height, width, channels = image.shape
re ected_image = np.zeros((height, width, channels), dtype=np.uint8)
for i in range(height):
re ected_image[i, :] = image[height - i - 1, :]
return re ected_image

Vertical Re ection
In vertical re ection, each column of pixels is reversed, creating a mirror image along
the vertical axis. Let I represent the original image and I_rv represent the vertically
re ected image.
The equation of this transformation is given by:

where i and j are row and column indices, and w is the width of the image. Here’s the
code implementation of vertical re ection.
def vertical_re ection(image):
height, width, channels = image.shape
re ected_image = np.zeros((height, width, channels), dtype=np.uint8)
for j in range(width):
re ected_image[:, j] = image[:, width - j - 1]
return re ected_image

Translation
Translation is a fundamental geometric transformation involving the shifting of an
object within an image from one location to another. This shift can occur both
horizontally and vertically, determined by speci ed offset values measured in pixels.
The translation equations for transforming the coordinates of a point (x, y) to a new
point (x’, y’) with respective horizontal and vertical offsets (tx, ty) can be expressed
as follows:

To facilitate this transformation, we utilize a transformation matrix in the form of a


2×3 array:

Page 27 of 73
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fl
fi
where tx represents the horizontal shift and ty represents the vertical shift, both
denoted in the number of pixels by which we need to shift in their respective
directions.
To implement this translation transformation in Python,
the cv2.warpAf ne() function is utilized, requiring the translation matrix and the
image as input. Here’s a Python script demonstrating how to achieve image
translation:
def translate_image(image, tx, ty):
# De ne the translation matrix
translation_matrix = np. oat32([[1, 0, tx], [0, 1, ty]])

# Apply the translation using cv2.warpAf ne()


img_translation = cv2.warpAf ne(image, translation_matrix, (image.shape[1],
image.shape[0]))

return img_translation

# Example translation parameters (tx, ty)


tx = 50
ty = -30

# Load your image (assuming you have 'image' variable)


# img = cv2.imread('your_image_path.jpg')

# Translate the image


translated_img = translate_image(img, tx, ty)
This script de nes a function translate_image() to translate an input image by the
speci ed horizontal (tx) and vertical (ty) shifts using the provided transformation
matrix and the cv2.warpAf ne() function.

Scaling
Scaling, a core geometric transformation, involves resizing an image in the
horizontal (x) and/or vertical (y) directions. This transformation is achieved using
scaling equations, which determine the new size of the image based on speci ed
scaling factors.

Page 28 of 73
fi
fi
fi
fi
fl
fi
fi
fi
fi
Let’s denote the scaling factors for the x and y directions as Sx and Sy respectively.
The scaling equations for transforming the coordinates of a point (x, y) to a
new point (x’, y’) with the scaling factors Sx and Sy can be expressed as
follows:

In these equations, x’ and y’ represent the coordinates of the point after scaling,
while x and y are the original coordinates. The scaling factors Sx and Sy determine
the extent of scaling in the respective directions. If Sx and Sy are greater than 1, the
image is enlarged in the x and y directions, respectively. Conversely, if Sx and Sy are
less than 1, the image is reduced in size.
To implement scaling in Python, we use transformation matrices. The scaling
transformation matrix takes the form of a 2×2 array:

In this matrix, Sx represents the scaling factor in the x-direction, and Sy represents
the scaling factor in the y-direction. The scaling matrix is then applied to each point
in the image to achieve the desired scaling effect.
Implementing scaling in Python involves using the scaling matrix and the
`cv2.warpAf ne()` function from the OpenCV library. Here’s a Python script
demonstrating how to achieve image scaling:

def scale_image(image, scale_x, scale_y):


# De ne the scaling matrix
scaling_matrix = np. oat32([[scale_x, 0, 0], [0, scale_y, 0]])

# Apply the scaling using cv2.warpAf ne()


img_scaled = cv2.warpAf ne(image, scaling_matrix, (image.shape[1],
image.shape[0]))

return img_scaled

# Example scaling factors (scale_x, scale_y)


scale_x = 1.5 # Scaling factor for the x-direction
Page 29 of 73
fi
fi
fl
fi
fi
scale_y = 0.8 # Scaling factor for the y-direction

# Load your image (assuming you have 'image' variable)


# img = cv2.imread('your_image_path.jpg')

# Scale the image


scaled_img = scale_image(img, scale_x, scale_y)
In this script, we de ne a function scale_image() that takes an input image and
scaling factors scale_x and scale_y to resize the image accordingly. We use a scaling
matrix with the speci ed scaling factors and apply it to the image
using cv2.warpAf ne() to achieve the scaling effect.

Rotation
Rotation is a geometric transformation that involves changing the orientation of an
image by a speci ed angle around a given axis or point. The rotation can be
mathematically expressed using equations:

Here, x’ and y’ represent the


coordinates of the point after rotation, and x and y are the original coordinates. The
angle \theta determines the amount of rotation to be applied. Trigonometric functions
such as cosine (cos) and sine (sin) play a fundamental role in these equations,
in uencing the rotation outcome.
It’s important to handle points that fall outside the boundary of the output image
appropriately based on the speci c application requirements. In practical
implementation, programming languages like Python utilize these equations within
image manipulation frameworks to achieve accurate and ef cient image rotation.
In Python, implementing image rotation involves utilizing the rotation equations
within image manipulation libraries or frameworks. One popular library for this
purpose is OpenCV. Here’s a brief guide on how to perform image rotation using
OpenCV:
def rotate_image(image, angle):
height, width = image.shape[:2]
rotation_matrix = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1)
rotated_image = cv2.warpAf ne(image, rotation_matrix, (width, height))
return rotated_image

Page 30 of 73
fl
fi
fi
fi
fi
fi
fi
fi
# Load the image (replace with the actual path to your image)
original_image = cv2.imread('path_to_your_image.jpg')

# De ne the rotation angle (in degrees)


rotation_angle = 45

# Rotate the image


rotated_image = rotate_image(original_image, rotation_angle)

Shearing
Shearing, much like rotation, is a fundamental geometric transformation used in
image processing. Unlike translation, shearing involves shifting the pixel values of an
image either horizontally or vertically, but the shift is not uniform across the image.
This creates a distorted effect by displacing parts of the image in different directions.
Mathematically, shearing can be expressed using equations that describe the
transformation of coordinates. Let’s denote the original coordinates as x and y and the
transformed coordinates after shearing as x’ and y’. The shearing can be applied
either horizontally or vertically.
Horizontal Shearing:

Vertical Shearing:

In these equations, k represents the shearing factor, determining the amount of shear
applied in the respective direction. When k is positive, it shifts the points towards a
speci c direction, and when k is negative, it shifts in the opposite direction.
Visualization helps to grasp this concept better. Imagine a rectangular image, and
applying horizontal shearing will shift the upper part of the image to the right and the
lower part to the left, creating a trapezoidal shape.
To implement image shearing in Python, we’ll use the similar approach of creating a
shearing matrix and applying it using the `cv2.warpAf ne()` function from the
OpenCV library. Below is a Python script demonstrating how to achieve image
shearing:
Page 31 of 73
fi
fi
fi
def shear_image(image, shear_factor):
# De ne the shearing matrix
shearing_matrix = np. oat32([[1, shear_factor, 0], [0, 1, 0]])

# Apply the shearing using cv2.warpAf ne()


img_sheared = cv2.warpAf ne(image, shearing_matrix, (image.shape[1],
image.shape[0]))

return img_sheared

# Example shearing factor


shear_factor = 0.2 # Shearing factor for the x-direction (horizontal shearing)

# Load your image (assuming you have 'image' variable)


# img = cv2.imread('your_image_path.jpg')

# Shear the image


sheared_img = shear_image(img, shear_factor)
In this script, we de ne a function `shear_image()` that takes an input image and a
shearing factor to shear the image horizontally. We use a shearing matrix with the
speci ed shearing factor and apply it to the image using `cv2.warpAf ne()` to
achieve the shearing effect. Finally, we display the original and sheared images using
OpenCV.

Conclusion
Geometric transformations are the building blocks of image processing, allowing us
to alter images in meaningful ways. From simple translations to complex perspective
corrections, these transformations empower various applications that rely on accurate
image manipulation. As we continue to advance in the eld of image processing, a
deeper understanding of geometric transformations will be indispensable for
innovating and improving the way we perceive and interact with images.

Page 32 of 73
fi
fi
fi
fl
fi
fi
fi
fi
UNIT-4
Image Filters

📌 What are Image Filters?

Image lters are mathematical operations applied to images to enhance features, suppress
noise, or extract information like edges, textures, or patterns.

🧃 Simple De nition:

An image lter modi es the intensity values of pixels by using a kernel or mask, typically by
convolution.

🔍 Where Are Filters Used?

Purpose Examples
Smoothing Blur, Gaussian Filter
Laplacian, High-Pass
Sharpening
Filter
Edge Detection Sobel, Prewitt, Canny
Noise Removal Median, Gaussian
Feature
Gabor, Wavelet
Extraction

🧰 How Do Filters Work?

• A lter (kernel/mask) is a small matrix (e.g., 3x3, 5x5).

• It is applied to every pixel and its neighborhood using convolution.

• The pixel's value is replaced with the result of the convolution.

🧮 Convolution Operation (Basic Formula):

Let f(x, y) be the image, and h(i, j) the lter kernel.

plaintext
CopyEdit
g(x, y) = ∑∑ f(x-i, y-j) * h(i, j)

Page 33 of 73
fi
fi
fi
fi
fi
fi
🧊 Example Filter (Box Blur 3x3):

plaintext
CopyEdit
[1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9]
This averages the 3x3 neighborhood around each pixel → blurs the image.

✅ Summary Table of Common Filters:

Filter Type Goal Example Filters


Blur, remove Box, Gaussian,
Smoothing
noise Median
Sharpening Highlight edges Laplacian, High-boost
Edge
Detect boundaries Sobel, Prewitt, Canny
Detect
Texture,
Feature Gabor, Wavelet
frequency

Image as a Signal
📌 What is an Image?

• A digital image is a 2D signal.

• Just like a 1D signal (like sound), an image has amplitude (pixel intensity) and spatial
coordinates (x, y).

• The value at each (x, y) location is the brightness or color (in case of RGB images).

🔍 Key Concepts:

Concept Meaning
Spatial Domain Where an image is represented by pixels.
A single point in an image, has intensity (grayscale) or RGB
Pixel
values.
Grayscale
Each pixel has one intensity value (0–255).
Image
Color Image Each pixel has 3 values – Red, Green, Blue (RGB).
Analog Image Continuous – in nite values. Like natural light.
Page 34 of 73
fi
Digital Image Sampled (discrete) in space and intensity.

📊 Image as a Matrix (Digital Signal Representation):

An image can be represented as:

plaintext
CopyEdit
f(x, y) = intensity at pixel location (x, y)
For example, a 3x3 grayscale image:

ini
CopyEdit
[ 10 20 30
40 50 60
70 80 90 ]

🧮 Mathematical Representation:

A digital image is a 2D function:

plaintext
CopyEdit
f(x, y), where:
- x, y are spatial coordinates
- f is intensity (grayscale value)
In signals terms:

• 1D signal: f(t)

• 2D signal (image): f(x, y)

🎯 Why Treat an Image as a Signal?

Because then we can:

• Apply signal processing techniques like convolution, Fourier Transform, ltering, etc.

• Analyze edges, patterns, textures.

• Enhance image quality or extract features.

📝 Exam Tip:

If asked “How is an image treated as a signal?”, write:

Page 35 of 73
fi
An image is treated as a two-dimensional signal, where the intensity of each pixel represents the
amplitude, and the spatial coordinates (x, y) represent the independent variables. This allows us to
apply signal processing techniques to perform operations such as ltering, edge detection, and
frequency analysis.

Processing in Spatial and Frequency Domain


🟦 1. What is Image Processing?

Image processing involves modifying or analyzing digital images to improve quality or extract
information.

There are two major domains to process an image:

🌐 A. Spatial Domain Processing

📌 De nition:

Processing an image by directly working on its pixels.

You apply a function or lter directly on the pixel intensity values (f(x, y)) of the image.

⚙ Formula:

plaintext
CopyEdit
g(x, y) = T[f(x, y)]
Where:

• f(x, y) = original image

• T = transformation operator (e.g., convolution, histogram equalization)

• g(x, y) = processed image

🔧 Techniques Used:

Technique Purpose
Filtering Smoothing, sharpening
Edge Detection Detect boundaries
Histogram
Improve contrast
Equalization

Page 36 of 73
fi
fi
fi
Shape-based
Morphological Ops
processing

📌 Examples:

• Blur using a kernel like:


plaintext
CopyEdit

[1 1 1

• 1 1 1
• 1 1 1] / 9

• Edge detection using a Sobel kernel

🌊 B. Frequency Domain Processing

📌 De nition:

Processing an image by transforming it into the frequency domain using a mathematical tool like
the Fourier Transform.

🧠 Why Frequency Domain?

Because many image features (edges, textures, patterns) become easier to analyze and modify
when seen in terms of frequency components.

🔁 2D Fourier Transform (Formula):

plaintext
CopyEdit
F(u, v) = ∑∑ f(x, y) * e^(-j2π(ux/M + vy/N))
• Converts spatial image f(x, y) → frequency domain F(u, v)

• u, v = frequency components

• M, N = image dimensions

Page 37 of 73
fi
🔄 Inverse Fourier Transform:

Used to get the image back after processing:

plaintext
CopyEdit
f(x, y) = ∑∑ F(u, v) * e^(j2π(ux/M + vy/N))

⚙ Frequency Domain Techniques:

Filter Type Purpose


Remove high-frequency
Low-Pass Filter
noise
High-Pass Filter Highlight edges/textures
Band-Pass Keep speci c frequency
Filter range

📊 Frequency Interpretation:

Frequenc
Meaning
y
Smooth areas, gradual changes
Low
(background)
High Sharp transitions, edges, noise

🎯 Comparison: Spatial vs Frequency Domain

Feature Spatial Domain Frequency Domain


Operates Frequencies (via Fourier
Pixels
On Transform)
Filters, Masks,
Tools Used DFT, FFT, Frequency Filters
Histograms
Best For Local enhancement Global pattern detection, ltering
Smoothing, Edge
Examples Noise removal, Texture analysis
detection

📝 Exam Tips:

Q1: What are spatial and frequency domain techniques?

Spatial domain techniques manipulate pixels directly (e.g., smoothing, sharpening), while frequency
domain techniques modify the image’s frequency components using transforms like the Fourier
Transform.

Page 38 of 73
fi
fi
Q2: Why use frequency domain?

Some image operations (e.g., noise reduction, texture analysis) are more effective or easier in the
frequency domain because they isolate different patterns or noise by frequency.

Image Filters - Convolution and Linear Filters


🔵 1. What is Convolution in Image Processing?

Convolution is a fundamental operation in image processing that involves combining an image


with a lter kernel (or mask) to produce a transformed version of the original image. The purpose
of convolution is to apply a lter that enhances or suppresses speci c features (such as edges,
blurring, or noise) in the image.

🧠 How does Convolution Work?

At the core of convolution is the kernel (or lter), a small matrix (e.g., 3x3, 5x5), which is applied
to the image by sliding it over each pixel in the image. The value of each pixel in the output image
is calculated by taking the weighted sum of the corresponding input image pixels, where the
weights are provided by the kernel.

Example of Convolution:

Let’s say we have an image f(x, y) and a 3x3 kernel h(i, j):

Image (f):

css
CopyEdit
[ 3 5 7 ]
[ 4 8 6 ]
[ 2 6 9 ]
Kernel (h):

css
CopyEdit
[ 0 -1 0 ]
[ -1 4 -1 ]
[ 0 -1 0 ]
Convolution operation:

We apply the kernel to the image at each position, multiply corresponding elements, and sum them
up. For the center pixel of the image (pixel at position (1,1) in a 0-based index):

diff
Page 39 of 73
fi
fi
fi
fi
CopyEdit
(3 * 0) + (5 * -1) + (7 * 0)
+ (4 * -1) + (8 * 4) + (6 * -1)
+ (2 * 0) + (6 * -1) + (9 * 0)
= -5 - 4 + 32 - 6 = 17
This gives the new pixel value for the center pixel (1,1) in the output image.

🧠 Types of Kernels in Convolution:

Different lters (kernels) are used for various operations. Some of the most common kernels
include:

1. Smoothing (Blur) Kernel – Reduces noise by averaging pixel values.

2. Edge Detection Kernel – Highlights edges in the image.

3. Sharpening Kernel – Enhances ne details.

🔵 2. What are Linear Filters?

Linear lters are lters that use linear operations (like addition and multiplication) to modify the
image. In the case of convolution, a lter is applied to the image by performing linear operations
over the pixel intensities and using a convolution operation.

Linear lters are called so because the output of the lter is a weighted sum of input pixel values,
meaning they do not alter the original image's structure, but they instead combine the image’s
pixels in some way that results in blurring, sharpening, or edge detection.

🧠 Types of Linear Filters:

1. Box Filter (Average Filter): This is the simplest linear lter, where the output pixel value is
the average of all the surrounding pixels in a window (kernel).
Example of a 3x3 Box Filter:
plaintext
CopyEdit

[ 1/9 1/9 1/9 ]

2. [ 1/9 1/9 1/9 ]


3. [ 1/9 1/9 1/9 ]
4.

This kernel is used to blur or smooth the image by averaging the pixel values.

Page 40 of 73
fi
fi
fi
fi
fi
fi
fi
fi
5. Gaussian Filter: The Gaussian lter smooths an image by giving more weight to the
central pixels and less weight to the surrounding pixels.
Gaussian Kernel (5x5 example):
plaintext
CopyEdit

[ 1 4 6 4 1 ]

6. [ 4 16 24 16 4 ]
7. [ 6 24 36 24 6 ]
8. [ 4 16 24 16 4 ]
9. [ 1 4 6 4 1 ] / 256
10.

This kernel helps in reducing high-frequency noise (like sharp edges and ne details) in an
image.
11. Sobel Filter (Edge Detection): The Sobel lter is a gradient-based operator used for
detecting edges in images. It calculates the gradient of image intensity, emphasizing changes
in pixel values along the x-axis (horizontal) or y-axis (vertical).
Sobel Kernel (Horizontal edge detection):
plaintext
CopyEdit

[ -1 0 1 ]

12. [ -2 0 2 ]
13. [ -1 0 1 ]
14.

15. Laplacian Filter (Edge Detection): The Laplacian lter is a second-order derivative
operator that detects areas of rapid intensity change, which is useful for detecting edges in
the image.
Laplacian Kernel:
plaintext
CopyEdit

[ 0 -1 0 ]

16. [ -1 4 -1 ]
17. [ 0 -1 0 ]
18.

Page 41 of 73
fi
fi
fi
fi
🔵 3. Applying Convolution in Image Processing:

Convolution is at the heart of most image processing techniques, where it is used in lters for
blurring, edge detection, sharpening, and more. Convolution is used in a wide variety of
applications:

• Blurring: To reduce noise and ne details.

• Edge Detection: To identify boundaries of objects in the image (using lters like Sobel or
Prewitt).

• Sharpening: To enhance details and edges in an image.

🧠 Important Concepts:

• Convolution vs Correlation: The difference lies in how the kernel is applied. In


convolution, the kernel is ipped before applying it to the image. In correlation, the kernel
is applied without ipping. They are often used interchangeably in image processing.

• Kernel Size: The size of the kernel (e.g., 3x3, 5x5) determines the extent of the
neighborhood used for the operation. Larger kernels usually lead to more smoothing or
blurring, while smaller kernels retain more detail.

• Edge Effects: When applying convolution near the image borders, you often encounter edge
effects because there are fewer neighboring pixels available to apply the kernel. Several
strategies like zero-padding or mirroring the border pixels are used to handle these cases.

📌 Applications of Linear Filters:

1. Blurring Filters: Used for reducing noise or details, commonly used for preprocessing
before feature extraction or object detection.

2. Edge Detection Filters: Used to identify boundaries in images, which is essential for tasks
like object detection and image segmentation.

3. Sharpening Filters: Used to enhance ne details, which is often useful in improving the
visibility of certain features in an image.

📝 Exam Tip for Convolution and Linear Filters:

• Always remember to explain the kernel's role: Whether it's for blurring, sharpening, or
edge detection, you should be able to describe what the kernel does to the image.

• Know the kernel values for common lters like Sobel, Gaussian, and Laplacian. Being
able to identify the kernel and its purpose will be key to answering exam questions correctly.

Page 42 of 73
fl
fl
fi
fi
fi
fi
fi
Thresholding and Band-Pass Filters
🔵 Thresholding in Image Processing

📌 What is Thresholding?

Thresholding is a simple, yet powerful image segmentation technique. It involves converting a


grayscale image into a binary image by assigning pixels either a foreground or background
value, based on their intensity levels. Essentially, it divides the image into two regions: one where
pixel intensity is above a threshold and the other where it is below that threshold.

🧠 Purpose of Thresholding:

• Segmentation: Divide an image into regions of interest (foreground and background).

• Feature Extraction: Isolate important features like objects, shapes, etc.

✅ Types of Thresholding:

1. Global Thresholding:

◦ Applies the same threshold to all pixels in the image.

◦ Simple and ef cient but not suitable for images with varying lighting conditions.

2. Formula:
plaintext
CopyEdit

If pixel intensity I(x, y) > T, set pixel value = 255


(foreground)
3. Else, set pixel value = 0 (background)
4.

Where T is the threshold value.


Example:
A threshold T of 128 means any pixel with a value above 128 becomes white (255), and
below 128 becomes black (0).
5. Adaptive Thresholding:

◦ Threshold is calculated locally for each pixel based on a neighborhood around it.

◦ Useful for images with varying lighting or shadows.


Page 43 of 73
fi
6. Example: In a poorly lit image, an adaptive threshold might set a different threshold for
each region to improve segmentation.

7. Otsu's Thresholding:

◦ An automated method that calculates the optimal threshold based on the image
histogram. It works by maximizing the variance between the foreground and
background.

◦ Widely used in scenarios where manual thresholding is impractical.

🧠 Applications of Thresholding:

• Binarization: Convert grayscale or color images to binary.

• Object Detection: Identify and segment objects from the background.

• OCR (Optical Character Recognition): Preprocess images to isolate text.

🧠 Thresholding Example:

If we have a grayscale image like this:

50 180 210
130 90 200
180 240 160

With a threshold value T = 150, the result will be:

0 255 255
255 0 255
255 255 255

Here, all pixels above 150 become 255 (foreground), and all below become 0 (background).

🔵 Band-Pass Filters

📌 What is a Band-Pass Filter?

A band-pass lter is a combination of a low-pass and high-pass lter. It allows signals (or
frequencies in the image) within a certain range (the passband) to pass through while blocking
the frequencies outside that range (the stopband). In image processing, band-pass lters can be
used to enhance certain features within a speci c frequency range.

Page 44 of 73
fi
fi
fi
fi
✅ Band-Pass Filter Properties:

1. Low-Pass Filter: Passes low frequencies (smooth components) and attenuates high
frequencies (details and edges).

2. High-Pass Filter: Passes high frequencies (edges and noise) and attenuates low
frequencies (smooth areas).

3. Band-Pass Filter: Passes a speci c range of frequencies and attenuates frequencies outside
of this range.

🧠 How Band-Pass Filters Work:

Imagine an image with both smooth and textural elements. A band-pass lter can be used to isolate
the textures by blocking both the very high frequencies (noise) and the very low frequencies
(smooth areas), passing only the intermediate frequencies that represent texture or edge-like
features.

🧠 Applications of Band-Pass Filters:

• Image Texture Enhancement: Isolate textures in an image while removing large smooth
areas or noise.

• Edge Detection: Enhance edges that lie within a speci c frequency range.

• Noise Removal: Reduce unwanted high-frequency noise while keeping important low-
frequency content.

🔴 Summary of Key Concepts

Technique Purpose Use Case


Segmentation of an image into foreground
Thresholding Binarization, object detection, OCR
and background
Global Single global threshold value for the entire Simple segmentation tasks
Threshold image
Adaptive Local thresholding based on surrounding Handling images with varying
Threshold pixels lighting
Otsu's Automatic thresholding using histogram Optimal thresholding in diverse
Thresholding analysis images
Band-Pass Isolate speci c frequency ranges of the Texture extraction, edge detection,
Filters image noise removal

📌 Exam Tip:

Page 45 of 73
fi
fi
fi
fi
• Thresholding is simple, but ensure you understand adaptive and Otsu's methods as they
are often favored in complex images.

• Band-Pass Filters require you to understand how frequency ranges are selected and
applied in image ltering. Focus on their dual role of removing both high and low-
frequency components to leave the middle range intact.

🎯 Gaussian, Wavelet, and Gabor Filters


Awesome! Let's dive deeper into each lter, one-by-one. We'll start with the Gaussian Filter, go
super detailed, and once you’re ready, we’ll move to the next one (Wavelet, then Gabor).

🔹 1. Gaussian Filter
🎯 What is a Gaussian Filter?
A Gaussian lter is a linear lter that’s used mainly for image smoothing (blurring), noise
reduction, and preprocessing before edge detection or other image processing tasks.

It’s called “Gaussian” because it uses the Gaussian function (bell-shaped curve) to assign weights
to the pixel values within a kernel (window).

🧮 Mathematical Formula:
The 2D Gaussian function used to generate the lter kernel is:

🧠 Concept Behind It:


• Pixels closer to the center have more in uence (higher weights).

Page 46 of 73
fi
fi
fi
fi
fl
fi
• Farther pixels contribute less.

• It’s isotropic – same effect in all directions.

This makes it ideal for removing high-frequency noise without distorting edges too much.

🧪 Example Kernel (5x5, σ = 1):

Use Case Description


Noise Reduction Smoothens out grainy images, removes random pixel noise

Edge Detection Preprocessing Applied before Sobel or Canny to suppress irrelevant details

Image Pyramids Used in scale-space techniques like Laplacian pyramid


Optical Flow In motion estimation, helps stabilize features

Computer Vision Enhances performance of object detection, segmentation, etc.

Filter Behavior

Box Filter Averages all pixels equally → unnatural blur

Gaussian Filter Weighs center pixels more → natural-looking blur

Feature Value

Page 47 of 73
Type Low-pass lter
Effect Smoothens image, reduces noise
Kernel Gaussian bell-curve

Domain Spatial (pixel domain)

Speed Fast (especially with separable kernels)

Awesome! Let’s move to the next one: the Wavelet Filter — this one’s a bit deeper conceptually
but super powerful, especially in compression and multi-resolution analysis. Let's break it down
step-by-step.

🔷 2. Wavelet Filter
🎯 What is a Wavelet Filter?
A wavelet lter is used for analyzing and transforming signals/images at different scales or
resolutions. It’s not just for smoothing — it’s about breaking down the image into high- and low-
frequency components, both in space and scale.

Think of it as a microscope for images: you can zoom in and out to analyze ne vs coarse details.

🧠 Core Idea Behind Wavelets


Unlike Fourier transform (which uses sine and cosine), wavelets use short, localized "waves" that
are:

• Localized in time and frequency

• Can analyze sharp edges or smooth regions separately

• Better for non-stationary signals like images (edges, textures, etc.)

🧮 Basic Wavelet Transform Process (2D Images)


1. Apply wavelet lter row-wise and column-wise
2. Decomposes the image into 4 sub-bands:
◦ LL (approximation, low-low) → smoothed version

◦ LH (horizontal details, low-high)


Page 48 of 73
fi
fi
fi
fi
◦ HL (vertical details, high-low)

◦ HH (diagonal details, high-high)

This is called a single-level wavelet decomposition.

👉 You can recursively apply this to LL → multilevel wavelet decomposition!

📷 Visualization (1-Level Decomposition):


+---------+---------+
| LL | LH |
| | |
+---------+---------+
| HL | HH |
| | |
+---------+---------+
• LL: blurred/smoothed image (base info)

• LH, HL, HH: edge & detail info in various directions

🔍 Popular Wavelets Used in Images

Wavelet Name Feature


Haar Simplest, fast, blocky results
Daubechies (db1,
Better quality, smooth, compact support
db2...)
Symlets Symmetrical, better for image edges
More vanishing moments (for
Coi ets
smoothness)

📊 Wavelet vs Gaussian

Feature Gaussian Filter Wavelet Filter


Goal Blur / smooth Analyze structure at multiple scales
Linear, convolution-
Type Transform-based (like Fourier)
based
Preserves Edges Partially Yes, very well
Frequency
No Yes (multi-resolution)
Analysis
Yes (can reconstruct original
Reversible No
image)
Page 49 of 73
fl
⚙ Use Cases of Wavelet Filters

Use Case Description


Image JPEG2000 uses wavelet transform instead of DCT (used in
Compression JPEG)
Denoising Suppress noise in high-frequency sub-bands (e.g., HH)
Edge Detection Edges appear clearly in HL/LH/HH sub-bands
Watermarking Embed data in LL or HH sub-bands
Medical Imaging Multiresolution helps analyze small structures like tumors

🧪 Example in Python (PyWavelets)


import pywt
import cv2
import matplotlib.pyplot as plt

img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)

# Apply 2D Wavelet Transform


coeffs = pywt.dwt2(img, 'haar')
LL, (LH, HL, HH) = coeffs

# Visualize sub-bands
plt.figure(figsize=(10, 6))
titles = ['LL', 'LH', 'HL', 'HH']
for i, band in enumerate([LL, LH, HL, HH]):
plt.subplot(2, 2, i + 1)
plt.imshow(band, cmap='gray')
plt.title(titles[i])
plt.axis('off')
plt.tight_layout()
plt.show()

🧠 Key Properties of Wavelets

Property Description
Localized In both time (space) and frequency
Multiresolution Capture coarse and ne details
Many coef cients near zero → ideal for
Sparsity
compression

Page 50 of 73
fi
fi
Orthogonal /
Supports perfect reconstruction
Biorthogonal
Edge-aware Captures edges very effectively

📎 Summary of Wavelet Filter:

Feature Value
Type Multi-resolution lter
Effect Decomposes image into frequency subbands
Domain Both spatial and frequency
Reversibl
Yes (can reconstruct original image)
e
Compression, denoising, analysis,
Uses
watermarking

Ready to move to the Gabor lter next? Or would you like a summary or comparison table at the
end too?

Awesome, let’s dive into the third and nal one: Gabor Filter — one of the most powerful tools
when it comes to texture analysis and edge detection in images.

🔷 3. Gabor Filter — Deep Dive

🎯 What is a Gabor Filter?


A Gabor lter is a linear lter used to detect edges, textures, or patterns in speci c orientations
and scales. It's like a localized frequency detector — combining a sinusoidal wave with a
Gaussian envelope.

It's inspired by how human vision perceives textures and edges — that’s why it’s used in face
recognition, iris scanning, ngerprint analysis, etc.

🧠 Core Idea Behind Gabor Filters


You take a sine wave (to capture frequency), and you localize it using a Gaussian. The result is a
lter that looks like a striped blob. These stripes can be rotated, scaled, and stretched.

🧮 Mathematical Formula
The 2D Gabor lter is de ned as:
Page 51 of 73
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
Symbo
Meaning
l
Standard deviation of Gaussian (controls
σ
spread)
f Frequency of sinusoidal wave
θ Orientation of lter
φ Phase offset
γ Spatial aspect ratio (ellipticity)

📸 What It Looks Like Visually


Gabor lters look like wave patterns localized within a Gaussian blob, like this:

~~~~~ <- horizontal pattern


Or at other angles like:

/////
You can generate a bank of Gabor lters at multiple angles (e.g. 0°, 45°, 90°, 135°) and
frequencies to detect texture in different directions.

Page 52 of 73
fi
fi
fi
🧪 How It Works in Practice
1. Convolve the image with a bank of Gabor lters.

2. Each lter will respond strongly to features that match its orientation and frequency.

3. Combine or analyze lter responses to:

◦ Detect edges

◦ Analyze texture

◦ Extract features for classi cation

📊 Gabor vs Wavelet vs Gaussian

Feature Gaussian Wavelet Gabor

Main Use Smoothing Multi-scale Analysis Texture & Edge Analysis

Frequency Analysis No Yes Yes


Orientation Selective No Partially Yes

Reversible Transform No Yes No

Edge Detection Weak Moderate-Strong Very Strong

Biological Basis No No Yes (Human Visual Cortex)

⚙ Use Cases of Gabor Filter

Use Case Description


Texture Segmentation Classify image regions based on texture
Capture facial features at different
Face Recognition
orientations
Fingerprint/Iris Analysis Enhance lines/patterns for recognition
Document Image
Detect printed text patterns
Analysis
Biomedical Imaging Detect patterns in tissues (MRI, CT)

📌 Key Properties of Gabor Filter

Page 53 of 73
fi
fi
fi
fi
Property Meaning
Localized Both in space and frequency
You can detect edges in chosen
Orientation-speci c
directions
Frequency-sensitive Detects patterns of speci c scale
Biologically
Mimics visual cortex in mammals
inspired
Just feature extraction, not
Non-reversible
reconstruction

🧪 Example in Python (OpenCV + Gabor Kernel)


import cv2
import numpy as np
import matplotlib.pyplot as plt

# Create Gabor kernel


gabor_kernel = cv2.getGaborKernel((21, 21), 5.0, np.pi/4,
10.0, 0.5, 0, ktype=cv2.CV_32F)

# Apply to image
img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
filtered_img = cv2.filter2D(img, cv2.CV_8UC3, gabor_kernel)

# Show results
plt.subplot(1, 2, 1)
plt.imshow(img, cmap='gray')
plt.title("Original")

plt.subplot(1, 2, 2)
plt.imshow(filtered_img, cmap='gray')
plt.title("Gabor Filtered")
plt.show()

📎 Summary of Gabor Filter

Feature Value
Type Linear, edge/texture lter
Effect Emphasizes edges/patterns in speci c orientations
Orientatio
Directional sensitivity
n
Frequency Yes

Page 54 of 73
fi
fi
fi
fi
Texture analysis, biometrics, face/ ngerprint
Uses
recognition

Would you like a summary table comparing all three lters, or maybe examples of when to
choose which one in a real-world scenario?

Page 55 of 73
fi
fi
UNIT -5

📘 Contour Properties & Its Applications


(As per OpenCV and Advanced Topics in Computer Vision)

🔹 What are Contours?


De nition:
Contours are curves or outlines that represent the boundaries of objects or shapes within an image.
They join continuous points having the same intensity or color, effectively outlining object
structures in an image.

According to OpenCV:
“Contours are a curve that simply joins all the continuous points along the boundary having the
same color or intensity.”

🔍 Why Contours are Useful


• Used for object boundary detection.

• Helpful in shape analysis and object classi cation.

• Effective in grayscale or binary images for isolating regions of interest.

✳ Contour Representation Methods


• Chain Code

• Fourier Descriptors

• Shape Context

🔸 Contour Properties

Property Description
Relationship between contours; helps in nesting structure
Hierarchy
detection.
Area Total number of pixels inside the contour. Useful for size ltering.
Perimeter Length of the contour boundary.
Centroid The center of mass (calculated using image moments).
Page 56 of 73
fi
fi
fi
Orientation Angle of object alignment or rotation.
Convexity Checks if the shape is convex or concave.
Bounding
Smallest rectangle that ts the contour.
Box
Convex Hull Tightest convex shape around the contour.

🔧 Contour Detection Techniques

1. Thresholding-Based Preprocessing

Used to separate object from background using brightness/intensity.

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)


thresh = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY_INV, 21, 5)

2. Edge-Based Preprocessing

Use edge detection as a pre-step before nding contours.

contours, _ = cv2.findContours(thresh, cv2.RETR_TREE,


cv2.CHAIN_APPROX_SIMPLE)
detected_contours = img.copy()
cv2.drawContours(detected_contours, contours, -1, (0, 255,
0), -1)

🔹 Common Edge Detection Algorithms

Algorithm Description
Canny Edge Detection Multi-step method; highly accurate for edge detection.
Sobel Operator Gradient-based; highlights horizontal and vertical edges.
Laplacian of Gaussian
Detects zero-crossings in second derivative; captures ne edges.
(LoG)
Scharr Operator Improved version of Sobel; better accuracy and rotation invariance.
Deep learning models (e.g., using CNNs) trained for contour/edge
CNN-Based Methods
detection.

🧰 Contour Detection Using OpenCV

🟡 Basic Detection:

Page 57 of 73
fi
fi
fi
contours, _ = cv2.findContours(thresh, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
🟢 Masking & Highlighting Contours:

highlight = np.ones_like(img)
cv2.drawContours(highlight, contours, -1, (0, 200, 175),
cv2.FILLED)

mask = np.zeros_like(img)
cv2.drawContours(mask, contours, -1, (255, 255, 255),
cv2.FILLED)
foreground = cv2.bitwise_and(img, mask)

⚙ Contour Manipulation Techniques

Technique Purpose
Filtering Smooth or enhance detected contours.
Morphological
Modify shape/size using dilation, erosion, etc.
Ops
Extract useful data like area, centroid, etc., for object
Feature Extraction
classi cation.

💼 Applications of Contour Properties

Field Applications
Object Detection Identify objects using shape outlines.
Medical Imaging Detect tumors, cells, organs, etc.
Robotics Shape-based navigation or object manipulation.
Gesture Recognition Hand contour for nger counting, sign language, etc.
Industrial Detect size, shape, or position of products in quality
Inspection control.
Traf c Monitoring Detect and track vehicles using their contour outlines.

✅ Summary
• Contours highlight the boundary information of objects.

• Various properties (area, convexity, orientation, etc.) are used to analyze shapes.

• OpenCV provides simple and effective tools to detect and manipulate contours.

Page 58 of 73
fi
fi
fi
• Contours are essential for recognition, classi cation, and tracking tasks in modern CV
systems.

🧩 Overview of Image Segmentation in Computer


Vision

🔍 What is Image Segmentation?


Image Segmentation is a computer vision technique that involves partitioning an image into
multiple segments or regions. The goal is to simplify or change the representation of an image
to make it more meaningful and easier to analyze.

In simple terms:

“Segmentation divides an image into its constituent parts or objects.”

🎯 Goals of Image Segmentation


• Identify objects or boundaries (lines, curves, etc.)

• Locate regions of interest (e.g., face, tumor, vehicles)

• Extract meaningful information for further analysis (e.g., classi cation, recognition)

📚 Types of Segmentation Techniques

1. Thresholding-Based Segmentation

• Simplest method.

• Divides image based on intensity value (gray or color levels).

• Pixels are grouped into foreground and background.

# Binary Threshold
ret, binary = cv2.threshold(gray_img, 127, 255,
cv2.THRESH_BINARY)

2. Region-Based Segmentation

• Segments pixels into regions based on prede ned criteria like similar intensity, color,
texture.

Page 59 of 73
fi
fi
fi
Types:

• Region Growing: Starts from seed points and grows by adding similar neighboring pixels.

• Region Splitting & Merging: Divides the image and merges similar adjacent regions.

3. Edge-Based Segmentation

• Uses edges (discontinuities) in intensity to detect object boundaries.

• Often uses Canny, Sobel, or Laplacian lters.

4. Clustering-Based Segmentation

• Groups pixels based on similarity using clustering algorithms like:

Algorithm Description
K-Means Groups pixels into K clusters based on color/texture
Mean Shift Iterative algorithm for nding dense regions in feature space
Fuzzy C- Like K-means but allows soft membership (a pixel can belong to multiple
Means clusters)

5. Watershed Algorithm

• Treats image like a topographic surface.

• Segments based on catchment basins and ridgelines.

• Very useful in separating overlapping objects.

cv2.watershed(image, markers)

6. Deep Learning-Based Segmentation

Modern, accurate, and widely used. Learns features directly from data.

Popular Architectures:

Model Description
Widely used in medical imaging. Encoder-decoder with skip
U-Net
connections.
Mask R-
Extends Faster R-CNN for instance segmentation.
CNN
DeepLab Uses atrous convolution for better boundary precision.

🧠 Types of Segmentation by Output


Page 60 of 73
fi
fi
Type Description
Semantic
Assigns a label to each pixel (e.g., all cars labeled the same)
Segmentation
Instance Differentiates between individual instances of objects (e.g., car1, car2)
Segmentation
Panoptic Combines semantic + instance segmentation for a complete scene
Segmentation understanding

🧰 Libraries and Tools for Segmentation


• OpenCV

• scikit-image

• TensorFlow / Keras

• PyTorch (with torchvision or Detectron2)

🎯 Applications of Segmentation

Domain Use Cases


Medical Imaging Tumor detection, organ segmentation
Autonomous Road, pedestrian, and vehicle
Vehicles segmentation
Agriculture Plant health, fruit counting, weed detection
Satellite Imaging Land cover classi cation, change detection
Augmented Reality Real-time background separation
Robotics Object grasping, navigation maps

✅ Summary
• Segmentation is essential for scene understanding in images and videos.

• It divides an image into regions or objects for analysis.

• Techniques range from simple thresholding to advanced deep learning models.

• Deep segmentation models (like Mask R-CNN, U-Net) are state-of-the-art.

Sure! Let’s now dive into the next advanced topic:

Page 61 of 73
fi
🧩 Template Matching in Image Processing

🔍 What is Template Matching?


Template Matching is a technique in digital image processing for nding small parts of an image
that match a template image.

Essentially, you provide a template (a small image) and search where it appears in a larger
image.

It's a form of pattern recognition used to detect and locate objects in an image.

🎯 Use Cases of Template Matching

Application Example Use


Object Detection Finding logos, icons, faces in an image
Industrial
Detecting parts on an assembly line
Automation
Recognizing game characters or
Gaming Bots
elements
Medical Imaging Locating speci c anatomical features
Document Analysis Locating signatures or stamps

🛠 How Template Matching Works


1. You provide a template image (a small patch).

2. The algorithm slides the template across the input image (like a sliding window).

3. At each position, it calculates a similarity score.

4. The location with the highest similarity is considered the match.

🔣 Similarity Measurement Methods


OpenCV supports multiple methods:

Method Description
cv2.TM_CCOEFF Correlation coef cient
cv2.TM_CCOEFF_NORME
Normalized correlation
D

Page 62 of 73
fi
fi
fi
cv2.TM_CCORR Cross-correlation
cv2.TM_CCORR_NORMED Normalized cross-correlation
Square difference (lower is
cv2.TM_SQDIFF better)
cv2.TM_SQDIFF_NORME
Normalized square difference
D

🧪 Code Example in OpenCV


import cv2

# Load images
img = cv2.imread('main_image.jpg', 0)
template = cv2.imread('template.jpg', 0)
w, h = template.shape[::-1]

# Template matching
res = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

# Draw rectangle on match


top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img, top_left, bottom_right, 255, 2)

cv2.imshow('Detected', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

⚠ Limitations
• Not scale or rotation invariant (fails if object is rotated or resized).

• Sensitive to lighting changes.

• Computationally expensive for large templates or images.

• Can return false positives if the template is not distinctive enough.

🧠 Alternatives / Improvements

Method Description

Page 63 of 73
Feature Matching (e.g., SIFT, Detects keypoints and matches descriptors — better for scale
ORB) and rotation changes.
Convolutional Neural
Learn patterns and features for robust object detection
Networks (CNNs)
Image Pyramids Helps with scale invariance in template matching

✅ Summary
• Template Matching is a simple but effective method for pattern detection.

• Best for xed-size, xed-orientation templates in controlled environments.

• For more robustness, consider feature-based or deep learning methods.

🥽 Stereo Imaging in Computer Vision

🌐 What is Stereo Imaging?


Stereo imaging (also known as stereopsis or stereo vision) is a technique that mimics human
binocular vision to perceive depth from two slightly different images taken from different
viewpoints (like how our left and right eyes see).

📸 You use two cameras placed side by side to capture two views of the same scene. The disparity
(difference in position of objects) between these images helps in estimating depth.

🎯 Applications of Stereo Imaging

Application Purpose
Autonomous vehicles Depth sensing for obstacle detection
3D reconstruction Creating 3D models of real-world scenes
Augmented reality Understanding depth for interactive
(AR) overlays
Robotics Navigation and environment understanding
Medical imaging Enhanced visualization of anatomy

🔧 Working Principle of Stereo Vision


1. Capture two images from slightly different viewpoints.

2. Find correspondences (matching pixels) between the two images.

Page 64 of 73
fi
fi
3. Calculate disparity (horizontal shift) for corresponding points.

4. Use triangulation to estimate the depth (Z coordinate).

📐 Triangulation & Depth Estimation

🧰 Stereo Matching Algorithms

Method Description
Block Matching Divides image into blocks and nds best match in other image
Semi-Global Matching
Optimizes disparity with local and global constraints
(SGM)
Graph Cuts Models disparity estimation as a graph optimization problem
Learns disparity estimation using CNNs or deep stereo
Deep Learning-Based
networks

🔍 Stereo Imaging in OpenCV

🔹 Step-by-Step Example:

import cv2
import numpy as np
Page 65 of 73
fi
# Load stereo images (left and right views)
imgL = cv2.imread('left.jpg', 0)
imgR = cv2.imread('right.jpg', 0)

# Create stereo matcher object


stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)

# Compute disparity map


disparity = stereo.compute(imgL, imgR)

# Normalize for visualization


disp = cv2.normalize(disparity, None, 0, 255,
cv2.NORM_MINMAX)
disp = np.uint8(disp)

cv2.imshow("Disparity Map", disp)


cv2.waitKey(0)
cv2.destroyAllWindows()

⚠ Challenges in Stereo Imaging


• Textureless regions: Hard to nd correspondences.

• Occlusions: Some parts visible in one view and not in the other.

• Lighting differences: Shadows or brightness affect matching.

• Real-time performance: Processing can be computationally intensive.

💡 Modern Improvements
• Use of deep learning models (like PSMNet, StereoNet).

• Multi-view stereo for higher accuracy 3D reconstruction.

• Integration with LiDAR or Time-of-Flight sensors for hybrid depth sensing.

✅ Summary
• Stereo Imaging helps estimate depth and create 3D information from 2D images.

• Useful in many modern technologies like autonomous cars, robotics, and AR/VR.

• Performance depends on accurate disparity estimation and good calibration.


Page 66 of 73
fi
Great! Let's now dive into the next topic:

📸 Computational Photography

🌐 What is Computational Photography?


Computational Photography is the intersection of computer graphics, computer vision, and
photography, where software algorithms are used to enhance or extend the capabilities of digital
photography beyond the limitations of traditional cameras.

It aims to improve image quality, extract information, or create novel visual experiences using
computation.

🔍 Core Goals
• Enhance low-light images

• Improve sharpness and focus

• Remove noise and motion blur

• Create panoramas or HDR images

• Capture depth and 3D information

• Reconstruct scenes or textures

🔧 Key Techniques in Computational Photography

Technique Description
HDR Imaging (High Combines multiple exposures to create a balanced image with bright
Dynamic Range) and dark areas correctly exposed
Panorama Stitching Merges multiple images to form a wide-angle or 360° view
Depth from Focus/ Extracts depth info based on how blurry or sharp objects appear
Defocus
Refocusing/Light Field Lets you refocus an image after capture (e.g., Lytro cameras)
Imaging
Image Deblurring Removes motion blur caused by camera shake or moving objects
Super Resolution Enhances image resolution using interpolation or deep learning
Photometric Stereo Recovers surface normals and lighting using multiple images under
varying lighting conditions
Style Transfer & Filters Uses AI to apply artistic styles or lters to images (e.g., Prisma app)

Page 67 of 73
fi
📱 Real-World Applications

Field Example Use


Smartphone
Night mode, Portrait mode (bokeh), Auto HDR
Cameras
AR/VR Scene understanding and real-time mapping
Medical Imaging Enhancing details in X-rays or MRIs
Digitally restoring old or damaged
Cultural Heritage
photographs
Security & Forensics Enhancing unclear surveillance footage

🤖 Deep Learning in Computational Photography


Modern computational photography often leverages neural networks to:

• Predict missing image parts (inpainting)

• Remove noise while preserving details

• Enhance color and contrast intelligently

• Perform semantic segmentation for better scene understanding

📘 Example: HDR Creation in OpenCV


import cv2

# Load differently exposed images


img1 = cv2.imread('img_low.jpg')
img2 = cv2.imread('img_mid.jpg')
img3 = cv2.imread('img_high.jpg')

# Merge to HDR
merge_debvec = cv2.createMergeDebevec()
hdr = merge_debvec.process([img1, img2, img3])

# Tonemap to display
tonemap = cv2.createTonemap(2.2)
ldr = tonemap.process(hdr)
ldr = cv2.normalize(ldr, None, 0, 255, cv2.NORM_MINMAX)
ldr = cv2.convertScaleAbs(ldr)

cv2.imshow("HDR Image", ldr)


Page 68 of 73
cv2.waitKey(0)
cv2.destroyAllWindows()

✅ Summary
• Computational Photography transforms basic image capture into smart image processing.

• It allows for better photo quality and creative visual effects.

• It's widely used in consumer tech, research, security, and art.

🤖 Introduction to Convolutional Neural Networks


(CNNs)

🧠 What are CNNs?


Convolutional Neural Networks (CNNs) are a class of deep neural networks, speci cally
designed to process data with a grid-like structure, such as images.

They are inspired by the visual cortex of the human brain and are the backbone of computer vision
tasks like image classi cation, object detection, and segmentation.

📷 Why CNNs for Images?


Images are high-dimensional and contain spatial hierarchies (edges, shapes, textures). CNNs use
convolutions to:

• Preserve spatial relationships

• Reduce the number of parameters

• Learn features hierarchically (from edges to complex objects)

🔍 Key Components of CNN

Layer Description
Convolutional Applies lters/kernels to extract features like edges or textures.
Layer
Activation Function Adds non-linearity (e.g., ReLU = max(0, x)) to make the network learn
complex patterns.
Downsamples the feature maps to reduce dimensionality and computation.
Pooling Layer
Common pooling: MaxPooling.

Page 69 of 73
fi
fi
fi
Fully Connected
Final layers that atten the data and classify based on learned features.
Layer (FC)
Dropout Layer Prevents over tting by randomly disabling neurons during training.

🔧 How CNN Works – Step-by-Step


1. Input Image
(e.g., 32x32 RGB image → 3 channels)

2. Convolution Layer
Apply multiple lters to extract features
(e.g., edges, corners)

3. Activation (ReLU)
Introduce non-linearity

4. Pooling
Reduce the feature map size
(e.g., from 32x32 → 16x16)

5. Repeat
Multiple conv → relu → pool layers
(deeper = more abstract features)

6. Fully Connected Layer


Final prediction (e.g., cat vs dog)

🔍 Example Architecture
Input (32x32x3)

Conv Layer (5x5 filter, 6 filters) → ReLU

Max Pooling (2x2)

Conv Layer (5x5 filter, 16 filters) → ReLU

Max Pooling (2x2)

Flatten

Fully Connected Layer (120) → ReLU

Fully Connected Layer (84) → ReLU

Output Layer (10 classes, softmax)

Page 70 of 73
fi
fi
fl
This is similar to LeNet-5, one of the earliest CNNs used for digit recognition.

📚 Applications of CNNs

Area Application
Recognizing objects in images (e.g., cats, dogs, traf c
Image Classi cation
signs)
Object Detection Identifying and locating objects (e.g., YOLO, SSD)
Semantic
Labeling every pixel (e.g., medical imaging)
Segmentation
Facial Recognition Identifying or verifying faces (e.g., Face ID)
Scene Understanding Autonomous vehicles, robotics

🔬 Training a CNN – Overview


1. Data: Large dataset of labeled images

2. Loss Function: Measures prediction error (e.g., Cross Entropy Loss)

3. Optimizer: Updates weights to reduce loss (e.g., SGD, Adam)

4. Backpropagation: Calculates gradients to adjust lters

5. Epochs: Repeats process multiple times to learn

🧪 Python Code Sample (Keras)


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D,
Flatten, Dense

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(64,
64, 3)),
MaxPooling2D(pool_size=(2, 2)),

Conv2D(64, (3, 3), activation='relu'),


MaxPooling2D(pool_size=(2, 2)),

Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
Page 71 of 73
fi
fi
fi
model.compile(optimizer='adam',
loss='categorical_crossentropy', metrics=['accuracy'])

✅ Summary
• CNNs are powerful models for image and visual data.

• They automatically learn features without manual engineering.

• Used in nearly every major computer vision application today.

🧠 What are CNNs? (Convolutional Neural Networks)


Imagine you’re trying to teach a computer to recognize what’s in a picture — like a dog, a cat, or a
car. Just giving it the whole image as-is won’t work because computers don’t “see” like us. They
see numbers (pixel values).

So we use a special type of deep learning model called a Convolutional Neural Network (CNN) to
help the computer understand what’s in the image.

It’s called "convolutional" because it uses a small grid (called a lter or kernel) that slides over the
image to detect patterns like:

• Edges 🔲

• Corners ◻

• Shapes 🟡

📷 Why CNNs are perfect for images?


Here’s the key idea:
An image is basically a grid of pixels (tiny dots with color values). For example:

• A 100x100 image has 10,000 pixels!

• If it's colored, it has 3 layers (Red, Green, Blue = RGB)

Problem with regular neural networks:

If we give all these pixel values directly to a normal neural network, it will:

• Have millions of weights, making it slow and hard to train

• Forget the position of pixels (like where an eye is in a face)

Page 72 of 73
fi
CNNs x this by:

✅ Looking at small sections of the image at a time


✅ Reusing lters to detect features like edges and textures
✅ Preserving the layout of the image (spatial relationships)

🧩 Think of CNNs like this:


Imagine looking at a photo and moving a small window over it bit by bit.
This window:

• Looks for edges, like where one color sharply changes to another.

• Then, as it keeps scanning, it builds up a sense of what’s in the image.

For example:

• First layer may detect lines ➖

• Second layer may detect shapes (like eyes, nose 👁 👃 )

• Third layer may recognize faces 🧑 🐶

👁 Real-life example:

When you upload a photo to Facebook and it tags your friend’s face — it’s using a CNN to
recognize features like:

• Eyes

• Nose

• Mouth
→ and says “This looks like Akanksha!”

Page 73 of 73
fi
fi

You might also like