DIP Notes Unit1,3,4,5
DIP Notes Unit1,3,4,5
i. Digital image processing is a method to convert an image into a digital form and
perform some operations on it, in order to get an enhanced image, or to extract some
useful information.
ii. Digital image processing is the analysis and manipulation of digitised image
especially in order to improve its quality, i.e processing of digital image by means of
digital image processing
iii. In this process, the input is an image and the output may be an image or
characteristics of the image
iv. Digital image processing includes 3 basic steps:
a. Importing the image
b. Analysing and manipulating the image which may include data compression
and image enhancement
c. Output which is an enhanced image or a report based on image analysis
v. Some pro purposes of digital image processing are as follows:
a. Visualisation
b. Image enhancement.
c. Image sharpening and restoration
d. Image retrieval
e. Pattern and object recognition.
f. Image analysis
vi. There are three types of digital image processing:
a. Low level processing: input and output both are images.
b. Mid-level processing: the input is an image comma but the output is an
extracted part of the image (image segmentation)
c. High level processing: the input is an image, but the output is analysis result
of the image. Example: face recognition.
vii. There are various fundamental steps in digital image processing, they are as follows:
a. Image acquisition:
This is the first step or process of the fundamental steps of digital image
processing. Image acquisition could be as simple as being given an image
that is already in digital form. Generally, the image acquisition stage involves
pre-processing, such as scaling etc.
b. Image enhancement:
Image enhancement is among the simplest and most appealing areas of
digital image processing. Basically, the idea behind enhancement techniques
is to bring out detail that is obscured, or simply to highlight certain features of
interest in an image. Such as, changing brightness & contrast etc.
c. Image restoration:
Image restoration is about recovering an image that has been degraded due
to factors like noise, blur, or distortion. Image restoration is an area that also
deals with improving the appearance of an image. However, unlike
enhancement, which is subjective, image restoration is objective, in the sense
that restoration techniques tend to be based on mathematical or probabilistic
models of image degradation
d. Colour image processing:
Colour image processing is an area that has been gaining its importance
because of the significant increase in the use of digital images over the
Internet. This phase involves working with color images and processing them
using color models. Color image processing can be performed using different
color representations, such as RGB (Red, Green, Blue), HSV (Hue,
Saturation, Value), or CMYK (Cyan, Magenta, Yellow, Key).
e. Wavelets and multi resolution processing:
This phase deals with the representation of images at different levels of
resolution. It involves decomposing an image into different frequency
components, which can help capture various levels of detail
f. Image compression:
Image compression is the process of reducing the size of an image file
without significantly degrading its quality, which helps in efficient storage and
transmission
g. Morphological processing:
Morphological processing is a set of image processing techniques that deal
with the shape or structure of objects within an image. It focuses on the
extraction and analysis of geometric structures.
h. Segmentation:
Image segmentation is the process of dividing an image into meaningful
segments or regions. It’s crucial for tasks like object detection or recognition.
i. Representation and description:
After segmentation, the image is represented in a form that can be processed
and analyzed further. The representation is crucial for understanding the
structure and characteristics of the segmented objects.
j. Object recognition:
Object detection and recognition involves identifying and labeling objects
within an image based on predefined criteria or descriptors
k. Knowledge base:
Knowledge may be as simple as detailing regions of an image where the
information of interest is known to be located, thus limiting the search that has
to be conducted in seeking that information.
4. Briefly describe relationship and connectivity between
pixels.
6. Short note on
a. Chessboard distance.
b. City-block distance.
8. What is resolution?
i. A resolution is an important aspect of in digital image processing
ii. The word "resolution" may mean many things. It is used to describe the crispness
and clarity of the images seen on screens in the context of digital technology, which
are based on the number of pixels arranged both ways horizontally and vertically.
iii. Image resolution quantifies how much close two lines (say one dark and one light)
can be to each other and still be visibly resolved. Meaning the clarity of the image.
iv. There are 2 common types of resolution: Spatial and Gray-Level (intensity)
resolution.
v. Spatial Resolution:
a. Definition: It represents the number of pixels used to define an image in the
spatial domain. Spatial resolution is the smallest discernible (detect with
difficulty) change in an image.
b. Units: Usually measured in pixels per inch (PPI) or dots per inch (DPI).
c. High Spatial Resolution: An image with a large number of small pixels. It
provides more detail and clarity.
d. Low Spatial Resolution: An image with fewer, larger pixels. It appears blocky
or pixelated.
e. Example:
i. A high-resolution image might be 1920x1080 pixels (Full HD).
ii. A low-resolution image might be 640x480 pixels (VGA).
vi. Intensity Resolution:
a. Definition: It refers to the number of intensity or color levels used to represent
each pixel in the image.
b. Units: Measured in bits per pixel (bpp).
i. 1 bpp: Black and white (2 intensity levels).
ii. 8 bpp: 256 grayscale levels.
iii. 24 bpp: 16.7 million colors (true color).
c. Higher Intensity Resolution: Captures finer intensity or color variations.
d. Lower Intensity Resolution: Causes banding or posterization effects.
e. Example:
i. A grayscale image with 8-bit intensity resolution has 256 levels of
gray.
ii. A color image with 24-bit resolution can represent 16.7 million colors.
vii. Trade-offs in Resolution:
a. High Resolution:
i. Advantages: Greater detail and accuracy.
ii. Disadvantages: Requires more storage and processing power.
b. Low Resolution:
i. Advantages: Smaller file size and faster processing.
ii. Disadvantages: Loss of detail and quality.
9. Explain structure of human eye
Skip
Chaitanya Shinde
Unit-2
1. Explain Image Enhancement in spatial domain
i. Spatial domain refers to the image plane itself, and approaches in this category
are based on direct manipulation of pixels in an image.
ii. The spatial domain is used to define the actual spatial coordinates of pixels within
an image
iii. Image Enhancement in the Spatial Domain refers to techniques used to improve
the visual quality of an image by modifying its pixel values directly, unlike
Frequency domain processing techniques are based on modifying the Fourier
transform of the image.
iv. Suppose we have a digital image which can be represented by a two dimensional
random field f ( x, y ) .
v. Spatial domain processes will be denoted by the expression:
g ( x, y) [ T=f(x,y) ] or s=T(r)
vi. The value of pixels, before and after processing, will be denoted by r and s,
respectively.
vii. Where f(x, y) is the input to the image, g(x, y) is the processed image and T is an
operator on f.
viii. T stands for Transformation function which works on the input image in order to
enhance the image and get the output image.
ix. The three basic types of functions used frequently for image enhancement in
spatial domain:
a. Linear Functions.
b. Logarithmic functions.
c. Power Law functions.
ix. Skeleton
a. Skeletonization is a process for reducing foreground regions in a binary
image to a skeletal remnant that largely preserves the extent and connectivity
of the original region while throwing away most of the original foreground
pixels.
Unit-4
1. Short note on Color images.
i. A color image is a type of digital image that represents visual information using
multiple color channels, typically red, green, and blue (RGB).
ii. These images are used to represent the real-world appearance of objects by
combining colors in varying intensities.
iii. They are described using color models like RGB, HSV, or CMYK, and are widely
used in diverse applications such as multimedia, computer vision, and scientific
imaging.
iv. RGB (Red, Green, Blue): Most common model for digital displays. Each pixel has
three values corresponding to red, green, and blue intensities.
v. CMY/CMYK (Cyan, Magenta, Yellow, Black): Used in printing.
vi. HSV (Hue, Saturation, Value): Represents color in a more intuitive way based on its
attributes.
vii. Advantages of Color Images
a. Realistic representation of scenes and objects.
b. Essential for applications like photography, video, and visualizations.
viii. Limitations
a. Storage and Bandwidth: Higher data requirements compared to grayscale
images.
b. Processing Complexity: Requires advanced algorithms for analysis and
enhancement.
c. Illumination Sensitivity: Color perception may vary under different lighting
conditions.
iv. Advantages:
a. Simple and computationally efficient.
b. Works well for images with distinct intensity differences.
v. Limitations:
a. Fails for images with overlapping intensity ranges between regions.
b. Sensitive to noise and illumination variations.
c. Ineffective for complex textures or gradients.
vi. Applications:
a. Document text extraction.
b. Medical imaging.
c. Object detection in computer vision.
6. Short note on Edge-based segmentation using:
a. Laplacian mask
b. Sobel mask
c. Prewitt mask
d. Canny edge detection
i. Laplacian Operator
a. Laplacian Operator is also a derivative operator which is used to find
edges in an image. Laplacian is a second order derivative mask. It can be
further divided into positive laplacian and negative laplacian.
ii. The sobel operator is very similar to Prewitt operator. It is also a derivate mask
and is used for edge detection. It also calculates edges in both horizontal and
vertical direction.
iii. Prewitt operator is used for detecting edges horizontally and vertically
7. Explain Hough Transform
i. The Hough Transform is a popular technique in computer vision and image
processing, used for detecting geometric shapes like lines, circles, and other
parametric curves.
ii. It has numerous applications in various domains such as medical imaging, robotics,
and autonomous driving.
iii. Fundamentally, it transfers these shapes' representation from the spatial domain to
the parameter space, allowing for effective detection even in the face of distortions
like noise or occlusion.
iv. The accumulator array, sometimes referred to as the parameter space or Hough
space, is the first thing that the Hough Transform creates.
v. The available parameter values for the shapes that are being detected are
represented by this space. The slope (m) and y-intercept (b) of a line, for instance,
could be the parameters in the line detection scenario.
vi. The Hough Transform calculates the matching curves in the parameter space for
each edge point in the image. This is accomplished by finding the curve that
intersects the parameter values at the spot by iterating over all possible values of the
parameters. The "votes" or intersections for every combination of parameters are
recorded by the accumulator array.
vii. In the end, the programme finds peaks in the accumulator array that match the
parameters of the shapes it has identified. These peaks show whether the image
contains lines, circles, or other shapes.
viii. Applications of Hough Transform
a. Line Detection: In applications such as road lane detection, architectural
feature recognition, and industrial inspection, Hough Transform is often used
to detect straight lines.
b. Circle Detection: It is widely used in applications like detecting circular objects
(e.g., detecting traffic signs, medical imaging for detecting circular structures
like blood vessels or tumors).
c. Shape Matching: In pattern recognition, Hough Transform can be used to
match specific geometric shapes to a given image.
d. Robotics: In robotic vision, the Hough Transform helps in detecting features
like walls, obstacles, or paths.
i. Image Compression is the process of reducing the size of an image file without
compromising the image quality beyond a certain acceptable limit.
ii. This reduction in file size is achieved by eliminating redundant data within the image.
iii. The goal is to reduce the storage space needed for the image and minimize the
bandwidth required for its transmission, while maintaining the image quality as much
as possible.
iv. There are two main types of image compression:
a. Lossy Compression:
i. In lossy compression, some of the image data is permanently
removed, leading to a loss of quality.
ii. The reduction in quality is typically not noticeable to the human eye
but can be significant if compressed too much.
iii. JPEG is the most common example of a lossy compression format.
b. Lossless Compression:
i. In lossless compression, no image data is lost, and the original image
can be perfectly reconstructed from the compressed version.
ii. While lossless compression may not achieve as much reduction in
size as lossy compression, it ensures that the image quality is
preserved.
iii. PNG and GIF are examples of lossless compression formats.
v. The need for image compression arises due to several practical reasons, including:
a. Storage Efficiency:
i. Large File Sizes: Digital images, especially high-resolution ones, can
have large file sizes. Storing high-quality images can consume
significant disk space, especially for large collections, such as medical
images, satellite images, or photographs.
ii. Reduces Storage Requirements: Image compression reduces the
storage space required for each image, allowing more images to be
stored on devices with limited storage, such as mobile phones,
computers, and cloud storage systems.
b. Faster Transmission and Downloading:
i. Bandwidth Limitation: Sending high-resolution images over the
internet requires significant bandwidth. This can result in slower
transmission times, especially over slow or congested networks. By
compressing images, their file size decreases, enabling faster loading,
downloading, and sharing.
ii. Web Usage: For websites, reducing the size of images helps in faster
page loading times, leading to better user experience and potentially
improved search engine rankings (since page speed is a ranking
factor).
c. Efficient Network Communication
i. Transmission Over Limited Networks: In scenarios like satellite
communication, mobile networks, and video conferencing, transmitting
large image files can be problematic due to bandwidth constraints.
Compressing images reduces the amount of data transmitted over
these limited networks, improving communication efficiency.
ii. Streaming Media: Image compression is also crucial in streaming
applications, where large amounts of image data (such as video
frames) need to be transmitted in real-time. Compression ensures
smooth playback with reduced buffering.
d. Improved Performance for Applications
i. Real-Time Processing: In applications like video surveillance, medical
imaging, and remote sensing, large image files may need to be
processed in real-time. By compressing the images, the computational
load and memory requirements are reduced, leading to faster
processing times.
ii. Storage in Embedded Systems: Devices like cameras, smartphones,
and drones often use image compression to store images in a limited
memory. Compressing images allows these devices to store more
images without compromising on image capture quality.
e. Preservation of Data Integrity
i. Reducing Data Redundancy: In an image, certain data is redundant,
such as repetitive patterns, similar pixel values, or areas of uniform
color. Image compression helps remove this redundant data, leading
to efficient storage without losing valuable information in the image,
especially in lossless compression.
2. Explain following Redundancy in images
a. Data redundancy
b. Coding redundancy
c. Interpixel redundancy
d. Psychovisual redundancy
i. Redundancy in images refers to the presence of repetitive or unnecessary
information that can be exploited to compress the image without losing important
details.
ii. Image redundancy is the key factor that enables image compression techniques to
reduce file size while maintaining quality.
iii. There are different types of redundancy in images, which can be classified into the
following categories:
iv. Data Redundancy
a. Definition:
i. Data redundancy refers to the repetition of data within the image file. It
occurs when the same information is stored multiple times in the
image, and it can be removed or reduced during compression without
affecting the image's content.
b. Examples:
i. Repeated Patterns: In an image, large areas with uniform colors or
patterns can cause data redundancy. For example, an image with a
blue sky or a flat-colored background can be compressed by noting
that the color remains constant over a large area.
ii. Same Pixel Values: Multiple adjacent pixels may have the same color
or intensity values, especially in images with smooth gradients or
areas of uniform color. This redundancy can be reduced by storing
only one value and referencing it multiple times.
c. Compression Techniques:
i. Run-Length Encoding (RLE): This technique is used to encode data
redundancy. It stores sequences of identical values as a single value
and a count, reducing the amount of data required to represent
repetitive pixel values.
v. Coding Redundancy
a. Definition:
i. Coding redundancy arises from inefficient coding schemes or the use
of fixed-length codes to represent pixel values. It refers to the use of
codes that take up more bits than necessary to represent the
information.
b. Examples:
i. Huffman Coding: In images, some pixel values (or combinations of
pixel values) occur more frequently than others. Coding redundancy
can be reduced by assigning shorter codes to frequently occurring
values and longer codes to less frequent values, which optimizes the
storage.
c. Compression Techniques:
i. Entropy Encoding: This is a technique used to reduce coding
redundancy by assigning shorter codes to more frequent symbols
(such as pixel values). Examples include Huffman Coding and
Arithmetic Coding. These algorithms effectively compress data by
minimizing the average number of bits per symbol.
vi. Interpixel Redundancy
a. Definition:
i. Interpixel redundancy refers to the correlation or similarity between
adjacent pixels or neighboring regions in an image. This redundancy
is present because adjacent pixels often have similar values,
especially in regions of uniform color or smooth gradients.
b. Examples:
i. Smooth Gradients: In natural images or photographs, neighboring
pixels tend to have similar color values, creating interpixel
redundancy. For example, in an image of a sunset, the pixels that
make up the sky may be similar to each other, leading to redundancy.
ii. Edges: Even in areas with high contrast (such as object boundaries),
there is often some degree of correlation between adjacent pixels,
where certain patterns of edge transitions repeat across the image.
c. Compression Techniques:
i. Prediction: One method to exploit interpixel redundancy is to predict
the value of a pixel based on its neighboring pixels, encoding only the
difference (error) between the predicted and actual pixel value. This is
done in techniques such as Differential Pulse Code Modulation
(DPCM).
ii. Transform Coding: In methods like Discrete Cosine Transform (DCT)
(used in JPEG compression), the image is transformed into a
frequency domain where redundant low-frequency information is
grouped together, and high-frequency components are discarded.
vii. Psychovisual Redundancy
a. Definition:
i. Psychovisual redundancy exploits the limitations of human vision. The
human visual system is less sensitive to certain kinds of image details,
such as small color variations or high-frequency components, making
them redundant from a perceptual perspective.
b. Examples:
i. Color Sensitivity: Humans are more sensitive to changes in brightness
(luminance) than to changes in color (chrominance). As a result, small
color variations may not be noticeable, even though they are present
in the image.
ii. High-frequency Details: The human eye has limited resolution for
detecting fine details, especially at high frequencies. Therefore, high-
frequency image components (such as sharp textures or noise) can
be removed or compressed without a noticeable loss in perceived
quality.
c. Compression Techniques:
i. Chroma Subsampling: This technique reduces the resolution of the
color components (chrominance) of an image while keeping the
brightness component (luminance) at full resolution.
ii. Perceptual Quantization: This method reduces the precision of certain
image details that are less noticeable to the human eye, thus
achieving better compression without significant perceptual loss.
3. Explain Run-length coding.
i. Run-length encoding (RLE) is a form of lossless data compression in which runs of
data (consecutive occurrences of the same data value) are stored as a single
occurrence of that data value and a count of its consecutive occurrences, rather than
as the original run.
ii. This method is used to replace consecutive repeating occurrence of symbol by 1
occurrence of symbol followed by number of occurrence.
iii. Example: when encoding an image built up from colored dots, the sequence "green
green green green " is shortened to "green x 4"
iv. For example, if the input string is “wwwwaaadexxxxxx”, then the function should
return “w04a03d01e01x06”
v. Follow the steps below to solve this problem:
a. Pick the first character from the source string.
b. Append the picked character to the destination string.
c. Count the number of subsequent occurrences of the picked character and
append the count to the destination string.
d. Pick the next character and repeat steps 2, 3 and 4 if the end of the string is
NOT reached.
vi. Can compress any type of data but cannot achieve high compression ratios
compared to other compression methods.
vii. The original data can be perfectly reconstructed from the encoded data.