Image Processing for Beginners
Image Processing for Beginners
References:-
1. Rafael C. Gonzalez, Richard E. Woods, "Digital Image Processing", 2/E, Prentice
– Hall 2001.
2. Scott E Umbaugh, “Computer Vision and Image Processing”, Prentice – Hall
1998.
3. Nick Efford, “Digital Image Processing – a practical approach using Java”,
Pearson Education 2000.
4. John R Jensen, “Introductory Digital Image Processing”, 3/E. Prentice Hall, 2005.
1. Introduction
An image is a picture: a way of recording and presenting information
visually. Since vision is the most advanced of our senses, it is not
surprising that images play the single most important role in human
perception. The information that can be conveyed in images has been
known throughout the centuries to be extraordinary - one picture is worth
a thousand words.
However, unlike human beings, imaging machines can capture and
operate on images generated by sources that cannot be seen by humans.
These include X-ray, ultrasound, electron microscopy, and computer-
generated images. Thus, image processing has become an essential field
that encompasses a wide and varied range of applications.
2. Basic definitions
• Image processing is a general term for the wide range of
techniques that exist for manipulating and modifying images in
various ways.
• A digital image may be defined as a finite, discrete
representation of the original continuous image. A digital image
is composed of a finite number of elements called pixels, each
of which has a particular location and value.
Figure 1.1 the electromagnetic spectrum arranged according to energy per photon.
(b)
(a)
Figure 1.2 Examples of Gamma-Ray imaging a) PET image b) Star explosion 15,000 years
ago
Figure 1.2(b) shows a star exploded about 15,000 years ago, imaged in
the gamma-ray band. Unlike the previous example shown in Figure 1.2(a)
, this image was obtained using the natural radiation of the object being
imaged.
X-rays are widely used for imaging in medicine, industry and astronomy.
In medicine, chest X-ray, illustrated in Figure 1.3(a), is widely used for
medical diagnostics.
4.4 Imaging
Imaging in the Visible and Infrared bands
The visual band of the EM spectrum is the most familiar in all activities
and has the widest scope of application. The infrared band often is used in
conjunction with visual imaging (Multispectral
(Multispectral Imaging). Applications
include light microscopy, astronomy, remote sensing, industry, and law
enforcement. Figure 1.5(a)
5(a) shows a microprocessor image magnified 60
times with a light microscop
microscope,
e, and Figure 1.5(b)
5(b) illustrates infrared
satellite image of the Americas
Americas. Figure 1.5(c) shows a multispectral
ultispectral
image of a hurricane taken by a weather satellite
satellite.
(b)
(a) (c)
Figure 1.5
1.5 Examples of visible and infrared imaging
imaging. a) Microprocessor magnified 60 times.
times
b) Infrared satellite image of the US. c) Multispectral image of Hurricane
(a) (b)
Figure 1.7 MRI images of a human (a) knee, and (b) spine.
(a) (b)
Figure 1.8 Examples of ultrasound imaging. a) Baby b) another view of baby
(a) (b)
Figure 1.9 (a) image of damaged integrated circuit magnified 2500 times (b) fractal image
(a)
(b) (c)
Figure 1.10 (a) Face recognition system for PDA (b) Iris recognition (c) Fingerprint
recognition
output are images, and image analysis techniques whose inputs are
images, but whose outputs are attributes extracted from those images.
1. Image processing techniques include:
1. Binary images
Binary images are the simplest type of images and can take on two
values, typically black and white, or 0 and 1. A binary image is referred
to as a 1-bit image because it takes only 1 binary digit to represent each
pixel. These types of images are frequently used in applications where the
only information required is general shape or outline, for example optical
character recognition (OCR).
Binary images are often created from the gray-scale images via a
threshold operation, where every pixel above the threshold value is turned
white (‘1’), and those below it are turned black (‘0’). In the figure below,
we see examples of binary images.
(a) (b)
Figure 2.1 Binary images. (a) Object outline. (b) Page of text used in OCR application.
2. Gray-scale images
Gray-scale images are referred to as monochrome (one-color) images.
3. Color images
Color images can be modeled as three-band monochrome image data,
where each band of data corresponds to a different color. The actual
information stored in the digital image data is the gray-level information
in each spectral band.
Typical color images are represented as red, green, and blue (RGB
images). Using the 8-bit monochrome standard as a model, the
corresponding color image would have 24-bits/pixel (8-bits for each of
the three color bands red, green, and blue). The figure below illustrates a
representation of a typical RGB color image.
4. Multispectral images
Multispectral images typically contain information outside the normal
human perceptual range. This may include infrared, ultraviolet, X-ray,
acoustic, or radar data. These are not images in the usual sense because
the information represented is not directly visible by the human system.
However, the information is often represented in visual form by mapping
the different spectral bands to RGB components.
Most of the types of file formats fall into the category of bitmap images,
for example:
§ PPM (Portable Pix Map) format
§ TIFF (Tagged Image File Format)
§ GIF (Graphics Interchange Format)
§ JPEG (Joint Photographic Experts Group) format
§ BMP (Windows Bitmap)
§ PNG (Portable Network Graphics)
§ XWD (X Window Dump)
l = f(xa ,yb)
From the above equations, it is evident that l lies in the range
Lmin ≤ l ≤ Lmax
Where Lmin is positive, and Lmax is finite.
The interval [Lmin , Lmax] is called the gray scale. Common practice is to
shift this interval numerically to the interval [0, L-1], where l = 0 is
considered black and l = L-1 is considered white on the gray scale. All
intermediate values are shades of gray varying from black to white.
Figure 2.1 Generating a digital image. (a) Continuous image, (b) A scan line from A to B in
the continuous image (c) Sampling and quantization, (d) Digital scan line.
The digital samples resulting from both sampling and quantization are
shown in Figure 2.1(d). Starting at the top of the image and carrying out
this procedure line by line produces a two-dimensional digital image as
shown in Figure 2.3.
Note that:
• The number of selected values in the sampling process is known as
the image spatial resolution. This is simply the number of pixels
relative to the given image area.
• The number of selected values in the quantization process is called
the grey-level (color level) resolution. This is expressed in terms
of the number of bits allocated to the color levels.
• The quality of a digitized image depends on the resolution
parameters on both processes.
Each element of this matrix array is called pixel. The spatial resolution
(number of pixels) of the digital image is M * N. The gray level
resolution (number of gray levels) L is
k
L= 2
Where k is the number of bits used to represent the gray levels of the
k
digital image. When an image can have 2 gray levels, we can refer to the
image as a “k-bit image”. For example, an image with 256 possible gray-
level values is called an 8-bit image.
The gray levels are integers in the interval [0, L-1]. This interval is called
the gray scale.
The number, b, of bits required to store a digitized image is
b= M* N*k
Example:
For an 8-bit image of size 512×512, determine its gray-scale and storage
size.
Solution k = 8 , M = N = 512
k
Number of gray levels L = 2 = 28 = 256
The gray scale is [0 , 255]
Storage size (b) = M * N * k = 512 * 512 * 8 = 2,097,152 bits
Figure 2.4 A 1024×1024, 8-bit image subsampled down to size 32×32 pixels.
To see the effects resulting from the reduction in the number of samples,
we bring all the subsampled images up to size 1024×1024 by row and
column pixel replication. The resulted images are shown in the figure
below.
Figure 2.5 (a) 1024×1024, 8-bit image. (b) through (f) 512×512, 256×256, 128×128, 64×64,
and 32×32 images resampled into 1024×1024 pixels by row and column duplication
Compare Figure 2.5(a) with the 512×512 image in Figure 2.5(b), we find
that the level of detail lost is simply too fine to be seen on the printed
page at the scale in which these images are shown. Next, the 256×256
image in Figure 2.5(c) shows a very slight fine checkerboard pattern in
the borders between flower petals and the black background. A slightly
more pronounced graininess throughout the image also is beginning to
appear. These effects are much more visible in the 128×128 image in
Figure 2.5(d), and they become pronounced in the 64×64 and 32×32
images in Figures 2.5(e) and (f), respectively.
Example
The pixel values of the following 5×5 image are represented by 8-bit
integers:
k
Determine f with a gray-level resolution of 2 for (i) k=5 and (ii) k=3.
Solution:
Dividing the image by 2 will reduce its gray level resolution by 1 bit.
Hence to reduce the gray level resolution from 8-bit to 5-bit,
8 bits – 5 bits = 3 bits will be reduced
3
Thus, we divide the 8-bit image by 8 (2 ) to get the following 5-bit
image:
2. Distance Measures
For pixels p, q, and z, with coordinates (x, y), (s, t), and (v, w),
respectively, D is a distance function or metric if
a) D(p, q) > = 0 ( D(p,q)= 0 iff p = q ),
b) D(p,q) = D(q,p), and
c) D(p,z) < = D(p,q) + D(q,z).
• Euclidean distance between p and q is defined as
= =
Original image image with rows expanded image with rows and
columns expanded
= =
Original image image with rows expanded image with rows and
columns expanded
Note that the zoomed image has size 2M-1 × 2N-1. However, we can use
techniques such as padding which means adding new columns and/or
rows to the original image in order to perform bilinear interpolation to get
zoomed image of size 2M × 2N.
Figure 3.1 Top row: images zoomed from 128×128, 64×64, and 32×32 pixels to 1024×1024
pixels susing nearest neighbor interpolation. Bottom row, same sequence, but using bilinear
interpolation
B) Shrinking
Shrinking may be viewed as undersampling. Image shrinking is
performed by row-column deletion. For example, to shrink an image by
one-half, we delete every other row and column.
Image algebra
There are two categories of algebraic operations applied to images:
• Arithmetic
• Logic
These operations are performed on a pixel-by-pixel basis between two or
more images, except for the NOT logic operation which requires only one
image. For example, to add images I1 and I2 to create I3:
I3(x,y) = I1(x,y) + I2(x,y)
(c) Subtracting image (b) from (a). Only moving objects appear in the resulting image
The logic operations AND, OR, and NOT form a complete set, meaning
that any other logic operation (XOR, NOR, NAND) can be created by a
combination of these basic elements. They operate in a bit-wise fashion
on pixel data.
The AND and OR operations are used to perform masking operation; that
is; for selecting subimages in an image, as shown in the figure below.
Masking is also called Region of Interest (ROI) processing.
(a) Original image (b) AND image mask (c) Resulting image, (a)
AND (b)
(d) Original image (e) OR image mask (f) Resulting image, (d)
OR (e)
Figure 3.6 Image masking
The NOT operation creates a negative of the original image ,as shown in
the figure below, by inverting each bit within each pixel value.
Image Histogram
The histogram of a digital image is a plot that records the frequency
distribution of gray levels in that image. In other words, the histogram is
a plot of the gray-level values versus the number of pixels at each gray
value. The shape of the histogram provides us with useful information
about the nature of the image content.
The histogram of a digital image f of size M× N and gray levels
in the range [0, L-1] is a discrete function
where is the kth gray level and is the number of pixels in the image
having gray level .
The next figure shows an image and its histogram.
Note that the horizontal axis of the histogram plot (Figure 3.8(b))
represents gray level values, , from 0 to 255. The vertical axis represents
the values of i.e. the number of pixels which have the gray level .
The next figure shows another image and its histogram.
Image Enhancement
Image enhancement aims to process an image so that the output image is
“more suitable” than the original. It is used to solve some computer
imaging problems, or to improve “image quality”. Image enhancement
techniques include smoothing, sharpening, highlighting features, or
normalizing illumination for display and/or analysis.
Image negatives
The negative of an image with gray levels in the range [0, L-1] is
obtained by using the following expression
(a) (b)
Figure 4.1 (a) Original digital mammogram. (b) Negative image obtained by negative
transformation
Piecewise-linear transformation
The form of piecewise linear functions can be arbitrarily complex. Some
important transformations can be formulated only as piecewise functions,
for example thresholding:
For any 0 < t < 255 the threshold transform can be defined as:
Thresholding Transform
255
Output Gray Level, s
204
153
102
51
0
0 51 102 153 204 255
Input Gray Level, r
Thresholding has another form used to generate binary images from the
gray-scale images, i.e.:
Thresholding Transform
255
Output Gray Level, s
204
153
102
51
0
0 51 102 153 204 255
Input Gray Level, r
The figure below shows a gray-scale image and its binary image resulted
from thresholding the original by 120:
(a) (b)
Figure 4.5 Thresholding. (a) Gray-scale image. (b) Result of thresholding (a) by 120
195
180
165
150
135
120
105
90
75
60
45
30
15
0
0
15
30
45
60
75
90
105
120
135
150
165
180
195
210
225
240
255
Input Gray Level, r
Example:
For the following piecewise linear chart determine the equation of
the corresponding grey-level transforms:
195
180
165
150
135
120
105
90
75
60
45
30
15
0
0
15
30
45
60
75
90
105
120
135
150
165
180
195
210
225
240
255
Input Gray Level, r
Solution
We use the straight line formula to compute the equation of each line
segment using two points.
Log transformation
The general form of the log transformation is
Log Transform
10
Output Gray Level, s
9
8
7
6
5
4
3
2
1
0
0
105
120
135
150
165
180
195
210
225
240
255
15
30
45
60
75
90
Power-law transformation
Power-law transformations have the basic form:
(a) (b)
(c) (d)
Figure 4.11 (a) Original MRI image of a human spine. (b)-(d) Results of applying power-law
transformation with c = 1 and y = 0.6,0.4, and 0.3, respectively.
We note that, as gamma decreased from 0.6 to 0.4, more detail became
visible. A further decrease of gamma to 0.3 enhanced a little more detail
in the background, but began to reduce contrast ("washed-out" image).
(a) (b)
(c) (d)
Figure 4.12 (a) Original bright image. (b)-(d) Results of applying power-law transformation
with c = 1 and y = 3, 4, and 5, respectively.
We note that, suitable results were obtained with gamma values of 3.0
and 4.0. The result obtained with y = 5.0 has areas that are too dark, in
which some detail is lost.
The figure below illustrates a gray image shown in four basic gray-level
characteristics: dark, light, low-contrast, and high-contrast. The right side
of the figure shows the histograms corresponding to these images.
Dark
image
Light
image
Low-
contrast
image
High-
contrast
image
Figure 4.13 Four basic image types: dark, light, low-contrast, high-contrast, and their
corresponding histograms.
Contrast stretching
aims to increase (expand) the dynamic range of an image. It transforms
the gray levels in the range {0,1,…, L-1} by a piecewise linear function.
The figure below shows a typical transformation used for contrast
stretching.
The locations of points
(r1, s1) and (r2, s2)
control the shape of the
transformation function.
Contrast Stretching
255
240
225
Output Gray Level, s
210
195
180
165
150
135
120
105
90
75
60
45
30
15
0
15
30
45
60
75
90
0
105
120
135
150
165
180
195
210
225
240
255
will be used to increase the contrast of the image shown in the figure
below:
(a) (b)
(c) (d)
Figure 4.16 Contrast stretching. (a) Original image. (b) Histogram of (a). (c) Result of
contrast stretching. (d) Histogram of (c).
For a given plot, we use the equation of a straight line to compute the
piecewise linear function for each line:
For example the plot in Figure 4.15, for the input gray values in the
interval [28 to 75] we get:
255
Output Gray Level, s
204
153
102
51
0
0
15
30
45
60
75
90
105
120
135
150
165
180
195
210
225
240
255
(a) (b)
(c) (d)
Figure 4.18 (a) Low-contrast image. (b) Histogram of (a). (c) High-contrast image resulted
from applying contrast-stretching in Figure 4.17 on (a). (d) Histogram of (c)
Gray-level slicing
Gray-level slicing aims to highlight a specific range [A…B] of gray
levels. It simply maps all gray levels in the chosen range to a high value.
Other gray levels are either mapped to a low value (Figure 4.19(a)) or left
unchanged (Figure 4.19(b)). Gray-level slicing is used for enhancing
features such as masses of water in satellite imagery. Thus it is useful for
feature extraction.
(a) (b)
Figure 4.19 Gray-level slicing
(b) Operation intensifies desired gray level (c) Result of applying (b) on (a)
range, while preserving other values (background unchanged)
(d) Operation intensifies desired gray level (e) Result of applying (d) on (a) (background
range, while changing other values to black changed to black)
Histogram Equalization
is an automatic enhancement technique which produces an output
(enhanced) image that has a near uniformly distributed histogram.
For continuous functions, the intensity (gray level) in an image
may be viewed as a random variable with its probability density function
(PDF). The PDF at a gray level r represents the expected proportion
(likelihood) of occurrence of gray level r in the image. A transformation
function has the form
The right side of this equation is known as the cumulative histogram for
the input image. This transformation is called histogram equalization or
histogram linearization.
Because a histogram is an approximation to a continuous PDF, perfectly
flat histograms are rare in applications of histogram equalization. Thus,
the histogram equalization results in a near uniform histogram. It spreads
the histogram of the input image so that the gray levels of the equalized
(enhanced) image span a wider range of the gray scale. The net result is
contrast enhancement.
Example:
Suppose that a 3-bit image (L = 8) of size 64 × 64 pixels has the gray
level (intensity) distribution shown in the table below.
r0 = 0 790
r1 = 1 1023
r2 = 2 850
r3 = 3 656
r4 = 4 329
r5 = 5 245
r6 = 6 122
r7 = 7 81
Solution:
M × N = 4096
We compute the normalized histogram:
r0 = 0 790 0.19
r1 = 1 1023 0.25
r2 = 2 850 0.21
r3 = 3 656 0.16
r4 = 4 329 0.08
r5 = 5 245 0.06
r6 = 6 122 0.03
r7 = 7 81 0.02
Normalized histogram
and , , , , ,
Transformation function
We round the values of s to the nearest integer:
These are the values of the equalized histogram. Note that there are only
five gray levels.
Figure 5.1 Left column original images. Center column corresponding histogram equalized
images. Right column histograms of the images in the center column.
Although all the histograms of the equalized images are different, these
images themselves are visually very similar. This is because the
difference between the original images is simply one of contrast, not of
content.
However, in some cases histogram equalization may introduce noise and
other undesired effect to the output images as shown in the figure below.
Note:
The size of mask must be odd (i.e. 3×3, 5×5, etc.) to ensure it has a
center. The smallest meaningful size is 3×3.
Example:
Use the following 3×3mask to perform the convolution process on the
shaded pixels in the 5×5 image below. Write the filtered image.
0 1/6 0 30 40 50 70 90
1/6 1/3 1/6 40 50 80 60 100
0 1/6 0 35 255 70 0 120
3×3 mask 30 45 80 100 130
40 50 90 125 140
5×5 image
Solution:
and so on …
30 40 50 70 90
40 85 65 61 100
Filtered image = 35 118 92 58 120
30 84 77 89 130
40 50 90 125 140
Spatial Filters
Spatial filters can be classified by effect into:
1. Smoothing Spatial Filters: also called lowpass filters. They include:
1.1 Averaging linear filters
1.2 Order-statistics nonlinear filters.
2. Sharpening Spatial Filters: also called highpass filters. For example,
the Laplacian linear filter.
1 1 1 1 2 1
1 1 1 2 4 2
1 1 1 1 2 1
Standard average filter Weighted average filter
Note:
Weighted average filter has different coefficients to give more
importance (weight) to some pixels at the expense of others. The idea
behind that is to reduce blurring in the smoothing process.
(a) (b)
(c) (d)
(e) (f)
Figure 5.2 Effect of averaging filter. (a) Original image. (b)-(f) Results of smoothing with
square averaging filter masks of sizes n = 3,5,9,15, and 35, respectively.
Order-statistics filters
are nonlinear spatial filters whose response is based on ordering (ranking)
the pixels contained in the neighborhood, and then replacing the value of
the center pixel with the value determined by the ranking result.
Examples include Max, Min, and Median filters.
Median filter
It replaces the value at the center by the median pixel value in the
neighborhood, (i.e. the middle element after they are sorted). Median
filters are particularly useful in removing impulse noise (also known as
salt-and-pepper noise). Salt = 255, pepper = 0 gray levels.
In a 3×3 neighborhood the median is the 5th largest value, in a 5×5
neighborhood the 13th largest value, and so on.
For example, suppose that a 3×3 neighborhood has gray levels (10,
20, 0, 20, 255, 20, 20, 25, 15). These values are sorted as
(0,10,15,20,20,20,20,25,255), which results in a median of 20 that
replaces the original pixel value 255 (salt noise).
Example:
Consider the following 5×5 image:
20 30 50 80 100
30 20 80 100 110
25 255 70 0 120
30 30 80 100 130
40 50 90 125 140
Apply a 3×3 median filter on the shaded pixels, and write the filtered
image.
Solution
20 30 50 80 100 20 30 50 80 100
30 20 80 100 110 30 20 80 100 110
25 255 70 0 120 25 255 70 0 120
30 30 80 100 130 30 30 80 100 130
40 50 90 125 140 40 50 90 125 140
Sort: Sort
20, 25, 30, 30, 30, 70, 80, 80, 255 0, 20, 30, 70, 80, 80, 100, 100, 255
20 30 50 80 100
30 20 80 100 110
25 255 70 0 120
30 30 80 100 130
40 50 90 125 140
Sort
0, 70, 80, 80, 100, 100, 110, 120, 130
20 30 50 80 100
30 20 80 100 110
Filtered Image = 25 30 80 100 120
30 30 80 100 130
40 50 90 125 140
(a) (b)
(c)
Figure 5.3 Effect of median filter. (a) Image corrupted by salt & pepper noise. (b) Result of
applying 3×3 standard averaging filter on (a). (c) Result of applying 3×3 median filter on (a).
The second order partial derivatives of the digital image f(x,y) are:
We conclude that:
• 1st derivative detects thick edges while 2nd derivative detects thin
edges.
• 2nd derivative has much stronger response at gray-level step than 1st
derivative.
Thus, we can expect a second-order derivative to enhance fine detail (thin
lines, edges, including noise) much more than a first-order derivative.
Since the Laplacian filter is a linear spatial filter, we can apply it using
the same mechanism of the convolution process. This will produce a
laplacian image that has grayish edge lines and other discontinuities, all
superimposed on a dark, featureless background.
Background features can be "recovered" while still preserving the
sharpening effect of the Laplacian operation simply by adding the
original and Laplacian images.
(a) (b)
(c)
Figure 5.5 Example of applying Laplacian filter. (a) Original image. (b) Laplacian image.
(c) Sharpened image.
Note that, F(0,0) = the average value of f(x,y) and is referred to as the dc
component of the spectrum.
x+y
It is a common practice to multiply the image f(x,y) by (-1) . In this
x+y
case, the DFT of (f(x,y)(-1) ) has its origin located at the centre of the
image, i.e. at (u,v) = (M/2,N/2).
The figure below shows a gray image and its centered Fourier spectrum.
(a)
(b)
Figure 6.1 (a) Gray image. (b) Centered Fourier spectrum of (a)
Phase spectrum
Phase data contains information about where objects are in the image, i.e.
it holds spatial information as shown in the Figure below.
Inverse 2D-DFT
After performing the Fourier transform, if we want to convert the image
from the frequency domain back to the original spatial domain, we apply
the inverse transform. The inverse 2D-DFT is defined as:
Low-pass filter
High-pass filter
The results of applying these two filters on the image in Figure 6.1(a) are
shown in the figure below.