IMAGE FEATURES USING
WAVELETS AND APPLICATIONS TO
DOCUMENT IMAGE PROCESSING
Dr. S A Angadi
Professor
Department of Computer Science and Engineering
PG Centre,
Visvesvaraya Technological University, Belgaum
Contents
Document Image Processing : An Introduction
Wavelets : An Overview
Discrete Wavelet Transform and Images
Image Features using Wavelets
Applications
Document Image Processing: An Introduction
DIP/DIA is the theory and practice of recovering the symbol
structures of digital images scanned from paper or produced by
computer
DIA is a subfield of Digital Image processing
Digital images of natural objects: X-rays, fingerprints, faces, scenery,
etc. are NOT part of DIA
Digital images of symbolic objects: Postal addresses, printed articles,
forms, music sheets, engineering drawings, topographic maps belong to
DIA
Source: Scanners, printers, fax machines, hand!
Incidental text: license plates, billboards, subtitles, in photos and video
WWW ??
DIA’s grand goal is take us to the land of paperless office
Document Image Processing
200 dpi images 400 dpi images
Document Image Examples
Document Image Processing
Meter
Mark
Digital
ender’s Address Endorsem Post Mark
ent
In Case of Undeliverable as Addressed Return to Sender
Linear
Code
Delivery Address
Document Image Examples
-Postal documents
Document Image Processing
Document Image Examples-Forms
Document Image Processing
Document Image Examples-Unconstrained Text
Document Image Processing
Document Image
Examples-Graphics
Document Image Processing
Document Image
Analysis
Textual Processing Graphical Processing
Optical Page Region and
Line
Character Layout Symbol
Processing
Recognition Analysis Processing
Text Skew, blocks, Lines, curves, Filled
paragraphs corners regions
Document Image Processing
The ultimate solution of Doc. Image Processing would be for computers
to deal with paper documents as they deal with other forms of computer
media.
A file of picture elements/ pixels is the raw input data to Doc. IP/ Doc.
IA systems
The first step in document analysis is to perform processing on this
image to prepare it for further analysis. Such processing includes:
thresholding to reduce a gray-scale or color image to a binary image,
reduction of noise to reduce extraneous data, and
thinning and region detection to enable easier subsequent detection of
pertinent features and objects of interest.
This is also called pixel-level processing (also called preprocessing and
low-level processing in literature)
Document Image Processing
Document
Data Capture
Pixel Level Processing
Feature Level Analysis
Text Analysis and Recognition Graphics Analysis and Recognition
Document Image Processing
The next major step is extraction of intermediate
features that aid in the final recognition
The features include features from individual pixels or from
collection of pixels or by transforming the information in
collection of pixels (eg: RLC features, normalization and
feature extraction, shape descriptors, compactness,
asymmetry, topology, contour smoothness, Hough
transform features, DCT features DWT features etc)
The features extracted from the image are subjected to
processing, for deriving the information from text/
graphics
Documents Image Processing
There are two main types of analysis that are
applied to text in documents (Text processing)
One is optical character recognition (OCR) to
derive the meaning of the characters and words
from their bit-mapped images, and the other is page
layout analysis to discover formatting of the text,
and from that to derive meaning associated with the
positional and functional blocks in which the text is
located
Document Image Processing
Graphics recognition and interpretation is an
important topic in document image analysis since
graphics elements pervade textual material, with
diagrams illustrating concepts in the text, company
logos heading business letters, and lines separating
fields in tables and sections of text (graphics
Processing)
The regions may be intermixed
For different type of processing, different features are
essential
Document Image Features
For Document Image Analysis features employed are categorized
into,
Image Features
Textual Features
Structural Features
Image features are either extracted directly from the image (e.g. the
Density of black pixels in a region) or extracted from a Segmented
image (e.g. the number of horizontal lines in a segmented block)
Image features extracted at the level of a whole image are called
global image features;
Image features extracted from the regions of an image are called
local image features
Document Image Features
Structural features (e.g. relationships between
objects in the page) are obtained from physical or
logical layout analysis
Textual features (e.g. presence of keywords) may
be computed from OCR output or directly from
document images
Some classifiers use only image features, only
structural features, or only textual features; others
use a combination of features from several groups
Document Image Features
Image Features Structural Features Textual Features
Various levels of Density Physical Layout Textual features
Attributes of connected Logical Structures Textual features obtained
components Results of functional before layout
Column/ row gaps labeling Textual features from
Text histogram Spatial relations OCR results analysis
Location and size of cells
Line and text features
Physical layout features
All these features can be subjected to transforms to obtain more meaningful information
Wavelet Transform: An Overview
What are Transforms
Transforms are applied to raw signal/data to obtain
further information from signal/ data which is not
readily available from raw data.
Fourier Transform gives frequency content of time
domain signal.
It will also give point value representation of a
polynomial
Why do we need frequency content of a signal/ ECG
Wavelet Transform: An Overview
FT provides a signal which contains only the frequency
domain information
It does not give any information of the signal in the
time domain
When time localization of the frequency components is
needed we require a transform giving time frequency
representation
STFT, Wigner Distributions and Wavelet Transforms
provide time frequency representation
Both FT and WT are reversible transforms- one can go
from raw data to processed data and back
Wavelet Transform: An Overview
“The wavelet transform is a tool that cuts up data,
functions or operators into different frequency
components, and then studies each component with
a resolution matched to its scale”
Wavelet Transform: An Overview
tau and s are translation and scale parameters
psi(t) is the transforming function and is called the
mother wavelet
there are two main differences between the STFT and
the CWT
1. The Fourier transforms of the windowed signals are not taken,
and therefore single peak will be seen corresponding to a
sinusoid, i.e., negative frequencies are not computed
2. The width of the window is changed as the transform is
computed for every single spectral component, which is probably
the most significant characteristic of the wavelet transform
Wavelet Transform: An Overview
the CWT can be thought of as the inner product of
the test signal with the basis functions psi_(tau ,s)
(t):
Where
Wavelet Transform: An Overview
Wavelet transform is referred to as “Fourier Transform of 20th Century”
Wavelets are wavelike oscillatory signals of finite bandwidth both in Time
and in Frequency
Wavelets are basis functions of spaces with certain properties
Wavelets provide time scale(frequency) representation of non stationary
signals
Based on multiresolution approximation (MRA)
• Approximate a function at various resolutions using a scaling function,
(t)
• Keep track of details lost using wavelet functions, (t)
• Reconstruct the original signal by adding approximation and detail
coeff
Implemented by using a series of lowpass and highpass filters
• Lowpass filters are associated with the scaling function and provide approximation
• Highpass filters are associated with the wavelet function and provide detail lost in approximating the signal
Discrete Wavelet Transform
The foundations of the DWT go back to 1976 when Croiser, Esteban,
and Galand devised a technique to decompose discrete time signals.
Crochiere, Weber, and Flanagan did a similar work on coding of speech
signals in the same year, they named their analysis scheme as subband
coding
A Discrete Wavelet Transform is any Transform for which
wavelets are discretely sampled
It transforms a discrete time signal to a discrete wavelet
representation
It converts an input series x0, x1, ..xm, into one high-pass
wavelet coefficient series and one low-pass wavelet coefficient
series (of length n/2 each)
Discrete Wavelet Transform
signal
lowpass highpass
filters
Approximation Details
(a) (d)
Discrete Wavelet Transform
The procedure starts with passing this signal
(sequence) through a half band digital lowpass
filter with impulse response h[n]
Filtering a signal corresponds to the mathematical
operation of convolution of the signal with the
impulse response of the filter.
Discrete Wavelet Transform
The DWT analyzes the signal at different frequency
bands with different resolutions by decomposing the
signal into a coarse approximation and detail information
DWT employs two sets of functions, called scaling
functions and wavelet functions, which are associated
with low pass and highpass filters, respectively.
The decomposition of the signal into different frequency
bands is simply obtained by successive highpass and
lowpass filtering of the time domain signal
The original signal x[n] is first passed through a
halfband highpass filter g[n] and a lowpass filter h[n]
Discrete Wavelet Transform
After the filtering, half of the samples can be
eliminated according to the Nyquist’s rule, since
the signal now has a highest frequency of p/2
radians instead of p
The signal can therefore be subsampled by 2,
simply by discarding every other sample. This
constitutes one level of decomposition and can
mathematically be expressed as follows:
Discrete Wavelet Transform
This decomposition halves the time resolution since
only half the number of samples now characterizes the
entire signal. However, this operation doubles the
frequency resolution, since the frequency band of the
signal now spans only half the previous frequency
band, effectively reducing the uncertainty in the
frequency by half
At every level, the filtering and subsampling will
result in half the number of samples (and hence half
the time resolution) and half the frequency band
spanned (and hence double the frequency resolution)
Discrete Wavelet Transform
Discrete Wavelet Transform
In practice, such transformation will be applied
recursively on the low-pass series until the desired
number of iterations is reached
The frequencies that are most prominent in the original signal
will appear as high amplitudes in that region of the DWT
signal that includes those particular frequencies
The difference of this transform from the Fourier transform is
that the time localization of these frequencies will not be lost
However, the time localization will have a resolution that
depends on which level they appear
This procedure in effect offers a good time resolution at high
frequencies, and good frequency resolution at low frequencies
Most practical signals encountered are of this type
Discrete Wavelet Transform
One important property of the discrete wavelet
transform is the relationship between the impulse
responses of the highpass and lowpass filters. The
highpass and lowpass filters are not independent of
each other, and they are related by
where g[n] is the highpass, h[n] is the lowpass filter, and L is the
filter length (in number of points).
Discrete Wavelet Transform
Note that the two filters are odd index alternated
reversed versions of each other. Lowpass to
highpass conversion is provided by the (-1)n term.
Filters satisfying this condition are commonly used
in signal processing, and they are known as the
Quadrature Mirror Filters (QMF). The two filtering
and subsampling operations can be expressed by
Discrete Wavelet Transform
The reconstruction in this case is very easy since half band
filters form orthonormal bases
The procedure is followed in reverse order for the
reconstruction
The signals at every level are upsampled by two, passed
through the synthesis filters g’[n], and h’[n] (highpass and
lowpass, respectively), and then added
The interesting point here is that the analysis and synthesis
filters are identical to each other, except for a time reversal
Therefore, the reconstruction formula becomes (for each layer)
Discrete Wavelet Transform
However, if the filters are not ideal halfband, then
perfect reconstruction cannot be achieved.
Although it is not possible to realize ideal filters,
under certain conditions it is possible to find filters
that provide perfect reconstruction
The most famous ones are the ones developed by
Ingrid Daubechies, and they are known as
Daubechies’ wavelets
2D- Discrete Wavelet Transform
Significant lossy data reduction is possible using
DWT
How do we generalize these concepts to 2D?
2D functions images f(x,y) I[m,n]
intensity function
Reasons to take 2D-DWT of an image
Compression
Denoising
Feature extraction
Discrete Wavelet Transforms
2D-DWT of an image
We start by defining a two-dimensional scaling and
wavelet functions
s ( x, y ) ( x) ( y ) s ( x, y ) ( x) ( y )
“Subset” of scale and position based on power of two
rather than every “possible” set of scale and position in
continuous wavelet transform
Behaves like a filter bank: signal in, coefficients out
Discrete Wavelet Transform
2 D DWT for Image
Discrete Wavelet Transform
2 D DWT for Image
Discrete Wavelet Transform
2 D DWT for Image/ has applications in Image
Compression/ Image Recognition
DWT on Images
~
LL Ak 1
COLUMNS
H 1 2
ROWS ~ 2 1
H
COLUMNS
…… ~ (h)
G 1 2 LH Dk 1
ROWS
INPUT COLUMNS
……
IMAGE ~
H 1 2 HL Dk(v)1
~ 2 1
G
ROWS
~ (d)
G 1 2 D
HH k 1
COLUMNS
LLL LLH LLH
LL LH LH LH
INPUT LHL LHH LL
LHL LHH
IMAGE
HL HH HL HH HL HH
DWT on Images
Downsample columns along the rows: For each row, keep the
2 1 even indexed columns, discard the odd indexed columns
Downsample rows along the columns : For each column, keep
1 2 the even indexed rows, discard the odd indexed rows
Upsample columns along the rows: For each row, insert zeros at
2 1 between every other sample (column)
Upsample rows along the columns: For each column, insert zeros
1 2 at between every other sample (row)
DWT on Images
LL Ak 1 1 2 H
2 1 H
LH (h)
Dk 1 1 2 G
ORIGINAL
IMAGE
Dk(v)1
HL
1 2 H
2 1 G
HH D(d) 1 2 G
k 1
We perform the 2-D wavelet transform by applying 1-D wavelet
transform first on rows and then on columns.
Rows Columns
H 2
H 2 LL
G 2
f(m, n) LH
H 2
G 2
HL
G 2
HH
Integer based Wavelets
By using a so-called lifting scheme, integer-
based wavelets can be created.
Using the integer-based wavelet, one can
simplify the computation.
Integer-based wavelets are also easier to
implement by a VLSI chip than non-integer
wavelets.
Image Features using Wavelets
The wavelet transform provides an appropriate basis for image
handling because of its beneficial features
The characteristics of the wavelet transform are:
The ability to compact most of the signal’s energy into a few
transformation coefficients, which is called energy compaction
The ability to capture and represent effectively low frequency
components (such as image backgrounds) as well as high frequency
transients (such as image edges)
The variable resolution decomposition with almost uncorrelated
coefficients
The ability of a progressive transmission, which facilitates the
reception of an image at different qualities
Image Features using Wavelets
The wavelet transform of the images will lead to
computation four different types of coefficients,
namely,
Approximation Coefficients
Horizontal Coefficients
Vertical Coefficients
Diagonal coefficients
Energy Values of the transformed images at various
levels
Image Features using Wavelets
Different type of texture features can be extracted
at various levels of decomposition from wavelets,
some of the wavelets employed are,
Applications of Wavelets to Image Processing
Wavelets for Image Compression
The Discrete wavelet transform decomposes an image into a set of
successfully smaller orthogonal images. Often it is possible to coarsely
quantize or eliminate low valued coefficients without sacrificing the integrity
of the image
For a given image, you can compute the DWT of, say each row, and discard
all values in the DWT that are less then a certain threshold
We then save only those DWT coefficients that are above the threshold for
each row, and when we need to reconstruct the original image, we simply pad
each row with as many zeros as the number of discarded coefficients, and use
the inverse DWT to reconstruct each row of the original image
We can also analyze the image at different frequency bands, and reconstruct
the original image by using only the coefficients that are of a particular band
Applications of Wavelets to Image Processing
Wavelets for Image Enhancement
Since the DWT decomposes the image into components of different
size, position and orientation, you can alter the coefficients before
reconstruction so that you can attenuate certain characteristics of the
image
Image Fusion
Image fusion combines 2 or more registered images of the same
object into a single object, that in most cases is better than the
original. This is helpful in the medical imaging field where multiple
images from different machines are employed
The images are combined in the transform domain by taking the
highest amplitude coefficient then performing the inverse on the new
fused image
Word Level Script Identification of Text in Low Resolution Images of
Display Boards using Wavelet Features
Wavelet features of image for recognition
applications
Coefficients
Transforms of coefficients
Energy Coefficients etc
Word Level Script Identification of Text in Low Resolution Images of
Display Boards using Wavelet Features
Themethod distinguishes input word into five scripts namely;
Devanagari, Kannada, English, Malayalam and Tamil.
The method investigates use of
zone wise wavelet energy features
wavelet log mean deviation features and
newly obtained properties of wavelet coefficients
for script identification of text in low resolution images of
display boards.
54
City Block Distance Measure for Script Identification using Wavelet Features
Preprocessing and Feature Extraction
55
The preprocessing is done to binarize the image and generate
bounding box around it.
Zone Wise Wavelet Energy Features
The detailed coefficient Dj1 (Horizontal Energy Band) is
divided into three horizontal and four vertical zones at each
level j as shown in figure in next slide.
The three horizontal zones namely top zone, middle zone and
bottom zone covers 30%, 40% and 30% of the region of band
and all vertical zones are divided to have equal size.
Feature Extraction
56 Zone Wise Wavelet Energy Features
Three horizontal and four vertical zones of detailed coefficient Dj1
• Further, 3j horizontal and 4j vertical zone energy features are obtained, at
each level j.
•The method also computes 2 relational features as difference values
between top and middle zone, and middle and bottom again at each level j.
Feature Extraction
57
Then, the wavelet energy feature is computed from the detailed coefficient
Dj3 (Diagonal Energy Band) at every level j.
Hence, totally 10 wavelet energy features are obtained at each level j.
These features (10 at each level) are stored into feature vector X.
u1 N
Ej1h1 ( (D ( m, n )) /( MxN )
j1
m 1 n 1
u2 N
E j1h 2 ( (D
m u 1 1 n 1
( m , n )) /( MxN )
j1
M N
E j 1h 3 ( (D
m u 2 1 n 1
j1 ( m, n )) /( MxN )
Feature Extraction
58 Wavelet Log Mean Deviation Features
•The method computes totally 3j wavelet log mean deviation features at every
resolution level j, which are stored into feature vector X.
M N |Djp (m ,n )| 1)
log(
m 1n 1 Sj
LMDjp
MN
•In the current work, the value of ∂ = 0.001 is experimented
• During experiments it is observed that, the obtained values gives better representation of
texture.
•Further, 2 more additional features that model relation between detailed energy bands are
determined as stated in equations below.
•Hence, this step records 10 features (5 at each level) into feature vector X.
LMDj4 = LMDj1 - LMDj2;
LMDj5 = LMDj2 - LMDj3;
Feature Extraction
59 Wavelet Vertical Run Features
•A wavelet vertical run R(Ø,d) is defined as number of consecutive wavelet
coefficients that runs for a distance greater than or equal to a specified value
d, in a given direction Ø=90 degree (The value 90 is fixed for vertical
direction).
•And the wavelet vertical run feature WRFj2z is number of occurrences of
wavelet vertical runs in a given area or region.
•These statistical features are obtained from vertical detailed coefficient Dj2
halved into four equal sized vertical regions/zones leading to a dimension of
8 features at both decomposition levels (4 features for every level j), which
are further recorded into feature vector X.
Feature Extraction
60 Wavelet Vertical Run Features
WRFj21 WRFj22 WRFj23 WRFj24
Four vertical zones of detailed coefficient Dj2
k*zone_ size
WRFj 2 z R(Ø,d)
n1( k 1)*zone_ size
X = [Ej1h1 Ej1h2 Ej1h3 Ej1h4 Ej1h5 Ej1v1 Ej1v2 Ej1v3 Ej1v4 Ej3 LMDj1 LMDj2 LMDj3 LMDj4 LMDj5
WRFj1 WRFj2 WRFj3 WRFj4, j=1,2 ]
Knowledge Base Construction
61
•For the purpose of knowledge base construction,
the images were captured from display boards of
government offices in India. The image database
consists of 1450 Kannada, 1200 English, 900
Malyalam, 900 Tamil and 900 Devanagari script
word images of varying resolutions
•The images in the database are characterized by
variable number of characters, variable font size and
style, uneven thickness and spacing between
characters, minimal information context, small skew,
noise and other degradations.
Knowledge Bases Construction
62 •Then, 70% of the different samples from each
script are chosen to train the system. The stored
information in the knowledge base sufficiently
characterizes all variations in input and script class
separation.
•It is also noticed that, training system with more
samples will improve the performance of the system.
At the end of training four knowledge bases
WD_IMKB_KAN, WD_IMKB_ENG,
WD_IMKB_MAL, and WD_IMKB_TAM for
Kannada, English, Malyalam and Tamil Scripts are
generated.
Knowledge Bases Construction
63
•Testing is carried out for all word images of
database containing 70% trained and 30% test
samples
•The experimentation is also done to identify the
script of 14081 word images (13271 Kannada and
810 English words) of experimental data set 1 and
14252 word images (13442 Kannada and 810
English words) of data set 2
Script Class Identification
64
• Computational Strategy for Devanagari Script
Identification
•In this stage, horizontal run statistics of test word image are
used to determine whether the written word in display board
image belongs to Hindi or other scripts.
•Initially, the horizontal runs of length greater than 6 are
computed for every row of word image and are stored into a run
feature vector HRV. The vector records row number and run
length count of all runs for all rows.
•Then, the model uses linear discrimanant function D1 to classify
word image into two classes’ w1 and w2 based on run vector.
Where, w1 corresponds to Hindi script and w2 corresponds to
other scripts category.
Script Class Identification
65 City Block Distance Measure for Script Identification
•In this stage, test data instance is processed to obtain wavelet
features, and a feature vector Xt is constructed .
•Xt = [tEj1h1 tEj1h2 tEj1h3 tEj1h4 tEj1h5 tEj1v1 tEj1v2
tEj1v3 tEj1v4 tEj3 tLMDj1 tLMDj2 tLMDj3 tLMDj4 tLMDj5
tWRFj1 tWRFj2 tWRFj3 tWRFj4, j=1,2 ]
•Then, the smallest city block distance between test data
instance Xt and data set of each knowledge base is determined
to obtain distances d1, d2, d3, and d4.
Script Class Identification
66
The smallest distance between test word image and knowledge base is
used to identify the script class.
Results and Analysis
67
The effectiveness of proposed methodology for
script identification using wavelet features has
been evaluated for 33683 low resolution word
images. The images were captured from display
boards of government offices in India.
The proposed methodology has produced good
results for low resolution word images
containing text of different size, font, and
alignment with varying background.
Results and Analysis
68
The approach also identifies script of small skewed text regions.
Hence, the proposed method is robust and achieves an identification
accuracy of 92% for Kannada Script, 97.65% for English, 82.5%
for Malyalam and 87% for Tamil Script.
A closer examination of results revealed that misclassifications
arise due to minimal information content, noise and larger skew,
which affect the texture of region of text and performance of the
texture based approach.
It is also found that, if the knowledge bases are trained for all
variations and degradations, better performance can be obtained.
69
THANK YOU
Useful Link
Matlab wavelet tool using guide
http://www.wavelet.org
http://www.multires.caltech.edu/teaching/
http://www-dsp.rice.edu/software/RWT/
www.multires.caltech.edu/teaching/courses/
waveletcourse/sig95.course.pdf
http://www.amara.com/current/wavelet.html