Pattern Recognition
1. Pattern Recognition Principles
by Julius T. Tou, Rafael C. Gonzales
2. Pattern Recognition and Machine Learning
by Christopher M. Bishop
3. Pattern Classification by
Richard Duda, Peter Hart, David Stork
Introduction
Pattern Recognition (PR) is an important
aspect of Artificial Intelligence (AI)
What is AI ?
AI is the part of computer science
concerned with designing intelligent
computer systems that exhibit the
characteristics we associate with
intelligence in human behavior -
understanding language, learning,
reasoning and solving problems etc.
Some topics of AI
Natural Language Processing to enable it
to communicate successfully in natural
languages.
Knowledge Representation to store what
it knows.
Automated Reasoning to use stored
information to answer questions and to
draw new conclusions.
Contd…
Machine Learning to adapt to new
circumstances and to detect patterns.
Computer Vision to perceive objects.
Robotics to manipulate objects and
move about.
❖ Note : In every aspect of human
intelligence recognition and / or
learning is involved.
Some Examples of Learning…
When we see we recognize or learn
When we hear we recognize or learn
When we touch we recognize or learn
When we smell we recognize or learn
When we taste we recognize or learn
Contd…
In fact, whatever we do with our five
sense organs is either a kind of
recognition or learning.
Thus recognition is an important part of
human intelligence. And accordingly,
pattern recognition has become an
important part of Artificial Intelligence.
The problem of pattern recognition can
be divided into two parts :
Part-1…
➢ First part : It is concerned with the
study of recognition mechanism of
patterns by human and other living
organism. This part is related to the
disciplines like physiology, psychology,
biology, etc.
Part-2…
➢ Second part : It deals with the
development of theory and techniques
for designing a device which can
perform these recognition tasks
automatically. This area is related to
engineering, computer and information
sciences. In this curriculum, we shall be
dealing with the second part i.e. the
problems of automatic machine
recognition of patterns.
Role of memory in pattern
recognition
There are two aspects of the memory in
this process.
• First, the part which holds the
information we can recall, e.g., a poem,
a face, a vocabulary, a theorem etc. To
use computer terminology, this form of
memory is addressable, and its content
can be recalled In this type of problem,
PR is a two-step process :
➢ Knowledge representation
➢ Searching
Contd…
• Secondly, the information which
presumably is stored somewhere, but
which we cannot retrieve, such as we
cannot describe how we balance when
we walk, how we recognize a speech or
how we drive a car or similar aspects of
pattern processing, although the
information must be stored somewhere
in our brain and nervous system.
Contd…
This form of memory is not addressable,
and its content can not be recalled. In
this case, the PR process is a two-fold
task :
➢ Developing some decision rules based
on previous knowledge (Learning)
➢ Use it for taking decision regarding an
unknown pattern (Classification)
Some of the important Pattern
Recognition application areas
1) MAN-MACHINE Communication-
(a) Automatic Speech Recondite,
(b) Speaker Identification,
(c) OCR Systems,
(d) Cursive Script Recognition
(e) Image Understanding.
Contd…
2) BIOMEDICAL Applications-
(a) ECG, EEG, EMG Analysis,
(b) Cytological, Histological and other
Stereological Applications,
(c) X-ray Analysis
(d) Medical Diagnostics.
Contd…
3) APPLICATION IN PHYSICS–
(a) High Energy Physics,
(b) Bubble Chamber and other Forms of
Track Analysis.
4) CRIME AND CRIMINAL
DETECTION-
Contd…
(a) Fingerprint,
(b) Hand Writing
(c) Speech Sound and
(d) Photographs
5) NATURAL RESOURCES STUDY
AND Estimation-
Contd…
(a) Agriculture,
(b) Hydrology,
(c) Forestry,
(d) Geology,
(e) Environment,
(f) Cloud Pattern,
(g) Urban Quality
Contd…
6) STEREOLOGICAL
APPLICATIONS-
a) Metal Processing.
b) Mineral Processing and
c) Biology.
7) MILITARY APPPICATIONS-
Contd…
(a)Detection of Nuclear Explosions,
(b)Missile Guidance and Detection
(c)Radar and Sonar Signal Detection,
(d)Target Identification.
(e)Naval Submarine Detection etc.
A simple problem
To recognize the voice of some
unknown speakers
What is Pattern ?
A pattern is a description of an object
that we are going to recognize.
➢ Description means a set of
measurements
➢ But all these measurements are not
significant in the context of
recognition
Example
Given a set of professional basketball
players and wrestlers, how to recognize
an individual of the set to be a player of
any one of these two games.
➢ Measurements : Age, Qualification,
Height, Weight
➢ Basis of feature selection : Basketball
players are taller and slimmer, whereas
wrestlers are comparatively shorter and
heavier.
➢ Feature: Height, Weight.
Pattern vector or Feature
vector
When each feature is considered as a
component of a vector, it is called a
Pattern vector or Feature vector.
Feature Space
A pattern vector is represented as a
point in n-dimensional Euclidian space
Contd…
-
Decision Space
When different regions of the feature
space are identified to belong to
different classes, then such a space is
called a decision space
Contd…
Operating stages in a
Pattern Classifier
Steps in a typical PR System
Physical Measure Feature Decision
System -ment Space Space
Space
Classification of PR
Pattern Recognition
Supervised Unsupervised
Non-Hierarchical Hierarchical
(Partitional)
Different approaches in PR
techniques/methodologies
Pattern Classification by Decision Functions
Pattern Classification by Distance Functions
Pattern Classification by Likelihood
Functions
❑ Trainable Pattern Classifiers-The
Deterministic Approach
❑ Trainable Pattern Classifiers-The Statistical
Approach
❑ Syntactic Pattern Recognition
❑ Pattern Preprocessing and Feature
Selection
Introduction to Decision
Function x x = 2
1
⇒ w1 x1 + w2 x2 + w3 = 0 … (1)
where w1 = 1, w2 = -1, w3 = 0
x2 c2
❏ Let D(x) = w1 x1 + w2 x2 + w3 then
❏ If D(x) > 0, then x ∈ c1
❏ If D(x) < 0, then x ∈ c2
❏ If D(x) = 0, decide arbitrarily
c1
➢ This is an equation for a straight line
separating two classes in 2D
➢ If the two classes are in 3D, we need a plane to
separate them
x1 w1 x1 + w2 x2 + w3 x3 + w4 = 0
Introduction to Decision
Function
❏ If the two classes are in a feature space of dimensionality
(ℝn) more than 3, then we need a hyperplane to separate
them
w1 x1 + w2 x2 + w3 x3 + … + wn xn + wn+1 = 0
Pattern Classification by
Decision Functions
Let C1,C2,…Cj,…Cm be designated as
the m possible pattern classes in an
N-dimensional feature vector space X
and let
X = [ x1 , x2 ,..., xn ,..., xN ]
/
Contd…
be the unknown pattern vector, where
xn represents the nth feature
measurement.
❑ If the pattern X is a member of
class Ck the decision function Dk(X)
associated with the class
Ck,k=1.2,…m must then possess the
largest value. In other words,
Contd…
Decide X€Ck if Dk(X) > Dj(X) with K≠j,
k,j=1,2,…,m.
Ties are resolved arbitrarily. The
decision boundary in the N-
dimensional feature space Ωx,
between regions associated with
classes Ck and Cj respectively would
be governed by the expression
Contd…
Dk(X) – Dj(X) =0 with k≠j, k,j=1,2,…,m.
❑ Let us now consider two hypothetical
pattern classes in R2 as shown below
Contd…
-
Figure : A simple decision function for two pattern classes
Contd…
Let d(x)=w1x1+w2x2+w3 be the
equation of a separating line where
the w’s are parameters and x1,x2are
the general coordinate variables.
❑ It is clear from the figure that
X€C1(or ω1), if D(X)>0 and
X€C2(or ω2) if D(X)<0
Contd…
The success of the said pattern
classification scheme depends on two
factors :
◼ The form of d(X)
◼ One’s ability to determine its
coefficients
The first problem is directly related to
the geometrical properties of the pattern
classes under consideration. Unless
some a priori information is available,
Contd…
the only way to establish the effectiveness
of a chosen decision function is by direct
trial.
❑ Once a certain function (or functions if
more than two classes are involved) has
been selected, the problem becomes the
determination of the coefficients. Several
adaptive and training schemes exist that
can solve this problem.
Contd…
Sometimes, some sample pattern can
be utilized in order to determine the
coefficients which characterize an
already specified decision function.
General n-dimensional
form of decision function
A general linear decision function is of
the form
d(X)=w1x1+w2x2+…+wnxn+wn+1
= W’0 X+wn+1 where
W0=(w1,w2,….,wn)’
This vector is referred to as weight or
parameter vector.
Contd…
It is a widely accepted convention to
append a 1 after the last component of
all pattern vectors
d(X)=W’X
where X=(x1,x2…..xn,1)’ and
W=(w1,w2,……wn,wn+1)’ are called
augmented pattern and weight vectors,
respectively. Since the same quantity is
equally appended to all patterns, the
basic geometrical properties of the
pattern classes are not disturbed.
Contd…
In the two-class case a decision
function d(X) is assumed to have the
property
0 if X 1
d ( X ) = W X
0 if X 2
When we have more than two classes,
denoted by w1,w2…wM we consider
the following multi-class cases.
Contd…
Case 1. Each pattern class is
separable from the other classes by a
single decision surface. In this case
there are M decision functions with
the property
0 if X i
d i( X ) =W i X = , i = 1,2,..., M
0 otherwise
Contd…
Where Wi=(wi1,wi2,….,win,wi,n+1)’ is the weight
vector associated with the ith decision function
Case 2. Each pattern class is separable from
every other individual class by a distinct
decision surface, that is the classes are
pairwise separable. In this case there are M(M-
1)/2 (the combination of M classes taken two
at a time) decision surface. The decision
functions here are of the from dij(X)=Wij’X and
have the property that,
Contd…
if x belongs to class ωi then dij(X)>0 for all j≠i
These functions also have the property that
dij(X)=-dji(X)
It is not uncommon to find problems involving
a combination of class 1 and 2. These
situations require fewer than the M(M-1)/2
decision surfaces which would be needed if all
the classes were only pairwise separable.
Example-1
A simple example of multi-class case-
1 is shown in the following figure
Contd…
It is noted that each class is separable from
the rest by a single decision boundary.
❑ if x belongs to class ω1, then d1(X)>0 while
d2(X)<0 and d3(X)<0
❑ The boundary between class ω1 and the other
classes is given by the values of x for which
d1(X)=0
❑ As a numerical illustration assume that the
decision functions of the above Figure to be
d 1( X ) = − x1 + x 2 , d 2( X ) = x1 + x 2 −5, d 3( X ) = − x 2 +1
Contd…
❑ The three decision boundaries are,
therefore,
-x1+x2=0, x1+x2-5=0, -x2+1=0
❑ For example, suppose that it is
desired to classify the pattern
x=(6,5)’. Substituting this pattern
into the three decision function yields
Contd…
d1(X)=-1, d2(X)=6, d3(X)=-4
❑ Since d2(X)>0while d1(X)<0 and
d3(X)<0 the pattern is assigned to
class ω2
Example2
The following Figure illustrates three
pattern classes separable under case 2
conditions. x2 d ( x) = 0
23
x1
d ( x) = 0 d12 ( x) = 0
Contd…
Here no class is separable from the
other by a single decision surface.
Each boundary shown is capable of
separating just two classes.
❑ Let us assume the following
numerical values:
d12(X)=-x1-x2+5, d13(X)=-x1+3,
d23(X)=-x1+x2
Contd…
❑ The decision boundaries are again
determined by setting the decision
function equal to zero. The decision
regions, however, are now given by
the positive sides of multiple decision
boundaries.
❑ For example, if x belongs to class ωi
then d12(X)>0 and d13(X)>0
Contd…
❑ The value of d23(X) in this region is
irrelevant since d23(X) is not related
to class ω1.
Suppose that it is desired to classify
the pattern x=(4,3)’. Substitution of
this pattern into above decision
functions yields
d12(X)=-2, d13(X)=-1, d23(X)=-1
Contd…
❑ Since these functions have the
property that dij(X)=-dji (X).
❑ It follows that d21(X)=2, d31(X)=1,
d32(X)=1
❑ That is, d3j(X)>0 for j=1,2
❑ Therefore we assign the pattern to
class ω3.
Geometrical properties of linear
decision functions (Hyperplane
Properties)
In the two-class problem, as well as
multi-class cases 1 and 2, the
equation of the surface separating the
pattern classes is obtained by letting
the decision functions be equal to
zero. In other words, in the two-class
case the surface between the two
pattern populations is given by the
equation
Contd…
d(X)=w1x1+w2x2+…+wnxn+wn+1=0…….….(1)
❑ In case 1, the equation of the boundary ωi
between and the remaining classes is
given by
di(X)=wi1x1+wi2x2+….+winxn+wi,n+1=0….….(2)
Contd…
❑ Similarly, In case 2, the boundary
between ωi and ωj is given by
dij(X)=wij1x1+wij2x2+….+wijnxn+wij,n+1=0
❑ In general, the equation of the
decision surface between classes ωi
and ωj is given by
d(X)=w1x1+w2x2+…+wnxn+wn+1=0
Contd…
= W’0X+wn+1=0…………..(3)
Where W0=(w1,w2…..wn)’
Equation (3) is recognized as the
equation of a line when n=2 and as
the equation of a plane when n=3.
When n>3, Eq. (3) is the equation of
a hyperplane.
Contd…
A “hyperplane” is schematically
shown in the following figure.
Hyperplane
W o X +wn+1 = 0
Figure 1. Some geometrical properties of hyperplanes
Contd…
Let u be a unit normal to the
hyperplane at same point p and
oriented to the positive side of the
hyperplane. From geometrical
considerations the equation of the
hyperplane may be written as
u’(x-p)=0……………(4)
or
u’x=u’p….……………(5)
Contd…
Dividing Eq. (3) by Wo = w1 + w2 + ... + wn
2 2 2
results in the equation
Wo X wn +1
=− ………………………(6)
Wo Wo
Comparing Eqs. (5) and (6), we see that the
unit normal to the hyperplane is given by
Contd…
Wo
u= ………………(7)
Wo
Also,
w n −1
u' p = − ………………(8)
Wo
Contd…
It is seen by comparing Fig. 1 and Eq.
(8) that the absolute value of u’p
represents the normal distance from
the origin to the hyperplane.
Denoting this distance by Du’ we
obtain
w n +1
Du = ……………………(9)
Wo
Contd…
From Fig. 1 it also reveals that the
normal distance Dx from the
hyperplane to an arbitrary point x is
given by
D x = u' x − u' p
Wo X w n +1 W o X + w n+1
= + =
Wo Wo Wo
Contd…
The unit normal u indicates the orientation
of the hyperplane. If any component of u is
zero, the hyperplane is parallel to the
coordinate axis which corresponds to that
component. Therefore, since,u=Wo/||Wo||
it is possible to tell by inspection of the
vector W0 weather a particular hyperpalne
is parallel to any of the coordinate axes. We
also see from Eq. (9) that if wn+1=0 the
hyperpalne passes through the origin.
Generalized Decision Functions
A generalized form of the decision
function can be defined as :
d ( X ) = w1 f1 ( X ) + w 2 f 2 ( X ) + + w K f K ( X ) + w K +1
K +1
= wi f i ( X ) …………………………(1)
i =1
Contd…
where the {fi(X),i=1,2,…,K, are real,
single-valued functions of the pattern
X,fk+1(X)=1, and K+1 is the number
of terms used in the expansion. The
Eq.(1) represents an infinite variety
of decision functions, depending on
the choice of the functions {fi(X)} and
on the number of terms used in the
expansion.
Contd…
In spite of the fact that the above
equation could represent very
complex decision functions, it is
possible to treat these functions as
linear by virtue of a transformation.
For that we define a vector X* whose
components are the functions fi(X),
that is,
Contd…
f1 ( X )
f2 ( X )
X = ……………………..……………(2)
fK (X )
1
Using Eq. (2), we may express (1) as d ( X ) = W X …………(3)
Contd…
where W=(w1,w2,….,wk,wk+1)’.
❑ Once evaluated, the functions {fi(X)} are
nothing more than a set of numerical
values, and X* is simply a K-dimensional
vector which has been augmented by 1.
Therefore, Eq. (3) represents a linear
function with respect to the new patterns
X*. Thus any decision function of the form
shown in Eq. (1) can be treated as linear by
virtue.
Thank You