Unit – V
Pattern Recognition
Dr.K.Sampath Kumar
SCSE/GU
Introduction
to
Pattern Recognition
Introduction to Pattern Recognition
The applications of Pattern Recognition can be found
everywhere.
Examples include disease categorization, prediction of survival
rates for patients of specific disease,
fingerprint verification,
face recognition,
iris discrimination,
chromosome shape discrimination,
optical character
recognition,
texture discrimination,
speech recognition, and etc.
• Pattern recognition is a branch of machine
learning that focuses on the recognition of patterns
and regularities in data.
• Pattern recognition systems are in many cases
trained from labeled "training" data (supervised
learning), but when no labeled data are available
other algorithms can be used to discover previously
unknown patterns (unsupervised learning).
• Machine learning is the common term for supervised
learning methods and originates from artificial
intelligence, whereas KDD and data mining have a
larger focus on unsupervised methods and stronger
connection to business use.
• Pattern recognition has its origins in engineering, and
the term is popular in the context of computer vision:
a leading computer vision conference is
named Conference on Computer Vision and Pattern
Recognition.
Pattern Recognition?
“The assignment of a physical object or event to one of
several pre-specified categories” -- Duda & Hart
•A pattern is an object, process or event
•A class (or category) is a set of patterns that share
common attribute (features) usually from the same
information source
•During recognition (or classification) classes
are assigned to the objects.
•A classifier is a machine that performs such task
Pattern Recognition Phases
Preprocessing
Classification
A Complete PR System
Problem Formulation
Measurements
Features Classification
Input Preprocessing Class
object Label
Basic ingredients:
•Measurement space (e.g., image intensity, pressure)
•Features (e.g., corners, spectral energy)
•Classifier - soft and hard
•Decision boundary
•Training sample
•Probability of error
A Pattern Recognition Paradigm
Texture Discrimination
Shape Discrimination
Optical Character Recognition
Face Recognition & Discrimination
Are They From the Same Person?
Statistical Pattern recognition
Outline
• Basic Probability Theory
• Bayesian Decision Theory
Probability theory
Probability is a mathematical model to help us study
physical systems in an ‘average’ sense
Kinds of probability
• Classical: Ratio of favorable to the total outcomes
NE
P( E )
N
• Relative Frequency: Measure of frequency of occurrence
NE
P ( E ) lim
N N
• Axiomatic theory of probability
Probability Theory
Conditional Probability: The probability of B given A is
P( AB)
P( B | A) , P( A) 0
P( A)
P( AB) P( BA)
or , P( A | B) P( B) P( B | A) P( A)
P( B | A) P( A) Bayes theorem
or , P( A | B)
P( B)
• Unconditional Probability: A1,A2,…,AC be mutually exclusive events such
that C A then for any event B,
i C
P( B) P( B | Ai ) P( Ai )
i 1
i 1
P( B | A j ) P( A j )
then P( A j | B) C
P( B | A ) P( A )
i 1
i i
Random variables
• Expected Value E[ X ] xf X ( x)dx
• Conditional Expectation E[ X | B ] xf X |B ( x)dx
• Moments mi j E[( X E[ X ]) i (Y E[Y ]) j ]
• Variance
Var[ X ] E[( X E[ X ]) 2 ]
cov( X , Y ) E[( X E[ X ])(Y E[Y ])]
• Covariance
E[ XY ] E[ X ]E[Y ]
Joint Random Variables
• X and Y are random variables defined on the same sample
space
Joint distribution function is given by
FXY ( x, y) P( X x, Y y)
Joint probability density function is given by
d2
f XY ( x, y ) FXY ( x, y )
dxdy
x y
FXY ( x, y ) f
XY ( x, y )dxdy
Marginal Density Functions
d
f X ( x)
( X x) ( X x) (Y )
FX ( x)
dx
d
FX ( x,)
dx
d
x
FX ( x) P( X x)
f XY ( x, y )dy dx
dx
P( X x, Y )
f X ( x) f XY ( x, y )dy
FXY ( x,)
Similarly , fY ( x)
f XY ( x, y )dx
Bayesian Decision Theory
• Consider C classes w1,…wC, with a priori probabilities
P(w1),…P(wC), assumed known
• To minimize the error probability, with no extra
information, we would assign a pattern to class wj if
P( w j ) P( wk ), k 1,..., C ; k j
Bayesian Decision Theory
• If we have an observation vector x, considered to be a random
variable whose distribution is given by p(x|w), then assign x to
class wj if
P( w j | x) P( wk | x), k 1,..., C; k j MAP rule
p(x | w j ) P( w j ) p(x | wk ) P( wk )
or ,
P ( x) P ( x)
or , p(x | w j ) P( w j ) p(x | wk ) P( wk ), k 1,..., C; k j
For 2 class case, the decision rule is
p(x | w1 ) P( w2 )
L( x) x w1 Likelihood Ratio
p(x | w2 ) P( w1 )
Bayesian Decision Theory
Likelihood Ratio Test 0.16
p(x|w1)P(w1)
P(x|w1)=N(0,1) 0.12 p(x|w2)P(w2)
P(x|w2) = 0.6N(1,1) + 0.4N(-1,2) 0.08
P(w1) = P(w2) = 0.5 0.04
0
-4 -3 -2 -1 0 1 2 3 4
x
1.6
1.2 P(w2) /P(w1)
1.0
0.8
0.4 L(x)
0
-4 -3 -2 -1 0 1 2 3 4
x
Probability of error
C
Probability of error P(e | x) P( wi | x) 1 P( w j | x)
i 1
i j
Minimized when
P(wj|x) is maximum
The average probability of error is
P (e) P(e | x) P(x)dx
For every x, we ensure that P(e|x) is minimum so that the integral must be as
small as possible
Conditional Risk & Bayes’ Risk
• Loss Measure of the cost of making an error
(ai | w j ) cost of assigning a pattern x to wi when x w j
• Conditional Risk
The overall risk in choosing action ai so that it is minimum for every x is
C
R(ai | x) (ai | w j ) P( wi | x)
i 1
C 0, i j
P ( w j | x) assuming (ai | w j )
j i 1, i j
1 P( wi | x)
To minimize the average probability of error, choose i that maximizes the
posteriori probability P(wi|x). If a is chosen such that for every x the
overall risk is minimized and the resulting minimum overall risk is called
the Bayes’ risk.
Bayes decision rule - Reject option
• Partition the sample space into 2 regions
R {x | (1 max P( wi | x) t}
i
A {x | (1 max P( wi | x) t} where t is a threshold
i
1
0. P(w1|x) Pr obability of rejection
t R is empty when
0.
9 r (t ) p( x)dx
0.
8 1
1-t 0.
7 1R t
0.5
6 error rate Cis
0. C 1
0.3
4 e(t ) or(1, max
t p( wi | x)) p(x)dx
A
i C
0. P(w2|x)
0.1
2
0
-4 -3 A -2 -1 0 1 2 3 4
R A x