Pattern Recognition
Fisher’s Discriminant Analysis
Dr. Subrata Datta
Dept. of AIML
NSEC
Introduction
• Linear discriminant analysis (LDA), normal discriminant analysis (NDA),
or discriminant function analysis is a generalization of Fisher's linear
discriminant analysis.
• First, in 1936 Fisher formulated linear discriminant for two classes, and
later on, in 1948 C.R Rao generalized it for multiple classes.
• It is a method used to find a linear combination of features that separates
two or more classes of objects or events.
• LDA is a supervised learning technique.
• LDA is closely related to analysis of variance (ANOVA) and regression
analysis.
• LDA is also closely related to principal component analysis (PCA)
and factor analysis.
• LDA projects data onto a lower-dimensional space that maximizes the
separation between the classes.
Goal of LDA
• The goal of LDA is to project the features in higher dimensional space onto
a lower-dimensional space in order to avoid the curse of dimensionality and
also reduce resources and dimensional costs.
Benefits of LDA
• Logistic Regression is one of the most popular linear classification models
that perform well for binary classification but falls short in the case of
multiple classification problems with well-separated classes. While LDA
handles these quite efficiently.
• LDA can also be used in data preprocessing to reduce the number of
features just as PCA which reduces the computing cost significantly.
• LDA is also used in face detection algorithms. In Fisher faces LDA is used
to extract useful data from different faces. Coupled with eigen faces it
produces effective results.
Limitations of LDA
• Linear decision boundaries may not effectively separate non-linearly
separable classes. More flexible boundaries are desired.
• In cases where the number of observations exceeds the number of features,
LDA might not perform as desired. This is called Small Sample Size (SSS)
problem. Regularization is required.
How LDA works?
Step 1 - Compute the within-class scatter matrix.
Step 2 - Compute the between-class scatter matrix.
Step 3 - Find the projection vectors.
Step 4 -Dimension reduction.
Problem 1
• Consider the 2 classes such as C1 and C2. The
data points in C1 are (4,1), (2,4), (2,3), (3,6)
and (4,4). The data points in C2 are (9,10),
(6,8), (9,5), (8,7) and (10,8). Classify the data
points by using Fishers LDA.
Compute within-class scatter matrix
SW
• Formula of within-class scatter matrix:
S W S1 S 2
• S1 is the covariance matrix for class C1
• S2 is the covariance matrix for class C2.
• Formula for covariance matrix:
n n
S1 ( x m )( x m )
xC1
1 1
T
S2 ( x m )( x m )
xC 2
2 2
T
• Here, m1 and m2 are the class mean of C1 and C2 respectively.
Within-class scatter matrix contd..
Now, m1={(4+2+2+3+4)/5, (1+4+3+6+4)/5} = (3, 3.6)
And m2={(9+6+9+8+10)/5, (10+8+5+7+8)/5} = (8.4, 7.6)
Apply Manhattan distance
Point in C1 (x-m1) i.e. (x-3) (y-m1) i.e. (y-3.6)
(for x co-ordinate) (for y co-ordinate)
x1(4,1) 1 -2.6
x2(2,4) -1 0.4
x3(2,3) -1 -0.6
x4(3,6) 0 2.4
x5(4,4) 1 0.4
Within-class scatter matrix contd..
Therefore,
S1 {[1 2.6]T [1 2.6] [1 0.4]T [ 1 0.4] [ 1 0.6]T [ 1 0.6] [0 2.4]T [0 2.4] [1 0.4]T [1 0.4]}
1 1 2.6
Ex: [1 2.6] *[1 2.6] [
T
] *[1 2.6] [ ]
2.6 2.6 6.76
By this way we get, 0.8 0.4
S1 [ ]
0.4 2.6
Similarly we get for Class 2: 1.84 0.04
S2 [ ]
0.04 2.64
Therefore,
2.64 0.44
SW S1 S 2 [ ]
0.44 6.28
Between-class scatter matrix
Formula: S B (m1 m2 ) * (m1 m2 )T
5.4 29.16 21.6
SB [ ] *[ 5.4 4] [ ]
4 21.6 16
Projection vector
Let the projection vectors are V1 and V2. Therefore, projector matrix is
V [V1 V2 ]
Formula for calculation of projection vectors:
V
SW1 * S BV V SW1 * S B * SW1 * S B I SW1 * S B I 0
V
Now, 1 1 1 2.64 (0.44)
S Adj( SW ) Adj([ ])
W
det( SW ) 2.64 0.44 (0.44) 6.28
det[ ]
0.44 6.28
Projection vector contd..
det( SW ) 16.5792 0.1936 16.3856
6.28 0.44
Adj ( SW ) [ ]
0.44 2.64
1 1 6.28 0.44 0.3832 0.0268
S [ ][ ]
0.0268 0.1611
W
16.3856 0.44 2.64
1 11.89 8.81
S * SB [
W ]
5.08 3.76
Projection vector contd..
S * BV V
1
W
11.89 8.81 0 11.89 8.81
| S * B I | 0 | [
1
] [ ] | 0 | [ ] | 0
0 3.76
W
5.08 3.76 5.08
(11.89 )(3.76 ) 8.81* 5.08 0
2 15.65 0 ( 15.65) 0 15.65
Now, putting the value of λ in the equation of projection vector.
S * BV V
1
W
11.89 8.81 V 1 V1
[ ] *[ ] 15.65 *[ ]
5.08 3.76 V 2 V2
11.89 8.81 𝑉1 𝑉1
∗ [ ] = 15.65 ∗ [ ]
5.08 3.76 𝑉2 𝑉2
𝑉1 0.91
After solving, we get W = [ ] = [ ]
𝑉2 0.39
Step 4: Dimension reduction
𝑦 = 𝑊 𝑇 ∗ 𝑋 where W is the projection vector and X is the data points.
Problem 2 (LDA)
Problem 2: Factory ‘ABC’ produces rings whose qualities are measured in
terms of Curvature and Diameter. The quality control report is as follows.
Classify the rings using LDA.
Curvature Diameter Quality control report
2.95 6.63 Passed
2.53 7.79 Passed
3.57 5.65 Passed
3.16 5.47 Passed
2.58 4.46 Not Passed
2.16 6.22 Not Passed
3.27 3.52 Not Passed
KNN Classifier
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.
Example of KNN
Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use
the KNN algorithm, as it works on a similarity measure. Our KNN model will
find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why KNN Algorithm?
KNN Algorithm
Example of KNN on Iris dataset
Sl # Sepal length Sepal width Species
1 5.3 3.7 Setosa
2 5.1 3.8 Setosa
3 7.2 3.0 Virginica
4 5.4 3.4 Setosa
5 5.1 3.3 Setosa
6 5.4 3.9 Setosa
7 7.4 2.8 Virginica
8 6.1 2.8 Verscicolor
9 7.3 2.9 Viginica
10 6.0 2.7 Verscicolor
11 5.8 2.8 Virginica
12 6.3 2.3 Verscicolor
13 5.1 2.5 Verscicolor
14 6.3 2.5 Verscicolor
15 5.5 2.4 Verscicolor
Question:
Find out the species for Sepal length 5.2 and
Sepal width 3.1.
q=(5.2,3.1)
Solution
Find distance using euclidean distance measure.
d ( p1, p 2) ( x1 x 2) 2 ( y1 y 2) 2
d ( p1, q) (5.3 5.2) 2 (3.7 3.1) 2 0.608
Sl # Sepal length Sepal width Species Distance Rank
1 5.3 3.7 Setosa 0.608 3
2 5.1 3.8 Setosa 0.707 6
3 7.2 3.0 Virginica 2.002 13
4 5.4 3.4 Setosa 0.36 2
5 5.1 3.3 Setosa 0.22 1
6 5.4 3.9 Setosa 0.82 8
7 7.4 2.8 Virginica 2.22 15
8 6.1 2.8 Verscicolor 0.94 10
9 7.3 2.9 Viginica 2.1 14
10 6.0 2.7 Verscicolor 0.89 9
11 5.8 2.8 Virginica 0.67 5
12 6.3 2.3 Verscicolor 1.36 12
13 5.1 2.5 Verscicolor 0.60 4
14 6.3 2.5 Verscicolor 1.25 11
15 5.5 2.4 Verscicolor 0.75 7
• Let K=5….So find 5 nearst data points
Sepal length Sepal width Species Distance Rank
5.1 3.3 Setosa 0.22 1
5.4 3.4 Setosa 0.36 2
5.3 3.7 Setosa 0.608 3
5.1 2.5 Verscicolor 0.60 4
5.8 2.8 Virginica 0.67 5
• As the number of Setosa instance is higher than the others; hence the
species for (5.1, 3.2) is Setosa.
Problem 2 (KNN)
Consider the following table – it consists of the
height, age, and weight (target) values for 10
people. As you can see, the weight value of
ID11 is missing. We need to predict the weight
of this person based on their height and age
using KNN algorithm. Consider K=3.
Solution
Here unknown ID is ID11. So, Q(5.5, 38, z). Step 1: Find the distance of each
and every point from (5.5,38).
ID height age Distance from (5.5, 38) Rank
1 5 45
2 5.1 26
3 5.6 30
4 5.9 34
5 4.8 40
6 5.8 36
7 5.3 19
8 5.8 28
9 5.5 23
10 5.8 32
KNN problem
Height, weight and T-shirt size of some customers are given. Predict the T-shirt
size of a new customer whose height and weight are 161 cm and 61 kg
respectively using KNN classification method. Consider k=3.
Height Weight T-shirt size Height Weight T-shirt size
158 58 M 163 64 XL
158 59 M 160 64 XL
158 63 L 163 61 L
160 59 L 165 62 XL
160 60 M 165 65 XL
160 60 M 165 62 XL
163 61 L 168 65 XL
Parzen-window method
• Parzen-window is a non-parametric measure.
• It is used for density estimation.
• Parzen-window based classification refers to density-based classification.
• Density estimation in Pattern Recognition can be achieved by using the
approach of the Parzen Windows.
• Parzen window density estimation technique is a kind of generalization of
the histogram technique.
Parametric vs non-parametric measure
• Parametric methods require to know the form of the density (Gaussian,
Poisson etc.) and try to estimate the missing parameters based on known
density form. Unfortunately, it might not always be the case that the form
of the density is known. Example- Maximum likelihood, Bayesian
learning.
• On the other hand, non-parametric methods do not require any known form
of density. Example- parzen-window, KNN.
Basics of parzen-window
• An n-dimensional hypercube is considered which is assumed to possess k-
data samples.
• The length of the edge of the hypercube is assumed to be hn.
• Hence the volume of the hypercube is: Vn = hnd
• A hypercube window function, φ(u) which is an indicator function of the
unit hypercube which is centered at origin.:
φ(u) = 1 if |ui| <= 0.5
φ(u) = 0 otherwise
Here, u is a vector, u = (u1, u2, …, ud)T.
Parzen-window density function
φ= parzen-window
Vn = hnd (density of volume)
1 1 x xi
n
pn x
n 1 V hn
Mathematical problems
Problem1:There are 7 samples of a 1-dimensional object D. It is defined as
D=(2, 3, 4, 8, 10, 11, 12). Consider parzen-window width 3. Estimate the
density of the object at x=1.
Solution:
Here, n=7
1 7 1 1 xi 1 1 2 1 3 1
pn 1 ... 1 0 0 0 ...
7 1 3 3 21 3 3 21
Parzen-window for Gaussian
distribution
• Based on Gaussian function the Parzen-
window formula becomes as follows.
1 𝑛 1 (𝑥𝑖 −𝑥)^2
𝑝𝑛 𝑥 = 1 2𝜋𝜎 exp (- 2 )
𝑛 2𝜎
• Problem 2:
Given a set of five data points x1=2, x2= 2.5, x3= 3, x4= 1 and x5= 6, find
Parzen probability density function (pdf) estimates at x= 3, using the Gaussian
function with σ= 1 as window function.
Solution:
1 (𝑥1−𝑥)2 1 (2−3)^2
exp(− )= exp(- )=0.2420
√2𝜋𝜎 2𝜎 2 √2𝜋 2
By the way we get, p(x=3)=(1/5)*[0.2420+0.3521+0.3989+0.0540+0.0044] =
0.2103