0% found this document useful (0 votes)

84 views37 pages

LDA & KNN: Advanced Analysis Guide

Here are the steps to classify this new data point using KNN on the Iris dataset: 1. Calculate the distance between the new data point (5.3, 3.7) and all data points in the training set. Common distance metrics are Euclidean distance, Manhattan distance, etc. 2. Sort the distances in increasing order. Pick the first K neighbors who have the shortest distance. Typically K=5. 3. Among the K nearest neighbors, count the number of data points labeled as each class (Iris-setosa, Iris-versicolor, Iris-virginica). 4. Assign the new data point to the class that has the highest count. 5. In this example

Uploaded by

Arunima Dolui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views37 pages

LDA & KNN: Advanced Analysis Guide

Uploaded by

Arunima Dolui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Pattern Recognition

Fisher’s Discriminant Analysis

Dr. Subrata Datta

Dept. of AIML
NSEC
Introduction
• Linear discriminant analysis (LDA), normal discriminant analysis (NDA),
or discriminant function analysis is a generalization of Fisher's linear
discriminant analysis.
• First, in 1936 Fisher formulated linear discriminant for two classes, and
later on, in 1948 C.R Rao generalized it for multiple classes.
• It is a method used to find a linear combination of features that separates
two or more classes of objects or events.
• LDA is a supervised learning technique.
• LDA is closely related to analysis of variance (ANOVA) and regression
analysis.
• LDA is also closely related to principal component analysis (PCA)
and factor analysis.
• LDA projects data onto a lower-dimensional space that maximizes the
separation between the classes.
Goal of LDA
• The goal of LDA is to project the features in higher dimensional space onto
a lower-dimensional space in order to avoid the curse of dimensionality and
also reduce resources and dimensional costs.
Benefits of LDA
• Logistic Regression is one of the most popular linear classification models
that perform well for binary classification but falls short in the case of
multiple classification problems with well-separated classes. While LDA
handles these quite efficiently.
• LDA can also be used in data preprocessing to reduce the number of
features just as PCA which reduces the computing cost significantly.
• LDA is also used in face detection algorithms. In Fisher faces LDA is used
to extract useful data from different faces. Coupled with eigen faces it
produces effective results.
Limitations of LDA
• Linear decision boundaries may not effectively separate non-linearly
separable classes. More flexible boundaries are desired.
• In cases where the number of observations exceeds the number of features,
LDA might not perform as desired. This is called Small Sample Size (SSS)
problem. Regularization is required.
How LDA works?
Step 1 - Compute the within-class scatter matrix.
Step 2 - Compute the between-class scatter matrix.
Step 3 - Find the projection vectors.
Step 4 -Dimension reduction.
Problem 1
• Consider the 2 classes such as C1 and C2. The
data points in C1 are (4,1), (2,4), (2,3), (3,6)
and (4,4). The data points in C2 are (9,10),
(6,8), (9,5), (8,7) and (10,8). Classify the data
points by using Fishers LDA.
Compute within-class scatter matrix
SW

• Formula of within-class scatter matrix:

S W  S1  S 2
• S1 is the covariance matrix for class C1
• S2 is the covariance matrix for class C2.
• Formula for covariance matrix:
n n
S1   ( x  m )( x  m )
xC1
1 1
T
S2   ( x  m )( x  m )
xC 2
2 2
T

• Here, m1 and m2 are the class mean of C1 and C2 respectively.

Within-class scatter matrix contd..
Now, m1={(4+2+2+3+4)/5, (1+4+3+6+4)/5} = (3, 3.6)
And m2={(9+6+9+8+10)/5, (10+8+5+7+8)/5} = (8.4, 7.6)
Apply Manhattan distance

Point in C1 (x-m1) i.e. (x-3) (y-m1) i.e. (y-3.6)

(for x co-ordinate) (for y co-ordinate)
x1(4,1) 1 -2.6
x2(2,4) -1 0.4
x3(2,3) -1 -0.6
x4(3,6) 0 2.4
x5(4,4) 1 0.4
Within-class scatter matrix contd..
Therefore,
S1  {[1  2.6]T [1  2.6]  [1 0.4]T [ 1 0.4]  [ 1  0.6]T [ 1  0.6]  [0 2.4]T [0 2.4]  [1 0.4]T [1 0.4]}

1 1  2.6
Ex: [1  2.6] *[1  2.6]  [
T
] *[1  2.6]  [ ]
 2.6  2.6 6.76

By this way we get, 0.8  0.4

S1  [ ]
 0.4 2.6

Similarly we get for Class 2: 1.84  0.04

S2  [ ]
 0.04 2.64
Therefore,
2.64  0.44
SW  S1  S 2  [ ]
 0.44 6.28
Between-class scatter matrix
Formula: S B  (m1  m2 ) * (m1  m2 )T

 5.4 29.16 21.6

SB  [ ] *[ 5.4  4]  [ ]
4 21.6 16
Projection vector
Let the projection vectors are V1 and V2. Therefore, projector matrix is

V  [V1 V2 ]
Formula for calculation of projection vectors:
V
SW1 * S BV  V  SW1 * S B   *  SW1 * S B  I  SW1 * S B  I  0
V

Now, 1 1 1 2.64  (0.44)

S  Adj( SW )  Adj([ ])
W
det( SW ) 2.64  0.44  (0.44) 6.28
det[ ]
 0.44 6.28
Projection vector contd..

det( SW )  16.5792  0.1936  16.3856

6.28 0.44
Adj ( SW )  [ ]
0.44 2.64

1 1 6.28 0.44 0.3832  0.0268

S  [ ][ ]
 0.0268 0.1611
W
16.3856 0.44 2.64

1 11.89 8.81
S * SB  [
W ]
5.08 3.76
Projection vector contd..
S * BV  V
1
W

11.89 8.81  0 11.89   8.81

| S * B  I | 0 | [
1
] [ ] | 0 | [ ] | 0
0  3.76  
W
5.08 3.76 5.08

 (11.89   )(3.76   )  8.81* 5.08  0

 2  15.65  0   (  15.65)  0    15.65

Now, putting the value of λ in the equation of projection vector.

S * BV  V
1
W

11.89 8.81 V 1 V1
[ ] *[ ]  15.65 *[ ]
5.08 3.76 V 2 V2
11.89 8.81 𝑉1 𝑉1
∗ [ ] = 15.65 ∗ [ ]
5.08 3.76 𝑉2 𝑉2
𝑉1 0.91
After solving, we get W = [ ] = [ ]
𝑉2 0.39

Step 4: Dimension reduction

𝑦 = 𝑊 𝑇 ∗ 𝑋 where W is the projection vector and X is the data points.
Problem 2 (LDA)
Problem 2: Factory ‘ABC’ produces rings whose qualities are measured in
terms of Curvature and Diameter. The quality control report is as follows.
Classify the rings using LDA.

Curvature Diameter Quality control report

2.95 6.63 Passed
2.53 7.79 Passed
3.57 5.65 Passed
3.16 5.47 Passed
2.58 4.46 Not Passed
2.16 6.22 Not Passed
3.27 3.52 Not Passed
KNN Classifier
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.
Example of KNN
Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use
the KNN algorithm, as it works on a similarity measure. Our KNN model will
find the similar features of the new data set to the cats and dogs images and
based on the most similar features it will put it in either cat or dog category.
Why KNN Algorithm?
KNN Algorithm
Example of KNN on Iris dataset
Sl # Sepal length Sepal width Species

1 5.3 3.7 Setosa

2 5.1 3.8 Setosa

3 7.2 3.0 Virginica

4 5.4 3.4 Setosa

5 5.1 3.3 Setosa

6 5.4 3.9 Setosa

7 7.4 2.8 Virginica

8 6.1 2.8 Verscicolor

9 7.3 2.9 Viginica

10 6.0 2.7 Verscicolor

11 5.8 2.8 Virginica

12 6.3 2.3 Verscicolor

13 5.1 2.5 Verscicolor

14 6.3 2.5 Verscicolor

15 5.5 2.4 Verscicolor

Question:
Find out the species for Sepal length 5.2 and
Sepal width 3.1.
q=(5.2,3.1)
Solution
Find distance using euclidean distance measure.
d ( p1, p 2)  ( x1  x 2) 2  ( y1  y 2) 2

d ( p1, q)  (5.3  5.2) 2  (3.7  3.1) 2  0.608

Sl # Sepal length Sepal width Species Distance Rank

1 5.3 3.7 Setosa 0.608 3

2 5.1 3.8 Setosa 0.707 6

3 7.2 3.0 Virginica 2.002 13

4 5.4 3.4 Setosa 0.36 2

5 5.1 3.3 Setosa 0.22 1

6 5.4 3.9 Setosa 0.82 8

7 7.4 2.8 Virginica 2.22 15

8 6.1 2.8 Verscicolor 0.94 10

9 7.3 2.9 Viginica 2.1 14

10 6.0 2.7 Verscicolor 0.89 9

11 5.8 2.8 Virginica 0.67 5

12 6.3 2.3 Verscicolor 1.36 12

13 5.1 2.5 Verscicolor 0.60 4

14 6.3 2.5 Verscicolor 1.25 11

15 5.5 2.4 Verscicolor 0.75 7

• Let K=5….So find 5 nearst data points

Sepal length Sepal width Species Distance Rank

5.1 3.3 Setosa 0.22 1

5.4 3.4 Setosa 0.36 2

5.3 3.7 Setosa 0.608 3

5.1 2.5 Verscicolor 0.60 4

5.8 2.8 Virginica 0.67 5

• As the number of Setosa instance is higher than the others; hence the
species for (5.1, 3.2) is Setosa.
Problem 2 (KNN)
Consider the following table – it consists of the
height, age, and weight (target) values for 10
people. As you can see, the weight value of
ID11 is missing. We need to predict the weight
of this person based on their height and age
using KNN algorithm. Consider K=3.
Solution
Here unknown ID is ID11. So, Q(5.5, 38, z). Step 1: Find the distance of each
and every point from (5.5,38).
ID height age Distance from (5.5, 38) Rank
1 5 45
2 5.1 26
3 5.6 30
4 5.9 34
5 4.8 40
6 5.8 36
7 5.3 19
8 5.8 28
9 5.5 23
10 5.8 32
KNN problem
Height, weight and T-shirt size of some customers are given. Predict the T-shirt
size of a new customer whose height and weight are 161 cm and 61 kg
respectively using KNN classification method. Consider k=3.

Height Weight T-shirt size Height Weight T-shirt size

158 58 M 163 64 XL
158 59 M 160 64 XL
158 63 L 163 61 L
160 59 L 165 62 XL
160 60 M 165 65 XL
160 60 M 165 62 XL
163 61 L 168 65 XL
Parzen-window method
• Parzen-window is a non-parametric measure.
• It is used for density estimation.
• Parzen-window based classification refers to density-based classification.
• Density estimation in Pattern Recognition can be achieved by using the
approach of the Parzen Windows.
• Parzen window density estimation technique is a kind of generalization of
the histogram technique.
Parametric vs non-parametric measure
• Parametric methods require to know the form of the density (Gaussian,
Poisson etc.) and try to estimate the missing parameters based on known
density form. Unfortunately, it might not always be the case that the form
of the density is known. Example- Maximum likelihood, Bayesian
learning.
• On the other hand, non-parametric methods do not require any known form
of density. Example- parzen-window, KNN.
Basics of parzen-window
• An n-dimensional hypercube is considered which is assumed to possess k-
data samples.
• The length of the edge of the hypercube is assumed to be hn.
• Hence the volume of the hypercube is: Vn = hnd
• A hypercube window function, φ(u) which is an indicator function of the
unit hypercube which is centered at origin.:
φ(u) = 1 if |ui| <= 0.5
φ(u) = 0 otherwise
Here, u is a vector, u = (u1, u2, …, ud)T.
Parzen-window density function
φ= parzen-window
Vn = hnd (density of volume)

1 1  x  xi 
n
pn x      
n 1 V  hn 
Mathematical problems
Problem1:There are 7 samples of a 1-dimensional object D. It is defined as
D=(2, 3, 4, 8, 10, 11, 12). Consider parzen-window width 3. Estimate the
density of the object at x=1.
Solution:
Here, n=7

1 7 1  1  xi  1   1  2   1  3   1
pn 1              ...  1  0  0  0  ...
7 1 3  3  21   3   3   21
Parzen-window for Gaussian
distribution
• Based on Gaussian function the Parzen-
window formula becomes as follows.
1 𝑛 1 (𝑥𝑖 −𝑥)^2
𝑝𝑛 𝑥 = 1 2𝜋𝜎 exp (- 2 )
𝑛 2𝜎
• Problem 2:
Given a set of five data points x1=2, x2= 2.5, x3= 3, x4= 1 and x5= 6, find
Parzen probability density function (pdf) estimates at x= 3, using the Gaussian
function with σ= 1 as window function.

Solution:
1 (𝑥1−𝑥)2 1 (2−3)^2
exp⁡(− )= exp(- )=0.2420
√2𝜋𝜎 2𝜎 2 √2𝜋 2
By the way we get, p(x=3)=(1/5)*[0.2420+0.3521+0.3989+0.0540+0.0044] =
0.2103

Pneumatic Conveying System Design Calculation: Input Parameters Unit Value
67% (3)
Pneumatic Conveying System Design Calculation: Input Parameters Unit Value
6 pages
Pattern Recognition
No ratings yet
Pattern Recognition
461 pages
Freebitcoin.in Bot Script v4.22
No ratings yet
Freebitcoin.in Bot Script v4.22
7 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Supervised Learning Techniques
No ratings yet
Supervised Learning Techniques
33 pages
Goal-Setting Reinvented: Life Mastery Week 3
100% (3)
Goal-Setting Reinvented: Life Mastery Week 3
60 pages
Assignment 3 B
No ratings yet
Assignment 3 B
7 pages
Data Warehouse Developer Resume
No ratings yet
Data Warehouse Developer Resume
7 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Visual Recognition
No ratings yet
Visual Recognition
123 pages
Crypto Primer
No ratings yet
Crypto Primer
28 pages
Unit 4 ML
No ratings yet
Unit 4 ML
18 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Course MDA-12
No ratings yet
Course MDA-12
48 pages
2 KNN
No ratings yet
2 KNN
67 pages
Warftpd 1 82 Tutorial
100% (2)
Warftpd 1 82 Tutorial
94 pages
04 Unit-Iv - ML
No ratings yet
04 Unit-Iv - ML
23 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Setback
No ratings yet
Setback
3 pages
Pythonlabmanual
No ratings yet
Pythonlabmanual
26 pages
Linear-Classifier Lecture 4 SLB
No ratings yet
Linear-Classifier Lecture 4 SLB
36 pages
Test 9: (A) The Woman Is Reading The Newspaper
No ratings yet
Test 9: (A) The Woman Is Reading The Newspaper
36 pages
Circuit-Breaker Selection Guide
No ratings yet
Circuit-Breaker Selection Guide
114 pages
05 KNN
No ratings yet
05 KNN
49 pages
KNN & Decision Tree Basics
No ratings yet
KNN & Decision Tree Basics
9 pages
Statistical Machine Learning Assignment
No ratings yet
Statistical Machine Learning Assignment
5 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
K-Nearest Neighbors Classifiers 2025
No ratings yet
K-Nearest Neighbors Classifiers 2025
33 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
Production Planning and Control
No ratings yet
Production Planning and Control
5 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
Incomplete 1
No ratings yet
Incomplete 1
9 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
Unit 3
No ratings yet
Unit 3
100 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
K-NN Numerical N Theory
No ratings yet
K-NN Numerical N Theory
5 pages
Oath of Office (CS Form No. 32) (2 Original) W/ Documentary Stamp Oath of Office (CS Form No. 32) (2 Original) W/ Documentary Stamp
No ratings yet
Oath of Office (CS Form No. 32) (2 Original) W/ Documentary Stamp Oath of Office (CS Form No. 32) (2 Original) W/ Documentary Stamp
1 page
kNN Algorithm in Data Science Lecture
No ratings yet
kNN Algorithm in Data Science Lecture
8 pages
9.introduction To Artificial Intelligence
No ratings yet
9.introduction To Artificial Intelligence
14 pages
User Guide: IBM Micromedex® Clinical Knowledge Suite IBM Micromedex® Neofax® and Pediatrics
No ratings yet
User Guide: IBM Micromedex® Clinical Knowledge Suite IBM Micromedex® Neofax® and Pediatrics
57 pages
10 October 1998
No ratings yet
10 October 1998
100 pages
KNN Classifier for Data Scientists
No ratings yet
KNN Classifier for Data Scientists
16 pages
Unit 4
No ratings yet
Unit 4
24 pages
Evaluation of K Nearest Neighbour Classifier Performance For Heterogeneous Data Sets
No ratings yet
Evaluation of K Nearest Neighbour Classifier Performance For Heterogeneous Data Sets
15 pages
12 - 23ECE216 - Nearest Neighbors
No ratings yet
12 - 23ECE216 - Nearest Neighbors
29 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Pipe Thread NPT and BSPT Fittings Compatibility
No ratings yet
Pipe Thread NPT and BSPT Fittings Compatibility
5 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Module 4 - Supervised and Unsupervised Learning Techniques
No ratings yet
Module 4 - Supervised and Unsupervised Learning Techniques
52 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
Kenr8662kenr8662 Sis PDF
No ratings yet
Kenr8662kenr8662 Sis PDF
2 pages
Mlfa Autumn 22 Lec 03
No ratings yet
Mlfa Autumn 22 Lec 03
61 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
No ratings yet
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
22 pages
Memory Decoding for Engineers
No ratings yet
Memory Decoding for Engineers
19 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Machine Learning Homework Solutions
No ratings yet
Machine Learning Homework Solutions
8 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
AEM 6 Architect EG
No ratings yet
AEM 6 Architect EG
13 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
22 pages
Workbook of Pattern Recognition
No ratings yet
Workbook of Pattern Recognition
11 pages
S-Class: View Price List View Offers New C-Class Mercedes-Benz Magazine
No ratings yet
S-Class: View Price List View Offers New C-Class Mercedes-Benz Magazine
88 pages
Siemens Washing Machine Guide
No ratings yet
Siemens Washing Machine Guide
8 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
INTERVIEWS
No ratings yet
INTERVIEWS
2 pages
Pattern: Recognition
No ratings yet
Pattern: Recognition
25 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Space Mouse: 3D Controller Seminar
No ratings yet
Space Mouse: 3D Controller Seminar
33 pages
Basic Ultrasound, Echocardiography and Doppler Ultrasound
No ratings yet
Basic Ultrasound, Echocardiography and Doppler Ultrasound
43 pages
Production Function
No ratings yet
Production Function
4 pages
Kukatpally, Hyderabad - 500 085,: Jawaharlal Nehru Technological University Hyderabad
No ratings yet
Kukatpally, Hyderabad - 500 085,: Jawaharlal Nehru Technological University Hyderabad
3 pages
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
No ratings yet
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
5 pages
Apriori Algorithm & Assoc. Rules
No ratings yet
Apriori Algorithm & Assoc. Rules
26 pages
Contra-Rotating Propellers
No ratings yet
Contra-Rotating Propellers
27 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Data Mining Algorithms Comparison
No ratings yet
Data Mining Algorithms Comparison
32 pages
Design of A Low Cost Emg Amplifier With Discreet
No ratings yet
Design of A Low Cost Emg Amplifier With Discreet
5 pages
Shafer™ RV-Series Rotary Vane Valve Actuators
No ratings yet
Shafer™ RV-Series Rotary Vane Valve Actuators
8 pages
03 - Classification PDF
No ratings yet
03 - Classification PDF
92 pages
RS9 GPL 4 HN JNJ O1 Dy
No ratings yet
RS9 GPL 4 HN JNJ O1 Dy
14 pages
Monk - Way of The Bullet (BBB)
No ratings yet
Monk - Way of The Bullet (BBB)
1 page
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
Energy Series Energy Series
No ratings yet
Energy Series Energy Series
8 pages
BUCK - What Is Cyberspace
No ratings yet
BUCK - What Is Cyberspace
3 pages
Speak & Spell: Ed-Tech Evolution
No ratings yet
Speak & Spell: Ed-Tech Evolution
8 pages