Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views20 pages

LDA Slides N

Uploaded by

Divya B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views20 pages

LDA Slides N

Uploaded by

Divya B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Linear Discriminant Analysis(LDA)

Do not write anything here..


Leave this space for webcam.
PLEASE DELETE THIS BOX
AND THIS TEXT ONCE JUST
BEFORE THE RECORDING.
Learning Objectives
• Introduction
• How does LDA work?
• Applications of LDA
• Comparisons with other methods
• Summary
Introduction

What is Linear Discriminant Analysis (LDA)?

• LDA is used for classifying observations to a class or category based on


predictor (independent) variables.

• LDA creates a model to predict the classes of new or future observations.

• LDA can be used for classifying observations into more than two classes.
Here, we shall be looking at only the case of two classes.
Introduction

Linear Discriminant Analysis (LDA) :

• LDA uses linear combinations of independent variables to predict the class


of a response variable.

• LDA assumes that the independent variables are normally distributed.

• When these assumptions are satisfied, LDA creates a linear decision


boundary. (Note that LDA can perform well even when its assumptions are
violated.)

• LDA finds a linear combination of predictor variables (a Linear Discriminant


Function) that best separates the classes of the response variable.
Introduction

Linear Discriminant Function that


separates two classes (in this case
labeled red and blue).
How does LDA work?
• Let Y be the dependent variable having two levels, 0 and 1, corresponding to
two classes.

• Let X denote all the independent variables taken together, X = (X1, X2, …, Xp).

• LDA computes Prob (Y=1 | X), which is Y “given” X.

• Consequently, Prob (Y=0 | X) = 1 – Prob (Y=1 | X).

• The calculation of this posterior probability is done using Bayes Theorem for
calculating conditional probabilities.
How does LDA work?

Consider the terms on the right.

• The prior probability Prob(Y=1) is usually just the proportion of 1’s in the data.

• Prob(Y=0) = 1 - Prob(Y=1) is then the proportion of 0’s in the data.


How does LDA work?
The two conditional distributions f(X|Y=0) and f(X|Y=1) describe the behavior of all
independent variables for each of the classes, 0 and 1.

They are calculated using the following assumptions:


• f(X1, X2,…,Xp | Y=1) has a distribution in which each individual variable Xi has a
normal distribution, .

• Similarly, f(X1, X2,…Xp | Y=0) has a distribution in which each individual Xi has a
normal distribution, .
How does LDA work?
For these two distributions, LDA assumes that the means may be different, so in
general for each predictor Xi.

• That separates the two classes: the average values of the independent variables
are different.

• (For these two multivariate distributions, LDA assumes that the standard deviations
are the same, so .)

• Under these assumptions, Prob(Y=1|X) and Prob (Y=0|X) are calculated.


How does LDA work?
For both logistic regression and LDA, the coefficients mean the same thing
for classification.

• If the coefficients are large in absolute value and different from zero,
then the corresponding Xi plays a significant role in correctly classifying
observations into the classes.

• If the coefficients are small in absolute value and close to zero, then
the corresponding Xi does not play a significant role in correctly
classifying observations into classes.
How does LDA work?
At its simplest, LDA defines a classification rule as follows:

If Prob(Y=1|X) ≥ Prob(Y=0|X), then classify X=(X1,X2,…Xp) as coming from class 1


If Prob(Y=1|X) < Prob(Y=0|X), then classify X=(X1,X2,…Xp) as coming from class 0

• In other words, classify observations into the more likely class.

• This rule is equivalent to classifying as coming from class 1 if Prob(Y=1|X) ≥ 0.5


How does LDA work?
• Under the assumptions used in the conditional probability calculations using the Bayes
theorem, it can be shown (we omit the math) that the rule Prob (Y=1 | X) ≥ 0.5 is
equivalent to:

β0 + β1X1 + β2X2 + β3X3 + ... + βpXp ≥ 0


for some coefficients β0, β1, ... βp.

• β0 + β1X1 + β2X2 + β3X3 + ... + βpXp is the Linear Discriminant Function,

• It is treated as a score for evaluating which class an observation falls in, or to predict
the class of a new observation (based on whether the value of the Linear
Discriminant Function is positive or negative).
How does LDA work?
• Note that logistic regression also works in a similar way and we can see
the equivalence of the probability rule and the linear combination rule
there as well.

• For logistic regression,

• The condition is the same as the condition


How does LDA work?
Standardized coefficients
• Standardized coefficients are those that are calculated from standardized
original variables/predictors

• They are unit free, like the standardized variables themselves.

• Standardized coefficients represent each independent variable’s weight in


the linear discriminant function.

• The larger the standardized beta coefficient in absolute magnitude, the


larger the respective variable’s own contribution to the discrimination as
specified by the Linear Discriminant Function.
How does LDA work?

Linear Discriminant Function that


separates two classes (in this
case labeled red and blue).

-30 + 2*X1 + 3*X2 ≥ 0


classify as red

-30 + 2*X1 + 3*X2 < 0


classify as blue

The equation of the line is


X2 = 10 - (2/3)*X1
How does LDA work?
Input predictor variable values
X1, X2, … Xp

Compute linear discriminant function Compute posterior probability


LDF = β0 + β1X1 + β2X2 + β3X3 + ... + βpXp P = Prob(Y|X1,X2,…Xp)

LDF >= 0 Classify Y as 1 P >= 0.5 Classify Y as 1

LDF < 0 Classify Y as 0 P < 0.5 Classify Y as 0


Applications of LDA

Applications of LDA include:

• Identification of types of customers who are likely to cancel a subscription or


buy a new subscription for a magazine.

• Pattern recognition. For example, to distinguish objects, faces, cars, animals,


etc. based on image features.

• Risk assessment. For example, to distinguish between companies that are


likely to default on loans and companies that are financially healthy.
Comparisons with other methods
Dependent Variable Independent Variables
ANOVA Numeric Categorical

Linear Regression Numeric Numeric and/or Categorical

Logistic Regression Categorical Numeric and/or Categorical

LDA Categorical Numeric

• Note: LDA is also sometimes done with categorical X’s, by converting them to dummy variables. It has
been found that LDA is capable of classifying well in that situation too.
Comparisons with other methods
Principal Component Analysis Linear Discriminant Analysis (LDA)
(PCA)
Kind of Machine Learning Unsupervised Learning Supervised Learning

Statistical Output Linear Combinations of Variables Linear Combinations of Variables

Objective of Output Maximize Variability of Linear Use Linear Combination to


Combination Separate Two Classes

Other Objective of Output Dimension Reduction Dimension Reduction


Summary
• LDA is a classification technique used in Statistics and Machine Learning
with applications to classification and pattern recognition.

• LDA model uses Bayes Theorem to estimate probabilities and predict


classes for new observations.

• LDA, like logistic regression, expresses a binary dependent variable as a


linear combination of independent variables.

• In a given classification problem, we can try both logistic regression and


LDA and see which one performs better.

• Another goal of LDA is to project the input features in higher-


dimensional space onto a one-dimensional space represented by a
discriminant function. Thus, it is also a dimension reduction technique.

You might also like