Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views52 pages

ClassificationMethods LogisticRegression

The document discusses logistic regression as a classification method, highlighting its application in predicting customer preferences and credit card defaults. It explains why linear regression is inappropriate for categorical outcomes and introduces the logistic function to model probabilities between 0 and 1. The document also covers interpreting coefficients, making predictions, and using qualitative predictors in multiple logistic regression.

Uploaded by

ddiya.2610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views52 pages

ClassificationMethods LogisticRegression

The document discusses logistic regression as a classification method, highlighting its application in predicting customer preferences and credit card defaults. It explains why linear regression is inappropriate for categorical outcomes and introduces the logistic function to model probabilities between 0 and 1. The document also covers interpreting coefficients, making predictions, and using qualitative predictors in multiple logistic regression.

Uploaded by

ddiya.2610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Dr.

Trilok Nath Pandey, VIT 1

CLASSIFICATION METHODS

Dr.Trilok Nath Pandey


SCOPE,VIT,Chennai
Dr.Trilok Nath Pandey, VIT 2

LOGISTIC REGRESSION
Dr.Trilok Nath Pandey, VIT 3

Outline
Cases:
 Orange Juice Brand Preference
 Credit Card Default Data
Why Not Linear Regression?
Simple Logistic Regression
 Logistic Function
 Interpreting the coefficients
 Making Predictions
 Adding Qualitative Predictors
Multiple Logistic Regression
Dr.Trilok Nath Pandey, VIT 4

Case 1: Brand Preference for Orange


Juice
• We would like to predict what customers prefer to buy:
Citrus Hill or Minute Maid orange juice?(OJ data in ISLR)

• The Y (Purchase) variable is categorical: 0 or 1

• The X (LoyalCH) variable is a numerical value (between 0


and 1) which specifies the how much the customers are
loyal to the Citrus Hill (CH) orange juice

• Can we use Linear Regression when Y is categorical?


Dr.Trilok Nath Pandey, VIT 5

Why not Linear Regression?


When Y only takes on values of 0 and 1, why
standard linear regression is inappropriate?

0.9
How do we
0.7 interpret values
Purchase

0.5 greater than 1?


0.4

0.2

0.0 How do we interpret


.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 values of Y between 0
LoyalCH
and 1?
Dr.Trilok Nath Pandey, VIT 6

Problems
• The regression line 0+1X can take on any value
between negative and positive infinity

• In the orange juice classification problem, Y can only take


on two possible values: 0 or 1.

• Therefore the regression line almost always predicts the


wrong value for Y in classification problems
Dr.Trilok Nath Pandey, VIT 7

Solution: Use Logistic Function


• Instead of trying to predict Y, let’s try to predict P(Y = 1),
i.e., the probability a customer buys Citrus Hill (CH) juice.
• Thus, we can model P(Y = 1) using a function that gives
outputs between 0 and 1.
• We can use the logistic function 1

• Logistic Regression! 0.9


0.8
0.7

Probability
0.6
 0  1 X
e 0.5

p  P(Y  1) 
0.4
0.3
 0  1 X
1 e 0.2
0.1
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
X
Dr.Trilok Nath Pandey, VIT 8

Logistic Regression
• Logistic regression is very 1
similar to linear
MM
regression 0.75

P(Purchase)
• We come up with b0 and
0.5
b1 to estimate 0 and 1.
CH
• We have similar problems 0.25
and questions as in linear
regression 0
.0 .2 .4 .6 .7 .9
• e.g. Is 1 equal to 0? How
LoyalCH
sure are we about our
guesses for 0 and 1?

If LoyalCH is about .6 then Pr(CH)  .7.


Dr.Trilok Nath Pandey, VIT 9

Case 2: Credit Card Default Data


We would like to be able to predict customers that are
likely to default

Possible X variables are:


 Annual Income
 Monthly credit card balance

The Y variable (Default) is categorical: Yes or No

How do we check the relationship between Y and X?


Dr.Trilok Nath Pandey, VIT 10

The Default Dataset


Dr.Trilok Nath Pandey, VIT 11

Why not Linear Regression?


If we fit a linear regression
to the Default data, then
for very low balances we
predict a negative
probability, and for high
balances we predict a
probability above 1!

When Balance < 500,


Pr(default) is negative!
Dr.Trilok Nath Pandey, VIT 12

Logistic Function on Default Data


• Now the probability
of default is close
to, but not less
than zero for low
balances. And
close to but not
above 1 for high
balances
Dr.Trilok Nath Pandey, VIT 13

Interpreting 1
• Interpreting what 1 means is not very easy with logistic
regression, simply because we are predicting P(Y) and
not Y.
• If 1 =0, this means that there is no relationship between
Y and X.
• If 1 >0, this means that when X gets larger so does the
probability that Y = 1.
• If 1 <0, this means that when X gets larger, the
probability that Y = 1 gets smaller.
• But how much bigger or smaller depends on where we
are on the slope
Dr.Trilok Nath Pandey, VIT 14

Are the coefficients significant?


• We still want to perform a hypothesis test to see whether
we can be sure that are 0 and 1 significantly different
from zero.
• We use a Z test instead of a T test, but of course that
doesn’t change the way we interpret the p-value
• Here the p-value for balance is very small, and b1 is
positive, so we are sure that if the balance increase, then
the probability of default will increase as well.
Dr.Trilok Nath Pandey, VIT 15

Making Prediction
• Suppose an individual has an average balance of $1000.
What is their probability of default?

• The predicted probability of default for an individual with a


balance of $1000 is less than 1%.

• For a balance of $2000, the probability is much higher,


and equals to 0.586 (58.6%).
Dr.Trilok Nath Pandey, VIT 16

Qualitative Predictors in Logistic


Regression
• We can predict if an individual default by checking if she is
a student or not. Thus we can use a qualitative variable
“Student” coded as (Student = 1, Non-student =0).
• b1 is positive: This indicates students tend to have higher
default probabilities than non-students
Dr.Trilok Nath Pandey, VIT 17

Multiple Logistic Regression


• We can fit multiple logistic just like regular regression
Dr.Trilok Nath Pandey, VIT 18

Multiple Logistic Regression- Default Data


• Predict Default using:
• Balance (quantitative)
• Income (quantitative)
• Student (qualitative)
Dr.Trilok Nath Pandey, VIT 19

Predictions
• A student with a credit card balance of $1,500 and an
income of $40,000 has an estimated probability of default
Dr.Trilok Nath Pandey, VIT 20

An Apparent Contradiction!

Positive

Negative
Dr.Trilok Nath Pandey, VIT 21

Students (Orange) vs. Non-students


(Blue)
Dr.Trilok Nath Pandey, VIT 22

To whom should credit be offered?


• A student is risker than non students if no information
about the credit card balance is available

• However, that student is less risky than a non student with


the same credit card balance!
Dr.Trilok Nath Pandey, VIT 23

Logistic Regression in Machine Learning


• Logistic regression is Supervised Learning technique. It is
used for predicting the categorical dependent variable using a
given set of independent variables.
• Logistic regression predicts the output of a categorical
dependent variable. Therefore the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and
1.
• Logistic Regression is much similar to the Linear Regression
except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit
an "S" shaped logistic function
Dr.Trilok Nath Pandey, VIT 24

Logistic Regression in Machine Learning

• Logistic Function (Sigmoid Function):


• The sigmoid function is a mathematical function used to
map the predicted values to probabilities.
• It maps any real value into another value within a range of
0 and 1.
• The S-form curve is called the Sigmoid function or the
logistic function.
Dr.Trilok Nath Pandey, VIT 25

Logistic Regression in Machine Learning

• We use the concept of the threshold value, which defines


the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the
threshold values tends to 0.
Dr.Trilok Nath Pandey, VIT 26

Logistic Regression in Machine Learning

• It’s an S-shaped curve that can take any real-valued


number and map it into a value between 0 and 1.

• Where e is the base of the natural logarithms (Euler’s


number or the EXP() function) and value is the actual
numerical value that you want to transform. Below is a
plot of the numbers between -5 and 5 transformed into the
range 0 and 1 using the logistic function.
Dr.Trilok Nath Pandey, VIT 27

Logistic Regression in Machine Learning


Dr.Trilok Nath Pandey, VIT 28

Logistic Regression in Machine Learning

• The odds equals the probability that Y=1 divided by the probability that Y=0.

• For example, if the probability that Y =1 is 0.8 then the probability that Y=0 is
1-0.8 or 0.2

• Odds = P(Y=1)/P(Y=0) = 0.8/0.2 = 4


Dr.Trilok Nath Pandey, VIT 29

Logistic Regression in Machine Learning


• Logistic Regression uses logit() to classify the
outcomes.
• If the probability of an event occurring (P) and the
probability that it will not occur is (1-P)
• Odds Ratio = P/(1-P)
• Taking the log of Odds ratio gives us:
• Log of Odds = log (p/(1-P))
• This is the logit function
• Probability ranges from 0 to 1
• Odds range from 0 to ∞
• Log Odds range from −∞ to ∞.
Dr.Trilok Nath Pandey, VIT 30

Logistic Regression in Machine Learning


Dr.Trilok Nath Pandey, VIT 31

Logistic Regression in Machine Learning


• The inverse of the logit function is the sigmoid
function.
• That is, if you have a probability p,
• sigmoid(logit(p)) = p.
• The sigmoid function maps arbitrary real values back
to the range [0, 1]
• Sigmoid function
• Sigmoid is a mathematical function that takes any real
number and maps it to a probability between 1 and 0.
Dr.Trilok Nath Pandey, VIT 32

Logistic Regression in Machine Learning


Dr.Trilok Nath Pandey, VIT 33

Representation Used for Logistic Regression

where:

• p = probability that y=1 given the values of the input features, x.


• x1,x2,..,xk = set of input features, x.
• B0,B1,..,Bk = parameter values to be estimated via the maximum
likelihood method.
• B0,B1,..Bk are estimated as the ‘log-odds’ of a unit change in the input
feature it is associated with.
• Bt = vector of coefficients
• X = vector of input features
• Estimating the values of B0,B1,..,Bk involves the concepts of
probability, odds and log odds

Dr.Trilok Nath Pandey, VIT 34

Example
• Consider the dataset
Dr.Trilok Nath Pandey, VIT 35

Probability:

• The probability of an event is the number of instances of


that event divided by the total number of instances
present.
• Thus, the probability of females securing honours:

• =0.29
Dr.Trilok Nath Pandey, VIT 36

Probability:

• Odds:
• The odds of an event is the probability of that event
occurring (probability that y=1), divided by the probability
that it does not occur.
• Thus, the odds of females securing honours:
Dr.Trilok Nath Pandey, VIT 37

Probability:
• =0.42
• This is interpreted as:

• 32/77 => For every 32 females that secure honours, there
are 77 females that do not secure honours.
• 32/77 => There are 32 females that secure honours, for
every 109 (ie 32+77) females.

Dr.Trilok Nath Pandey, VIT 38

Log odds:

• The Logit or log-odds of an event is the log of the odds.


This refers to the natural log (base ‘e’). Thus,

• Thus, the log-odds of females securing honours:


Dr.Trilok Nath Pandey, VIT 39

• Q: Find the odds ratio of graduating with honours for


females and males.
Dr.Trilok Nath Pandey, VIT 40

Calculations for probability:

Where :

• B0,B1,..Bk are estimated as the ‘log-odds’ of a unit change in the


input feature it is associated with.
• As B0 is the coefficient not associated with any input feature,
• B0= log-odds of the reference variable, x=0 (ie x=male). ie Here,
• B0= log[odds(male graduating with honours)]
• As B1 is the coefficient of the input feature ‘female’,
• B1= log-odds obtained with a unit change in x= female.
• B1= log-odds obtained when x=female and x=male.
Dr.Trilok Nath Pandey, VIT 41

Calculations:
Dr.Trilok Nath Pandey, VIT 42

• From the calculation in the section ‘odds ratio(OR)’,



• B1= log (1.82)
• B1= 0.593

• Thus, the LogR equation becomes

• y= -1.47 + 0.593* female

Dr.Trilok Nath Pandey, VIT 43

• where the value of female is substituted as 0 or 1 for male and


female respectively.

• Now, let us try to find out the probability of a female securing
honours when there is only 1 input feature present-‘female’.

• Substitute female=1 in: y= -1.47 + 0.593* female

• Thus, y=log[odds(female)]= -1.47 + 0.593*1 = -0.877

• As log-odds = -0.877.
• Thus, odds= e^ (Bt.X)= e^ (-0.877)= 0.416
• And, probability is calculated as:
Dr.Trilok Nath Pandey, VIT 44
Estimated Regression Equation
Example:
Consider the following training examples:
Marks scored: X = [81 42 61 59 78 49]
Grade (Pass/Fail): Y = [Pass Fail Pass Fail Pass Fail]
Assume we want to model the probability of Y of the form
which is parameterized by (β0, β1).
(i) Which of the following parameters would you use to model p(x).
(a) (-119, 2) (b) (-120, 2) (c) (-121, 2)
(ii) With the chosen parameters, what should be the minimum mark to ensure the
student gets a ‘Pass’ grade with 95% probability?
 Among three, the maximum likelihood value is for β0 = -120
, β1 = 2.
 Therefore, we have to use these values to model p(x)
 With the chosen parameters, what should be the minimum
mark to ensure the student gets a ‘Pass’ grade with 95%
probability?

 Substituting p(x) = 0.95, 0 = -120 and 1 = 2, we will get


xmin = 61.47
Problem:
Consider the following training examples:
Marks scored: X = [75 40 64 53 82 45]
Grade (Pass/Fail): Y = [Pass Fail Pass Fail Pass Fail]
Assume we want to model the probability of Y of the form
which is parameterized by (β0, β1).
(i) Which of the following parameters would you use to model p(x).
(a) (-119, 2) (b) (-120, 2) (c) (-121, 2)
(ii) With the chosen parameters, what should be the minimum mark to
ensure the student gets a ‘Pass’ grade with 95% probability?
Dr.Trilok Nath Pandey, VIT 52

You might also like