0% found this document useful (0 votes)

12 views52 pages

ClassificationMethods LogisticRegression

The document discusses logistic regression as a classification method, highlighting its application in predicting customer preferences and credit card defaults. It explains why linear regression is inappropriate for categorical outcomes and introduces the logistic function to model probabilities between 0 and 1. The document also covers interpreting coefficients, making predictions, and using qualitative predictors in multiple logistic regression.

Uploaded by

ddiya.2610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views52 pages

ClassificationMethods LogisticRegression

Uploaded by

ddiya.2610

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Dr.

Trilok Nath Pandey, VIT 1

CLASSIFICATION METHODS

Dr.Trilok Nath Pandey

SCOPE,VIT,Chennai
Dr.Trilok Nath Pandey, VIT 2

LOGISTIC REGRESSION
Dr.Trilok Nath Pandey, VIT 3

Outline
Cases:
 Orange Juice Brand Preference
 Credit Card Default Data
Why Not Linear Regression?
Simple Logistic Regression
 Logistic Function
 Interpreting the coefficients
 Making Predictions
 Adding Qualitative Predictors
Multiple Logistic Regression
Dr.Trilok Nath Pandey, VIT 4

Case 1: Brand Preference for Orange

Juice
• We would like to predict what customers prefer to buy:
Citrus Hill or Minute Maid orange juice?(OJ data in ISLR)

• The Y (Purchase) variable is categorical: 0 or 1

• The X (LoyalCH) variable is a numerical value (between 0

and 1) which specifies the how much the customers are
loyal to the Citrus Hill (CH) orange juice

• Can we use Linear Regression when Y is categorical?

Dr.Trilok Nath Pandey, VIT 5

Why not Linear Regression?

When Y only takes on values of 0 and 1, why
standard linear regression is inappropriate?

0.9
How do we
0.7 interpret values
Purchase

0.5 greater than 1?

0.4

0.2

0.0 How do we interpret

.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 values of Y between 0
LoyalCH
and 1?
Dr.Trilok Nath Pandey, VIT 6

Problems
• The regression line 0+1X can take on any value
between negative and positive infinity

• In the orange juice classification problem, Y can only take

on two possible values: 0 or 1.

• Therefore the regression line almost always predicts the

wrong value for Y in classification problems
Dr.Trilok Nath Pandey, VIT 7

Solution: Use Logistic Function

• Instead of trying to predict Y, let’s try to predict P(Y = 1),
i.e., the probability a customer buys Citrus Hill (CH) juice.
• Thus, we can model P(Y = 1) using a function that gives
outputs between 0 and 1.
• We can use the logistic function 1

• Logistic Regression! 0.9

0.8
0.7

Probability
0.6
 0  1 X
e 0.5

p  P(Y  1) 
0.4
0.3
 0  1 X
1 e 0.2
0.1
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
X
Dr.Trilok Nath Pandey, VIT 8

Logistic Regression
• Logistic regression is very 1
similar to linear
MM
regression 0.75

P(Purchase)
• We come up with b0 and
0.5
b1 to estimate 0 and 1.
CH
• We have similar problems 0.25
and questions as in linear
regression 0
.0 .2 .4 .6 .7 .9
• e.g. Is 1 equal to 0? How
LoyalCH
sure are we about our
guesses for 0 and 1?

If LoyalCH is about .6 then Pr(CH)  .7.

Dr.Trilok Nath Pandey, VIT 9

Case 2: Credit Card Default Data

We would like to be able to predict customers that are
likely to default

Possible X variables are:

 Annual Income
 Monthly credit card balance

The Y variable (Default) is categorical: Yes or No

How do we check the relationship between Y and X?

Dr.Trilok Nath Pandey, VIT 10

The Default Dataset

Dr.Trilok Nath Pandey, VIT 11

Why not Linear Regression?

If we fit a linear regression
to the Default data, then
for very low balances we
predict a negative
probability, and for high
balances we predict a
probability above 1!

When Balance < 500,

Pr(default) is negative!
Dr.Trilok Nath Pandey, VIT 12

Logistic Function on Default Data

• Now the probability
of default is close
to, but not less
than zero for low
balances. And
close to but not
above 1 for high
balances
Dr.Trilok Nath Pandey, VIT 13

Interpreting 1
• Interpreting what 1 means is not very easy with logistic
regression, simply because we are predicting P(Y) and
not Y.
• If 1 =0, this means that there is no relationship between
Y and X.
• If 1 >0, this means that when X gets larger so does the
probability that Y = 1.
• If 1 <0, this means that when X gets larger, the
probability that Y = 1 gets smaller.
• But how much bigger or smaller depends on where we
are on the slope
Dr.Trilok Nath Pandey, VIT 14

Are the coefficients significant?

• We still want to perform a hypothesis test to see whether
we can be sure that are 0 and 1 significantly different
from zero.
• We use a Z test instead of a T test, but of course that
doesn’t change the way we interpret the p-value
• Here the p-value for balance is very small, and b1 is
positive, so we are sure that if the balance increase, then
the probability of default will increase as well.
Dr.Trilok Nath Pandey, VIT 15

Making Prediction
• Suppose an individual has an average balance of $1000.
What is their probability of default?

• The predicted probability of default for an individual with a

balance of $1000 is less than 1%.

• For a balance of $2000, the probability is much higher,

and equals to 0.586 (58.6%).
Dr.Trilok Nath Pandey, VIT 16

Qualitative Predictors in Logistic

Regression
• We can predict if an individual default by checking if she is
a student or not. Thus we can use a qualitative variable
“Student” coded as (Student = 1, Non-student =0).
• b1 is positive: This indicates students tend to have higher
default probabilities than non-students
Dr.Trilok Nath Pandey, VIT 17

Multiple Logistic Regression

• We can fit multiple logistic just like regular regression
Dr.Trilok Nath Pandey, VIT 18

Multiple Logistic Regression- Default Data

• Predict Default using:
• Balance (quantitative)
• Income (quantitative)
• Student (qualitative)
Dr.Trilok Nath Pandey, VIT 19

Predictions
• A student with a credit card balance of $1,500 and an
income of $40,000 has an estimated probability of default
Dr.Trilok Nath Pandey, VIT 20

An Apparent Contradiction!

Positive

Negative
Dr.Trilok Nath Pandey, VIT 21

Students (Orange) vs. Non-students

(Blue)
Dr.Trilok Nath Pandey, VIT 22

To whom should credit be offered?

• A student is risker than non students if no information
about the credit card balance is available

• However, that student is less risky than a non student with

the same credit card balance!
Dr.Trilok Nath Pandey, VIT 23

Logistic Regression in Machine Learning

• Logistic regression is Supervised Learning technique. It is
used for predicting the categorical dependent variable using a
given set of independent variables.
• Logistic regression predicts the output of a categorical
dependent variable. Therefore the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and
1.
• Logistic Regression is much similar to the Linear Regression
except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit
an "S" shaped logistic function
Dr.Trilok Nath Pandey, VIT 24

Logistic Regression in Machine Learning

• Logistic Function (Sigmoid Function):

• The sigmoid function is a mathematical function used to
map the predicted values to probabilities.
• It maps any real value into another value within a range of
0 and 1.
• The S-form curve is called the Sigmoid function or the
logistic function.
Dr.Trilok Nath Pandey, VIT 25

Logistic Regression in Machine Learning

• We use the concept of the threshold value, which defines

the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the
threshold values tends to 0.
Dr.Trilok Nath Pandey, VIT 26

Logistic Regression in Machine Learning

• It’s an S-shaped curve that can take any real-valued

number and map it into a value between 0 and 1.

• Where e is the base of the natural logarithms (Euler’s

number or the EXP() function) and value is the actual
numerical value that you want to transform. Below is a
plot of the numbers between -5 and 5 transformed into the
range 0 and 1 using the logistic function.
Dr.Trilok Nath Pandey, VIT 27

Logistic Regression in Machine Learning

Dr.Trilok Nath Pandey, VIT 28

Logistic Regression in Machine Learning

• The odds equals the probability that Y=1 divided by the probability that Y=0.

• For example, if the probability that Y =1 is 0.8 then the probability that Y=0 is
1-0.8 or 0.2

• Odds = P(Y=1)/P(Y=0) = 0.8/0.2 = 4

Dr.Trilok Nath Pandey, VIT 29

Logistic Regression in Machine Learning

• Logistic Regression uses logit() to classify the
outcomes.
• If the probability of an event occurring (P) and the
probability that it will not occur is (1-P)
• Odds Ratio = P/(1-P)
• Taking the log of Odds ratio gives us:
• Log of Odds = log (p/(1-P))
• This is the logit function
• Probability ranges from 0 to 1
• Odds range from 0 to ∞
• Log Odds range from −∞ to ∞.
Dr.Trilok Nath Pandey, VIT 30

Logistic Regression in Machine Learning

Dr.Trilok Nath Pandey, VIT 31

Logistic Regression in Machine Learning

• The inverse of the logit function is the sigmoid
function.
• That is, if you have a probability p,
• sigmoid(logit(p)) = p.
• The sigmoid function maps arbitrary real values back
to the range [0, 1]
• Sigmoid function
• Sigmoid is a mathematical function that takes any real
number and maps it to a probability between 1 and 0.
Dr.Trilok Nath Pandey, VIT 32

Logistic Regression in Machine Learning

Dr.Trilok Nath Pandey, VIT 33

Representation Used for Logistic Regression

where:

• p = probability that y=1 given the values of the input features, x.

• x1,x2,..,xk = set of input features, x.
• B0,B1,..,Bk = parameter values to be estimated via the maximum
likelihood method.
• B0,B1,..Bk are estimated as the ‘log-odds’ of a unit change in the input
feature it is associated with.
• Bt = vector of coefficients
• X = vector of input features
• Estimating the values of B0,B1,..,Bk involves the concepts of
probability, odds and log odds
•
Dr.Trilok Nath Pandey, VIT 34

Example
• Consider the dataset
Dr.Trilok Nath Pandey, VIT 35

Probability:

• The probability of an event is the number of instances of

that event divided by the total number of instances
present.
• Thus, the probability of females securing honours:

• =0.29
Dr.Trilok Nath Pandey, VIT 36

Probability:

• Odds:
• The odds of an event is the probability of that event
occurring (probability that y=1), divided by the probability
that it does not occur.
• Thus, the odds of females securing honours:
Dr.Trilok Nath Pandey, VIT 37

Probability:
• =0.42
• This is interpreted as:
•
• 32/77 => For every 32 females that secure honours, there
are 77 females that do not secure honours.
• 32/77 => There are 32 females that secure honours, for
every 109 (ie 32+77) females.
•
Dr.Trilok Nath Pandey, VIT 38

Log odds:

• The Logit or log-odds of an event is the log of the odds.

This refers to the natural log (base ‘e’). Thus,

• Thus, the log-odds of females securing honours:

Dr.Trilok Nath Pandey, VIT 39

• Q: Find the odds ratio of graduating with honours for

females and males.
Dr.Trilok Nath Pandey, VIT 40

Calculations for probability:

Where :

• B0,B1,..Bk are estimated as the ‘log-odds’ of a unit change in the

input feature it is associated with.
• As B0 is the coefficient not associated with any input feature,
• B0= log-odds of the reference variable, x=0 (ie x=male). ie Here,
• B0= log[odds(male graduating with honours)]
• As B1 is the coefficient of the input feature ‘female’,
• B1= log-odds obtained with a unit change in x= female.
• B1= log-odds obtained when x=female and x=male.
Dr.Trilok Nath Pandey, VIT 41

Calculations:
Dr.Trilok Nath Pandey, VIT 42

• From the calculation in the section ‘odds ratio(OR)’,

•
• B1= log (1.82)
• B1= 0.593
•
• Thus, the LogR equation becomes
•
• y= -1.47 + 0.593* female
•
Dr.Trilok Nath Pandey, VIT 43

• where the value of female is substituted as 0 or 1 for male and

female respectively.
•
• Now, let us try to find out the probability of a female securing
honours when there is only 1 input feature present-‘female’.
•
• Substitute female=1 in: y= -1.47 + 0.593* female
•
• Thus, y=log[odds(female)]= -1.47 + 0.593*1 = -0.877
•
• As log-odds = -0.877.
• Thus, odds= e^ (Bt.X)= e^ (-0.877)= 0.416
• And, probability is calculated as:
Dr.Trilok Nath Pandey, VIT 44
Estimated Regression Equation
Example:
Consider the following training examples:
Marks scored: X = [81 42 61 59 78 49]
Grade (Pass/Fail): Y = [Pass Fail Pass Fail Pass Fail]
Assume we want to model the probability of Y of the form
which is parameterized by (β0, β1).
(i) Which of the following parameters would you use to model p(x).
(a) (-119, 2) (b) (-120, 2) (c) (-121, 2)
(ii) With the chosen parameters, what should be the minimum mark to ensure the
student gets a ‘Pass’ grade with 95% probability?
 Among three, the maximum likelihood value is for β0 = -120
, β1 = 2.
 Therefore, we have to use these values to model p(x)
 With the chosen parameters, what should be the minimum
mark to ensure the student gets a ‘Pass’ grade with 95%
probability?

 Substituting p(x) = 0.95, 0 = -120 and 1 = 2, we will get

xmin = 61.47
Problem:
Consider the following training examples:
Marks scored: X = [75 40 64 53 82 45]
Grade (Pass/Fail): Y = [Pass Fail Pass Fail Pass Fail]
Assume we want to model the probability of Y of the form
which is parameterized by (β0, β1).
(i) Which of the following parameters would you use to model p(x).
(a) (-119, 2) (b) (-120, 2) (c) (-121, 2)
(ii) With the chosen parameters, what should be the minimum mark to
ensure the student gets a ‘Pass’ grade with 95% probability?
Dr.Trilok Nath Pandey, VIT 52

Developing Qualified NDT Procedures & The Technical Justification Process - Sample
100% (1)
Developing Qualified NDT Procedures & The Technical Justification Process - Sample
28 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
29 LogisticRegression
No ratings yet
29 LogisticRegression
15 pages
Logistic Regression Course Overview
No ratings yet
Logistic Regression Course Overview
16 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
LO3 Logistic Regression1
No ratings yet
LO3 Logistic Regression1
31 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (2)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
04 Chap04 ClassificationMethods-LogisticRegression 2024
No ratings yet
04 Chap04 ClassificationMethods-LogisticRegression 2024
23 pages
Week-8 Logistic, SVM 0 Random Forest
No ratings yet
Week-8 Logistic, SVM 0 Random Forest
23 pages
Unit 3-ML
No ratings yet
Unit 3-ML
99 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
22 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
19 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
16 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Classification
No ratings yet
Classification
56 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
ML Logistic Regression Module3 Final
No ratings yet
ML Logistic Regression Module3 Final
22 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
L6 LogisticRegression
No ratings yet
L6 LogisticRegression
22 pages
3 Classification
No ratings yet
3 Classification
26 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
Chapter Two Dss
No ratings yet
Chapter Two Dss
3 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
2-Logistic Regression
No ratings yet
2-Logistic Regression
15 pages
Logistic Regression for Analysts
No ratings yet
Logistic Regression for Analysts
33 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
49 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Dav Exp4 66
No ratings yet
Dav Exp4 66
5 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Module 2
No ratings yet
Module 2
92 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
41 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
AI - ML Intern Assignment - Quick Chatbot Prototype
No ratings yet
AI - ML Intern Assignment - Quick Chatbot Prototype
1 page
Dbscan TNP
No ratings yet
Dbscan TNP
19 pages
Smote TNP
No ratings yet
Smote TNP
32 pages
TNP Lecture 2 G1G2
No ratings yet
TNP Lecture 2 G1G2
58 pages
Module 3 Urban Community Issues Part 2-Merged
No ratings yet
Module 3 Urban Community Issues Part 2-Merged
16 pages
Chapter 0
No ratings yet
Chapter 0
10 pages
Sufficient Statistics - Problems - Solved - Xiang - Yin
No ratings yet
Sufficient Statistics - Problems - Solved - Xiang - Yin
5 pages
Intraday Momentum Tradingwith HMM
No ratings yet
Intraday Momentum Tradingwith HMM
38 pages
Practice Paper 2
No ratings yet
Practice Paper 2
24 pages
Book of Bill
No ratings yet
Book of Bill
6 pages
IS4242 W3 Regression Analyses
No ratings yet
IS4242 W3 Regression Analyses
67 pages
BS 6
No ratings yet
BS 6
10 pages
BSMA 301 Statistics: Dr. Eyram Kwame
No ratings yet
BSMA 301 Statistics: Dr. Eyram Kwame
137 pages
1989 - Harris, T. - Assessment of Control Loop Performance
No ratings yet
1989 - Harris, T. - Assessment of Control Loop Performance
6 pages
Time Series Characteristic
No ratings yet
Time Series Characteristic
72 pages
Survival Analysis for Medical Data
No ratings yet
Survival Analysis for Medical Data
37 pages
Data Analyst Roadmap 1687174001
No ratings yet
Data Analyst Roadmap 1687174001
8 pages
Linear Regression-3: Prof. Asim Tewari IIT Bombay
No ratings yet
Linear Regression-3: Prof. Asim Tewari IIT Bombay
50 pages
Prosiding Seminar Edusainstech ISBN: 978-602-5614-35-4 Fmipa Unimus 2020
No ratings yet
Prosiding Seminar Edusainstech ISBN: 978-602-5614-35-4 Fmipa Unimus 2020
9 pages
Normal Distribution: Mean, Mode, Median 0 +1sd +2sd +3sd
No ratings yet
Normal Distribution: Mean, Mode, Median 0 +1sd +2sd +3sd
6 pages
Experimental Research Methods
0% (1)
Experimental Research Methods
30 pages
(BS ISO 5725-1 - 1994) - Accuracy (Trueness and Precision) of Measurement Methods and Results. General Principles and Definitions
No ratings yet
(BS ISO 5725-1 - 1994) - Accuracy (Trueness and Precision) of Measurement Methods and Results. General Principles and Definitions
26 pages
10 Estimation of Proportion
No ratings yet
10 Estimation of Proportion
17 pages
Advanced Regression Analysis Guide
No ratings yet
Advanced Regression Analysis Guide
68 pages
Efficient Bayesian Inference For AFRIMA Processes
No ratings yet
Efficient Bayesian Inference For AFRIMA Processes
33 pages
F.Y.B.Sc. Computer Science (Statistics) - 14.082019
No ratings yet
F.Y.B.Sc. Computer Science (Statistics) - 14.082019
13 pages
Examination: Subject CT4 Models (Includes Both 103 and 104 Parts) Core Technical
No ratings yet
Examination: Subject CT4 Models (Includes Both 103 and 104 Parts) Core Technical
20 pages
QC PDF
No ratings yet
QC PDF
18 pages
Pedersen Et Al 2019 - GAMS, Pacote MGCV
No ratings yet
Pedersen Et Al 2019 - GAMS, Pacote MGCV
42 pages
Binomial Probabilities
No ratings yet
Binomial Probabilities
4 pages
Targeted Learning in Data Science (PDFDrive) PDF
No ratings yet
Targeted Learning in Data Science (PDFDrive) PDF
655 pages
Analyzing Data Measured by Individual Likert-Type Items
No ratings yet
Analyzing Data Measured by Individual Likert-Type Items
5 pages
AddMath Chapter 5 - Form 5
No ratings yet
AddMath Chapter 5 - Form 5
6 pages
Taylor Ims11 Tif ch14
No ratings yet
Taylor Ims11 Tif ch14
31 pages

ClassificationMethods LogisticRegression

Uploaded by

ClassificationMethods LogisticRegression

Uploaded by

Dr.

Trilok Nath Pandey, VIT 1

Dr.Trilok Nath Pandey

Case 1: Brand Preference for Orange

• The Y (Purchase) variable is categorical: 0 or 1

• The X (LoyalCH) variable is a numerical value (between 0

• Can we use Linear Regression when Y is categorical?

Why not Linear Regression?

0.5 greater than 1?

0.0 How do we interpret

• In the orange juice classification problem, Y can only take

• Therefore the regression line almost always predicts the

Solution: Use Logistic Function

• Logistic Regression! 0.9

If LoyalCH is about .6 then Pr(CH)  .7.

Case 2: Credit Card Default Data

Possible X variables are:

The Y variable (Default) is categorical: Yes or No

How do we check the relationship between Y and X?

The Default Dataset

Why not Linear Regression?

When Balance < 500,

Logistic Function on Default Data

Are the coefficients significant?

• The predicted probability of default for an individual with a

• For a balance of $2000, the probability is much higher,

Qualitative Predictors in Logistic

Multiple Logistic Regression

Multiple Logistic Regression- Default Data

Students (Orange) vs. Non-students

To whom should credit be offered?

• However, that student is less risky than a non student with

Logistic Regression in Machine Learning

Logistic Regression in Machine Learning

• Logistic Function (Sigmoid Function):

Logistic Regression in Machine Learning

• We use the concept of the threshold value, which defines

Logistic Regression in Machine Learning

• It’s an S-shaped curve that can take any real-valued

• Where e is the base of the natural logarithms (Euler’s

Logistic Regression in Machine Learning

Logistic Regression in Machine Learning

• Odds = P(Y=1)/P(Y=0) = 0.8/0.2 = 4

Logistic Regression in Machine Learning

Logistic Regression in Machine Learning

Logistic Regression in Machine Learning

Logistic Regression in Machine Learning

Representation Used for Logistic Regression

• p = probability that y=1 given the values of the input features, x.

• The probability of an event is the number of instances of

• The Logit or log-odds of an event is the log of the odds.

• Thus, the log-odds of females securing honours:

• Q: Find the odds ratio of graduating with honours for

Calculations for probability:

• B0,B1,..Bk are estimated as the ‘log-odds’ of a unit change in the

• From the calculation in the section ‘odds ratio(OR)’,

• where the value of female is substituted as 0 or 1 for male and

 Substituting p(x) = 0.95, 0 = -120 and 1 = 2, we will get

You might also like