Marketing Analytics (MBA)
CHAPTER 17: LOGISTIC REGRESSION
To IIEBM (Marketing)
Batch 2020-21 (Semester 3)
Credenca Data Solutions Pvt. Ltd.
Office # 301, Tower S4,
Magarpatta Cybercity, PUNE 411013
+91 88 05 99 2525 (Mob)
+91 20 25363592 (Landline)
Linear Regression
Marks
student
Attendance
Linear Regression
• Calculate R2 and determine if Marks and
Attendance are correlated. Large values
imply a large effect.
• Calculate p-value to determine if the R2
value is statistically significant.
Marks
• Use the line to predict Marks given
Attendance
Attendance
Multiple Regression
Allows to calculate:
Hours Spent
• R2
• p-value
• Use the line to predict Marks given
Marks
Attendance and Hours in Self Study
Attendance
Compare Between Two Models
Comparing two models tells us if we need to measure Attendance and Hours Spent to
predict Marks or if we can get away with just Attendance
Hours Spent
Marks
Marks
Attendance Attendance
Logistic Regression
• Logistic regression predicts whether something is TRUE or FALSE, instead of predicting something continuous like Marks.
PASS
Passed in exams
Failed in exams
FAIL
Attendance
Logistic Regression
• Instead of fitting a line to the data, logistic regression fits an “S” shaped “logistic function”
PASS
• Curve goes from 0 to 1
• Curve tells us the probability
that a student will pass based
on attendance
FAIL
Attendance
Logistic Regression
• Although logistic regression tells the probability that a student is passed or not, its usually used for classification.
PASS
For example, if the probability a student
will pass is > 50% then we will classify it
as passed, otherwise we will classify it
as failed.
FAIL
Attendance
Logistic Regression
• Just like with linear regression, we can make simple models: Marks predicted by Attendance
• Or complicated models: Marks predicted by Attendance and Hours in Self Study
• Logistic Regression’s ability to provide probabilities and classify new sample makes it a popular method
Difference: Linear and Logistic Regression
PASS
One big difference between linear
regression and logistic regression is
how the line is fit to the data.
FAIL
Attendance
Difference: Linear and Logistic Regression
Marks
With linear regression, we fit the line
using “least squares”
Attendance
Difference: Linear and Logistic Regression
Marks
In other words, we find the line that
minimizes the sum of the squares of
the residuals
Attendance
Difference: Linear and Logistic Regression
We also use the residuals to
Marks
calculate R2 and to compare simple
models to complicated models.
Attendance
Difference: Linear and Logistic Regression
PASS
Logistic Regression doesn’t have the
same concept of “residual”, so it
can’t use least squares and it can’t
calculate R2.
FAIL
Attendance
Difference: Linear and Logistic Regression
PASS
Instead, it uses something called as
“maximum likelihood”
“Maximum Likelihood” discussed later
FAIL
Attendance
High and Low Probabilities
High Probability
Student will Pass
1 PASS
Probability
student will
pass
0 FAIL
Attendance
Low Probability
Student will Pass
Y-Axis Values
In linear regression, the value on y-axis (in In logistic regression, the y-axis is confined
theory) can be any number to probability values between 0 and 1
1 PASS
Probability
student will
Marks
pass
0 FAIL
Attendance
Attendance
Log(Odds Ratio)
1 PASS
To solve this problem, the y-axis in logistic
regression is transformed from the
“probability of passing” to log(odds of Probability
passing) so, just like y-axis in linear student will
pass
regression, it can go from –infinity to
+infinity
0 FAIL
Attendance
Log(Odds Ratio)
Log(odds of passing) = log(p/1-p) Logit Function
“p” is the probability value
0
Attendance
Log(Odds Ratio)
Corresponds to p=0.5
=Log(0.5 / 1 – 0.5)
1
=Log(1)
=0
Similarly, if we plug in probability
values, we will start converting
probabilities into number scale.
Attendance
Log(Odds Ratio)
+ve infinity
-1
-2
-3
-ve infinity
Data for Logistic Regression
Person# Age Subscribe?
1 20 0
2 23 0
3 24 0
4 25 0
5 25 1
6 26 0
7 26 0
Data: Predicting the chance based on age whether person 8 28 0
9 28 0
will subscribe or not
10 29 0
Data has the age and subscription status (1 = subscriber, 11 30 0
0 = nonsubscriber) for 41 people 12 30 0
13 30 0
14 30 0
15 30 0
16 30 1
17 32 0
18 32 0
19 33 0
20 33 0
Logistic Regression Model
• Log ( p / 1 – p) = log (odds)
• Exponentiate both sides P / 1- p = elog(odds)
• Multiply both sides by (1 – p)… p = (1- p) elog(odds)
• Multiply (1 – p) and elog(odds) p = elog(odds) - pelog(odds)
• Add p elog(odds) to both sides p + pelog(odds) = elog(odds)
• Pull p out P(1 + elog(odds)) = elog(odds)
• Divide both sides by (1 + elog(odds))… p = elog(odds) / 1 + elog(odds)
This can also be written as p = 1 / 1 + e-log(odds)
Logistic Regression Model
In Equation 2, Ln (p / 1- p) is referred to as the log odds ratio, because (the odds ratio) is the ratio of the probability of
success (dependent variable = 1) to the probability of failure (dependent variable = 0.)
Logistic Regression Model
• If you take e to both sides of Equation 2 and use the fact that eLn x = x, you can rewrite Equation 2 as one of the following:
Equation 3 is often referred to as the logistic regression model (or sometimes the logit regression model) because the
function is known as the logistic function.
Maximum Likelihood Estimate of Logistic Regression Model
• Essentially, in the magazine example, the maximum likelihood estimation chooses the slope and intercept to maximize, given
the age of each person, the probability or likelihood of the observed pattern of subscribers and nonsubscribers.
• For each observation in which the person was a subscriber, the probability that the person was a subscriber is given by
Equation 4, and for each observation in which the person is not a subscriber, the probability that the person is not a
subscriber is given by 1 – (right side of Equation 4).
Maximum Likelihood Estimate of Logistic Regression Model
• If you choose slope and intercept to maximize the product of these probabilities, then you are “maximizing the likelihood” of
what you have observed.
• Unfortunately, the product of these probabilities proves to be a small number, so it is convenient to maximize the natural
logarithm of this product.
• The following equation makes it easy to maximize the log likelihood
Steps For Maximum Likelihood Estimation
• Enter trial values of the intercept and slope in D1:D2, and name D1:D2 using Create from Selection.
• Copy the formula =intercept+slope*D4 from F4 to F5:F44, to create a “score” for each observation.
• Copy the formula =EXP(F4)/(1+EXP(F4)) from G4 to G5:G44 to use Equation 4 to compute for each observation the
estimated probability that the person is a subscriber.
• Copy the formula =1-G4 from H4 to H5:H44 to compute the probability of the person not being a subscriber.
• Copy the formula =IF(E4=1,G4,1-G4) from I4 to I5:I44 to compute the likelihood of each observation.
• In I2 the formula =PRODUCT(I5:I44) computes the likelihood of the observed subscriber and nonsubscriber data. Note that
this likelihood is a small number.
• Copy the formula =LN(I4) from J4 to J5:J44, to compute the logarithm of each observation's probability.
• Use Equation 5 in cell J2 to compute the Log Likelihood with the formula =SUM(J4:J44).
Solver Setup
Result
• Intercept: -5.661
• Slope: 0.128
Logistic Regression to Estimate Probabilities
• Predicting a chance whether a 44-year person will subscribe:
Score = Intercept + Slope*Age
= -5.661 + 0.128 * 44
= -0.023689446
Using Eq 4
e(-0.023689446) / 1 + e(-0.023689446)
= 0.494077915