0% found this document useful (0 votes)

36 views29 pages

Lect7 Math231

The document discusses logistic regression, which models the probability of binary outcomes as a function of predictor variables. It provides background on logistic regression and how it addresses limitations of linear regression for binary outcomes. An example analyzes the relationship between age and coronary heart disease using logistic regression. The results show that the odds of coronary heart disease increase by 11.6% for each additional year of age.

Uploaded by

Qasim Rafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views29 pages

Lect7 Math231

Uploaded by

Qasim Rafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

1/29

Statistics

Logistic Regression

Shaheena Bashir

FALL, 2019
2/29
Outline

Background

Introduction
Logit Transformation
Assumptions

Estimation

Example
Analysis
How Good is the Fitted Model?

Single Categorical Predictor

Types of Logistic Regression Models

o
3/29
Background

Motivating Example

o
4/29
Background

Scatter Plot
Relationship between Age & CHD

1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.8
Coronary heart disease

0.6
0.4
0.2
0.0

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

20 30 40 50 60 70

Age (years)

o
Not informative!!
5/29
Background

Regression Model: Objective

I Describe the relationship between an outcome (dependent or

response) variable and a set of independent (predictor or
explanatory) variables by some regression model (equation).
I Predict some future outcome based on the regression model
How to model the relationship of CHD with age?

o
6/29
Background

Background

I What distinguishes a logistic regression model from the linear

regression model is that the outcome variable is binary (or
dichotomous).
I Whether the tumor is malignant (Yes=1) or not (No=0)
I Whether a newborn baby with low birth weight (yes=1) or not
(No=0)
I A student gets admission at LUMS (Yes=1) vs not (No=0)
For categorical response variable, the assumption that the
errors follow a normal distribution fails.

o
7/29
Background

Tabular Form of CHD Data

Age Group n CHD Present % CHD Present

20-29 10 1 0.10
30-34 15 2 0.13
35-39 12 3 0.25
40-44 15 5 0.33
45-49 13 6 0.46
50-54 8 5 0.63
55-59 17 13 0.76
60-69 10 8 0.80
100 43

o
8/29
Background

Proportion of Individuals with CHD

Relationship between Age & CHD

1.0
0.8

●
●

●
0.6
% with CHD

●
0.4

●
0.2

●
●
0.0

20 30 40 50 60 70

Age (years)

o
9/29
Introduction

Logistic Regression Model

I The response variable in logistic regression is categorical. The

linear regression model, i.e., Y = X β + does not work well
for a few reasons.
I The response values, 0 and 1, are arbitrary, so modeling the
actual values of Y is not exactly of interest.
I Our interest is in modeling the probability of each individual in
the population who responds with 0 or 1,
I The error terms in this case do not follow a normal distribution.
Thus, we might consider modeling P, the probability, as the
response variable.

o
10/29
Introduction

Sigmoid Function

Modeling the probability as response, some problems

I Although the general increase in probability is accompanied by
a general increase in age, we know that P, like all
probabilities, can only fall within the boundaries of 0 and 1.
I It is better to assume that the relationship between age and P
is sigmoidal (S-shaped), rather than a straight line.
I It is possible, however, to find a linear relationship between
age and a function of P. Although a number of functions
work, one of the most useful is the logit function.

o
11/29
Introduction
Logit Transformation

Logit Function
p
The logit function ln 1−p (also called log-odds) is simply the log of
ratio of P(Y = 1) divided by P(Y = 0).
p
ln = Xβ
1−p
The odds
p
= exp(X β).
(1 − p)
Solving

exp(y ) 1
p = Pr (Y = 1|X = x) = =
[1 + exp(y )] 1 + exp(−y )
gives the standard logistic function, while y = X β.
o
12/29
Introduction
Logit Transformation

Logit Function

p
g (x) = ln 1−p has many of the desirable properties of a linear
regression model.
I It may be continuous
I It is linear in the parameters
I It has the potential for a range between −∞ and +∞
depending on the range of x .

o
13/29
Introduction
Logit Transformation

Summary: Logit Transformation

Quantity Formula min max

Probability p 0 1
p
Odds 1−p 0 ∞
p
Logit or ’Log-Odds’ loge 1−p −∞ ∞

Logit stretches the probability scale

o
14/29
Introduction
Assumptions

Assumptions

Linear Regression Logistic Regression

∼ N(0; σ 2 ) ∼ Bin(p)
p
Y = Xβ + ln 1−p = Xβ +
Y |X ∼ N(X β; σ 2 ) Y |X ∼ Bin(p)

o
15/29
Estimation

Estimation of Parameters of Regression Model: β

I The method of maximum likelihood yields values for the

unknown parameters that maximize the probability of
obtaining the observed set of data.
I For logistic regression the likelihood equations are non-linear
in the parameters β’s and require special methods for their
solution.
I These methods are iterative in nature and have been
programmed into available logistic regression software

o
16/29
Example

Example: CHD Data

I Is age a risk factor of CHD? How the probability of CHD

changes by age?
I Outcome variable: CHD (Yes, No)
I Predictor: Age (in years)
Logistic regression models the probability of some event occurring
as a linear function of a set of predictors.

o
17/29
Example
Analysis

CHD Analysis

ln 1−p̂ p̂ = −5.31 + 0.11Age

I The coefficient is interpreted as the MARGINAL increase in
the log odds of CHD when age increases by 1 year.

Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.31 1.13 -4.68 0.00
age 0.11 0.02 4.61 0.00

OR = exp(0.11) = 1.116
The odds of getting CHD are · · · · · · when age increases by 1 year

o
18/29
Example
Analysis

Fitted Values

exp(βo + β1 X )
p =
[1 + exp(βo + β1 X )]
exp(−5.31 + 0.11Age)
=
[1 + exp(−5.31 + 0.11Age)]

o
19/29
Example
Analysis

R Software

mod1<-glm(chd ∼ age, family=’binomial’, data=chdage)

summary(mod1)
predict(mod1, type = ’response’)
anova(mod1, test=’Chisq’)
plot(mod1)

o
20/29
Example
Analysis

Predicted Probabilities

●
●
●
●
0.8

●
●
●
●
●
●
●
●
●
predicted probabilities

●
0.6

●
●
●
●
●
●
●
0.4

●
●
●
●
●
●
●
●
0.2

●
●
●
●
●
●
●
●
●●
●●
●

20 30 40 50 60 70

Age
o
21/29
Example
How Good is the Fitted Model?

Analysis of Deviance
Model: binomial, link: logit
Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev Pr (> Chi)

NULL 99 136.66
Age 1 29.31 98 107.35 6.168e − 08 ∗ ∗∗

I Deviance is a measure of goodness of fit of a generalized

linear model. Or rather, it’s a measure of badness of fit.
I If our new model explains the data better than the null model,
there should be a significant reduction in the deviance which
can be tested against the chi-square distribution to give a
p-value
o
22/29
Example
How Good is the Fitted Model?

Hosmer-Lemeshow Goodness of Fit

How well our model fits depends on the difference between the
model and the observed data.

library(ResourceSelection)
hoslem.test(as.numeric(chdage$chd)-1, fitted(mod1))
R Output
Hosmer and Lemeshow goodness of fit (GOF) test
data: as.numeric(chdage$chd) - 1, fitted(mod1)
X-squared = 2.2243, df = 8, p-value = 0.9734

Our model appears to fit well because we have no significant

difference between the model and the observed data (i.e. the
p-value > 0.05).
o
23/29
Example
How Good is the Fitted Model?

o
24/29
Single Categorical Predictor

Simple Logistic Regression Model with a Categorical

Predictor

I How some function of the probability of categorical response

is linearly related to a predictor
I Interpretation of the resulting intercept βo & the slope β1
where predictor variable is also binary.

o
25/29
Single Categorical Predictor

Case-Control Study: A Recap Example

Past exposure CHD Cases Controls (without disease)

Smokers 112 176
Non-smokers 88 224
Totals 200 400

Odds of CHD for Smokers = · · ·

Odds of CHD for Non-Smokers = · · ·

o
26/29
Single Categorical Predictor

Case-Control Study: A Recap Example Cont’d

Let yi is binary response variable

I yi = 1; if CHD=yes
I yi = 0; if CHD=no

Past exposure yi ni
Smokers 112 288
Non-smokers 88 312

Then yi ∼ Bin(ni , pi )
xi is the binary predictor of past smoking
I xi = 1; if past smoker
I xi = 0; if non-smoker in the past

o
27/29
Single Categorical Predictor

Case-Control Study: A Recap Example Cont’d

The probability of CHD pi can be modeled as:

logit(pi ) = βo + β1 xi

I xi = 1, then logit(pi |xi = 1) = βo + β1 (1)

I xi = 0, then logit(pi |xi = 0) = βo

pi |xi = 1
β1 = logit(pi |xi = 1) − logit(pi |xi = 0) = log
pi |xi = 0
∴ OR = · · · · · ·

o
28/29
Single Categorical Predictor

Example: Logistic Regression

Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.93 0.13 -7.43 0.00
pastsmoke1 0.48 0.17 2.76 0.01

I For past smokers, xi = 1 then

ln(odds of CHD) = βo + β1 ∴ Odds for smokers = · · ·
I For past non-smokers, xi = 0 then
ln(odds of CHD) = βo ∴ Odds for non-smokers = · · ·
OR = · · ·

o
29/29
Types of Logistic Regression Models

Types of Logistic Regression Model

I Binary Logistic Regression Model: The categorical

response is dichotomous (has only two 2 possible outcomes),
e.g., an email is a Spam or Not
I Multinomial Logistic Regression Model: Three or more
categories without ordering (polytomous response), e.g.,
Predicting food choices (Veg, Non-Veg, Vegan)
I Ordinal Logistic Regression Model: Three or more
categories with ordering, e.g., Movie rating from 1 to 5,
teaching evaluation by students, etc.

18logistic Regression Yilma
No ratings yet
18logistic Regression Yilma
88 pages
Logistic Regression for Researchers
100% (1)
Logistic Regression for Researchers
51 pages
Chapter 8 Logistic Regression (Compatibility Mode)
No ratings yet
Chapter 8 Logistic Regression (Compatibility Mode)
22 pages
Binary Logistic Regression Concept
No ratings yet
Binary Logistic Regression Concept
10 pages
4 - Logistic Reg 1
No ratings yet
4 - Logistic Reg 1
30 pages
302 F 14 Logistic Regression
No ratings yet
302 F 14 Logistic Regression
23 pages
3 Classification
No ratings yet
3 Classification
26 pages
Dissertation Help: Logistic Regression
100% (2)
Dissertation Help: Logistic Regression
6 pages
Chap4 Logistic Regression
No ratings yet
Chap4 Logistic Regression
40 pages
Logistic Regresson
No ratings yet
Logistic Regresson
32 pages
ML Logistic Regression Module3 Final
No ratings yet
ML Logistic Regression Module3 Final
22 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
19 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Logistic Regression Monograph
No ratings yet
Logistic Regression Monograph
33 pages
Class
No ratings yet
Class
102 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Module 4 - Logistic Regression - Afterclass1b
No ratings yet
Module 4 - Logistic Regression - Afterclass1b
54 pages
Lecture3-Logistic Regression 6-5-08
No ratings yet
Lecture3-Logistic Regression 6-5-08
72 pages
Module 6 - Spring Boot Java (MCA)
No ratings yet
Module 6 - Spring Boot Java (MCA)
113 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
The Machine Learning Solutions Architect Handbook - 2nd Edition (Early Access) David Ping All Chapter Instant Download
100% (1)
The Machine Learning Solutions Architect Handbook - 2nd Edition (Early Access) David Ping All Chapter Instant Download
49 pages
Solution HW4
No ratings yet
Solution HW4
5 pages
Process Verification Audit Checklist
100% (1)
Process Verification Audit Checklist
5 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
Logistic Regression - 2021 ch-8
No ratings yet
Logistic Regression - 2021 ch-8
52 pages
Logistic Regression-1
No ratings yet
Logistic Regression-1
27 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
28 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
23 pages
Predictive Modeling: Logistic Regression
No ratings yet
Predictive Modeling: Logistic Regression
13 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic+Regression+Monograph+ +DSBA+v2
No ratings yet
Logistic+Regression+Monograph+ +DSBA+v2
54 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
CS-13410 Introduction To Machine Learning: Lecture # 17
No ratings yet
CS-13410 Introduction To Machine Learning: Lecture # 17
11 pages
4 - C - Logistic Regression
No ratings yet
4 - C - Logistic Regression
13 pages
Logistic Regression (2022)
No ratings yet
Logistic Regression (2022)
44 pages
Log Reg
No ratings yet
Log Reg
32 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
AP-Lab Manual - Updated
No ratings yet
AP-Lab Manual - Updated
110 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
MP - ECE - UNIT-2 8086 and Interfacing
No ratings yet
MP - ECE - UNIT-2 8086 and Interfacing
60 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Cantina Centrifuge CFG February March2025
No ratings yet
Cantina Centrifuge CFG February March2025
10 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
33 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
Lesson 13 Logistic Regression
No ratings yet
Lesson 13 Logistic Regression
26 pages
Java Past Paper
No ratings yet
Java Past Paper
3 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Design and Fabrication of Compact Bicycle Trolley
No ratings yet
Design and Fabrication of Compact Bicycle Trolley
7 pages
Intel® Core™2 Duo Processor E7500
No ratings yet
Intel® Core™2 Duo Processor E7500
4 pages
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
No ratings yet
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
65 pages
Cranes&Hoists For Mining Industry
No ratings yet
Cranes&Hoists For Mining Industry
2 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Logistic Regression Guide
100% (1)
Logistic Regression Guide
34 pages
WWW - Countrycrossstitchkits.Co - Uk 900-01-14 - "Just For You"
No ratings yet
WWW - Countrycrossstitchkits.Co - Uk 900-01-14 - "Just For You"
7 pages
L5 Logistic Regression (2011)
100% (1)
L5 Logistic Regression (2011)
55 pages
Interview Questions
No ratings yet
Interview Questions
50 pages
Nextreme Whitepaper Design Considerations For TEG System Optimization NWP003.1
No ratings yet
Nextreme Whitepaper Design Considerations For TEG System Optimization NWP003.1
14 pages
Windows System Error Codes
No ratings yet
Windows System Error Codes
304 pages
Siprotec 7sa511 Distance Protection Relay: Function Overview
No ratings yet
Siprotec 7sa511 Distance Protection Relay: Function Overview
3 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Final Semester Exam Paper
No ratings yet
Final Semester Exam Paper
4 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
MySQL JOIN Types Explained
No ratings yet
MySQL JOIN Types Explained
1 page
How To Crack GATE - IES - BARC - Electronic Devices and Circuits (EDC)
No ratings yet
How To Crack GATE - IES - BARC - Electronic Devices and Circuits (EDC)
4 pages
Which Chart or Graph Is Right For You? Tell Impactful Stories With Data
No ratings yet
Which Chart or Graph Is Right For You? Tell Impactful Stories With Data
14 pages
Now and Get: Best VTU Student Companion You Can Get
No ratings yet
Now and Get: Best VTU Student Companion You Can Get
5 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Probability Sampling Methods Guide
No ratings yet
Probability Sampling Methods Guide
30 pages
Rolltech Rings: Integrated Technology Solutions From SMS Group
No ratings yet
Rolltech Rings: Integrated Technology Solutions From SMS Group
8 pages
Lect4 Math231
No ratings yet
Lect4 Math231
31 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Inferential Statistics: Shaheena Bashir
No ratings yet
Inferential Statistics: Shaheena Bashir
18 pages
Week 4 Cyber Attacks On Online Learning Platforms Transcript
No ratings yet
Week 4 Cyber Attacks On Online Learning Platforms Transcript
3 pages
Research Study Design Guide
No ratings yet
Research Study Design Guide
34 pages
Lect6 Math231
No ratings yet
Lect6 Math231
38 pages
S1-K12 Laser Service Manual
No ratings yet
S1-K12 Laser Service Manual
10 pages
Lect9 Math231
No ratings yet
Lect9 Math231
42 pages
Philips Pet716 Service Manual
No ratings yet
Philips Pet716 Service Manual
31 pages
Statistics: Shaheena Bashir
No ratings yet
Statistics: Shaheena Bashir
15 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
36 pages
Statistics: Shaheena Bashir
No ratings yet
Statistics: Shaheena Bashir
37 pages
E-Commerce Project
No ratings yet
E-Commerce Project
26 pages
Lect3 Math231
No ratings yet
Lect3 Math231
31 pages
Lect1 Math231
No ratings yet
Lect1 Math231
65 pages
1 清实录10 高宗纯皇帝实录卷六○至卷一五七
No ratings yet
1 清实录10 高宗纯皇帝实录卷六○至卷一五七
600 pages
HFS File Sharing Guide
No ratings yet
HFS File Sharing Guide
53 pages

Lect7 Math231

Uploaded by

Lect7 Math231

Uploaded by

1/29

Single Categorical Predictor

Types of Logistic Regression Models

Regression Model: Objective

I Describe the relationship between an outcome (dependent or

I What distinguishes a logistic regression model from the linear

Tabular Form of CHD Data

Age Group n CHD Present % CHD Present

Proportion of Individuals with CHD

Logistic Regression Model

I The response variable in logistic regression is categorical. The

Modeling the probability as response, some problems

Summary: Logit Transformation

Quantity Formula min max

Logit stretches the probability scale

Linear Regression Logistic Regression

Estimation of Parameters of Regression Model: β

I The method of maximum likelihood yields values for the

Example: CHD Data

I Is age a risk factor of CHD? How the probability of CHD

ln 1−p̂ p̂ = −5.31 + 0.11Age

Estimate Std. Error z value Pr(>|z|)

mod1<-glm(chd ∼ age, family=’binomial’, data=chdage)

Df Deviance Resid. Df Resid. Dev Pr (> Chi)

I Deviance is a measure of goodness of fit of a generalized

Hosmer-Lemeshow Goodness of Fit

Our model appears to fit well because we have no significant

Simple Logistic Regression Model with a Categorical

I How some function of the probability of categorical response

Case-Control Study: A Recap Example

Past exposure CHD Cases Controls (without disease)

Odds of CHD for Smokers = · · ·

Case-Control Study: A Recap Example Cont’d

Let yi is binary response variable

Case-Control Study: A Recap Example Cont’d

The probability of CHD pi can be modeled as:

I xi = 1, then logit(pi |xi = 1) = βo + β1 (1)

Example: Logistic Regression

Estimate Std. Error z value Pr(>|z|)

I For past smokers, xi = 1 then

Types of Logistic Regression Model

I Binary Logistic Regression Model: The categorical

You might also like