Logistic Regression
Introduction
Regression analysis and logistic regression
Concept of logistic regression
Types of logistic regression
Regression Analysis and Logistic Regression
Regression analysis
Based on the principle of Least Square Estimation (LSE)
The parameters are chosen to minimize the sum of squared errors (SSE)
Minimizes error in prediction
If the error distribution is normal with constant variance, the LSE estimates the parameters
accurately; that is, model is the best possible and with the smallest standard errors
Applicable when dependent variable follows normal distribution
Logistic regression analysis
When a dependent variable does not follow normal distribution
Value of a dependent variable may be with 2, 3 or a fewer more outcomes
Regression Analysis and Logistic Regression
Logistic regression
Considers maximum likelihood estimation (MLE) to give better result.
In MLE, the likelihood is the probability of the observed data set given a set of proposed values
for the parameters.
The principle of MLE is to estimate parameters by choosing parameter values that give the
largest possible likelihood.
Note
Regression analysis predicts a value of a dependent variable
Logistic regression predicts the probability of a given value of a dependent variable
Both estimates their respective model parameters
Regression Analysis and Logistic Regression
A Regression model and Logistic Regression model
An Example
Hours
0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50
(xi)
Pass
0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1
(yi)
Probability
Category 1:
Category 2:
Concept of Logistic Regression
Introduction
Developed and popularized primarily by
Joseph Berkson in (1944), where he coined
the term logit.
Logistic regression is a statistical method.
uses a logistic function to model a
binary dependent variable.
although many more complex
extensions exist.
Logistic regression (or logit regression)
estimates the parameters of a logistic
model.
Concept of Logistic Regression
Introduction
A binary logistic model has a
dependent variable with two
possible values, such as Pass or
Fail, Happy or Sad etc.
It is represented by an indicator
variable, where the two values
are labeled ‘1’ and ‘0’
1 0
Concept of Logistic Regression
What is logistic regression?
The logistic function, takes the following form 𝒑𝒙
𝒕 = 𝒍𝒏
𝟏 − 𝒑𝒙
𝑒 𝛽0 +𝛽1 𝑥1 +∙∙∙+𝛽𝑚𝑥𝑚
𝑝 𝑥1 , . . . , 𝑥𝑚 = 𝑝𝑥 =
1+𝑒 𝛽0 +𝛽1 𝑥1 +∙∙∙+𝛽𝑚𝑥𝑚
𝒆𝒕
px = 𝟏+𝒆𝒕
Binary regression: A case when 𝑝 has two outcomes: success and failure
It defines odds, which is the ratio of the probability of success and failure
𝑝𝑥
𝑜𝑑𝑑𝑠 = 1−𝑝𝑥
= 𝑒 𝛽0 +𝛽𝑎 𝑥1 +∙∙∙+𝛽𝑚𝑥𝑚
The logarithm of the odds (called logit) is
t = ln(odds) = 𝛽0 + 𝛽1 𝑥1 +∙∙∙ +𝛽𝑚 𝑥𝑚
An Illustration
A sample is collected to examine the effect of toxic substance on tumor. A subject is
examined for the toxic content in the body and then the presence (1) or absence (0) of
tumors. The independent variable is the concentration of the toxic substance “Conc” .
The number of subjects at each concentration (N) and the number having tumors
“Tumor” is shown in the table.
Odds and ln(odds) are also included in the table.
Conc N Tumor Odds ln(odds)
0.0 50 2 0.0417 -3.18 𝒑
𝒕 = 𝒍𝒏
2.1 54 5 0.1020 -3.28 𝟏−𝒑
5.4 46 5 0.1220 -2.10
8.0 51 10 0.2439 -1.41
15.0 50 40 4.0000 +1.39
19.5 52 42 4.2000 +1.44
An Illustration
Conc N Tumor Odds ln(odds)
0.0 50 2 0.0417 -3.18
2.1 54 5 0.1020 -3.28
5.4 46 5 0.1220 -2.10
8.0 51 10 0.2439 -1.41
15.0 50 40 4.0000 +1.39
19.5 52 42 4.2000 +1.44
Here, we find a relation between t
(ln(odds) and x (Conc).
𝑡 = 𝛽0 + 𝛽1 x
Thus, logit is a linear function
For this, it can be calculated as
𝛽0 = -3.204 and 𝛽1 = 0.2628
An Illustration
Here, we find a relation between t
(ln(odds) and x (Conc).
𝑡 = 𝛽0 + 𝛽1 x
Thus, logit is a linear function
For this, it can be calculated as
𝛽0 = -3.204 and 𝛽1 = 0.2628
For an example, a subject exposed to a
concentration of 10, has an estimated
probability of tumor is
𝑒𝑡
𝑝10 = 1+𝑒 𝑡
= 0.36
Concept of Logistic Regression
Logit in Logistic Regression
The log-odds (the logarithm of the odds) is a linear combination of
one or more independent variables ("predictors").
the independent variables can each be a continuous variable (any real
value).
The probability of the value can vary between 0 (such as certainly false)
and 1 (such as certainly true).