BINARY LOGISTIC REGRESSION
Linear Regression is defined by the statement :
Yi ~ N(1 + 2 Xi2 + ... + k Xik , 2 )
or
k
Yi ~ N( jXij , 2 ) , i = 1,2,...,n , j=1,2,...,k , Xi1 = 1 i
j=1
In BINARY LOGISTIC REGRESSION , Y assumes the values 0 and 1 , and so is a Bernoulli
random variable, the explanatory variables can be discrete or continuous but are treated as
fixed.
The basic form of Logistic regression can be derived using Bayes’ rule. Assume that k=2, so
that there is one non-trivial explanatory variable X and a constant term, then
P Y = 0 P X | Y=0
P Y = 0 | X =
P Y = 0 P X | Y=0 + P Y = 1 P X | Y=1
1
=
P Y = 1 P X | Y=1
1+
P Y = 0 P X | Y=0
1
=
P Y = 1 P X | Y=1
1 + exp log
P Y = 0 P X | Y=0
1
=
P Y = 1 P X | Y=1
1 + exp log + log
P Y = 0 P X | Y=0
1
= , (1)
1 + exp 1 + 2 X
where
P Y = 1
1 = log
P Y = 0
and
P X | Y=1
2 = log ,
P X | Y=0
exp 1 + 2 X
if X is discrete. Also (1) P Y = 1| X =
1 + exp 1 + 2 X
Page 1 of 8
If X is continuous (1) holds using the density f(.) instead of P. In other words,
P Yi = 1| X i1 = F ( 1 + 2 X i1 ) ,
where
exp x
F(x) = .
1 + exp x
The conditional probability function is:
f ( y | Xi1 ) = P Yi = y|Xi1
= ( F ( 1 + 2 Xi1 ) ) (1 − F (1 + 2 Xi1 ) )
y 1− y
F ( 1 + 2 Xi1 ) if y=1
=
1 − F ( 1 + 2 Xi1 ) if y = 0
Thus, the logistic regression model is:
exp 1 + 2 Xi1
Yi |Xi1 ~ Bernoulli
1 + exp 1 + 2 Xi1
or
exp 1 + 2 Xi1
i = P Yi = 1 | Xi1 =
1 + exp 1 + 2 Xi1
or
log i = 1 + 2 X i1
1 − i
or
log it i = 1 + 2 X i1.
exp x
The term Logistic Regression derives from the fact that the function F(x) = is
1 + exp x
known as the Logistic Function.
ASSUMPTIONS
▪ The data Y1, Y2, ..., Yn are independently distributed, i.e., cases are independent.
▪ Binary logistic regression model assumes Bernoulli distribution of the response.
▪ Does NOT assume a linear relationship between the dependent variable and the independent
variables, but it does assume linear relationship between the logit of the response and the
explanatory variables; log it i = 1 + 2 X i1 .
Page 2 of 8
▪ Independent (explanatory) variables can be even the power terms or some other nonlinear
transformations of the original independent variables.
▪ The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in
many cases given the model structure.
▪ Errors need to be independent but NOT normally distributed.
▪ It uses maximum likelihood estimation (MLE) rather than ordinary least squares (OLS) to
estimate the parameters, and thus relies on large-sample approximations.
For modelling , Logistic Regression is often used to estimate probabilities as a function of
explanatory variables, X. and parameters , . Often these probabilities are used to find odds,
odd ratios and relative risks.
ODDS AND ODDS RATIOS
The odds is the ratio of the probability that something is true divided by the probabilities that
it is not true. Thus,
P Yi = 1| X i1
Odd(X) = P Yi = 0 | X i1
= exp 1 + 2 X i1 .
The odd ratio is the ratio of two odds for different values of Xi1 , say Xi1 =x and Xi1 = x + x
Odd(x + x) exp 1 + 2 (x + x)
=
Odd(x) exp 1 + 2 (x + x)
= exp 2 x ,
where x is a small change in x.
Then,
Page 3 of 8
1 Odd(x + x) - Odd(x)
x
lim
x →0 Odd(x)
exp 2 x − 1
= lim
x →0
x
exp 2 x − 1
= 2 lim
x →0
2 x
d exp u
= 2
du u =0
= 2 exp 0
= 2 .
Thus, 2 may be interpreted as the relative change in the odds due to small change x in Xi1 :
Odd(x + x) - Odd(x) Odd(x + x)
= -1
Odd(x) Odd(x)
2 x
If Xi1 is a Binary variable itself, Xi1 = 0 or Xi1 = 1 , then only reasonable choices for x + x
and x are 1 and 0, respectively, so that then
Odd(1)
-1
Odd(0)
Odd(1) − Odd(0)
=
Odd(0)
= exp 2 - 1.
Only if 2 is small we may use the approximation exp 2 - 1 2 . If not, one has to interpret
2 in terms of the log of the odd ratio involved:
Odd(1)
log = 2 .
Odd(0)
Page 4 of 8
GENERALIZATION
If k 2 and X ij are independent
P X | Y=1 k P Xij | Yi =1
log
= log .
P X | Y=0 j=2 P Xij | Yi =0
Setting
P Xij | Yi =1
jXij = log
P Xij | Yi =0
One can extend the model and obtain general logistic regression model
k
exp 1 + jXij
j= 2
Yi |Xij ~ Bernoulli .
k
1 + exp + X
1 j ij
j= 2
Regardless of whether the Xs’ are dichotomous, polychotomous or continuous, Logistic
Regression is a way to identify the distribution of Y as a function of X and of parameter ,
just as linear regression is a way to identify the distribution of a function of X and of parameter
(different) .
The interpretation of the coefficients j , j=2,3,...,k in the logistic model is given as:
(
Odd X1j , X 2j ,...,Xi-1, j , Xij + Xij , Xi+1, j , ...,X kj ) -1
(
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj )
j Xij ,
if X ij is small.
For example, j may be interpreted as the percentage change in the
( )
Odd X1j , X 2j ,...,Xi-1, j , Xij , Xi+1,j , ...,X kj due to small percentage change in X ij .
Page 5 of 8
ESTIMATION OF PARAMETERS
Let k=2. The parameters 1 and 2 are estimated using method of maximum likelihood.
The log of likelihood function L ( 1 , 2 ) is given as:
n
log L ( 1 , 2 ) = log ( f ( yi | Xi1, 1 , 2 ) )
i =1
n n
= yi log F(1 + 2 Xi1 ) + (1 − yi ) log(1 − F(1 + 2Xi1))
i =1 i =1
n
F(1 + 2 Xi1 ) n
= yi log + log(1 − F(1 + 2Xi1))
i =1 1 − F(1 + 2 Xi1 ) i =1
n n
= yi (1 + 2 X i1 ) - log(1 + exp 1 + 2 Xi1 )
i =1 i =1
log L ( 1 , 2 ) n n exp 1 + 2 Xi1
= yi −
1 i =1 i =1 1 + exp 1 + 2 X i1
and
log L ( 1 , 2 ) n n X exp + X
2
= yi Xi1 - 1 +i1 exp 1 + 2X i1
i =1 i =1 1 2 i1
n
= ( yi - i )Xi1
i =1
Since this is a transcendental equation, therefore it is not possible to obtain closed-form solution
of 2 . One can use Newton-Raphson can be used to obtain ̂2 :
ˆ 1 ˆ 1(0)
ˆ ˆ
• Guess initial value of = , say, =
(0)
, say, ̂02 .
~ ˆ ~ ˆ (0)
2 2
• Use
Page 6 of 8
log L ( 1 , 2 )
−1 1 ,
ˆ (t +1) = ˆ (t +1) + −H
~ ~ log L ( , )
1 2
2
where H is Hessian Matrix given as:
2 log L ( 1 , 2 ) 2 log L ( 1 , 2 )
21 212
H= 2
log L ( 1 , 2 ) log L ( 1 , 2 )
2
212 21
iteratively till two consecutive values of ̂ are approximately equal.
~
The estimated variance covariance matrix of ̂ is − H .The diagonal elements of this
−1
~
matrix gives estimated standard errors of parameters 1 and 2 .
Foe k>2, the result can be generalized.
TESTING OF HYPOTHESES
I. Testing the significance of single regression coefficient
If sample size is large , under H 0 : j = j0 ,
(
n ˆ j - j0 ) ~ N(0, 1) , j = 1,2,...,k .
sˆ
j
These results can be used to test whether the coefficients j is zero or not, j=1,3,…,k. The
null hypothesis H 0 : j = j0 , j = 2,…,k is of interest since this hypothesis implies that
the conditional probability P Yi = 1| Xij does not depend on X ij , j = 2,3,…,k. Under
H 0 : j = j0 ,
n ˆ j
~ N(0, 1) , j = 2,...,k .
sˆ
j
This statistic is called pseudo t-value as it is used in the same way as the t-value in linear
regression and sˆ is called the standard error of ̂ j .The test-statistic is also called Wald’s
j
statistic and the corresponding test Wald’s test.
Page 7 of 8
II. Testing the joint significance of all predictors
We are interested in testing H0 : 2 = 3 = ... = m = 0 (m k) against the alternative
hypothesis that at least one of 2 , 3 , ... , m is not equal to zero. For this we proceed as
follows:
Re-estimate the logit model using
(
log L 0, 0 , ..., 0,ˆ m+1 , ˆ m+2 , ... , ˆ k = )
max
m+1 , m + 2 , ... , k
( log L ( 0, 0 , ..., 0,m+1 , m+2 , ... , k ) )
Then, under H0
(
L 0, 0 , ..., 0,ˆ m+1 , ˆ m+ 2 , ... , ˆ k
LR m = -2log
)
(
L ˆ 2 , ˆ 3 , ... , ˆ k )
2m−1 .
This is the LIKELIHOOD RATIO test which is right-sided.
PREDICTION WITH LOGISTIC REGRESSION
From prediction point of view , logistic regression can be used for classification and the
zero and one are taken as class labels.
Suppose data of the form ( Yi , X i1 ) , i= 1,2,…,n is available and estimates of parameters
have been obtained. These estimators are consistent and asymptotically normally
distributed. The objective is to estimate conditional probability of the event such as
Yn +1 given X n +1 , 1. This is given as :
exp ˆ 1 + ˆ 2 X n+1, 1
Est.P Yn +1 = 1 | X n+1,1 = .
1 + exp ˆ 1 + ˆ 2 X n+1, 1
If the above probability is greater than half , one is led to predict that Yn +1 = 1 , otherwise
Yn +1 = 1 for given X n+1, 1 .
References
Allison, P.D.(2012). Logistic Regression using SAS-Theory and Application, SAS
Institute Inc., Cary, NC, USA, 2nd ed..
Page 8 of 8