Logistic Regression: Interaction Terms
1
Interactions in Logistic Regression
I For linear regression, with predictors X1 and X2 we saw
that an interaction model is a model where the
interpretation of the effect of X1 depends on the value of
X2 and vice versa.
I Exactly the same is true for logistic regression.
I The simplest interaction models includes a predictor
variable formed by multiplying two ordinary predictors:
logit(P(Y = 1)) = β0 + β1 × X1 + β2 × X2 + β3 × X1 × X2
I Interaction term
2
Interactions in Logistic Regression
We will look at the interpretation of interactions in 3 cases:
1 Interaction between two dummy variables.
2 Interaction between a dummy and a continuous variable.
3 Interaction between two continuous variables.
3
Interaction Between 2 Dummy Variables
I Consider a logistic model for the risk of suffering a heart
attack over a year in terms gender and smoking status:
logit P(Y = 1) = β0 + β1 sex + β2 smoke + β3 (sex × smoke)
I sex indicates gender (male=1, female=0)
I smoke indicates smoking status (smokes=1, does not=0).
4
Interpreting the Intercept
logit P(Y = 1) = β0 + β1 sex + β2 smoke + β3 (sex × smoke)
I In order to interpret β0 we need to find a situation in
which the final three terms in the equation vanish.
I This happens when an observation corresponds to a
female non-smoker, for then sex=0 and smoke=0.
logit P(Y = 1) = β0 + β1 × 0 + β2 × 0 + β3 (0 × 0)
= β0
I Consequently, β0 is the log odds in favour of a female
non-smoker suffering from a heart attack.
5
Interpretations of Other Quantities Involving β0
We can also give interpretations on the odds scale and on the
probability scale:
I exp(β0 ) is the odds in favour of a female non-smoker
suffering from a heart attack.
exp(β0 )
I
1+exp(β0 )is the probability of a female non-smoker
suffering from a heart attack.
6
Interpreting β1 and β2
logit P(Y = 1) = β0 + β1 sex + β2 smoke + β3 (sex × smoke)
I We would know how to interpret β1 if the interaction
term was not there.
I Since in that case would just have an ordinary
multivariate logistic model.
I This happens when an observation corresponds to a
non-smoker, for then smoke=0.
logit P(Y = 1) = β0 + β1 × sex + β2 × 0 + β3 (sex × 0)
= β0 + β1 × sex
7
Interpreting β1 and β2
I Amongst non-smokers
logit P(Y = 1) = β0 + β1 × sex
I We know how to interpret β1 in this case as its a
univariate logistic model.
I β1 is the log-odds ratio comparing males and females
amongst non-smokers.
I exp(β1 ) is the odds ratio comparing males and females
amongst non-smokers.
8
Interpreting β1 and β2
logit P(Y = 1) = β0 + β1 sex + β2 smoke + β3 (sex × smoke)
I To interpret β2 we need to get rid of the interaction term
without getting rid of the β2 smoke term.
I Same argument as before but now set sex=0 (female):
logit P(Y = 1) = β0 + β1 × 0 + β2 × smoke + β3 (0 × smoke)
= β0 + β2 × smoke
I β2 is the log-odds ratio comparing smokers with
non-smokers amongst females.
9
Interpreting β3
logit P(Y = 1) = β0 + β1 sex + β2 smoke + β3 (sex × smoke)
I To interpret β3 rewrite the regression equation:
logit P(Y = 1) = β0 + [β1 + β3 smoke]sex + β2 smoke
I This looks like a multivariate regression model with sex
and smoke as predictors where:
I β1 + β3 smoke is the log-odds ratio for males vs. females;
I β2 is the log odds ratio for smokers vs. non-smokers.
I β3 is the difference between the log-odds ratio comparing
males vs females in smokers and the log-odds ratio
comparing males vs. females in non-smokers.
10
Interpreting β3
logit P(Y = 1) = β0 + β1 sex + β2 smoke + β3 (sex × smoke)
I We could just as well have rewritten the equation this way:
logit P(Y = 1) = β0 + β1 sex + [β2 + β3 sex]smoke
I β3 is the difference between the log-odds ratio comparing
smokers vs non-smokers in males and the log-odds ratio
comparing smokers vs. non-smokers in females.
I So we have two ways of thinking about β3 :
1 either as modification of the effect of smoke by sex
2 or the modification of the effect of sex by smoke.
11
Quick Lookup Table
We can draw up a table for the 4 types of observation:
sex smoke logit(P(Y = 1))
1 Male Yes β0 + β1 + β2 + β3
2 Male No β0 + β1
3 Female Yes β0 + β2
4 Female No β0
I This allows us to find the function of the parameters
corresponding to a log-odds ratio and vice versa.
I e.g. 3 - 4 shows us that the log-odds ratio for smokers
vs. non-smokers amongst females is β2
I e.g. 1 - 2 shows us that the log-odds ratio for smokers
12 vs. non-smokers amongst males is β + β
Interaction Between a Dummy Variable and a
Continuous Variable
I Consider a logistic model where the main predictors are
sex (a dummy coded as before) and age (in years)
logit P(Y = 1) = β0 + β1 sex + β2 age + β3 (sex × age)
I β0 is the log-odds in favour of a female age 0 suffering
from a heart attack.
13
Interaction Between a Dummy Variable and a
Continuous Variable
I Consider a logistic model where the main predictors are
sex (a dummy coded as before) and age (in years)
logit P(Y = 1) = β0 + β1 sex + β2 age + β3 (sex × age)
I β1 is the log-odds ratio for males vs. females amongst
people of age 0.
14
Interaction Between a Dummy Variable and a
Continuous Variable
I Consider a logistic model where the main predictors are
sex (a dummy coded as before) and age (in years)
logit P(Y = 1) = β0 + β1 sex + β2 age + β3 (sex × age)
I β2 is the log-odds ratio corresponding to an increase in
age by 1 year amongst females.
15
Interaction Between a Dummy Variable and a
Continuous Variable
I Consider a logistic model where the main predictors are
sex (a dummy coded as before) and age (in years)
logit P(Y = 1) = β0 + β1 sex + β2 age + β3 (sex × age)
I β3 is the difference between the log-odds ratio
corresponding to a change in age by 1 year amongst males
and the the log-odds ratio corresponding to an increase in
age by 1 year amongst females.
I β3 is also difference between the log-odds ratios for males
vs. females in two age homogenous groups which differ
by 1 year.
16
Quick Lookup Table
Again we can draw up a table, this time considering groups of
individuals aged z and z + 1
sex age logit(P(Y = 1))
1 Male z+1 β0 + β1 + β2 (z + 1) + β3 (z + 1)
2 Male z β0 + β1 + β2 z + β3 z
3 Female z+1 β0 + β2 (z + 1)
4 Female z β0 + β2 z
I e.g. 3 - 4 shows us that the log-odds ratio
corresponding to an increase in age by 1 year amongst
females is β2
I e.g. 2 - 4 shows us that the log-odds ratio for males vs.
females amongst people aged z is β1 + β3 z
17
Interaction Between 2 Continuous Variables
I Consider a logistic model where the main predictors are
BP (blood pressure in mmHg) and age (in years)
logit P(Y = 1) = β0 + β1 BP + β2 age + β3 (BP × age)
I β0 is the log-odds in favour of a person with a BP of
0mmHg and age 0 suffering from a heart attack.
I Ridiculous interpretation (model can’t apply when age or
BP are close to 0, but we hope it is good for the ranges we
are interested in.)
18
Interaction Between 2 Continuous Variables
I Consider a logistic model where the main predictors are
BP (blood pressure in mmHg) and age (in years)
logit P(Y = 1) = β0 + β1 BP + β2 age + β3 (BP × age)
I β1 is the log-odds ratio corresponding to an increase in BP
by 1mmHg amongst people aged 0.
19
Interaction Between 2 Continuous Variables
I Consider a logistic model where the main predictors are
BP (blood pressure in mmHg) and age (in years)
logit P(Y = 1) = β0 + β1 BP + β2 age + β3 (BP × age)
I β2 is the log-odds ratio corresponding to an increase in
age by 1 year amongst people with a BP of 0mmHg.
20
Interaction Between 2 Continuous Variables
I Consider a logistic model where the main predictors are
BP (blood pressure in mmHg) and age (in years)
logit P(Y = 1) = β0 + β1 BP + β2 age + β3 (BP × age)
I β3 is the difference between the log-odds ratios
corresponding to an increase in age of 1 year for two BP
homogenous groups which differ by 1 mmHg.
I β3 is also difference between the difference between the
log-odds ratios corresponding to an increase in BP of 1
mmHg for two age homogenous groups which differ by 1
year.
21
Quick Lookup Table
Again we can draw up a table, this time considering
individuals with BP w and w + 1 and aged z and z + 1
BP age logit(P(Y = 1))
1 w+1 z+1 β0 + β1 (w + 1) + β2 (z + 1) + β3 (w + 1)(z + 1)
2 w+1 z β0 + β1 (w + 1) + β2 z + β3 (w + 1)z
3 w z+1 β0 + β1 w + β2 (z + 1) + β3 w(z + 1)
4 w z β0 + β1 w + β2 z + β3 wz
I e.g. 3 - 4 shows us that the log-odds ratio
corresponding to an increase in age by 1 year amongst
those of BP w is β2 + β3 w.
I e.g. 2 - 4 shows us that the log-odds ratio
22
Final Comment on Interpretation
I Remember whenever you give an interpretation of a
quantity γ in terms of a log-odds ratio there is always an
equivalent interpretation of exp(γ) as an odds-ratio.
I Whenever you give an interpretation of a quantity γ as
the log-odds in favour of an event you can always give
two equivalent interpretations
1 of exp(γ) as the odds in favour of the event,
exp(γ)
2 of 1+exp(γ) as the probability of the event.
23