Introduction to Regression Models
1. Linear Probability Model (LPM)
A Linear Probability Model (LPM) is a regression model used for binary
dependent variables (0/1).
It uses Ordinary Least Squares (OLS) to estimate the probability of an event
occurring.
Formula: P(Y=1∣X)=β0+β1X1+β2X2+...+βkXk+ϵP(Y = 1 | X) = \beta_0 + \
beta_1X_1 + \beta_2X_2 + ... + \beta_kX_k + \epsilonP(Y=1∣X)=β0+β1X1+β2X2+...
+βkXk+ϵ
Pros:
o Easy to interpret coefficients.
o Simple to estimate.
Cons:
o Predicted probabilities can be < 0 or > 1, which is not realistic.
o Assumes constant variance (homoskedasticity), which is often violated.
2. Logit Model
A logistic regression model is used when the dependent variable is binary (0 or 1).
It estimates the log-odds of an event occurring.
Formula: P(Y=1∣X)=eβ0+β1X1+...+βkXk1+eβ0+β1X1+...+βkXkP(Y = 1 | X) = \
frac{e^{\beta_0 + \beta_1X_1 + ... + \beta_kX_k}}{1 + e^{\beta_0 + \beta_1X_1 + ...
+ \beta_kX_k}}P(Y=1∣X)=1+eβ0+β1X1+...+βkXkeβ0+β1X1+...+βkXk
Pros:
o Predicted probabilities always lie between 0 and 1.
o Works well with categorical and continuous variables.
Cons:
o Interpretation of coefficients is less intuitive compared to LPM.
o Requires computing marginal effects for probability interpretation.
3. Multinomial Logit (MNL) & Multinomial Probit
Used when the dependent variable has more than two unordered categories (e.g.,
choosing among 3 job types).
Multinomial Logit (MNL): Assumes Independence of Irrelevant Alternatives
(IIA)—meaning that the odds of choosing one category over another are not affected
by additional choices.
Multinomial Probit: Does not assume IIA and allows for correlated error terms but
is computationally intensive.
4. Ordered Choice Model (Ordered Logit/Probit)
Used when the dependent variable has more than two ordered categories (e.g.,
survey responses: poor, fair, good, excellent).
Ordered Logit/Probit:
o Assumes that there is an underlying continuous variable determining the
choice.
o Uses threshold values to determine category placement.
5. Count Data Models
Used when the dependent variable represents count values (0,1,2,3,...) (e.g., number
of doctor visits).
Common models:
o Poisson Regression: Assumes that the mean and variance are equal (Poisson
distribution).
o Negative Binomial Regression: Used when there is overdispersion (variance
> mean).
6. Survival Analysis Models
Used when analyzing time until an event occurs (e.g., time until failure of a
machine, time until a patient recovers).
Common models:
o Kaplan-Meier estimator: Non-parametric method for estimating survival
functions.
o Cox Proportional Hazards Model: Used when analyzing the effect of
covariates on survival time.