PROBIT AND LOGIT REGRESSION
ANALYSIS
Definition
What are Probit and Logit Models?
Probit and Logit are statistical regression models used when the dependent variable is
binary, meaning it has only two categories (e.g., pass/fail, yes/no, buy/don’t buy).
- Logit Model: Uses the logistic (sigmoid) function to model the probability of an event. It is
widely used and relatively easier to interpret.
Logit(p) = ln(p / (1 - p)) = β₀ + β₁X
- Probit Model: Uses the cumulative standard normal distribution to estimate probabilities.
Often applied when the underlying latent variable is assumed to follow a normal
distribution.
When to Use Probit or Logit Models
Use either model when:
- The outcome (dependent variable) is binary (e.g., 1 = Pass, 0 = Fail).
- The goal is to predict the probability of the outcome based on one or more independent
variables.
Examples:
- Will a customer buy a product (Yes/No) based on their income?
- Will a patient recover (Yes/No) based on treatment type?
Assumptions and How to Assess Them
Assumption Description Assessment Method
Binary Outcome The dependent variable Inspect coding of dependent
should be coded as 0 or 1 variable
Independence of Each observation must be Ensure data is not repeated
Observations independent or clustered
Linearity in the Logit/Probit Predictors should be Box-Tidwell Test or
linearly related to the log- examine plots of residuals
odds
No Multicollinearity Predictors should not be Check Variance Inflation
highly correlated Factor (VIF)
Sufficient Sample Size Enough data is required for Rule: ≥10 events per
reliable estimation predictor
Hypothetical Data and Probit/Logit Analysis
Hypothetical Dataset:
ID Group (1 = Score PASS (1 = Score ≥ 7,
Morning) else 0)
1 1 8 1
2 1 7 1
3 1 9 1
4 1 8 1
5 1 10 1
6 0 5 0
7 0 6 0
8 0 4 0
9 0 6 0
10 0 5 0
Logistic Regression Step-by-Step
Model:
- Dependent Variable: PASS
- Independent Variable: Group (1 = Morning, 0 = Night)
Step 1: Logistic Regression Equation
Logit(p) = ln(p / (1 - p)) = β₀ + β₁ × Group
Step 2: SPSS Output (Hypothetical):
Variable B (Coefficient) Wald p-value Exp(B) (Odds Ratio)
Group +8.0 0.001 2980
APA Interpretation of the Results
A binary logistic regression was conducted to assess whether study time (morning vs.
night) predicted test success (PASS). The logistic model was statistically significant, χ²(1) =
[insert if needed], p = .001. Students in the morning group were significantly more likely to
pass (Exp(B) = 2980), indicating a strong relationship between study timing and
performance.
Post Hoc Test Analysis
Although post hoc tests are generally not used in logistic regression, model diagnostics help
assess the model's performance:
Diagnostic Test Result Interpretation
Pseudo R² (McFadden’s) 0.75 75% of variance in PASS
explained by the model
Classification Table 100% accuracy Perfect classification in this
hypothetical case
ROC Curve (AUC) AUC = 1.0 Model perfectly
distinguishes between PASS
and FAIL
Hosmer-Lemeshow Test p > 0.05 Model fits the data well
Confusion Matrix:
Actual PASS (1) Actual FAIL (0)
Predicted PASS 5 0
Predicted FAIL 0 5
THANK YOU