Logistic
Regression
Concepts,
Applications, and
Evaluation Metrics
Instructor: Dr.
Faridoon Khan
• Logistic Regression is a statistical
model primarily used for binary
classification tasks. Unlike linear
regression, which predicts a
continuous outcome, logistic
regression predicts the probability
Introduction of a binary outcome and maps it to
to Logistic a range between 0 and 1.
Regression • Goal:
• The primary objective of logistic
regression is:
• To predict probabilities for binary
outcomes (e.g., success/failure,
yes/no, positive/negative).
• This probability can then be converted
into a classification (e.g., “yes” if
P>0.5, “no” if P≤0.5 based on a
threshold.
• How It Works:
• Logistic regression estimates the
How it relationship between one or more
works? predictor variables and the log odds
of the binary outcome.
• The model uses the logistic function
(or sigmoid function) to map these log
odds to a probability, providing an
interpretable likelihood for each
observation.
• Healthcare: Used to predict disease
risk (e.g., heart disease, diabetes)
based on factors like age, cholesterol
levels, and lifestyle.
Applications • Finance: Applied in credit scoring,
of Logistic fraud detection, and risk assessment.
Regression • Marketing: Helps predict customer
responses, conversions, or churn.
• Social Sciences: Analyzes survey
responses or predicts voting
behaviors.
• Linear relationship between
predictors and log odds.
Assumptions • Independence of observations.
of Logistic
• Minimal multicollinearity among
Regression
predictors.
• Sufficient sample size for accurate
estimations.
• Formula:
• P(Y = 1 | X) = e^(β₀ + β₁ * X₁ + ... +
The βₚ * Xₚ) / (1 + e^(β₀ + β₁ * X₁ + ... +
Logistic βₚ * Xₚ))
Function • The logistic function produces an
S-shaped curve that maps log odds
to probabilities between 0 and 1.
• Odds represent the likelihood of success
relative to failure in a given scenario. They
provide a way to compare the probability
of an event occurring (success) versus the
probability of it not occurring (failure).
Understanding • Formula:
Odds Odds = P(success) / (1 -
P(success))
• P(Success) is the probability of the event
happening.1−𝑃(Success)1−P(Success) is
the probability of the event not happening
(failure).
• Log Odds: The logarithm of odds is a
transformation that converts odds into a
linear scale, which is essential for modeling in
Log Odds logistic regression.
and Log • It allows the relationship between predictor
Odds variables and the outcome to be expressed in
a linear form, making it easier to interpret
Ratio and estimate the effect of each predictor.
• Formula:
Log odds = log(Odds) = log(P(Y = 1) / (1 - P(Y =
1))
• Linear relationship with
predictors
Why Use
• Simplifies interpretation of
Log Odds in
coefficients
Logistic
• Enables probability
Regression?
estimation using log
transformations.
• Definition: An odds ratio compares the
likelihood of an event occurring between
two groups.
• Formula:
OR = e^(β)
Odds • β is the coefficient of the predictor variable
Ratio in the logistic regression model.
• e^(β) represents the change in odds
associated with a one-unit increase in the
predictor.
• Interpretation: OR > 1 indicates increased
odds; OR < 1 indicates decreased odds.
• Transformation from log
odds to probabilities:
Logistic
Regression
Transformation • P(Y = 1 | X) = e^(β₀ + β₁X₁
+ ... + βₚXₚ) / (1 + e^(β₀ +
β₁X₁ + ... + βₚXₚ))
• Positive coefficient: Increases
log odds of the outcome
• Negative coefficient: Decreases
Interpreting
Coefficients log odds
in Logistic
Regression
• Example: If β₁ = 0.2, a 1-unit
increase in X₁ raises log odds by
0.2.
• A confusion matrix shows:
• True Positives (TP)
• True Negatives (TN)
Confusion • False Positives (FP)
Matrix • False Negatives (FN)
• Helps evaluate classification
performance.
• Accuracy = (TP + TN) / (TP +
TN + FP + FN)
Accuracy
Metric • Limitations: Accuracy may
not be a good measure for
imbalanced datasets.
• Precision = TP / (TP + FP)
Precision • Recall = TP / (TP + FN)
and • Useful for measuring the
Recall model’s effectiveness in
identifying true positives.
• F1 Score = 2 * (Precision *
Recall) / (Precision + Recall)
F1 Score
• Balances precision and
recall, especially useful for
imbalanced classes.
• Plots True Positive Rate vs.
Receiver False Positive Rate
Operating
Characteristic • Shows trade-off between
(ROC) Curve
sensitivity and specificity at
different thresholds.
•AUC = 1.0: Perfect classifier. The model has excellent
performance, distinguishing perfectly between classes
with no overlap in predicted probabilities.
Area •AUC = 0.5: No better than random guessing. The
Under model has no discriminatory power and is equivalent
to flipping a coin.
the ROC •0.5 < AUC < 1.0: The model has some level of
Curve
discriminatory power. The closer to 1, the better the
model is at distinguishing between classes.
(AUC) •AUC < 0.5: Worse than random guessing. This rarely
happens and might suggest that the model is
systematically predicting the opposite of the correct
outcome.
• Formula:
Log-Loss • Log-Loss = -1/N * Σ(y * log(p)
+ (1 - y) * log(1 - p))
(Cross-
Entropy • Penalizes confident but
Loss) incorrect predictions, crucial
for probability-based models.
• Using logistic regression to
Application predict the likelihood of
Example: heart disease based on
Heart
Disease patient data.
Prediction
• Predictors: Age, Cholesterol
level, Blood Pressure, etc.
Logistic • Formula:
Regression P(Heart Disease = 1 | X) = e^(β₀
+ β₁ * Age + β₂ * Cholesterol + β₃
Equation
* Blood Pressure) / (1 + e^(β₀ +
for Heart β₁ * Age + β₂ * Cholesterol + β₃ *
Disease Blood Pressure))
• Age: β₁ = 0.05
Predictor • Cholesterol: β₂ = 0.02
Variables • Blood Pressure: β₃ = 0.04
and • Interpreting each
Coefficients coefficient in the context of
heart disease.
Example • Scenario: Predicting heart
Calculation disease for a patient with
(Step-by- Age = 55, Cholesterol = 240,
Step) and Blood Pressure = 140.
• Log odds = -3 + (0.05 * Age)
+ (0.02 * Cholesterol) +
Step 1: (0.04 * Blood Pressure)
Calculate
• Substitute values:
Log Odds
• = -3 + (0.05 * 55) + (0.02 *
240) + (0.04 * 140)
• Convert log odds to
probability using:
Step 2:
Convert • P(Heart Disease) = e^(Log
Log Odds Odds) / (1 + e^(Log Odds))
to
• Illustrate the probability
Probability
calculation from the previous
log odds.
• In logistic regression,
interpreting the predicted
probability helps us classify an
observation based on a certain
threshold. For example, if we
Final set a threshold of 0.5:
Prediction 1. Threshold Interpretation:
Interpretation 1. P(Heart Disease)>0.5: We
classify the individual as high
risk for heart disease.
2. If P(Heart Disease)≤0.5: We
classify the individual as low
risk for heart disease.
Evaluating • Use confusion matrix, accuracy,
Heart Disease precision, recall, and F1 score
Model to assess model performance
Performance on heart disease prediction.
ROC
Curve for • Plot the ROC curve to
assess model discrimination
Heart ability between high and
Disease low risk of heart disease.
Prediction
• Interpretability: Logistic regression
allows us to understand the impact of
each predictor on the outcome, making
it easier to explain results in practical
terms.
• Probabilistic Output: The model
Summary provides probabilities, giving us insight
into how likely an outcome is, rather
than a strict yes/no answer.
• Evaluation Metrics: Metrics like
accuracy, AUC, precision, and recall
help us assess model performance,
especially in fields with significant
consequences for misclassification.
• Logistic regression’s combination of
interpretability, flexibility, and
effectiveness in handling binary
classification tasks makes it a valuable
tool for real-world applications across
Conclusion
various fields.
• By quantifying relationships and
delivering probabilistic predictions, it
supports informed decision-making,
especially in areas where understanding
risk and likelihood is crucial.