Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views30 pages

Detailed Logistic Regression

Logistic regression is a statistical model used for binary classification that predicts the probability of a binary outcome, mapping it to a range between 0 and 1. It is widely applied in fields such as healthcare, finance, marketing, and social sciences, utilizing metrics like accuracy, precision, recall, and AUC to evaluate model performance. The model's interpretability and probabilistic output make it a valuable tool for informed decision-making in various real-world applications.

Uploaded by

Sami Jahangir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views30 pages

Detailed Logistic Regression

Logistic regression is a statistical model used for binary classification that predicts the probability of a binary outcome, mapping it to a range between 0 and 1. It is widely applied in fields such as healthcare, finance, marketing, and social sciences, utilizing metrics like accuracy, precision, recall, and AUC to evaluate model performance. The model's interpretability and probabilistic output make it a valuable tool for informed decision-making in various real-world applications.

Uploaded by

Sami Jahangir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Logistic

Regression

Concepts,
Applications, and
Evaluation Metrics
Instructor: Dr.
Faridoon Khan
• Logistic Regression is a statistical
model primarily used for binary
classification tasks. Unlike linear
regression, which predicts a
continuous outcome, logistic
regression predicts the probability
Introduction of a binary outcome and maps it to
to Logistic a range between 0 and 1.
Regression • Goal:
• The primary objective of logistic
regression is:
• To predict probabilities for binary
outcomes (e.g., success/failure,
yes/no, positive/negative).
• This probability can then be converted
into a classification (e.g., “yes” if
P>0.5, “no” if P≤0.5 based on a
threshold.
• How It Works:
• Logistic regression estimates the
How it relationship between one or more
works? predictor variables and the log odds
of the binary outcome.
• The model uses the logistic function
(or sigmoid function) to map these log
odds to a probability, providing an
interpretable likelihood for each
observation.
• Healthcare: Used to predict disease
risk (e.g., heart disease, diabetes)
based on factors like age, cholesterol
levels, and lifestyle.
Applications • Finance: Applied in credit scoring,
of Logistic fraud detection, and risk assessment.
Regression • Marketing: Helps predict customer
responses, conversions, or churn.
• Social Sciences: Analyzes survey
responses or predicts voting
behaviors.
• Linear relationship between
predictors and log odds.
Assumptions • Independence of observations.
of Logistic
• Minimal multicollinearity among
Regression
predictors.
• Sufficient sample size for accurate
estimations.
• Formula:
• P(Y = 1 | X) = e^(β₀ + β₁ * X₁ + ... +
The βₚ * Xₚ) / (1 + e^(β₀ + β₁ * X₁ + ... +
Logistic βₚ * Xₚ))
Function • The logistic function produces an
S-shaped curve that maps log odds
to probabilities between 0 and 1.
• Odds represent the likelihood of success
relative to failure in a given scenario. They
provide a way to compare the probability
of an event occurring (success) versus the
probability of it not occurring (failure).

Understanding • Formula:
Odds Odds = P(success) / (1 -
P(success))
• P(Success) is the probability of the event
happening.1−𝑃(Success)1−P(Success) is
the probability of the event not happening
(failure).
• Log Odds: The logarithm of odds is a
transformation that converts odds into a
linear scale, which is essential for modeling in
Log Odds logistic regression.

and Log • It allows the relationship between predictor

Odds variables and the outcome to be expressed in


a linear form, making it easier to interpret
Ratio and estimate the effect of each predictor.
• Formula:

Log odds = log(Odds) = log(P(Y = 1) / (1 - P(Y =


1))
• Linear relationship with
predictors
Why Use
• Simplifies interpretation of
Log Odds in
coefficients
Logistic
• Enables probability
Regression?
estimation using log
transformations.
• Definition: An odds ratio compares the
likelihood of an event occurring between
two groups.
• Formula:

OR = e^(β)
Odds • β is the coefficient of the predictor variable
Ratio in the logistic regression model.
• e^(β) represents the change in odds
associated with a one-unit increase in the
predictor.
• Interpretation: OR > 1 indicates increased
odds; OR < 1 indicates decreased odds.
• Transformation from log
odds to probabilities:
Logistic
Regression
Transformation • P(Y = 1 | X) = e^(β₀ + β₁X₁
+ ... + βₚXₚ) / (1 + e^(β₀ +
β₁X₁ + ... + βₚXₚ))
• Positive coefficient: Increases
log odds of the outcome
• Negative coefficient: Decreases
Interpreting
Coefficients log odds
in Logistic
Regression
• Example: If β₁ = 0.2, a 1-unit
increase in X₁ raises log odds by
0.2.
• A confusion matrix shows:
• True Positives (TP)
• True Negatives (TN)
Confusion • False Positives (FP)
Matrix • False Negatives (FN)

• Helps evaluate classification


performance.
• Accuracy = (TP + TN) / (TP +
TN + FP + FN)
Accuracy
Metric • Limitations: Accuracy may
not be a good measure for
imbalanced datasets.
• Precision = TP / (TP + FP)

Precision • Recall = TP / (TP + FN)


and • Useful for measuring the
Recall model’s effectiveness in
identifying true positives.
• F1 Score = 2 * (Precision *
Recall) / (Precision + Recall)

F1 Score
• Balances precision and
recall, especially useful for
imbalanced classes.
• Plots True Positive Rate vs.
Receiver False Positive Rate
Operating
Characteristic • Shows trade-off between
(ROC) Curve
sensitivity and specificity at
different thresholds.
•AUC = 1.0: Perfect classifier. The model has excellent
performance, distinguishing perfectly between classes
with no overlap in predicted probabilities.
Area •AUC = 0.5: No better than random guessing. The

Under model has no discriminatory power and is equivalent


to flipping a coin.

the ROC •0.5 < AUC < 1.0: The model has some level of

Curve
discriminatory power. The closer to 1, the better the
model is at distinguishing between classes.

(AUC) •AUC < 0.5: Worse than random guessing. This rarely
happens and might suggest that the model is
systematically predicting the opposite of the correct
outcome.
• Formula:
Log-Loss • Log-Loss = -1/N * Σ(y * log(p)
+ (1 - y) * log(1 - p))
(Cross-
Entropy • Penalizes confident but
Loss) incorrect predictions, crucial
for probability-based models.
• Using logistic regression to
Application predict the likelihood of
Example: heart disease based on
Heart
Disease patient data.
Prediction
• Predictors: Age, Cholesterol
level, Blood Pressure, etc.
Logistic • Formula:
Regression P(Heart Disease = 1 | X) = e^(β₀
+ β₁ * Age + β₂ * Cholesterol + β₃
Equation
* Blood Pressure) / (1 + e^(β₀ +
for Heart β₁ * Age + β₂ * Cholesterol + β₃ *
Disease Blood Pressure))
• Age: β₁ = 0.05
Predictor • Cholesterol: β₂ = 0.02
Variables • Blood Pressure: β₃ = 0.04
and • Interpreting each
Coefficients coefficient in the context of
heart disease.
Example • Scenario: Predicting heart
Calculation disease for a patient with
(Step-by- Age = 55, Cholesterol = 240,
Step) and Blood Pressure = 140.
• Log odds = -3 + (0.05 * Age)
+ (0.02 * Cholesterol) +
Step 1: (0.04 * Blood Pressure)
Calculate
• Substitute values:
Log Odds
• = -3 + (0.05 * 55) + (0.02 *
240) + (0.04 * 140)
• Convert log odds to
probability using:
Step 2:
Convert • P(Heart Disease) = e^(Log
Log Odds Odds) / (1 + e^(Log Odds))
to
• Illustrate the probability
Probability
calculation from the previous
log odds.
• In logistic regression,
interpreting the predicted
probability helps us classify an
observation based on a certain
threshold. For example, if we
Final set a threshold of 0.5:
Prediction 1. Threshold Interpretation:
Interpretation 1. P(Heart Disease)>0.5: We
classify the individual as high
risk for heart disease.
2. If P(Heart Disease)≤0.5: We
classify the individual as low
risk for heart disease.
Evaluating • Use confusion matrix, accuracy,
Heart Disease precision, recall, and F1 score
Model to assess model performance
Performance on heart disease prediction.
ROC
Curve for • Plot the ROC curve to
assess model discrimination
Heart ability between high and
Disease low risk of heart disease.
Prediction
• Interpretability: Logistic regression
allows us to understand the impact of
each predictor on the outcome, making
it easier to explain results in practical
terms.
• Probabilistic Output: The model
Summary provides probabilities, giving us insight
into how likely an outcome is, rather
than a strict yes/no answer.
• Evaluation Metrics: Metrics like
accuracy, AUC, precision, and recall
help us assess model performance,
especially in fields with significant
consequences for misclassification.
• Logistic regression’s combination of
interpretability, flexibility, and
effectiveness in handling binary
classification tasks makes it a valuable
tool for real-world applications across
Conclusion
various fields.
• By quantifying relationships and
delivering probabilistic predictions, it
supports informed decision-making,
especially in areas where understanding
risk and likelihood is crucial.

You might also like