0% found this document useful (0 votes)

30 views31 pages

LO3 Logistic Regression1

This document provides an overview of logistic regression. It begins with the objectives of explaining how logistic regression handles binary dependent variables, transforms them into likelihood and probability measures, and interprets results. It then discusses how regression analysis is used with independent and dependent variables. Specifically, it covers binomial, ordinal, and multinomial logistic regression. Key aspects of logistic regression covered include assumptions, modeling probability rather than direct outcomes, using the logit function to ensure probabilities are between 0 and 1, and using maximum likelihood estimation to calculate coefficients.

Uploaded by

ntr.notoria.gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views31 pages

LO3 Logistic Regression1

Uploaded by

ntr.notoria.gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Machine Learning

Logistic Regression
By E. Cheteni

www.belgiumcampus.ac.za
Lesson Objectives

Rationale for Logistic Regression

• Identify the types of variables used for dependent and
independent variables in the application of logistic
regression
• Describe the method used to transform binary measures
into the likelihood and probability measures used in logistic
regression.
• Interpret the results of a logistic regression analysis &
assessing predictive accuracy.

2
www.belgiumcampus.ac.za
Overview of Regression Analysis

E.g. What influences a person ‘s salary?

Regression Analysis is used to

achieve two goals

Independent
Dependent
Variables
Variable (Criterion)
[Predictors]

3
www.belgiumcampus.ac.za
Forms of regression Analysis

4
www.belgiumcampus.ac.za
5
www.belgiumcampus.ac.za
6
www.belgiumcampus.ac.za
7
www.belgiumcampus.ac.za
8
www.belgiumcampus.ac.za
9
www.belgiumcampus.ac.za
10
www.belgiumcampus.ac.za
Categorical variables with more than one value

11
www.belgiumcampus.ac.za
Handling Categorical Variables
• Categorical variables are typically encoded using techniques like one-
hot encoding or dummy coding to represent them as binary variables.
• One-hot encoding creates a binary variable for each category (e.g.
gender_male, gender_female), while dummy coding uses one less
variable to represent the categories.
• Multicollinearity among categorical variables with multiple levels can
inflate standard errors and lead to unreliable coefficient estimates.

12
www.belgiumcampus.ac.za
Handling Categorical Variables
• Example of one-hot encoding using a categorical variable ‘Color’ with three levels:
‘Red’, ‘Green’ and ‘Blue’.
One-Hot Encoding:
• One-hot encoding creates a binary variable for each category, where 1 indicates
the presence of the category and 0 indicates absence.

13
www.belgiumcampus.ac.za
Handling Categorical Variables
• Examples of dummy coding using a categorical variable ‘Color’ with three levels:
‘Red’, ‘Green’ and ‘Blue’.
Dummy Coding:
• Dummy coding uses one less variable than the number of categories to represent
the categories.
• Typically, one category is used as the reference category.

14
www.belgiumcampus.ac.za
www.belgiumcampus.ac.za
What is logistic regression
• Classification algorithm that uses regression analysis conducted when
the dependent variable (target) is categorical or binary

16
www.belgiumcampus.ac.za
When do we use LR?

• Continuous—such as temperature in degrees Celsius or weight in grams.

• continuous data is categorized as either interval data by;
• coding
• values are equally split

• Discrete, ordinal—data which can be placed into some kind of order on a scale.
• A score of 1 indicates a lower degree of happiness than a score of 5,
• but there is no way of determining the numerical value between each of the points on the scale.

• Discrete, nominal—data which fits into named groups which do not represent any
order or scale.
• E.g., eye color may fit into the categories “blue”, “brown”, or “green”, but there is no hierarchy to these
categories.

17
www.belgiumcampus.ac.za
Types of logistic regression
Binomial logistic regression
• In binomial Logistic regression, there can be only two possible types of the dependent variables, such
variables, such as 0 or 1, Pass or Fail, Yes or No, High or Low

Ordinal logistic regression

• In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".

Multinominal logistic regression

• In multinomial Logistic regression, there can be 3 or more possible unordered types of the
dependent variable, such as "cat", "dogs", or "sheep“.

18
www.belgiumcampus.ac.za
Examples of Binary classification problems

• Spam Detection : Predicting if an email is Spam or not

• Credit Card Fraud : Predicting if a given credit card
transaction is fraud or not
• Health : Predicting if a given mass of tissue is benign or
malignant
• Marketing : Predicting if a given user will buy an insurance
product or not
• Banking : Predicting if a customer will default on a loan.

19
www.belgiumcampus.ac.za
Logistic regression assumptions
Assumptions: Logistic Regression makes certain assumptions about the
data, including:
1. Binary output/target: as mentioned at the beginning, logistic
regression is for classification problems. We need to make sure the
target is binary and transform to values of 0 or 1.
2. Linear relationship: the logistic algorithm makes use of the linear
equation, so the same assumptions apply here.
3. Independent inputs: the highly correlated (multicollinearity) input
variables can fail the model convergence.

20
www.belgiumcampus.ac.za
Logistic Regression
• Consider the Default data set, where the response default falls into
• one of two categories, Yes or No.
• Rather than modeling this response Y directly, logistic regression
models the probability that Y belongs to a particular category.
• For the Default data, logistic regression models the probability of
default.
• For example, the probability of default given balance can be written
as

21
www.belgiumcampus.ac.za
Logistic Regression
• The values of ,which we abbreviate p(balance), will
range between 0 and 1.
• Then for any given value of balance, a prediction can be made for
default.
• For example, one might predict default = Yes for any individual for
whom p(balance) > 0.5.
• Alternatively, if a company wishes to be conservative in predicting
individuals who are at risk for default, then they may choose to use a
lower threshold, such as p(balance) > 0.1.

22
www.belgiumcampus.ac.za
Logistic Regression Model
• The linear logistic regression model would be:

• If we use this approach to predict default=Yes using balance, then we obtain:

• for balances close to zero we predict a negative probability of default;
• if we were to predict for very large balances, we would get values bigger than 1.
• These predictions are not sensible, since of course the true probability of default,
regardless of credit card balance, must fall between 0 and 1.

23
www.belgiumcampus.ac.za
Logistic Regression Model
• To avoid this problem, we must model p(X) using a function that gives
outputs between 0 and 1 for all values of X.

• After a bit of manipulation of the above, we find that,

• The quantity is called odds, and can take on any value between 0 and
∞.
• Values of the odds close to 0 and ∞ indicate very low and very high
probabilities of default, respectively.
24
www.belgiumcampus.ac.za
Logit or Log(odds)

• The logit regression model has a logit that is linear in X.

• Coefficients in logistic regression represent the log-odds of the
outcome associated with a one-unit change in the predictor
variable.
• 𝛽1 gives the average change in Y associated with a one-unit
increase in X.
• By contrast, in a logistic regression model, increasing X by one
unit changes the log odds by 𝛽1
25
www.belgiumcampus.ac.za
Estimating the Regression Coefficients

• The coefficients 𝛽0 and 𝛽1 are unknown, and must be estimated based

on the available training data.
• The maximum likelihood is used to estimate the regression coefficients.

• The basic intuition behind using maximum likelihood:

• we seek estimates for 𝛽0 and 𝛽1 are such that the predicted probability 𝑝(𝑋𝑖 ) of
default for each individual, corresponds as closely as possible to the individual’s
observed default status.

26
www.belgiumcampus.ac.za
Estimating the Regression Coefficients
• The estimates 𝛽0 and 𝛽1 are are chosen to maximize this likelihood function.

• We see that 𝛽1 = 0.0055; this indicates a one-unit increase in balance is associated with an
increase in the log odds of default by 0.0055 units.
• We can measure the accuracy of the coefficient estimates by computing their standard
errors.
• For instance, the z-statistic associated with 𝛽1 is equal to 𝛽1 /𝑆𝐸(𝛽1 ).
• So a large (absolute) value of the z-statistic indicates evidence against the null hypothesis
H0 : 𝛽1 = 0 implying that;

• i.e. the probability of default does not depend on balance.

27
www.belgiumcampus.ac.za
Example 1 - Making Predictions
• Once the coefficients have been estimated, we can compute the
probability of default for any given credit card balance.

• For example, for an individual with a balance of R1, 000;

28
www.belgiumcampus.ac.za
Example 2
• For example, the Default data set contains the qualitative variable
student.

• To fit a model that uses student status as a predictor variable, we

simply create a dummy variable that takes on a value of 1 for students
and 0 for non-students.

29
www.belgiumcampus.ac.za
Exercise
• Question:For the Default data, estimated coefficients of the logistic regression model that
predicts the probability of default using balance, income, and student status. Student
status is encoded as a dummy variable student[Yes], with a value of 1 for a student and a
value of 0 for a non-student. In fitting this model, income was measured in thousands of
dollars.

1. Given a student with a credit card balance of R1, 500 and an income of R40, 000,
estimate probability of default.
2. Give a non-student with the same balance and income, estimate probability of default.

30
www.belgiumcampus.ac.za
THANK YOU

31
www.belgiumcampus.ac.za

Probability and Stochastic Processes 3rd Edition Quiz Solutions
100% (2)
Probability and Stochastic Processes 3rd Edition Quiz Solutions
90 pages
Ujian-t Bersandar: Analisis Hipotesis
No ratings yet
Ujian-t Bersandar: Analisis Hipotesis
11 pages
Lecture 2 Components of Statistics
No ratings yet
Lecture 2 Components of Statistics
11 pages
ClassificationMethods LogisticRegression
No ratings yet
ClassificationMethods LogisticRegression
52 pages
04 Chap04 ClassificationMethods-LogisticRegression 2024
No ratings yet
04 Chap04 ClassificationMethods-LogisticRegression 2024
23 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
MACHINE LEARNING Presentation Logistic Regression
No ratings yet
MACHINE LEARNING Presentation Logistic Regression
18 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
100% (2)
FEM 2063 - Data Analytics: CHAPTER 4: Classifications
76 pages
Lecture 08
No ratings yet
Lecture 08
42 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
49 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Business Analytics: Advance: Logistic Regression
100% (1)
Business Analytics: Advance: Logistic Regression
26 pages
Logistic Regression Course Overview
No ratings yet
Logistic Regression Course Overview
16 pages
Sonia Jessica - 2022 - How Does Logistic Regression Work
No ratings yet
Sonia Jessica - 2022 - How Does Logistic Regression Work
4 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
69 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic Regression - Metrics and Iteration
No ratings yet
Logistic Regression - Metrics and Iteration
26 pages
Notes For Chapter 7
No ratings yet
Notes For Chapter 7
13 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Chapter Two Dss
No ratings yet
Chapter Two Dss
3 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
Fai Module 3
No ratings yet
Fai Module 3
67 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
19 pages
Classification
No ratings yet
Classification
56 pages
Logit Regression Analysis
No ratings yet
Logit Regression Analysis
11 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
VO MCA S4 Data Mining Unit 8
No ratings yet
VO MCA S4 Data Mining Unit 8
18 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
Class
No ratings yet
Class
102 pages
3 Classification
No ratings yet
3 Classification
26 pages
Unit 3-ML
No ratings yet
Unit 3-ML
99 pages
Logistic Regression
100% (1)
Logistic Regression
56 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
17 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Eml 24.7.25
No ratings yet
Eml 24.7.25
23 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Logistic Regression Report
No ratings yet
Logistic Regression Report
39 pages
Chapter 06 - US 7e
No ratings yet
Chapter 06 - US 7e
22 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
Environmental Modelling & Software
No ratings yet
Environmental Modelling & Software
5 pages
Markov Trading Model 1719707206
No ratings yet
Markov Trading Model 1719707206
4 pages
Fundamentals of Statistical Inference What Is The Meaning of Random Error Norbert Hirschauer Instant Download
No ratings yet
Fundamentals of Statistical Inference What Is The Meaning of Random Error Norbert Hirschauer Instant Download
86 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
19 pages
Basic Stat in SAS
No ratings yet
Basic Stat in SAS
12 pages
Random Variable Illustration
No ratings yet
Random Variable Illustration
5 pages
Mathematical Expectation: Lecture # 2
No ratings yet
Mathematical Expectation: Lecture # 2
17 pages
BS 6
No ratings yet
BS 6
10 pages
ANOVA Techniques in STA408
No ratings yet
ANOVA Techniques in STA408
6 pages
MTH 207 - Midterm
No ratings yet
MTH 207 - Midterm
8 pages
Risk Anlytics - Tutorial - w14+15
No ratings yet
Risk Anlytics - Tutorial - w14+15
33 pages
Logistic Regression
No ratings yet
Logistic Regression
49 pages
Random Process and Linear Algebra - MA3355 - Hand Written Notes - Unit 1 - Probability and Random Variables
No ratings yet
Random Process and Linear Algebra - MA3355 - Hand Written Notes - Unit 1 - Probability and Random Variables
69 pages
MODULE7 Measures of Variability
No ratings yet
MODULE7 Measures of Variability
24 pages
Modeling Discrete Time To Event Data Instant Reading Access
100% (8)
Modeling Discrete Time To Event Data Instant Reading Access
15 pages
Unit IV Data Processing and Analysis
No ratings yet
Unit IV Data Processing and Analysis
27 pages
Probability and Statistics
No ratings yet
Probability and Statistics
127 pages
Efron 1987
No ratings yet
Efron 1987
16 pages
COSM Question Bank For The (2023-24)
No ratings yet
COSM Question Bank For The (2023-24)
21 pages
Presentation On Regression Analysis: Presented by
No ratings yet
Presentation On Regression Analysis: Presented by
18 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
Cross Validation Techniques Guide
No ratings yet
Cross Validation Techniques Guide
21 pages
Wgu C784 - Applied Healthcare Statistics Pre Assessment Test Exam Questions and Verified Answers Graded A+ 2024 Update
No ratings yet
Wgu C784 - Applied Healthcare Statistics Pre Assessment Test Exam Questions and Verified Answers Graded A+ 2024 Update
17 pages
Aligning Statistical and Scientific Reasoning
No ratings yet
Aligning Statistical and Scientific Reasoning
3 pages
BMS40420171201
No ratings yet
BMS40420171201
5 pages
Module III - Static Reliability Analysis and Design
No ratings yet
Module III - Static Reliability Analysis and Design
35 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Chi Square Tests and Nonparametric Tests
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Chi Square Tests and Nonparametric Tests
33 pages

LO3 Logistic Regression1

Uploaded by

LO3 Logistic Regression1

Uploaded by

Machine Learning

Rationale for Logistic Regression

E.g. What influences a person ‘s salary?

Regression Analysis is used to

• Continuous—such as temperature in degrees Celsius or weight in grams.

Ordinal logistic regression

Multinominal logistic regression

• Spam Detection : Predicting if an email is Spam or not

• If we use this approach to predict default=Yes using balance, then we obtain:

• After a bit of manipulation of the above, we find that,

• The logit regression model has a logit that is linear in X.

• The coefficients 𝛽0 and 𝛽1 are unknown, and must be estimated based

• The basic intuition behind using maximum likelihood:

• i.e. the probability of default does not depend on balance.

• For example, for an individual with a balance of R1, 000;

• To fit a model that uses student status as a predictor variable, we

You might also like