0% found this document useful (0 votes)

26 views29 pages

Chapter 10 Logistic Reg (Python)

Chapter 10 discusses logistic regression, which extends linear regression to situations with categorical outcome variables, focusing on binary classification. It introduces the logit function to relate predictor variables to a 0/1 outcome and explains the process of fitting a logistic regression model, including variable selection and performance evaluation. The chapter emphasizes the importance of understanding odds and probabilities in predictive classification and addresses issues like multicollinearity in predictor variables.

Uploaded by

orselmerve2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views29 pages

Chapter 10 Logistic Reg (Python)

Uploaded by

orselmerve2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Chapter 10 – Logistic

Regression
Logistic Regression
⚫Extends idea of linear regression to
situation where outcome variable is
categorical

⚫Widely used, particularly where a

structured model is useful to explain
(=profiling) or to predict
⚫ Finding the factors that differantiate
between male and female top executives

⚫We focus on binary classification

i.e. Y=0 or Y=1
The Logit
Goal: Find a function of the predictor
variables that relates them to a 0/1
outcome

⚫Instead of Y as outcome variable (like in

linear regression), we use a function of Y
called the logit
⚫Logit can be modeled as a linear function of
the predictors
⚫The logit can be mapped back to a
probability, which, in turn, can be mapped
to a class
Step 1: Logistic Response
Function
p = probability of belonging to class 1

Need to relate p to predictors with a function

that guarantees 0 ≤ p ≤ 1

Standard linear function (as shown below)

does not:

q = number of
predictors
The Fix:
use logistic response
function

Equation 10.2 in
textbook
Step 2: The Odds
The odds of an event are defined as:

p = probability of
eq. 10.3 event

Or, given the odds of an event, the probability

of the event can be computed by:
eq.
10.4
We can also relate the Odds to
the predictors:

eq. 10.5

To get this result, substitute 10.2

into 10.4
Step 3: Take log on both
sides

This gives us the logit:

log(Odds) = logit (eq. 10.6)

Logit, cont.

So, the logit is a linear function of predictors

x1, x2, …
⚫ Takes values from -infinity to +infinity

Review the relationship between logit, odds

and probability (Check the chapter 10)
Odds (a) and Logit (b) as function of
P
Example
Personal Loan Offer
(UniversalBank.csv)

Outcome variable: accept bank loan (0/1)

Predictors: Demographic info, and info about

their bank relationship
Single Predictor Model
Modeling loan acceptance on income (x)

Assume Fitted coefficients (more later): b0 = -

6.3525, b1 = -0.0392
Seeing the Relationship
Last step - classify
Model produces an estimated probability of
being a “1”

⚫Convert to a classification by establishing

cutoff level

⚫If estimated prob. > cutoff, classify as “1”

© Galit Shmueli and Peter Bruce 2017

Ways to Determine Cutoff
⚫0.50 is popular initial choice

⚫Additional considerations (see Chapter 5)

⚫ Maximize classification accuracy
⚫ Maximize sensitivity (subject to min. level of
specificity)
⚫ Minimize false positives (subject to max. false
negative rate)
⚫ Minimize expected cost of misclassification
(need to specify costs)
Example, cont.

⚫Estimates of β’s are derived through an

iterative process called maximum likelihood
estimation

⚫Let’s include all 12 predictors in the model

now
Data Prep

bank_df = pd.read_csv('UniversalBank.csv')
bank_df.drop(columns=['ID', 'ZIP Code'], inplace=True)
bank_df.columns = [c.replace(' ', '_') for c in bank_df.columns]

# Treat education as categorical, convert to dummy variables

bank_df['Education'] = bank_df['Education'].astype('category')
new_categories = {1: 'Undergrad', 2: 'Graduate', 3:
'Advanced/Professional'}
bank_df.Education.cat.rename_categories(new_categories, inplace=True)
bank_df = pd.get_dummies(bank_df, prefix_sep='_', drop_first=True)

y = bank_df['Personal_Loan']
X = bank_df.drop(columns=['Personal_Loan'])
Fitting Model

# partition data
train_X, valid_X, train_y, valid_y = train_test_split(X, y,
test_size=0.4, random_state=1)

# fit a logistic regression (set penalty=l2 and C=1e42 to avoid

# regularization)
logit_reg = LogisticRegression(penalty="l2", C=1e42,
solver='liblinear')
logit_reg.fit(train_X, train_y)
print('intercept ', logit_reg.intercept_[0])
print(pd.DataFrame({'coeff': logit_reg.coef_[0]},
index=X.columns).transpose())
print('AIC', AIC_score(valid_y, logit_reg.predict(valid_X), df =
len(train_X.columns) + 1))
Results
intercept -12.61895521314035

Age Experience Income Family CCAvg Mortgage

coeff -0.032549 0.03416 0.058824 0.614095 0.240534 0.001012

Securities_Account CD_Account Online CreditCard

coeff -1.026191 3.647933 -0.677862 -0.95598

Education_Graduate Education_Advanced/Professional
coeff 4.192204 4.341697

AIC -709.1524769205962

coefficients for logit

Converting from logit to probabilities

logit_reg_pred = logit_reg.predict(valid_X)
logit_reg_proba = logit_reg.predict_proba(valid_X)
logit_result = pd.DataFrame({'actual': valid_y,
'p(0)': [p[0] for p in logit_reg_proba],
'p(1)': [p[1] for p in logit_reg_proba],
'predicted': logit_reg_pred })

# display four different cases

interestingCases = [2764, 932, 2721, 702]
print(logit_result.loc[interestingCases])

actual p(0) p(1) predicted

2764 0 0.976 0.024 0
932 0 0.335 0.665 1
2721 1 0.032 0.968 1
702 1 0.986 0.014 0
Interpreting Odds, Probability
For predictive classification, we typically use
probability with a cutoff value

For explanatory purposes, odds have a useful

interpretation:
⚫If we increase x1 by one unit, holding x2, x3 …
xq constant, then
⚫b1 is the factor by which the odds of
belonging to class 1 increase
⚫ Recall
⚫ Consider single predictor as «Income»,
remaining will be constant.
⚫ Odds(Personel Loan=Yes|Income)=
⚫ So, is the multiplicative factor by which the
odds (of belonging to class 1) increase
when the value of X1 is increased by 1
unit, holding all other predictors constant.
If < 0, an increase in X1 is associated with
a decrease in the odds of belonging to
class 1, whereas a positive value of is
associated with an increase in the odds.
Loan Example:
Evaluating Classification
Performance

Performance measures: Confusion matrix

and % of misclassifications

More useful in this example: gains (lift)

(terms sometimes used interchangeably)
Python’s Gains
Chart
df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
gainsChart(df.actual, ax=axes[0])
liftChart(df['p(1)'], title=False, ax=axes[1])
plt.show()

# of 1’s yielded by model,

moving thru records sorted by
predicted prob. of being a 1

# of 1’s yielded by selecting

randomly
Python’s Lift Chart
df = logit_result.sort_values(by=['p(1)'], ascending=False)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10, 4))
gainsChart(df.actual, ax=axes[0])
liftChart(df['p(1)'], title=False, ax=axes[1])
plt.show()

Top decile (i.e. the 10%

most probable to be
1’s) are 7.8 times as
likely to be 1’s,
compared to random
selection
Multicollinearity

Problem: As in linear regression, if one

predictor is a linear combination of other
predictor(s), model estimation will fail
⚫Note that in such a case, we have at least
one redundant predictor

Solution: Remove extreme redundancies

(by dropping predictors via variable
selection, or by data reduction methods such
as PCA)
Variable Selection
This is the same issue as in linear regression
⚫ The number of correlated predictors can grow when we
create derived variables such as interaction terms
(e.g. Income x Family), to capture more complex
relationships
⚫ Problem: Overly complex models have the danger of
overfitting
⚫ Solution: Reduce variables via automated selection of
variable subsets (as with linear regression)
⚫ See Chapter 6
Summary
⚫Logistic regression is similar to linear
regression, except that it is used with a
categorical response
⚫It can be used for explanatory tasks
(=profiling) or predictive tasks
(=classification)
⚫The predictors are related to the response Y
via a nonlinear function called the logit
⚫As in linear regression, reducing predictors
can be done via variable selection
⚫Logistic regression can be generalized to
more than two classes

Clark The Penguin Dicionary of Geography
No ratings yet
Clark The Penguin Dicionary of Geography
472 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Design of Cable Trench
78% (9)
Design of Cable Trench
4 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
Logistic+Regression - Done
100% (1)
Logistic+Regression - Done
41 pages
Logistic Regression Tutorial Python
No ratings yet
Logistic Regression Tutorial Python
30 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
BANA 560 Lecture - 4 - LogisticRegression
No ratings yet
BANA 560 Lecture - 4 - LogisticRegression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Advanced Regression with GLMs
No ratings yet
Advanced Regression with GLMs
13 pages
Chapter 10 Logistic Reg
No ratings yet
Chapter 10 Logistic Reg
29 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Logistic Regression Insights
No ratings yet
Logistic Regression Insights
33 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
HR Analytics with Logistic Regression
No ratings yet
HR Analytics with Logistic Regression
9 pages
MBBS/BDS (Management Quota) Course Session:2022 - 2023 Provisional List of Candidates Allotted On - 26.11.2022
No ratings yet
MBBS/BDS (Management Quota) Course Session:2022 - 2023 Provisional List of Candidates Allotted On - 26.11.2022
161 pages
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
No ratings yet
Chapter 10 - Logistic Regression: Data Mining For Business Intelligence
20 pages
Binary Logistic
No ratings yet
Binary Logistic
29 pages
Class
No ratings yet
Class
102 pages
Chap10 LogisticRegression
No ratings yet
Chap10 LogisticRegression
19 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
European Steel and Alloy Grades: 10crmo9-10 (1.7380)
No ratings yet
European Steel and Alloy Grades: 10crmo9-10 (1.7380)
3 pages
Dav Exp4 66
No ratings yet
Dav Exp4 66
5 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Topic 7 Regression (Cont2) Logistic Regression
No ratings yet
Topic 7 Regression (Cont2) Logistic Regression
33 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logisticregression
No ratings yet
Logisticregression
22 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
MCB 202 (Lecture 1)
No ratings yet
MCB 202 (Lecture 1)
12 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Linear Regression and Logit
No ratings yet
Linear Regression and Logit
15 pages
Logistic Regression for Analysts
No ratings yet
Logistic Regression for Analysts
33 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Crosss Sectional Areas - by Sections Part 1
No ratings yet
Crosss Sectional Areas - by Sections Part 1
5 pages
Lec 20
No ratings yet
Lec 20
16 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Chapter 10 Logistic Reg - Week 07 - 01
No ratings yet
Chapter 10 Logistic Reg - Week 07 - 01
31 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
The Practice of Algebraic Curves A Second Course in Algebraic Geometry (David Eisenbud Etc.) (Z-Library)
No ratings yet
The Practice of Algebraic Curves A Second Course in Algebraic Geometry (David Eisenbud Etc.) (Z-Library)
432 pages
Logistic Regresson
No ratings yet
Logistic Regresson
32 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
Toray
No ratings yet
Toray
60 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Stats Correlation for Chem Students
No ratings yet
Stats Correlation for Chem Students
50 pages
Eml 24.7.25
No ratings yet
Eml 24.7.25
23 pages
Understanding Logit Model
No ratings yet
Understanding Logit Model
3 pages
Bernoulli
No ratings yet
Bernoulli
4 pages
Ecology and Evolution Notes
No ratings yet
Ecology and Evolution Notes
3 pages
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
No ratings yet
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
20 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
1 s2.0 S1057521924005544 Main
No ratings yet
1 s2.0 S1057521924005544 Main
14 pages
Ansi Niso Z39.104 2022
No ratings yet
Ansi Niso Z39.104 2022
12 pages
Allergic Rhinitis RCT Data Analysis
No ratings yet
Allergic Rhinitis RCT Data Analysis
10 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
English - Grade 11 - Third Term Test 2022 - Kalmunai - English Paper II
No ratings yet
English - Grade 11 - Third Term Test 2022 - Kalmunai - English Paper II
8 pages
Refrigeration Controller Guide
No ratings yet
Refrigeration Controller Guide
8 pages
General Guidelines and Procedures Protocolo de Interpretacion
No ratings yet
General Guidelines and Procedures Protocolo de Interpretacion
12 pages
According To Official Estimates, About 330,000 Houses Were Damaged
No ratings yet
According To Official Estimates, About 330,000 Houses Were Damaged
5 pages
ED-Course Plan 2024 EEE
No ratings yet
ED-Course Plan 2024 EEE
6 pages
Syllabus Arch 353 Sec Sem.2024-2025
No ratings yet
Syllabus Arch 353 Sec Sem.2024-2025
4 pages
Knowledge Impact on Self-Treatment Choices
No ratings yet
Knowledge Impact on Self-Treatment Choices
9 pages
Quick Guide FDX Console
No ratings yet
Quick Guide FDX Console
5 pages
Refrigeration & Air Conditioning Guide
No ratings yet
Refrigeration & Air Conditioning Guide
19 pages
Tieng Anh 8 Friends Plus - Unit 6 - Test 1
No ratings yet
Tieng Anh 8 Friends Plus - Unit 6 - Test 1
4 pages
Subliminal Mastery for Manifestors
No ratings yet
Subliminal Mastery for Manifestors
3 pages
Notfn - Group - I - 2023 With Syllabus - 122023 - 08122023
No ratings yet
Notfn - Group - I - 2023 With Syllabus - 122023 - 08122023
16 pages
Calculating Module Voltages
No ratings yet
Calculating Module Voltages
2 pages
Ammonia to Hydrazine Reaction Balancing
No ratings yet
Ammonia to Hydrazine Reaction Balancing
1 page