Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
50 views32 pages

Lecture - 4 - Classification - Logistic Regression

Logistic regression is used for classification problems with binary or multi-class qualitative dependent variables. It models the probabilities of different class outcomes as a function of predictor variables. The logistic function is used to constrain the output probabilities between 0 and 1. Maximum likelihood estimation is applied to estimate the model coefficients. The estimated coefficients can then be used to predict class probabilities for new observations and classify them based on probability thresholds. Multiple logistic regression extends the approach to multiple predictor variables.

Uploaded by

bberkcan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views32 pages

Lecture - 4 - Classification - Logistic Regression

Logistic regression is used for classification problems with binary or multi-class qualitative dependent variables. It models the probabilities of different class outcomes as a function of predictor variables. The logistic function is used to constrain the output probabilities between 0 and 1. Maximum likelihood estimation is applied to estimate the model coefficients. The estimated coefficients can then be used to predict class probabilities for new observations and classify them based on probability thresholds. Multiple logistic regression extends the approach to multiple predictor variables.

Uploaded by

bberkcan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Classification – Logistic

Regression

Dr. Mehmet Yasin Ulukuş

Mindset Institute - Mehmet Yasin Ulukuş


Why not Linear Regression?
• Suppose that we are trying to predict the medical condition of a
patient in the emergency room on the basis of her symptoms
• Three possible diagnoses: stroke, drug overdose, and epileptic seizure
• We could consider encoding these values as a quantitative response
variable, , as follows:

Mindset Institute - Mehmet Yasin Ulukuş


Why not Linear Regression?
• Suppose that we are trying to predict the medical condition of a patient in the
emergency room on the basis of her symptoms.
• Three possible diagnoses: stroke, drug overdose, and epileptic seizure.
• We could consider encoding these values as a quantitative response variable, Y ,
as follows and use least square to fit regression:
Why not
this coding
then?

• This coding implies an ordering on the outcomes but there is no particular reason
that this needs to be the case.

Mindset Institute - Mehmet Yasin Ulukuş


Why not Linear Regression?
• 1, 2, 3 coding would be reasonable.
• If the response variable’s values take on a natural ordering, such as mild,
moderate, and severe.
• Also if we felt the gap between mild and moderate was similar to the gap
between moderate and severe.
• Unfortunately, in general there is no natural way to convert a qualitative response
variable with more than two levels into a quantitative response that is ready for
linear regression.

Mindset Institute - Mehmet Yasin Ulukuş


Why not Linear Regression?
• For a binary (two level) qualitative response, the situation is better.
• If there are only two possibilities: stroke and drug overdose

• We could then fit a linear regression to this binary response, and predict drug
overdose if and stroke otherwise.
• Regression by least squares does make sense for binary response.
• It can be shown that the obtained using linear regression is in fact an estimate of
in this special case.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression
• Recall the default data again

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression
• Rather than modeling this response directly, logistic regression models the probability
that belongs to a particular category
• For the Default data, logistic regression models the probability of default for example
given a balance.

which we abbreviate
• Now we can use these probabilities to classify individuals. For example, we might classify
customers with a as default yes customers
• Alternatively, if a company wishes to be conservative in predicting individuals who are at
risk for default, then they may choose to use a lower threshold, such as to call them
default yes.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression
• For simplicity, let
• We talked of using a linear regression model to represent these probabilities

• This approach might provide values less than zero and greater than 1.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression
• To avoid this problem, we must
model using a function that
gives outputs between 0 and 1
for all values of
• In logistic regression, we use the
logistic function

The logistic function will always


produce an S-shaped curve of this
form, and so regardless of the value
of X, we will obtain a sensible
prediction
Mindset Institute - Mehmet Yasin Ulukuş
Logistic Regression
• After a bit of manipulation, we find that

• is called odds which can take on any value between 0 and .


• Terminology is used in gambling
• On average nine out of every ten people with an odds of 9 will default,
since implies an odds of

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression
• By taking the logarithm, we arrive at

• So logarithm of odds are modeled by a linear function


• One unit increase on will multiply the odds by
• Higher odds means higher probabilities
• If is positive then increasing will be associated with increasing , and is
negative then increasing will be associated with decreasing

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Estimating Coefficients
• The coefficients and are unknown, and must be estimated based on
the available training data
• We prefer general method called maximum likelihood
• The basic intuition behind using maximum likelihood for logistic
regression is as follows:
• We seek estimates for and such that the predicted probability of
default for each individual, corresponds as closely as possible to the
individual’s observed default status.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Estimating Coefficients
• In other words, we try to find and such that plugging these estimates into the
model for

yields a number close to one for all individuals who defaulted, and a number
close to zero for all individuals who did not
• This intuition can be formalized using a mathematical equation called a likelihood
function

• The estimates and are chosen to maximize this likelihood function.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Estimating Coefficients
• Instead of maximizing likelihood, we can maximize the log-likelihood (true for any
increasing function) to get rid of multiplications.
• The log-likelihood can be written as

• The estimates and are chosen to maximize log-likelihood function.


• Log-likelihood function is concave.
• Take derivatives with respect to and and set to zero and solve the equation
system.
• Equation system is non-linear but there are iterative algorithms that can solve
this system (Newton-Raphson).

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Default Data Model
• For the Default Data the logistic regression parameters are as follows:
R provides these values
Scikit Learn does not.

• We see that ; this indicates that an increase in balance is associated


with an increase in the probability of default
• To be precise, a one-unit increase in balance is associated with an
increase in the log odds of default by 0.0055 units

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Predictions
• Once the coefficients have been estimated, it is a simple matter to
compute the probability of default for any given credit card balance
• We predict that the default probability for an individual with a balance of
$1,000 is

• In contrast, the predicted probability of default for an individual with a


balance of $2,000 is much higher, and equals 0.586 or 58.6%

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Qualitative inputs
• One can use qualitative predictors with the logistic regression model using the
dummy variable approach we have seen in linear regression
• As an example, the Default data set contains the qualitative variable student
• To fit the model we simply create a dummy variable that takes on a value of 1 for
students and 0 for non-students
• Fitted model is as follows

• Predictions are as follows:

Mindset Institute - Mehmet Yasin Ulukuş


Multiple Logistic Regression
• We now consider the problem of predicting a binary response using multiple
predictors
• Our model is now is a generalization of the single model

where are predictors


• Probabilities are computed by

• We use the maximum likelihood method to estimate

Mindset Institute - Mehmet Yasin Ulukuş


Multiple Logistic Regression
• Table below shows the coefficient estimates for a logistic regression model that uses
balance, income (in thousands of dollars), and student status to predict probability of
default

• The p-values associated with balance and the dummy variable for student status are very
small, indicating that each of these variables is associated with the probability of default
• For example, a student with a credit card balance of $1,500 and an income of $40,000
has an estimated probability of default of

Mindset Institute - Mehmet Yasin Ulukuş


Multiple Logistic Regression Classification
• We estimate probabilities using

𝛽 0 + 𝛽 1 𝑥1 +𝛽 2 𝑥 2 <0 𝛽 0 +𝛽 1 𝑥1 + 𝛽 2 𝑥 2 >0
• Now, we can use thresholds to assign
classes
• For instance, if then
• It can be rewritten as, if then .
• Now, note that is linear, hence the
decision boundary is linear.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression Python Example
• See Logistic Regression Jupyter Notebook File for details.

Mindset Institute - Mehmet Yasin Ulukuş


CONFUSION MATRIX
• A confusion matrix, shown below for the Default data, is a convenient way to display this
information

• In total 275 individuals among 10,000 are misclassified


• Only 23 out of 9,667 of the individuals who did not default were incorrectly labeled
• However, of the 333 individuals who defaulted, 252 (or 75.7%) were missed
• From the perspective of a credit card company that is trying to identify high-risk
individuals would be unacceptable

Mindset Institute - Mehmet Yasin Ulukuş


Types of Clasification Errors Summary
• Possible results when applying a classifier or diagnostic test to a population

• People use different terminology for these errors.

Mindset Institute - Mehmet Yasin Ulukuş


Setting Different Thresholds
• As we have seen, Logistic Regression is trying to approximate the Bayes
classifier
• That is, the Bayes classifier will yield the smallest possible total number of
misclassified observations, irrespective of which class the errors come
from
• In contrast, a credit card company might particularly wish to avoid
incorrectly classifying an individual who will default, whereas incorrectly
classifying an individual who will not default, though still to be avoided, is
less problematic
• We will now see that it is possible to modify it in order to develop a
classifier that better meets the credit card company’s needs

Mindset Institute - Mehmet Yasin Ulukuş


Setting Different Thresholds
• In the two-class case, this amounts to assigning an observation to the default class if

Pr(default = Yes|X = x) > 0.5

• However, if we are concerned about incorrectly predicting the default status for
individuals who default, then we can consider lowering this threshold
• For instance, we might label any customer with probability of default above 20% to the
default class
Pr(default = Yes|X = x) > 0.2

• To do this all we need to compute the probabilities generated by the logistic regression
model and classify customers as defaulters if the predicted probability is 0.2.

Mindset Institute - Mehmet Yasin Ulukuş


Setting Different Thresholds
• Now we have the following confusion matrix

• Of the 333 individuals who default, we correctly predict all but 138, or 41.4%.
• This is a vast improvement over the error rate of 75.7% that resulted from using the
threshold of 50%.
• However, this improvement comes at a cost: now 235 individuals who do not default are
incorrectly classified.
• As a result, the overall error rate has increased slightly to 3.73 %.
• Preferrable for credit card company

Mindset Institute - Mehmet Yasin Ulukuş


ROC Curves
• The ROC curve is a popular graphic for
simultaneously displaying the two types of
errors for all possible thresholds
• ROC is an acronym for receiver operating
characteristics
• The overall performance of a classifier,
summarized over all possible thresholds, is
given by the area under the (ROC) curve (AUC)
• An ideal ROC curve will hug the top left corner,
so the larger area under curve the better the
classifier

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression Confusion Matrix and ROC Python Example

• See Logistic Regression Jupyter Notebook File for details.

Mindset Institute - Mehmet Yasin Ulukuş


Multi Class Classification
• Just as binary classification involves predicting if something is from one of two classes (e.g.
“black” or “white”, “dead” or “alive”, etc.), 
• Multiclass problems involve classifying something into one of N classes (e.g. “red”, “white” or
“blue”, etc).

• Similarly if we had the option to have the exact probability distribution we can still use The Bayes
Optimal
• The classifier would basically pick the class where the conditional probability is the highest

Mindset Institute - Mehmet Yasin Ulukuş


Multi Class Classification
• See Multi Class Classification Jupyter Notebook File for details.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Non Linear Decision Boundaries
• Just as in the Linear regression, similar non-linear transformation can be
done in logistic regression.
• For example let’s say we have two independent variables .
• Instead of a linear model, we may choose a 2nd order polynomial. Hence
the model will look like as follows.

• Note that the above model may again be treated as a linear model but with
four independent variables
• The decision boundary created by this model will be a second order
polynomial.

Mindset Institute - Mehmet Yasin Ulukuş


Logistic Regression: Non Linear Decision Boundaries

Mindset Institute - Mehmet Yasin Ulukuş

You might also like