Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views17 pages

2.1.3 Logistic Regression

Sem 6 material for Mumbai university

Uploaded by

cloudcore951
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views17 pages

2.1.3 Logistic Regression

Sem 6 material for Mumbai university

Uploaded by

cloudcore951
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

MODULE 2

LEARNING WITH
REGRESSION AND TREES
2.1 Learning with Regression: Linear Regression,
Multivariate Linear Regression, Logistic Regression.
1
Darakhshan Khan
LOGISTIC REGRESSION
• This type of statistical model (also known as logit model) is often used for classification
• Logistic regression estimates the probability of an event occurring, such as voted or
didn’t vote, based on a given dataset of independent variables.
• Since the outcome is a probability, the dependent variable is bounded between 0 and
1.
• Consider y variable (binary classification)
• 0: negative class
• 1: positive class
• Examples
• Email: spam / not spam
• Online transactions: fraudulent / not fraudulent
2
• Tumor: malignant / not malignant
Darakhshan Khan
LOGISTIC REGRESSION (1)
• Issue 1 of Linear Regression
• Using linear regression and then threshold the classifier
output (i.e. anything over some value is yes, else no)
• In our example, linear regression with thresholding
seems to work i.e. it does a reasonable job of stratifying
the data points into one of two classes
• But what if we had a single Yes with a very large
tumour size
• This would lead to classifying all the existing yeses as
nos

3
Darakhshan Khan
LOGISTIC REGRESSION (2)

Issue 2 of Linear Regression


• We know Y is 0 or 1
• Model can give values large than 1 or less than 0
• So, logistic regression generates a value where is
always either 0 or 1
• Logistic regression is a classification algorithm -
don't be confused

4
Darakhshan Khan
LOGISTIC REGRESSION – MODEL REPRESENTATION

• What function is used to represent model in classification?


• Aim of this classifier to output values between 0 and 1
• Using linear regression, hθ(x) = (θTx)
• For classification hypothesis representation we do hθ(x) = g(θTx)
• Where, g(z), z is a real number
• g(z) = 1/(1 + e-z)
• This is the sigmoid function, or the logistic function
• If we combine these equations we can write out the hypothesis as
• What does the sigmoid function look like
• Crosses 0.5 at the origin, then flattens out]
• Asymptotes at 0 and 1
5
• Given this we need to fit θ to our data
Darakhshan Khan
LOGISTIC REGRESSION – MODEL
REPRESENTATION(1)
Interpreting output
• When (hθ(x)) outputs a number, treat that value as the estimated probability that
y=1 on input x
• Example : If X is a feature vector with x0 = 1 (as always) and x1 = tumourSize
(some value)
• hθ(x) = 0.7, Tells a patient they have a 70% chance of a tumor being malignant
• More formal notation, hθ(x) = P(y=1|x ; θ)
• Probability that y=1, given x, parameterized by θ
• Since this is a binary classification task we know y = 0 or 1
• So the following must be true
P(y=1|x ; θ) + P(y=0|x ; θ) = 1

6
• P(y=0|x ; θ) = 1 - P(y=1|x ; θ)
Darakhshan Khan
LOGISTIC REGRESSION - DECISION
BOUNDARY

• Better understand of what the hypothesis function (model) looks like


• One way of using the sigmoid function is;
• When the probability of y being 1 is greater than 0.5 then we can predict y = 1
• Else we predict y = 0

• Looking at sigmoid function, g(z) is greater than or equal to 0.5 when z is


greater than or equal to 0
• So if z is positive, g(z) is greater than 0.5, where z = (θ T x)
• So when , θT x >= 0 , then hθ >= 0.5
• So what we've shown is that the hypothesis predicts y = 1 when θT x >= 0
• The corollary of that when θT x <= 0 then the hypothesis predicts y = 0
7
Darakhshan Khan
LOGISTIC REGRESSION - DECISION
• BOUNDARY(1)
Example, h (x) = g(θ + θ x + θ x )
θ 0 1 1 2 2

• Assume, θ0 = -3, θ1 = 1, θ2 = 1
• So our parameter vector is a column vector with the above values, i.e., θ T is a row vector = [-3,1,1]
• What does this mean? The z here becomes θT x
• We predict "y = 1" if -3x0 + 1x1 + 1x2 >= 0
• -3 + 1x1 + 1x2 >= 0

• We can also re-write this as If (x1 + x2 >= 3) then we predict y = 1


• If we plot x1 + x2 = 3, we graphically plot our decision boundary
• Means we have these two regions on the graph
• Blue = false, Magenta = true
• Line = decision boundary
• Concretely, the straight line is the set of points where hθ(x) = 0.5 exactly
• The decision boundary is a property of the hypothesis
8
• Means we can create the boundary with the hypothesis(function) and parameters without any data
Darakhshan Khan
• Later, we use the data to determine the parameter values
LOGISTIC REGRESSION – NON LINEAR DECISION
BOUNDARY
• Get logistic regression to fit a complex non-linear data set i,e. add
higher order terms in hypothesis function
• hθ(x) = g(θ0 + θ1x1 + θ2x2 + θ3x12 + θ4x22)
• We take the transpose of the θ vector times the input vector i.e θ T was
[-1,0,0,1,1]
• So, Predict that "y = 1" if -1 + x12 + x22 >= 0 or x12 + x22
>= 1
• If we plot x12 + x22 = 1, this gives a circle with a radius of 1 around origin
• This indicates more complex decision boundaries can be build by fitting
complex parameters to this (relatively) simple hypothesis
• More complex decision boundaries?
• By using higher order polynomial terms, we can get even more complex
decision boundaries
9
Darakhshan Khan
LOGISTIC REGRESSION – COST FUNCTION
(ERROR/LOSS)
• Fit θ parameters
• Define the optimization objective for the cost function and fit the parameters

10
Darakhshan Khan
LOGISTIC REGRESSION – COST FUNCTION
(ERROR/LOSS)
• Linear regression uses the following function to determine θ
• Instead of writing the squared error term, we can write it as "cost()“,
• cost(hθ(xi), y) = 1/2(hθ(xi) - yi)2
• Which evaluates to the cost for an individual example using the same measure as used in linear regression
• redefine J(θ) as
• To further simplify it we can get rid of the superscripts
• If we use this function for logistic regression this is a non-convex function for parameter optimization
• We have some function - J(θ) - for determining the parameters
• Our hypothesis function has a non-linearity (sigmoid function of hθ(x) )
• This is a complicated non-linear function
• If you take hθ(x) and plug it into the Cost() function, and them plug the Cost() function into J(θ) and
plot J(θ) we find many local optimum -> non convex function
• Lots of local minima mean gradient descent may not find the global optimum - may get stuck in a
local minimum
11
Darakhshan Khan
LOGISTIC REGRESSION – COST FUNCTION (ERROR/LOSS)
(1)cost, where we can apply gradient descent
• We need different convex logistic regression

• This is our logistic regression cost function i.e. penalty the algorithm pays
• Plot the function, Plot y = 1 hθ(x) evaluates as -log(hθ(x) )

• So when hθ(x) =1 (correct), cost function is 0 else it slowly increases cost function as we become
"more" wrong , i.e. hθ(x) = 0
• X axis is what we predict
• Y axis is the cost associated with that prediction
• This cost functions has some interesting properties
• If y = 1 and hθ(x) = 1
• If hypothesis predicts exactly 1 and thats exactly correct then that corresponds to 0 (exactly, not nearly 0)
• As hθ(x) goes to 0
• Cost goes to infinity, This captures the intuition that if hθ(x) = 0 (predict P (y=1|x; θ) = 0) but y = 1
this will penalize the learning algorithm with a massive cost
• What about if y = 0, then cost is evaluated as -log(1- hθ(x)) 12
• Just
Darakhshan Khanget inverse of the other function
LOGISTIC REGRESSION – SIMPLIFIED COST FUNCTION
• Define a simpler way to write the cost function and apply gradient descent to the logistic regression

• Rather than writing cost function on two lines/two cases. We can compress them into one equation -
more efficient
• cost(hθ, (x),y) = -ylog( hθ(x) ) - (1-y)log( 1- hθ(x) )
• If y = 1,then -log(hθ(x)) - (0)log(1 - hθ(x)) = -log(hθ(x))
• Which is what we had before when y = 1

• If y = 0, then -(0)log(hθ(x)) - (1)log(1 - hθ(x)) = -log(1- hθ(x))


• Which is what we had before when y = 0

• Cost function for the θ parameters can be defined as


13
Darakhshan Khan
LOGISTIC REGRESSION – SIMPLIFIED COST FUNCTION(1)

• To fit parameters θ:
• Find parameters θ which minimize J(θ)
• This means we have a set of parameters to use in our model for future
predictions
• Then, if we're given some new example with set of features x, we can
take the θ which we generated, and output our prediction using

• Which is P(y=1 | x ; θ), Probability y = 1, given x, parameterized by θ 14


Darakhshan Khan
LOGISTIC REGRESSION – GRADIENT DESCENT
How to minimize the logistic regression cost function J(θ)
• Use gradient descent as before
• Repeatedly update each parameter using a learning rate (For derivation refer notes)

• It looks identical, but the hypothesis for Logistic Regression is different from Linear
Regression
• Ensuring Gradient Descent is Running Correctly

15
Darakhshan Khan
LOGISTIC REGRESSION – MULTICLASS
CLASSIFICATION PROBLEMS
• Similar terms: One-vs-all or One-vs-rest
• Examples
• Email folders or tags (4 classes): Work, Friends, Family, Hobby
• Medical Diagnosis (3 classes): Not ill, Cold, Flu
• Weather (4 classes): Sunny, Cloudy, Rainy. Snow
• Binary vs Multi-class

16
Darakhshan Khan
LOGISTIC REGRESSION – MULTICLASS
CLASSIFICATION PROBLEMS
One-vs-all (One-vs-rest)
• Split them into 3 distinct groups and
compare them to the rest
• If you have k classes, you need to
train k logistic regression classifiers

17
Darakhshan Khan

You might also like