Prepared by : Dr.
Hanaa Bayomi
Updated By: Prof Abeer ElKorany
Lecture 4 : Logistic Regression
Flach talks about three types of Machine
Learning models [Fla12]
Geometric Linear Regression
models
ML Models Logical models Decision Tree
Probabilistic Logistic Regression
Models
Flach talks about three types of Machine
Learning models [Fla12]
Geometric Linear Regression
models
ML Models Logical models Decision Tree
Probabilistic Logistic Regression
Models
CLASSIFICATION
The classification problem is just like the regression problem,
except that the values y we now want to predict take on only a
small number of discrete values.
Some Example of Classification problem
• Email : Spam / Not spam
• Tumor: Malignant/ Benign
• Transaction : Fraud /NO
CLASSIFICATION
CLASSIFICATION
0.5
CLASSIFICATION
∞ ∞
Logistic Regression
∞ ∞
Logistic Regression
Logistic Regression
Logistic regression with two parameters : X1,
X2 Range from 0-3
Logistic regression Using polynomial function
Logistic regression Using polynomial function
Logistic regression Using polynomial function
Logistic regression Using polynomial function
Logistic Regression
Linear Regression VS Logistic Regression
1.Linear Regression: Linear regression is used to model the
relationship between a dependent variable and one or more
independent variables, assuming a linear relationship. It is
primarily used for predicting continuous numeric values.
2.Logistic Regression: Logistic regression is used to model the
relationship between a dependent variable and one or more
independent variables, with the aim of predicting the probability
of an event or a binary outcome. It is commonly used for
classification problems where the dependent variable is
categorical.
How to choose parameters
Cost function
Linear regression:
Logistic Regression
“non-convex”
“convex”
Local Minima
Logistic cost function
Logistic cost function
Logistic cost function
Logistic cost function
Logistic regression cost function
If y = 1
0 1
Logistic cost function
Logistic regression cost function
If y = 0
0 1
Logistic cost function
Logistic regression cost function
It is required to find the parameters w and B that minimize cost
Simplified Loss function
When Y=1
When Y=0
Cost Function
This is based on maximum likehood principles from statistics
It is required to find the parameters w and B that minimize cost
Gradient Descent
Gradient Descent
Want :
Repeat
(simultaneously update all )
GRADIENT DESCENT
▪ in Linear Regression
▪ in Logistic Regression
{
m 1
j = j + y
i − xij
1 + e − xi
t
i =1
}
We can now use gradient ascent to maximize j(θ) The update rule will be:
repeat until convergence
DEFINITION
▪ Binary Logistic Regression
•We have a set of feature vectors X with corresponding binary
outputs
T
X = {x1,x2 ,....,xn }
T
Y = { y1, y2 ,...., yn } , where yi {0,1}
• We want to model p(y|x)
p ( yi = 1 | xi , ) = j xij = xi
j
By definition p ( yi = 1 | xi , ) {0,1} . We want to transform the
probability to remove the range restrictions, as xiθ can take any
real value.
USING ODDS
▪ Odds
p : probability of an event occurring
1 – p : probability of the event not occurring
The odds for event i are then defined as
pi
oddsi =
1 − pi
Taking the log of the odds removes the range restrictions.
pi
log = j xij = xi
1 − pi j
This way we map the probabilities from the [0; 1] range to the
entire number line (real value).
LOGISTIC REGRESSION MODEL
Logistic Regression
Linear Regression
1
1, −x
0.5
t 1+ e
h ( x) = x
t g ( x ) = 0, 1-
1
0.5
1+ e -x
1
p ( y i = 1 | xi , ) =
− t x
1+ e
1
p ( y i = 0 | xi , ) = 1 −
− t x
1+ e
yi 1− yi
1 1
p ( y i | xi : ) = 1 −
− t x − t x
1+ e 1+ e
h ( x) = p( y = 1 | x; )
Logistic
Regression
Multi-class
classification: One-vs-all
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby
Medical diagrams: Not ill, Cold, Flu
Weather: Sunny, Cloudy, Rain, Snow
Binary classification: Multi-class
classification:
x2 x2
x1 x1
One-vs-all (one-vs-rest): x2
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
One-vs-all (one-vs-rest): x2
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
x1