0% found this document useful (0 votes)

35 views47 pages

DDA3020 Lecture 06 Logistic Regression

This document outlines a lecture on logistic regression. It begins with a review of linear regression, discussing both the deterministic and probabilistic perspectives. It then introduces classification problems and representations before covering the topics of logistic regression, regularized logistic regression, and the probabilistic perspective of logistic regression. Variants of linear regression like ridge regression, lasso regression, and robust regression are also briefly discussed.

Uploaded by

J Deng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views47 pages

DDA3020 Lecture 06 Logistic Regression

Uploaded by

J Deng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

DDA3020 Machine Learning

Lecture 06 Logistic Regression

Jicong Fan
School of Data Science, CUHK-SZ

October 10/12, 2022

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 1 / 47
Outline

1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 2 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 3 / 47
Linear regression: deterministic perspective

Linear hypothesis function: fw,b (x) = x> w + b, or, simply fw (x) =

x> w by concatenating b and w together and augmenting x to [1; x]
Linear regression by minimizing residual sum of squares (RSS):
m
1X > 1
w∗ = arg min J(w), where J(w) = (xi w − yi )2 = kXw − yk2
w 2 i=1 2

Two solutions:
−1 >
Closed-form solution: w∗ = X> X X y
Gradient descent: w ← w − αX> (Xw − y), for multiple iterations until
convergence

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 4 / 47
Linear regression: probabilistic perspective
We assume that: y = w> x + e, where e ∼ N (0, σ 2 ) is called observation
noise or residual error
y is also a random variable, and its conditional probability is

p(y|x, w) = N (w> x, σ 2 )

Maximum log-likelihood estimation:

m
Y
wM LE = arg max log L(w|D) = arg max log p(yi |xi , w) (1)
w w
i
m
X m
X
= arg max log p(yi |xi , w) = arg max log N (w> xi , σ 2 ) (2)
w w
i i
m
1 X m
= arg max − log(σ m (2π) ) − 2 (yi − w> xi )2
2 (3)
w 2σ i
m
1X
= arg min (yi − w> xi )2 , (4)
w 2 i

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 5 / 47
Variants of linear regression
Ridge regression to avoid over-fitting, through MAP estimation:
m
X
wM AP = arg max log p(yi |xi , w) + log p(w) (5)
w
i=1
Xm
= arg max log N (w> xi , σ 2 ) + N (w|0, τ 2 I) (6)
w
i=1
m
X
≡ arg min (w> xi − yi )2 + λkwk22 . (7)
w
i=1

Polynomial regression: linear model with basis expansion φ(x)

d
X d X
X d d X
X d X
d
fw,b (x) = b + wi xi + wij xi xj + wijk xi xj xk + . . .
i=1 i=1 j=1 i=1 j=1 k=1

= φ(x)> w, (8)
>
φ(x) = [1, x1 , . . . , xd , . . . , xi xj , . . . , xi xj xk , . . .] ,
w = [b, w1 , . . . , wd , . . . , wij , . . . , wijk , . . .]> .
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 6 / 47
Variants of linear regression

Lasso regression to obtain sparse model,

m
X
wM AP = arg max log N (w> xi , σ 2 ) + Lap(w|0, b) (9)
w
i
m
X
= arg min (w> xi − yi )2 + λkwk1 . (10)
w
i=1

Robust regression for data with outliers:

m
X
wM LE = arg min |w> xi − yi | (11)
w
i=1

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 7 / 47
Summary of different linear regressions
Note that the uniform distribution will not change the mode of the likelihood.
Thus, MAP estimation with a uniform prior corresponds to MLE.
p(y|x, w) p(w) regression method
Gaussian Uniform Least squares
Gaussian Gaussian Ridge regression
Gaussian Laplace Lasso regression
Laplace Uniform Robust regression
Student Uniform Robust regression

u1
ML Estimate

MAP Estimate

prior mean

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 8 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 LogisticOctober
Regression
10/12, 2022 9 / 47
Classification

Classification: classifying input data into discrete states

Email filtering: spam / not spam?
Weather forecast: sunny / not sunny?
Tumor: malignant / benign?
The label y ∈ {0, 1}:
y = 0: negative class, e.g., not spam, not sunny, benign
y = 1: positive class, e.g., spam, sunny, malignant

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 10 / 47
Threshold classifier with linear regression

We assume a linear hypothesis function fw,b (x) = x> w + b

A simple threshold classifier with this hypothesis function is
If fw,b (x) > 0.5, then y = 1, i.e., malignant tumor
If fw,b (x) < 0.5, then y = 0, i.e., benign tumor

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 11 / 47
Threshold classifier with linear regression

It seems that the simple threshold classifier with linear regression works
well on this classification task
However, if there is a positive sample with very large tumor size (plot
above), what will happen?
The hypothesis function will be significantly changed, causing that some
positive samples are mis-classified as negative (not malignant). How to han-
dle it? Adjusting the threshold value, or adopting robust linear regression.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 12 / 47
Threshold classifier with linear regression

But there is still something wired.

Our goal is to predict y ∈ {0, 1}, but the prediction could be fw,b (x) > 1
or fw,b (x) < 0, which does not serve our purpose.
A desired hypothesis function for this task should be fw,b (x) ∈ [0, 1].

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 13 / 47
Threshold classifier with linear regression

Exercise: Which statements are true?

If linear regression doesn’t work well like the above example, feature scaling
may help
If the training set satisfies that all yi ∈ [0, 1] for all points (xi , yi ), then the
linear hypothesis function fw,b (x) ∈ [0, 1] for all values of xi
None of the above is correct

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 14 / 47
Hypothesis representation

A desired hypothesis function for this task should be fw,b (x) ∈ [0, 1]
To this end, we introduce a novel function, as follows:
1
fw,b (x) = g(w> x) ∈ [0, 1], g(z) = ,
1 + exp(−z)

where g(·) is called sigmoid function or logistic function (shown below)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−10 −5 0 5 10

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 15 / 47
Hypothesis representation
Interpretation of sigmoid/logistic function
fw,b (x) = estimated probability that y = 1 of input x.
For example (plot below), if fw,b (x) = 0.8, then it means that a patient
with tumor size x has 80% chance of tumor being malignant. In this task,
larger tumor size has larger chance/probability of being malignant tumor.
Thus, we can say that

fw,b (x) = P (y = 1|x; w).

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−10 −5 0 5 10

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 16 / 47
Decision boundary
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
−10 −5 0 5 10

In logistic regression, we have

1
fw,b (x) = g(w> x + b) = P (y = 1|x; w) ∈ [0, 1], g(z) = .
1 + exp(−z)
Suppose that if fw,b (x) ≥ 0.5, then we predict y = 1; if fw,b (x) < 0.5, then
we predict y = 0
Correspondingly, if w> x + b ≥ 0, we predict y = 1; if w> x + b < 0, then
we predict y = 0.
It determines the decision boundary, which is the curve/hyper-plane cor-
responding to fw,b (x) = 0.5, or w> x + b = 0
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 17 / 47
Decision boundary

fw,b (x) = g(b + w1 x1 + w2 x2 ) = g(−3 + x1 + x2 )

Predict y = 1 if −3 + x1 + x2 ≥ 0 (plot above)

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 18 / 47
Decision boundary

Figure: Non-linear decision boundary

fw,b (x) = g(b + w1 x1 + w2 x2 + w3 x21 + w4 x22 ) = g(−1 + x21 + x22 )

Predict y = 1 if −1 + x21 + x22 ≥ 0 (plot above)

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 19 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 20 / 47
Cost function

Training set: m training examples {(xi , yi )}m

i=1
Hypothesis function: fw,b (x) = g(w> x + b) = 1
1+exp(−w> x−b)
Cost function:
1
Pm 2 1 2
Linear regression: J(w) = 2m i=1 (fw,b (xi ) − yi ) = 2m kXw − yk ,
which is called `2 loss or residual sum of squares
It is convex w.r.t. w for linear regression
Logistic regression: If we adopt the same cost function for logistic regres-
sion, we have
m
1 X
J(w) = (g(w> xi ) − yi )2 .
2m i
However, it is non-convex w.r.t. w.

Exercise 1: Prove the `2 loss is convex w.r.t. w for linear regression.

Exercise 2: Prove the `2 loss is non-convex w.r.t. w for logistic regression.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 21 / 47
Cost function

Exercise 1: Prove the `2 loss is convex w.r.t. w for linear regression.

Exercise 2: Prove the `2 loss is non-convex w.r.t. w for logistic regression.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 22 / 47
Cost function
Cross-entropy:
Z X
H(p, q) = − p(x) log(q(x))dx or − p(x) log(q(x)),
x x

where p(x), q(x) are probability density functions (PDF) of x if x is

a continuous random variable, or, probability mass functions if x is a
discrete random variable
We set
ground-truth posterior probability : y(x) = P (y = 1|x),
predicted posterior probability : fw,b (x) = P (y = 1|x; w).
Cross-entropy loss:

cost y(x), fw,b (x) = H y(x), fw,b (x)
= − P (y = 1|x) · log P (y = 1|x; w) − P (y = 0|x) · log P (y = 0|x; w)
(
− log(fw,b (x)), if y(x) = 1
=
− log(1 − fw,b (x)), if y(x) = 0

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 23 / 47
Cost function for logistic regression
Cross-entropy loss:
(
− log(fw,b (x)), if y(x) = 1
cost y(x), fw,b (x) =
− log(1 − fw,b (x)), if y(x) = 0

For y = 1, if fw,b (x) = 1, i.e., P (y = 1|x; w) = 1, then the prediction

equals to the ground-truth label, the cost is 0.
For y = 1, if fw,b (x) → 0, i.e., P (y = 1|x; w) → 0, then it should be
penalized with a very large cost. Here we have cost(y(x), fw,b (x)) → ∞.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 24 / 47
Cost function for logistic regression
Cross-entropy loss:
(
− log(fw,b (x)), if y(x) = 1
cost(y(x), fw,b (x)) =
− log(1 − fw,b (x)), if y(x) = 0
For y = 0, if fw,b (x) = 0, i.e., P (y = 1|x; w) = 0, then the prediction
equals to the ground-truth label, the cost is 0
For y = 0, if fw,b (x) → 1, i.e., P (y = 1|x; w) → 0, then it should be
penalized with a very large cost. Here we have cost(y(x), fw,b (x)) → ∞

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 25 / 47
Cost function for logistic regression

Cross-entropy loss:
(
− log(fw,b (x)), if y(x) = 1
cost(y(x), fw,b (x)) =
− log(1 − fw,b (x)), if y(x) = 0

Exercise: Which states are true?

If fw,b (x) = y, then cost(y(x), fw,b (x)) = 0 for both y = 0 and y = 1
If y = 0, then cost(y(x), fw,b (x)) → ∞ as fw,b (x) → 1
If y = 0, then cost(y(x), fw,b (x)) → ∞ as fw,b (x) → 0
Regardless whether y = 0 or y = 1, if fw,b (x) = 0.5, then
cost(y(x), fw,b (x)) > 0

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 26 / 47
Cost function of logistic regression

Cost function of logistic regression

m
1 X
J(w) = cost(yi , fw,b (xi )),
m i=1
(
− log(fw,b (x)), if y(x) = 1
cost(y(x), fw,b (x)) =
− log(1 − fw,b (x)), if y(x) = 0

The above cost function can be simplified as follows

m
1 X
J(w) = − yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) .
m i=1

Exercise: Please prove that J(w) is convex w.r.t. w.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 27 / 47
Gradient descent for logistic regression

Learning w by minimize J(w), i.e.,

m
1 X
w∗ = arg min J(w) = −

yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) .
w m i=1

Gradient descent: repeat the following update until convergence

w ← w − α∇w J(w)
m
1 X
∇w J(w) = [fw,b (xi ) − yi ]xi
m i=1

How to define convergence? Calculating the changes of J(w) or w in the

last K steps, if the change is lower than a threshold, than it can be seen as
convergence. Remember that choosing suitable learning rate α is important
to achieve a good converged solution.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 28 / 47
Gradient descent for logistic regression

Exercise: Suppose you are running a logistic regression model, and you should
observe the learning procedure to find a suitable learning rate α. Which of the
following is reasonable to make sure α is set properly and that the gradient
descent is running correctly?
1
Pm 2
Plot J(w) = − m i (yi − fw,b (xi )) as a function of the number of itera-
tions (i.e., the horizontal axis is the iteration number) and make sure J(w)
is decreasing on every iteration.
1
Pm
Plot J(w) = − m i yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) as a
function of the number of iterations (i.e., the horizontal axis is the iteration
number) and make sure J(w) is decreasing on every iteration.
Plot J(w) as a function of w and make sure it is decreasing on every
iteration.
Plot J(w) as a function of w and make sure it is convex.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 29 / 47
Multi-class classification

Binary classification: in above examples and derivations, we only consider

the binary classification problem, i.e., y ∈ {0, 1}.
Multi-class/multi-category classification: however, many practical prob-
lems involve with multi-category outputs, i.e., y ∈ {1, . . . , C}:
Weather forecast: sunny, cloudy, rain, snow
Email tagging: work, friends, families, hobby

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 30 / 47
Multi-class classification

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 31 / 47
Multi-class classification: one-vs-all

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 32 / 47
Multi-class classification: one-vs-all

One-vs-all logistic regression:

Train a binary logistic regression fwj ,bj (·) for each class j, by setting all
samples of other classes as negative class
For a new testing sample x, predict its class as arg maxj fwj ,bj (x).
Pros: Easy to implement
Cons: The training cost is too high, and is difficult to scale to tasks with large
number of classes.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 33 / 47
Multi-class classification: Softmax regression

Softmax function:

(j) exp(wj> x + bj )
fW,b (x) = PC = P (y = j|x; W, b), (12)
>
c=1 exp(wc x + bc )

where W = [w1 , . . . , wC ], b = [b1 ; b2 ; . . . ; bC ] with C being the number of

(j)
classes. For simplicity, in the following we write fW,b (·) as fwj ,bj (·)
Cost function:
m C
1 XX
J(W) = − I(yi = j) log(fwj ,bj (xi )) , (13)
m i j

where I(a) = 1 if a is true, otherwise I(a) = 0.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 34 / 47
Multi-class classification: Softmax regression
It can also be optimized by gradient descent:

∂J(W)
wj ← wj − α ,
∂wj
m
∂J(W) 1 X I(yi = j) ∇fwj ,bj (xi )
=− ·
∂wj m i fwj ,bj (xi )) ∇wj
C
X I(yi 6= j) ∇fwc ,bc (xi )
+ ·
f
c=1 wc ,bc
(xi )) ∇wj
∇fwj ,bj (xi )
= fwj ,bj (xi ) · (1 − fwj ,bj (xi )) · xi .
∇wj
∇fwc ,bc (xi )
= −fwj ,bj (xi ) · fwc ,bc (xi ) · xi
∇wj
m
∂J(W) 1 X
=⇒ = fwj ,bj (xi ) − I(yi = j) xi (14)
∂wj m i

Note: {wc }C
c=1 should be updated in parallel, rather than sequentially.
Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 35 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 36 / 47
Overfitting in linear regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 37 / 47
Overfitting in linear regression

Overfitting: If we have too many features, the learned hypothesis may fit the
training data very well (low bias), but fail to generalize to new examples.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 38 / 47
Overfitting in logistic regression

Under-fitting Good-fitting Over-fitting

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 39 / 47
Addressing Overfitting

Generally, there are two approaches to address the overfitting problem, includ-
ing:
Reducing the number of features:
Feature selection
Dimensionality reduction (introduced in later lectures)
Regularization:
Keep all features, but reduce magnitude/value of each parameter, such that
each feature contributes a bit to predict y
In the following, we will focus on the regularization-based approach.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 40 / 47
Regularized logistic regression
The objective function of the regularized logistic regression is formulated
as follows
d
¯ λ X 2
J(w) = J(w) + w
2m j=1 j
m d
1 X λ X 2
=− yi log(fw,b (xi )) + (1 − yi ) log(1 − fw,b (xi )) + w .
m i 2m j=1 j

Note: the bias parameter w0 (or b) is not regularized/penalized.

The above objective function can also be solved by gradient descent, as
follows
m
α X
w0 ← w0 − (fw,b (xi ) − yi ) · xi (0), where xi (0) = 1, ∀i
m i=1
m
α X
wj ← wj − (fw,b (xi ) − yi ) · xi (j) + λ · wj ,
m i=1

where xi (j) denotes the j-th entry of xi , and j = 0, . . . , d.

Jicong Fan School of Data Science, CUHK-SZ
DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 41 / 47
Regularized logistic regression

Exercise: When using regularized logistic regression, which of these is the

best way to monitor whether gradient descent is working correctly?
Plot J(w) as a function of the number of iterations and make sure it’s
decreasing
λ
Pd 2
Plot J(w) − 2m j=1 wj as a function of the number of iterations and
make sure it’s decreasing
λ
Pd 2
Plot J(w) + 2m j=1 wj as a function of the number of iterations and
make sure it’s decreasing
Pd
Plot j=1 wj2 as a function of the number of iterations and make sure it’s
decreasing

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 42 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 43 / 47
Logistic regression: probabilistic modeling
Behind logistic regression for binary classification, we assume that
both the feature x and and the label y are random variables, as follows

µ(x|w) = Sigmoid(w> x),

y(x|w) ∼ Bernoulli(µ(x|w)).

Then, we have
(
µ if y = 1,
P (y|x; w) =
1 − µ if y = 0.

The log-likelihood function of P (y|x; w) is formulated as

L(w) = y log(µ) + (1 − y) log(1 − µ).

Thus, we obtain

max L(w) ≡ min J(w).

w w

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 44 / 47
Logistic regression: probabilistic modeling

Behind logistic regression, we assume that

µ(x|w) = Sigmoid(w> x),

y(x|w) ∼ Bernoulli(µ(x|w)).

`2 -regularized logistic regression: we further assume w ∼ N (w|0, σ 2 I),

then we have
d
λ X 2
max L(w) + log N (w|0, σ 2 I) ≡ min J(w) + w .
w w 2m j=1 j

`1 -regularized logistic regression: if we assume w ∼ Laplace(w|0, b), then

we have
d
λ X
max L(w) + log Laplace(w|0, b) ≡ min J(w) + |wj |.
w w 2m j=1

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 45 / 47
1 Review of last week

2 Classification and representation

3 Logistic regression

4 Regularized logistic regression

5 Probabilistic perspective of logistic regression

6 Summary: linear regression vs. logistic regression

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 46 / 47
Summary: linear regression vs. logistic regression

Linear regression Logistic regression

Task regression classification
Hypothesis fw,b (x) w> x + b ∈ (−∞, ∞) g(w> x+ b) ∈ [0, 1]
1
Pm > 2 1
Pm
Objective J(w) 2m i (yi − w xi ) −m i=1 yi log(fw,b (xi ))
+(1 − yi ) log(1 − fw,b (xi ))
Solution closed-form or gradient descent gradient descent

Note that: For each variant of linear/logistic regression, you can derive it from both
the deterministic and the probabilistic perspectives.
Own reading: Both linear regression and logistic regression are special cases of gen-
eralized linear models. If interested, you can find more details from Section 4 of th
book “Pattern Recognition and Machine Learning”, Bishop, 2006.

Jicong Fan School of Data Science, CUHK-SZ

DDA3020 Machine Learning Lecture 06 Logistic
October
Regression
10/12, 2022 47 / 47

Unit 3-ML
No ratings yet
Unit 3-ML
99 pages
Lecture 03 Logistic Regression
No ratings yet
Lecture 03 Logistic Regression
34 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
10 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
P-2 M.L. M-I U-I Logistic Regression
No ratings yet
P-2 M.L. M-I U-I Logistic Regression
50 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Logistic - Regression Class 3
No ratings yet
Logistic - Regression Class 3
88 pages
Lecture 07
No ratings yet
Lecture 07
26 pages
Lecture W3
No ratings yet
Lecture W3
28 pages
Data Science - UNIT-3 - Notes
No ratings yet
Data Science - UNIT-3 - Notes
32 pages
Eml 24.7.25
No ratings yet
Eml 24.7.25
23 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
41 pages
L14 Logistic Regression
No ratings yet
L14 Logistic Regression
22 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
03-Logistic Regression
No ratings yet
03-Logistic Regression
59 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
ML-Unit I - Logistic Regression
No ratings yet
ML-Unit I - Logistic Regression
102 pages
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
100% (1)
Logistic Regression: Gunjan Bharadwaj Assistant Professor Dept of CEA
42 pages
Binary Logistic Regression 2
No ratings yet
Binary Logistic Regression 2
43 pages
Lecture 8 Logistic Regression
No ratings yet
Lecture 8 Logistic Regression
34 pages
Notes 05
No ratings yet
Notes 05
51 pages
Ziad Aladawy - Logistic Regressio
No ratings yet
Ziad Aladawy - Logistic Regressio
54 pages
Exp 2
No ratings yet
Exp 2
7 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic Regression Annotated
No ratings yet
Logistic Regression Annotated
23 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Classification-Introduction, Logistic Regression
No ratings yet
Classification-Introduction, Logistic Regression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
ML Assignment
No ratings yet
ML Assignment
20 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
11 pages
11logistic Regression in Machine Learning - GeeksforGeeks
No ratings yet
11logistic Regression in Machine Learning - GeeksforGeeks
4 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Assignment - Predictive Modeling
88% (24)
Assignment - Predictive Modeling
66 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Week 8
No ratings yet
Week 8
38 pages
Logistic Regression for ML Students
No ratings yet
Logistic Regression for ML Students
34 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Algorithms Notes
No ratings yet
Algorithms Notes
66 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
Logistic Regression
No ratings yet
Logistic Regression
19 pages
Bayesian Decision Theory Guide
No ratings yet
Bayesian Decision Theory Guide
39 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
An Introduction To Bayesian Methods For Analyzing Chemistry Data Part 2 PDF
No ratings yet
An Introduction To Bayesian Methods For Analyzing Chemistry Data Part 2 PDF
10 pages
2021 Book TheFourthIndustrialRevolutionI PDF
No ratings yet
2021 Book TheFourthIndustrialRevolutionI PDF
473 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
Machine Learning: Logistic Regression
No ratings yet
Machine Learning: Logistic Regression
19 pages
Generative AI at SAP
No ratings yet
Generative AI at SAP
36 pages
Educational Data Mining Survey
100% (1)
Educational Data Mining Survey
6 pages
Scikit-learn ML Course Guide
100% (1)
Scikit-learn ML Course Guide
23 pages
4 李孝悌：桃花扇底送南朝断裂的逸乐
No ratings yet
4 李孝悌：桃花扇底送南朝断裂的逸乐
53 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
102 pages
ML and AI Notes
100% (1)
ML and AI Notes
43 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
6 pages
FIFA Video Game - Players Classification
No ratings yet
FIFA Video Game - Players Classification
26 pages
AI-Powered Support for Slow Learners
No ratings yet
AI-Powered Support for Slow Learners
8 pages
INT354 Lecture 0
No ratings yet
INT354 Lecture 0
33 pages
MGNM801 Ca2
No ratings yet
MGNM801 Ca2
19 pages
Data Mining
No ratings yet
Data Mining
14 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
48 pages
1051791158741317
No ratings yet
1051791158741317
74 pages
Program Book For Short-Term Internship As On 18-10-2022
No ratings yet
Program Book For Short-Term Internship As On 18-10-2022
57 pages
Sse 11 24 549 4
No ratings yet
Sse 11 24 549 4
1 page
Course Outline (Ds & Ai) 2024
No ratings yet
Course Outline (Ds & Ai) 2024
13 pages
Tutorial 2
No ratings yet
Tutorial 2
58 pages
CH 03
No ratings yet
CH 03
56 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
AnClub Placements Prepbook 2024
No ratings yet
AnClub Placements Prepbook 2024
85 pages
DL - NLP - Reading Materials - Bda - Cs - 25
No ratings yet
DL - NLP - Reading Materials - Bda - Cs - 25
25 pages
Idsa Reviewer
No ratings yet
Idsa Reviewer
4 pages
DDA3020 Lecture 02 Linear Algebra
No ratings yet
DDA3020 Lecture 02 Linear Algebra
37 pages
SMSSpam Detectionusing Deep Neural IJPAM
No ratings yet
SMSSpam Detectionusing Deep Neural IJPAM
13 pages
Decision Trees & Clustering Guide
No ratings yet
Decision Trees & Clustering Guide
21 pages
The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study
No ratings yet
The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study
6 pages
JCTC: A Large Job Posting Corpus For Text Classification: Haoyu Xu
No ratings yet
JCTC: A Large Job Posting Corpus For Text Classification: Haoyu Xu
15 pages
Data Mining: Lecture - 03
No ratings yet
Data Mining: Lecture - 03
56 pages
Jon Krohn Metis Deep Learning 2017-05-01
No ratings yet
Jon Krohn Metis Deep Learning 2017-05-01
107 pages