Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views52 pages

Data Science L19 - LogisticRegression

The document provides an overview of Linear and Logistic Regression, focusing on their applications in predictive modeling and classification tasks. It explains the concepts of ground truth data, cost functions, and the sigmoid function, which is crucial for logistic regression. Additionally, it discusses the decision boundary and the importance of using appropriate cost functions for optimization in machine learning.

Uploaded by

Abhishek Goutam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views52 pages

Data Science L19 - LogisticRegression

The document provides an overview of Linear and Logistic Regression, focusing on their applications in predictive modeling and classification tasks. It explains the concepts of ground truth data, cost functions, and the sigmoid function, which is crucial for logistic regression. Additionally, it discusses the decision boundary and the importance of using appropriate cost functions for optimization in machine learning.

Uploaded by

Abhishek Goutam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

ED5340 - Data Science: Theory apa th y

and Practise an
h ug
u t
M
a n
L19 - Logistic Regression (Credit
n a t to Andrew Ng)
h
a
Ram

Ramanathan Muthuganapathy (https://ed.iitm.ac.in/~raman)


Course web page: https://ed.iitm.ac.in/~raman/datascience.html
Moodle page: Available at https://courses.iitm.ac.in/
Linear Regression
Predictive problem - Continuous input / output

• Ground truth data - Input feature / output (x, y)hy are the knowns
a t
ap
• Use a model / hypothesis as h(w) ug
an
th
u
• Develop an error / cost / loss function a n
M J(w) = J(y, ȳ) = J(y, h(w))
t h
n a
a
• The weights are identified byam
R
• min J(w)
• Essentially, ML problem is now reduced to an optimization problem.
• Weights are identified using Optimization.
Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras
Linear Regression
Predictive

• Ground truth data - Input feature / output (x,hyy) are the knowns
a t
ap
• Use a model / hypothesis as h(w) anducost
ga function J(w) n
th
u
M
• h a n
a t
a n
Input (x) am
Hypothesis
R h(w)
Weights / Parameters
Loss function J(w)
Output (Y)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Classification (binary)

• Ground truth data - Input y


feature / output (x, y) are the pa th
n a
knowns a
h ug
u t

Output (y)
• Output is either 0 or 1
h a n
M
a t
a n 1
Ram

0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Classification (binary) - Examples

• Spam / Not spam y


th
pa
• Malignant / benign an a
h ug
u t
• Fraud / No fraud

Output (y)
M
a n
t h
a
• Good / bad grades a n 1
Ram

0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Classification (binary)

• Ground truth data - Input y


feature / output (x, y) are the pa th
n a
knowns a
h ug
u t

Output (y)
• Output is either 0 or 1
h a n
M
a t
a n
• - Benign
Ram
1

• - Malignant

0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Hypothesis - Linear Regression Model

• Ground truth data - Input y


feature / output (x, y) are the pa th
n a
knowns a
h ug hw(x ) (i)
u t

Output (y)
• Output is either 0 or 1
h a n
M
a t
a n
• - Small
Ram
1

• - Large
(i) (i) (i)
• ȳ = hw(x ) = w0 + w1x 0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Hypothesis - Linear Regression Model with thresholding

(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
hw(x ) (i)
u t

Output (y)
M
a n
t h
n a
a 1
Ram
0.5

0
(x)
T-shirt size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Hypothesis - Increase the training data

(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
u t

Output (y)
M
a n
t h
n a
a 1
Ram
0.5

0
(x)
T-shirt size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Hypothesis - Increase the training data

(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
(i)
u t hw(x )

Output (y)
M
• Misclassification starts t h a n
happening a n a
1
Ram
• Not a good idea to use 0.5
Linear Regression

• y < 0 or y > 1
0
(x)
T-shirt size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid function

T
• hw(x) = w x h y
a t
ap
T
• hw(x) = σ(w x) h ug
an
u t
1 a n
M

• σ(z) = a th
1+e −z
am
a n
R
• σ(z) is called Sigmoid or
Logistic function.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −z
n apa
a
ug
• σ(z) is called Sigmoid or M
u th
Logistic function. h a n
a t
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −5z
n apa
a
ug
• σ(z) with 5 M
u th
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −10z
n apa
a
ug
• σ(z) with 10. M
u th
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −100z
n apa
a
ug
• σ(z) with 100 M
u th
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −z
n apa
a
ug
• Smoother approximation of u th
step function M
a n
th
n a
a
• This means what? R am

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Observations

1
• σ(z) = th y
1+e −z
n apa
a
ug
• 0 ≤ σ(z) < = 1 M
u th
a n
0.5 t h
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Observations

1
• σ(z) = th y
1+e −z
n apa
a
ug
• value of σ(z) at z = 0? M
u th
a n
0.5 t h
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Observations

1
• σ(z) = th y
1+e −z
n apa
a
ug
• z ≥ 0, σ(z) ≥ 0.5 M
u th
a n
0.5 t h
• z < 0, σ(z) < 0.5 a n a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Observations

1
• σ(z) = th y
1+e −z
n apa
a
ug
• z ≥ 0, σ(z) ≥ 0.5 M
u th
a n
0.5 t h
• z < 0, σ(z) < 0.5 a n a
Ram
• σ(z) sign changes at 0.5

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Observations

T
• hw(x) = σ(w x) h y
a t
p
1 an a
h
• w (x) = th ug
1+e −w Tx
n
M
u
0.5 h a
a t
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Observations

T
• hw(x) = σ(w x) h y
a t
p
1 an a
h
• w (x) = th ug
1+e −w Tx
n
M
u
0.5 h a
T T t
• w x ≥ 0, σ(w x) ≥ 0.5 a n a
Ram
T T
• w x < 0, σ(w x) < 0.5

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Sigmoid - Interpretation

• hw(x) - Estimated probability that y = 1 at x h y


a t
ap
h
• w (x) = 0.85 , probably that the size is u
large
ga is 85% n
and hence y = 1
h t
u
M
• y = 1 if hw(x) ≥ 0.5 a th a n
a n
am
• y = 0 if hw(x) < 0.5 R

T T
• w x ≥ 0, σ(w x) ≥ 0.5
T T
• w x < 0, σ(w x) < 0.5

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Decision boundary

(i)
• hw(x ) ≥ 0.5, y = 1 y
h (i)
t
ah (x )
ap w = σ(w0 x0 + w1x1 + w2 x2)
(i)
• hw(x ) < 0.5, y = 0 h ug
an
u t
M
• w0 = − 5, w1 = 1, w2 = 1 a th a n

x2
a n
T am
• Apply w x ≥ 0 R
(i)
• Linear decision boundary. hw(x ) = 0.5

• You can also get non-linear x1


decision boundary.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cost function

(i)
• We need hw(x ) y
h (i)
t
ah (x )
ap w = σ(w0 x0 + w1x1 + w2 x2)
• We need to find the weights ug
an
w′i s M
u th
a n
th
• Cost function. a

x2
a n
Ram

(i)
hw(x ) = 0.5

x1

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cost function - Squared cost function

• Let us look at squared y


distance cost function. th (i)
ah (x )
p w = σ(w0 x0 + w1x1 + w2 x2)
n a
a
ug
• Assume we have hw(x) u th
M
m a n
1 (i) (i) a2n a th

x2
∑ 2m
J(w) = (hw(x − y m)
• R a
i=1
(i)
1 hw(x ) = 0.5
h
• w (x) =
1+e −w Tx
x1

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cost function - Squared cost function

• In one variable. y
th
pa
• Not very desirable an a
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cost function - Squared cost function (two variables)

• Non-convex, CP looks pretty bad!


h y
• Not very desirable a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Note on Logistic Regression
for prediction (MLR book)

• To model population growth h y


a t
ap
• To get to a saturation level an
h ug
u t
• Squared distance cost function a n
M
th
n a
• a
Now it is synonymous with classification
m a
R

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cross-Entropy cost function

{−log(1 − hw(x)) ifnapy = 0


−log(hw(x)) if y = 1
cost (hw(x), y) = th y
• a
a
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cross-Entropy cost function

• cost (hw(x), y) = y

{−log(1 − hw(x)) if y = 0
−log(hw(x)) if y = 1 apa th
an
h ug
u t
M
n
• hw(x) = 1, cost is 0 n a th a
a
am
• hw(x) = 0, penalization with
R
large cost

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cross-Entropy cost function

• cost (hw(x), y) = y

{−log(1 − hw(x)) if y = 0
−log(hw(x)) if y = 1 apa th
an
h ug
u t
M
n
• hw(x) = 0, cost is 0 n a th a
a
am
• hw(x) = 1, penalization with
R
large cost

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cross-Entropy cost function - Putting things together

• cost (hw(x), y) = −y log(hw(x)) − (1 − y)log(1


h y − hw(x))
a t
ap
• J(w) = − y log(hw(x)) − (1 − y)log(1u ga− hw(x)) n
h t
u
M
• At y = 1, J(w) = ? a th a n
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cross-Entropy cost function - Putting things together

• cost (hw(x), y) = −y log(hw(x)) − (1 − y)log(1


h y − hw(x))
a t
ap
• J(w) = − y log(hw(x)) − (1 − y)log(1u ga− hw(x)) n
h t
u
M
• At y = 0, J(w) = ? a th a n
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Cross-Entropy cost function - Minimization

• cost (hw(x), y) = −y log(hw(x)) − (1 − y)log(1


h y − hw(x))
a t
ap
• J(w) = − y log(hw(x)) − (1 − y)log(1u ga− hw(x)) n
h t
u
M
• min J(w) a th a n
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
t
• min J(w) a n
M
u
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
∂J ∂J ∂h M
u t
= .
• ∂w ∂h ∂w th a n
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
∂J M
u t

• ∂h = ? th a n
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
∂J −y 1 − y M
u t

• ∂h = − (−1) a n
h 1−h a n a th

∂J h−y Ram
=
• ∂h h(1 − h)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
∂h M
u t

• ∂w = ? th a n
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!

1
• σ(z) = th y
1+e −z
n apa
a
∂h h ug
t
• ∂w = ? n
M
u
h a
a t
a n
∂σ am
• ∂z = ? R

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
∂h M
u t

• ∂w = σ(1 − σ)x th a n
n a
a
∂J h−y Ram
=
• ∂h h(1 − h)
∂J ∂J ∂h
=
• ∂w ∂h ∂w .

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
∂h M
u t

• ∂w = σ(1 − σ)x th a n
n a
a
∂J Ram

• ∂w = (h − y)x

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent!
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
m t
∂J 1 (i) (i) (i) n M
u

• ∂wj m ∑
= (hw(x ) − y )xjtha
n a
i=1 a
a m
R

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Gradient descent update
m
1 (i) (i) (i)y (i)

J(w) = − y log(hw(x )) − (1 − yath )log(1 − hw(x ))
• m i=1 n ap
a
h ug
m t
∂J 1 (i) (i) (i) n M
u

• ∂wj m ∑
= (hw(x ) − y )xjtha
n a
i=1 a
a m
R
k+1 k ∂J
wj = wj − αk
• ∂wj

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Logistic Regression
Plot the cost function J(w)

h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


One-vs-all (OvA) multi-class classification
OvA

h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


One-vs-all (OvA) multi-class classification
OvA

h y
a t
(1) p
hw (x) an a
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


One-vs-all (OvA) multi-class classification
OvA

h y
a t
ap
(2) an
hw (x) h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


One-vs-all (OvA) multi-class classification
OvA

h y
a t
ap
(3) n
hw (x) ug
a
th
u
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


One-vs-all (OvA) multi-class classification
OvA

h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


One-vs-all (OvA) multi-class classification
OvA - Fusion rule

h y
a t
ap
n
(i) a
max hw (x) u th ug
i a n
M
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

You might also like