Data Science L19 - LogisticRegression
Data Science L19 - LogisticRegression
and Practise an
h ug
u t
M
a n
L19 - Logistic Regression (Credit
n a t to Andrew Ng)
h
a
Ram
• Ground truth data - Input feature / output (x, y)hy are the knowns
a t
ap
• Use a model / hypothesis as h(w) ug
an
th
u
• Develop an error / cost / loss function a n
M J(w) = J(y, ȳ) = J(y, h(w))
t h
n a
a
• The weights are identified byam
R
• min J(w)
• Essentially, ML problem is now reduced to an optimization problem.
• Weights are identified using Optimization.
Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras
Linear Regression
Predictive
• Ground truth data - Input feature / output (x,hyy) are the knowns
a t
ap
• Use a model / hypothesis as h(w) anducost
ga function J(w) n
th
u
M
• h a n
a t
a n
Input (x) am
Hypothesis
R h(w)
Weights / Parameters
Loss function J(w)
Output (Y)
Output (y)
• Output is either 0 or 1
h a n
M
a t
a n 1
Ram
0
(x)
Tum size
Output (y)
M
a n
t h
a
• Good / bad grades a n 1
Ram
0
(x)
Tum size
Output (y)
• Output is either 0 or 1
h a n
M
a t
a n
• - Benign
Ram
1
• - Malignant
0
(x)
Tum size
Output (y)
• Output is either 0 or 1
h a n
M
a t
a n
• - Small
Ram
1
• - Large
(i) (i) (i)
• ȳ = hw(x ) = w0 + w1x 0
(x)
Tum size
(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
hw(x ) (i)
u t
Output (y)
M
a n
t h
n a
a 1
Ram
0.5
0
(x)
T-shirt size
(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
u t
Output (y)
M
a n
t h
n a
a 1
Ram
0.5
0
(x)
T-shirt size
(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
(i)
u t hw(x )
Output (y)
M
• Misclassification starts t h a n
happening a n a
1
Ram
• Not a good idea to use 0.5
Linear Regression
• y < 0 or y > 1
0
(x)
T-shirt size
T
• hw(x) = w x h y
a t
ap
T
• hw(x) = σ(w x) h ug
an
u t
1 a n
M
• σ(z) = a th
1+e −z
am
a n
R
• σ(z) is called Sigmoid or
Logistic function.
1
• σ(z) = th y
1+e −z
n apa
a
ug
• σ(z) is called Sigmoid or M
u th
Logistic function. h a n
a t
a n
Ram
1
• σ(z) = th y
1+e −5z
n apa
a
ug
• σ(z) with 5 M
u th
a n
th
n a
a
Ram
1
• σ(z) = th y
1+e −10z
n apa
a
ug
• σ(z) with 10. M
u th
a n
th
n a
a
Ram
1
• σ(z) = th y
1+e −100z
n apa
a
ug
• σ(z) with 100 M
u th
a n
th
n a
a
Ram
1
• σ(z) = th y
1+e −z
n apa
a
ug
• Smoother approximation of u th
step function M
a n
th
n a
a
• This means what? R am
1
• σ(z) = th y
1+e −z
n apa
a
ug
• 0 ≤ σ(z) < = 1 M
u th
a n
0.5 t h
n a
a
Ram
1
• σ(z) = th y
1+e −z
n apa
a
ug
• value of σ(z) at z = 0? M
u th
a n
0.5 t h
n a
a
Ram
1
• σ(z) = th y
1+e −z
n apa
a
ug
• z ≥ 0, σ(z) ≥ 0.5 M
u th
a n
0.5 t h
• z < 0, σ(z) < 0.5 a n a
Ram
1
• σ(z) = th y
1+e −z
n apa
a
ug
• z ≥ 0, σ(z) ≥ 0.5 M
u th
a n
0.5 t h
• z < 0, σ(z) < 0.5 a n a
Ram
• σ(z) sign changes at 0.5
T
• hw(x) = σ(w x) h y
a t
p
1 an a
h
• w (x) = th ug
1+e −w Tx
n
M
u
0.5 h a
a t
a n
Ram
T
• hw(x) = σ(w x) h y
a t
p
1 an a
h
• w (x) = th ug
1+e −w Tx
n
M
u
0.5 h a
T T t
• w x ≥ 0, σ(w x) ≥ 0.5 a n a
Ram
T T
• w x < 0, σ(w x) < 0.5
T T
• w x ≥ 0, σ(w x) ≥ 0.5
T T
• w x < 0, σ(w x) < 0.5
(i)
• hw(x ) ≥ 0.5, y = 1 y
h (i)
t
ah (x )
ap w = σ(w0 x0 + w1x1 + w2 x2)
(i)
• hw(x ) < 0.5, y = 0 h ug
an
u t
M
• w0 = − 5, w1 = 1, w2 = 1 a th a n
x2
a n
T am
• Apply w x ≥ 0 R
(i)
• Linear decision boundary. hw(x ) = 0.5
(i)
• We need hw(x ) y
h (i)
t
ah (x )
ap w = σ(w0 x0 + w1x1 + w2 x2)
• We need to find the weights ug
an
w′i s M
u th
a n
th
• Cost function. a
x2
a n
Ram
(i)
hw(x ) = 0.5
x1
x2
∑ 2m
J(w) = (hw(x − y m)
• R a
i=1
(i)
1 hw(x ) = 0.5
h
• w (x) =
1+e −w Tx
x1
• In one variable. y
th
pa
• Not very desirable an a
h ug
u t
M
a n
th
n a
a
Ram
• cost (hw(x), y) = y
{−log(1 − hw(x)) if y = 0
−log(hw(x)) if y = 1 apa th
an
h ug
u t
M
n
• hw(x) = 1, cost is 0 n a th a
a
am
• hw(x) = 0, penalization with
R
large cost
• cost (hw(x), y) = y
{−log(1 − hw(x)) if y = 0
−log(hw(x)) if y = 1 apa th
an
h ug
u t
M
n
• hw(x) = 0, cost is 0 n a th a
a
am
• hw(x) = 1, penalization with
R
large cost
• ∂h = ? th a n
n a
a
Ram
• ∂h = − (−1) a n
h 1−h a n a th
∂J h−y Ram
=
• ∂h h(1 − h)
• ∂w = ? th a n
n a
a
Ram
1
• σ(z) = th y
1+e −z
n apa
a
∂h h ug
t
• ∂w = ? n
M
u
h a
a t
a n
∂σ am
• ∂z = ? R
• ∂w = σ(1 − σ)x th a n
n a
a
∂J h−y Ram
=
• ∂h h(1 − h)
∂J ∂J ∂h
=
• ∂w ∂h ∂w .
• ∂w = σ(1 − σ)x th a n
n a
a
∂J Ram
• ∂w = (h − y)x
• ∂wj m ∑
= (hw(x ) − y )xjtha
n a
i=1 a
a m
R
• ∂wj m ∑
= (hw(x ) − y )xjtha
n a
i=1 a
a m
R
k+1 k ∂J
wj = wj − αk
• ∂wj
h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram
h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram
h y
a t
(1) p
hw (x) an a
h ug
u t
M
a n
th
n a
a
Ram
h y
a t
ap
(2) an
hw (x) h ug
u t
M
a n
th
n a
a
Ram
h y
a t
ap
(3) n
hw (x) ug
a
th
u
M
a n
th
n a
a
Ram
h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram
h y
a t
ap
n
(i) a
max hw (x) u th ug
i a n
M
th
n a
a
Ram