0% found this document useful (0 votes)

21 views42 pages

Lecture 6 Linear Classifier 2

Lecture-6-linear-classifier-2

Uploaded by

S. M. Mazharul Hoque Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views42 pages

Lecture 6 Linear Classifier 2

Lecture-6-linear-classifier-2

Uploaded by

S. M. Mazharul Hoque Chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Linear classifiers:

Parameter learning
Quality metric for logistic regression:
Maximum likelihood estimation

5 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

⌃
P(y=+1|x,ŵ) = 1 T
.

1 + e-ŵ h(x)
x Feature h(x) ML
Training
extraction model
Data

y ŵ

ML algorithm

Quality
metric
6 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Learning problem
Training data:
N observations (xi,yi)
x[1] = #awesome x[2] = #awful y = sentiment
2 1 +1
0 2 -1
3 3 -1
Optimize
4 1 +1
1 1 +1
quality metric
on training ŵ
2 4 -1 data
0 3 -1
0 1 -1
2 1 +1

7 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Finding best coefficients

x[1] = #awesome x[2] = #awful y = sentiment

2 1 +1
0 2 -1
3 3 -1
4 1 +1
1 1 +1
2 4 -1
0 3 -1
0 1 -1
2 1 +1

9 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Finding best coefficients
x[1] = #awesome x[2] = #awful y = sentiment x[1] = #awesome x[2] = #awful y = sentiment
0 2 -1 2 1 +1
0
3 2
3 -1 4 1 +1
2
3 4
3 -1 1 1 +1
0 3 -1 4
2 1 +1
0 1 -1 1 1 +1
2 4 -1
0 3 -1
0 1 -1
2 1 +1

10 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Finding best coefficients
x[1] = #awesome x[2] = #awful y = sentiment x[1] = #awesome x[2] = #awful y = sentiment
0 2 -1 2 1 +1
3 3 -1 4 1 +1
2 4 -1 1 1 +1
0 3 -1 2 1 +1
0 1 -1

P(y=+1|xi,w) = 0.0 P(y=+1|xi,w) = 1.0

Pick ŵ that makes

11 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Quality metric = Likelihood function

Negative data points Positive data points

P(y=+1|xi,w) = 0.0 P(y=+1|xi,w) = 1.0

No ŵ achieves perfect predictions (usually)
Likelihood ℓ(w): Measures quality of
fit for model with coefficients w
12 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Data likelihood

14 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Quality metric: probability of data

x[1] = #awesome x[2] = #awful y = sentiment x[1] = #awesome x[2] = #awful y = sentiment
2 1 +1 0 2 -1

If model good, should predict: If model good, should predict:

Pick w to maximize: Pick w to maximize:

15 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Maximizing likelihood
(probability of data)
Data
x[1] x[2] y Choose w to maximize
point

x1,y1 2 1 +1

x2,y2 0 2 -1

x3,y3 3 3 -1 Must combine into single

x4,y4 4 1 +1 measure of quality
x5,y5 1 1 +1

x6,y6 2 4 -1

x7,y7 0 3 -1

x8,y8 0 1 -1

x9,y9 2 1 +1
16 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Learn logistic regression model with
maximum likelihood estimation (MLE)
Data
x[1] x[2] y Choose w to maximize
point

x1,y1 2 1 +1 P(y=+1|x[1]=2, x[2]=1,w)

x2,y2 0 2 -1 P(y=-1|x[1]=0, x[2]=2,w)

x3,y3 3 3 -1 P(y=-1|x[1]=3, x[2]=3,w)

x4,y4 4 1 +1 P(y=+1|x[1]=4, x[2]=1,w)

19 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Find “best” classifier
Maximize likelihood over all possible w0,w1,w2
N
Y
`(w) = P (yi | xi , w)
i=1
ℓ(w0=0, w1=1, w2=-1.5) = 10-6
ℓ(w0=1, w1=1, w2=-1.5) = 10-5
#awful

…
Best model:
4
Highest likelihood ℓ(w)
3
ŵ = (w0=1, w1=0.5, w2=-1.5)
2

1
ℓ(w0=1, w1=0.5, w2=-1.5) = 10-4
0
0 1 2 3 4 …
#awesome
22 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Maximizing likelihood

Maximize function over all

possible w0,w1,w2
N
Y
max P (yi | xi , w)
w0,w1,w2 i=1

No closed-form solution è use gradient ascent ℓ(w0,w1,w2) is a

23
function of 3 variables
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Review of gradient ascent

25 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Finding the max
via hill climbing

Algorithm:

while not converged!

w(t+1) ß w(t) + η dℓ
dw
w(t)

27 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Convergence criteria
For concave
convex functions,
optimum occurs when

Algorithm:
In practice, stop when

while not converged!

w(t+1) ß w(t) + η dℓ
dw
w(t)

28 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Moving to multiple dimensions:
Gradients

Δ
ℓ(w)
(w) =

29 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Contour plots

30 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Gradient ascent

Algorithm:

while not converged Δ

w(t+1) ß w(t) + η ℓ(w(t))

31 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Learning algorithm for
logistic regression

33 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Derivative of (log-)likelihood
Sum over Feature
data points value
Difference between truth and prediction

@`(w) X
@`(w) XNN
N ⇣⇣ ⌘⌘
=
= (xiii)) 1[y
hhjjj(x 1[yiii =
= +1]
+1] P
P(y
(y = +1 || xxiii,,w)
= +1 w)
@w
@wjjj i=1
i=1
i=1

Indicator function:

@`(w) X
N⇣
= hj (xi ) 1[yi = +1] P (y = +1 | xi , w)
@wj i=1

35 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Computing derivative
@`(w(t) ) X
N⇣ ⌘
(t)
= hj (xi ) 1[yi = +1] P (y = +1 | xi , w )
@wj i=1

(t)
w0 0
(t)
w1 1
(t)
w2 -2

Contribution to
x[1] x[2] y P(y=+1|xi,w)
derivative for w1
Total derivative:
2 1 +1 0.5

0 2 -1 0.02

3 3 -1 0.05

4 1 +1 0.88

36 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Derivative of (log-)likelihood: Interpretation
Sum over Feature
data points value
Difference between truth and prediction

@`(w) X
@`(w) XNN
N ⇣⇣ ⌘⌘
=
= (xiii)) 1[y
hhjjj(x 1[yiii =
= +1]
+1] P
P(y
(y = +1 || xxiii,,w)
= +1 w)
@w
@wjjj i=1
i=1
i=1

If hj(xi)=1: P(y=+1|xi,w) ≈ 1 P(y=+1|xi,w) ≈ 0

yi=+1

yi=-1

37 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Summary of gradient ascent
for logistic regression

init w(1)=0 (or randomly, or smartly), t=1

Δ
while || ℓ(w(t))|| > ε
for j=0,…,D N ⇣ ⌘
X
partial[j] = i=1 hj (xi ) 1[yi = +1] P (y = +1 | xi , w )
(t)

wj(t+1) ß wj(t) + η partial[j]

tßt+1

Choosing the step size η

Learning curve:
Plot quality (likelihood) over iterations

If step size is too small, can take a
long time to converge

Compare converge with
different step sizes

Careful with step sizes that are too large

Very large step sizes can even
cause divergence or wild oscillations

Simple rule of thumb for picking step size η
• Unfortunately, picking step size
requires a lot of trial and error L
• Try a several values, exponentially spaced
- Goal: plot learning curves to
• find one η that is too small (smooth but moving too slowly)
• find one η that is too large (oscillation or divergence)
• Try values in between to find “best” η

• Advanced tip: can also try step size that decreases with
iterations, e.g.,

Summary of logistic
regression classifier

⌃
P(y=+1|x,ŵ) = 1 T
.

1 + e-ŵ h(x)
x Feature h(x) ML
Training
extraction model
Data

y ŵ

ML algorithm

Quality
metric
70 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
What you can do now…
• Measure quality of a classifier using
the likelihood function
• Interpret the likelihood function as
the probability of the observed data
• Learn a logistic regression model with
gradient descent
ascent
• (Optional) Derive the gradient
descent update rule for logistic
regression

Simplest link function: sign(z)

z
-∞ +∞

0.0 1.0
sign(z)
⇢
+1 if z 0 But, sign(z) only outputs -1 or +1,
sign(z) =
1 otherwise no probabilities in between
73 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Finding best coefficients

x[1] = #awesome x[2] = #awful y = sentiment x[1] = #awesome x[2] = #awful y = sentiment
0 2 -1 2 1 +1
3 3 -1 4 1 +1
2 4 -1 1 1 +1
0 3 -1 2 1 +1
0 1 -1

0.0 P(y=+1|xi,ŵ) 1.0

Score(xi) = ŵTh(xi)
-∞
74
+∞
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Quality metric: probability of data
⌃
P(y=+1|x,ŵ) = 1 .

T
1 + e-ŵ h(x)
x[1] = #awesome x[2] = #awful y = sentiment x[1] = #awesome x[2] = #awful y = sentiment
2 1 +1 0 2 -1

If model good, should predict If model good, should predict

Increase probability y=+1 when Increase probability y=-1 when

Choose w to make Choose w to make

Maximizing likelihood
(probability of data)
Data Choose w to
x[1] x[2] y
point maximize

x1,y1 2 1 +1

x2,y2 0 2 -1

x3,y3 3 3 -1 Must combine into single

x4,y4 4 1 +1 measure of quality
x5,y5 1 1 +1

x6,y6 2 4 -1

x7,y7 0 3 -1

x8,y8 0 1 -1

x9,y9 2 1 +1
76 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Learn logistic regression model with
maximum likelihood estimation (MLE)

• Choose coefficients w that maximize likelihood:

N
Y
P (yi | xi , w)
i=1

• No closed-form solution è use gradient ascent

Logistic Regression Learning Annotated
No ratings yet
Logistic Regression Learning Annotated
77 pages
Intro
No ratings yet
Intro
38 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
ML 01
No ratings yet
ML 01
24 pages
PW3 SupervisedLearning
No ratings yet
PW3 SupervisedLearning
10 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
DL145611 03 Shallow
No ratings yet
DL145611 03 Shallow
92 pages
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
No ratings yet
Lecture 0.3 - Linear Classifiers, Logistic Regression, Multiclass Classification
48 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
100 pages
Lecture 01-2
No ratings yet
Lecture 01-2
33 pages
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
No ratings yet
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
527 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
NLP: Linear & Log-Linear Models
No ratings yet
NLP: Linear & Log-Linear Models
34 pages
Lec 4
No ratings yet
Lec 4
24 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lec 2
No ratings yet
Lec 2
43 pages
M146 Lec3 Sidenotes S25
No ratings yet
M146 Lec3 Sidenotes S25
29 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Classification
No ratings yet
Classification
47 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
61 pages
Lecture 1and2-Revision Part1
No ratings yet
Lecture 1and2-Revision Part1
53 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
05 Optimization Basics
No ratings yet
05 Optimization Basics
94 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Lecture1 Intro ML
No ratings yet
Lecture1 Intro ML
60 pages
Machine Learning Crash Course
No ratings yet
Machine Learning Crash Course
29 pages
ML Classification Trupesh Patel
No ratings yet
ML Classification Trupesh Patel
39 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Module-1 Deep Learning (Autosaved)
No ratings yet
Module-1 Deep Learning (Autosaved)
100 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Lecture 1
No ratings yet
Lecture 1
56 pages
NN Theory
No ratings yet
NN Theory
138 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
15-780 - Machine Learning: J. Zico Kolter
No ratings yet
15-780 - Machine Learning: J. Zico Kolter
71 pages
Lec1 PerceptronPocket Recap
100% (1)
Lec1 PerceptronPocket Recap
61 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Fun Facts About 59
No ratings yet
Fun Facts About 59
34 pages
Convolution Theorm
No ratings yet
Convolution Theorm
21 pages
Law of Matrices PT 1
No ratings yet
Law of Matrices PT 1
25 pages
CLA Week3
No ratings yet
CLA Week3
13 pages
Crash Simulation of A Boeing 737 Fuselage Section Vertical Drop Test
No ratings yet
Crash Simulation of A Boeing 737 Fuselage Section Vertical Drop Test
14 pages
Second Periodical Examination in Mathematics Grade 8
No ratings yet
Second Periodical Examination in Mathematics Grade 8
6 pages
Applications of Laplace Transformation in Engineering Field
No ratings yet
Applications of Laplace Transformation in Engineering Field
3 pages
Calculus and Analytical Geometry MA101
No ratings yet
Calculus and Analytical Geometry MA101
1 page
Abstrakty Pawlucki70 Short
No ratings yet
Abstrakty Pawlucki70 Short
15 pages
Revision Test MEP Year 9
100% (1)
Revision Test MEP Year 9
6 pages
Bernstein
No ratings yet
Bernstein
2 pages
Numerical Methods for Engineers
No ratings yet
Numerical Methods for Engineers
2 pages
Fast Marching for Advancing Fronts
No ratings yet
Fast Marching for Advancing Fronts
6 pages
Actuator Disc Methods For Open Propellers Assessments of Numerical Methods
No ratings yet
Actuator Disc Methods For Open Propellers Assessments of Numerical Methods
13 pages
Calculus & Linear Algebra Course
No ratings yet
Calculus & Linear Algebra Course
1 page
Arithmetic Sequence - 10
No ratings yet
Arithmetic Sequence - 10
17 pages
Mean Value Theorems
100% (1)
Mean Value Theorems
4 pages
Permutations Begin and End With The Letter A? How Many of These Arrangements Do NOT Have Two Vowels Adjacent To One Another?
No ratings yet
Permutations Begin and End With The Letter A? How Many of These Arrangements Do NOT Have Two Vowels Adjacent To One Another?
6 pages
Model Reduction for Control Engineers
No ratings yet
Model Reduction for Control Engineers
17 pages
Bosq Nguyen A Course in Stochastic Processes PDF
100% (1)
Bosq Nguyen A Course in Stochastic Processes PDF
354 pages
8th Unit 56 Study
No ratings yet
8th Unit 56 Study
5 pages
2024-04-15-BSC (P) - 2024-II-IV-VI Sem (CBCS)
No ratings yet
2024-04-15-BSC (P) - 2024-II-IV-VI Sem (CBCS)
10 pages
Untitled
100% (2)
Untitled
633 pages
Memory Based Questions - Maths
No ratings yet
Memory Based Questions - Maths
2 pages
8.0 Exponentian and Logarithmic Series Theory
No ratings yet
8.0 Exponentian and Logarithmic Series Theory
2 pages
Class XI Mathematics Chapter Notes Linear Inequalities Definitions
No ratings yet
Class XI Mathematics Chapter Notes Linear Inequalities Definitions
6 pages
Understanding Number Systems
No ratings yet
Understanding Number Systems
16 pages
Corre Part1 Algebra
No ratings yet
Corre Part1 Algebra
11 pages
Ims Precal
No ratings yet
Ims Precal
10 pages
Quiz 05inp Lagrange Solution
No ratings yet
Quiz 05inp Lagrange Solution
8 pages

Lecture 6 Linear Classifier 2

Uploaded by

Lecture 6 Linear Classifier 2

Uploaded by

Linear classifiers:

5 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

7 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

x[1] = #awesome x[2] = #awful y = sentiment

9 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

10 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

P(y=+1|xi,w) = 0.0 P(y=+1|xi,w) = 1.0

Pick ŵ that makes

Negative data points Positive data points

P(y=+1|xi,w) = 0.0 P(y=+1|xi,w) = 1.0

14 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

If model good, should predict: If model good, should predict:

Pick w to maximize: Pick w to maximize:

15 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

x3,y3 3 3 -1 Must combine into single

x1,y1 2 1 +1 P(y=+1|x[1]=2, x[2]=1,w)

x2,y2 0 2 -1 P(y=-1|x[1]=0, x[2]=2,w)

x3,y3 3 3 -1 P(y=-1|x[1]=3, x[2]=3,w)

x4,y4 4 1 +1 P(y=+1|x[1]=4, x[2]=1,w)

19 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

Maximize function over all

No closed-form solution è use gradient ascent ℓ(w0,w1,w2) is a

25 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

while not converged!

27 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

while not converged!

28 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

29 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

30 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

while not converged Δ

31 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

33 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

35 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

36 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

If hj(xi)=1: P(y=+1|xi,w) ≈ 1 P(y=+1|xi,w) ≈ 0

37 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

init w(1)=0 (or randomly, or smartly), t=1

wj(t+1) ß wj(t) + η partial[j]

38 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

40 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

42 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

43 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

44 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

45 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

46 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

47 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

68 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

72 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

0.0 P(y=+1|xi,ŵ) 1.0

If model good, should predict If model good, should predict

Increase probability y=+1 when Increase probability y=-1 when

Choose w to make Choose w to make

75 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

x3,y3 3 3 -1 Must combine into single

• Choose coefficients w that maximize likelihood:

• No closed-form solution è use gradient ascent

77 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization

You might also like