0% found this document useful (0 votes)

11 views17 pages

Lecture 6

This document discusses the derivation and application of Support Vector Machines (SVM) and kernels, focusing on the dual formulation for both linearly separable and non-separable cases. It highlights the importance of support vectors, the role of Lagrange multipliers, and the use of various kernel functions to avoid explicit feature computation. Additionally, it addresses concerns about overfitting in high-dimensional feature spaces and strategies to mitigate it.

Uploaded by

hoangtucuagio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views17 pages

Lecture 6

Uploaded by

hoangtucuagio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Support Vector Machines & Kernels

Lecture 6

David Sontag
New York University

Slides adapted from Luke Zettlemoyer and Carlos Guestrin,

and Vibhav Gogate
Dual SVM derivation (1) – the linearly
separable case

Original optimization problem:

Rewrite One Lagrange multiplier

constraints per example
Lagrangian:

Our goal now is to solve:

Dual SVM derivation (2) – the linearly
separable case

(Primal)

Swap min and max

(Dual)

Slater’s condition from convex optimization guarantees that

these two optimization problems are equivalent!
⇥
x(1)
⇧ ... ⌃
⇧ ⌃
Dual SVM⇧ derivation
⇧ x (n) ⌃
⌃ (3) – the linearly
⇧ x(1) x(2) ⌃
⇥(x) = ⇧ separable case
⇧ ⌃
(1) (3) ⌃
⇧ x x ⌃
⇧ ⌃
⇧ ... ⌃
(Dual) ⇧ ⌃
⇤ ex(1) ⌅
Can solve for optimal w,. .b. as function of α:
⇤L ⌥
⇤w
=w j yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

Sums over all training examples scalars dot product

⇥
x(1)
⇧ ... ⌃
⇧ ⌃
Dual SVM⇧ derivation
⇧ x (n) ⌃
⌃ (3) – the linearly
⇧ x(1) x(2) ⌃
⇥(x) = ⇧ separable case
⇧ ⌃
(1) (3) ⌃
⇧ x x ⌃
⇧ ⌃
⇧ ... ⌃
(Dual) ⇧ ⌃
⇤ ex(1) ⌅
Can solve for optimal w,. .b. as function of α:
⇤L ⌥
⇤w
=w j yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

So, in dual formulation we will solve for α directly!

• w and b are computed from α (if needed)
Dual SVM derivation (3) – the linearly
separable case
Lagrangian:

αj > 0 for some j implies constraint

is tight. We use this to obtain b:

(1)

(2)

(3)
Classification rule using dual solution

Using dual solution

dot product of feature vectors of

new example with support vectors
Dual for the non-separable case

Primal: Solve for w,b,α:

Dual:

What changed?
• Added upper bound of C on αi!
• Intuitive explanation:
• Without slack, αi  ∞ when constraints are violated (points
misclassified)
• Upper bound of C limits the αi, so misclassifications are allowed
Support vectors

• Complementary slackness conditions:

• Support vectors: points xj such that

(includes all j such that , but also additional points
where ↵j⇤ = 0 ^ yj (w
~ ⇤ · ~xj + b)  1 )

• Note: the SVM dual solution may not be unique!

Dual SVM interpretation: Sparsity

-1
=
=

=
w.x + b
w.x + b

w.x + b
Final solution tends to
be sparse
•αj=0 for most j

•don’t need to store these

points to compute w or make
predictions
Non-support Vectors:
•αj=0
•moving them will not Support Vectors:
change w • αj≥0
SVM with kernels

• Never compute features explicitly!!!

– Compute dot products in closed form Predict with:

• O(n2) time in size of dataset to

compute objective
– much work on speeding up
Quadratic kernel

[Tommi Jaakkola]
Quadratic kernel

Feature mapping given by:

[Cynthia Rudin]
Common kernels
• Polynomials of degree exactly d

• Polynomials of degree up to d

• Gaussian kernels
Euclidean distance,
squared

• And many others: very active area of research!

(e.g., structured kernels that use dynamic programming
to evaluate, string kernels, …)
Gaussian kernel
Level sets, i.e. w.x=r for some r

Support vectors

[Cynthia Rudin] [mblondel.org]

Kernel algebra

Q: How would you prove that the “Gaussian kernel” is a valid kernel?
A: Expand the Euclidean norm as follows:

To see that this is a kernel, use the

Taylor series expansion of the
Then, apply (e) from above exponential, together with repeated
application of (a), (b), and (c):
The feature mapping is
infinite dimensional!
[Justin Domke]
Overfitting?

• Huge feature space with kernels: should we worry about

overfitting?
– SVM objective seeks a solution with large margin
• Theory says that large margin leads to good generalization
(we will see this in a couple of lectures)
– But everything overfits sometimes!!!
– Can control by:
• Setting C
• Choosing a better Kernel
• Varying parameters of the Kernel (width of Gaussian, etc.)

SVM
No ratings yet
SVM
21 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Lect 3
No ratings yet
Lect 3
14 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SVM Basics for Computer Science Students
No ratings yet
SVM Basics for Computer Science Students
36 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
20 SVM
No ratings yet
20 SVM
35 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Intro SVM PDF
No ratings yet
Intro SVM PDF
47 pages
Applied Data Analytics With Python
No ratings yet
Applied Data Analytics With Python
14 pages
Building Your Data Hub To Support Digital: Mark Walters Senior Manager Information Architecture & Design
No ratings yet
Building Your Data Hub To Support Digital: Mark Walters Senior Manager Information Architecture & Design
12 pages
Cadna A: in Numerous Companies and Authorities Is Sucessfully Used For Example in
No ratings yet
Cadna A: in Numerous Companies and Authorities Is Sucessfully Used For Example in
3 pages
DB Systems - Data Modeling
No ratings yet
DB Systems - Data Modeling
126 pages
17 Managerial Roles
No ratings yet
17 Managerial Roles
4 pages
Getting Started With Excel: Comprehensive
0% (1)
Getting Started With Excel: Comprehensive
10 pages
CH 11
No ratings yet
CH 11
21 pages
Flipkart Sample Opposition
100% (1)
Flipkart Sample Opposition
76 pages
Summit X460 Series: Scalable Aggregation and Edge Switch
No ratings yet
Summit X460 Series: Scalable Aggregation and Edge Switch
13 pages
Grade 9 Chapter 10 Review Exercise
No ratings yet
Grade 9 Chapter 10 Review Exercise
6 pages
160719a0cd3011 - 29094359708
No ratings yet
160719a0cd3011 - 29094359708
2 pages
BZ-08-062-F Forklift Handover Checklist Form
No ratings yet
BZ-08-062-F Forklift Handover Checklist Form
2 pages
AAN 2023 Day 1-2 Mind Next Original
No ratings yet
AAN 2023 Day 1-2 Mind Next Original
21 pages
Technical Datasheet Modula - EN24062013
No ratings yet
Technical Datasheet Modula - EN24062013
2 pages
Pre Post Observation
100% (2)
Pre Post Observation
4 pages
Vacuum Test Procedure (VCP)
No ratings yet
Vacuum Test Procedure (VCP)
5 pages
For Green Marketing Project
No ratings yet
For Green Marketing Project
16 pages
3E4495 Install Note T20 Alarms Terminal
No ratings yet
3E4495 Install Note T20 Alarms Terminal
26 pages
STCMB 1
No ratings yet
STCMB 1
59 pages
Mini-Vert Brochure
No ratings yet
Mini-Vert Brochure
4 pages
s15 Pin Out
No ratings yet
s15 Pin Out
4 pages
Puritanism & Early American Literature
No ratings yet
Puritanism & Early American Literature
4 pages
Student Animal Research Booklets
100% (1)
Student Animal Research Booklets
45 pages
Canon Irc2380i Irc3080 Irc3080i Irc3580 Irc3580i Brochure
No ratings yet
Canon Irc2380i Irc3080 Irc3080i Irc3580 Irc3580i Brochure
8 pages
Ep 20 Units
No ratings yet
Ep 20 Units
142 pages
Why Weightlifting Is Superior
No ratings yet
Why Weightlifting Is Superior
4 pages
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
No ratings yet
Colour Dilution Alopecia in Doberman Pinschers With Blue or Fawn Coat Colours - A Study On The Incidence and Histopathology of This Di
10 pages
Assignment MHDD 160
No ratings yet
Assignment MHDD 160
2 pages
Anova (Keller)
No ratings yet
Anova (Keller)
91 pages
Wearable Devices For The Detection of Covid-19
No ratings yet
Wearable Devices For The Detection of Covid-19
21 pages
Technical Vocational Education: Quarter 1-Week4-Module 4
No ratings yet
Technical Vocational Education: Quarter 1-Week4-Module 4
20 pages
Geographical Data in The Computer-1
No ratings yet
Geographical Data in The Computer-1
36 pages
Android-Controlled Pesticide Spraying Robot
No ratings yet
Android-Controlled Pesticide Spraying Robot
6 pages
Industrial Two Roll Mill Quotation
No ratings yet
Industrial Two Roll Mill Quotation
3 pages

Lecture 6

Uploaded by

Lecture 6

Uploaded by

Support Vector Machines & Kernels

Slides adapted from Luke Zettlemoyer and Carlos Guestrin,

Original optimization problem:

Rewrite One Lagrange multiplier

Our goal now is to solve:

Swap min and max

Slater’s condition from convex optimization guarantees that

Substituting these values back in (and simplifying), we obtain:

Sums over all training examples scalars dot product

Substituting these values back in (and simplifying), we obtain:

So, in dual formulation we will solve for α directly!

αj > 0 for some j implies constraint

Using dual solution

dot product of feature vectors of

Primal: Solve for w,b,α:

• Complementary slackness conditions:

• Support vectors: points xj such that

• Note: the SVM dual solution may not be unique!

•don’t need to store these

• Never compute features explicitly!!!

• O(n2) time in size of dataset to

Feature mapping given by:

• And many others: very active area of research!

[Cynthia Rudin] [mblondel.org]

To see that this is a kernel, use the

• Huge feature space with kernels: should we worry about

You might also like