0% found this document useful (0 votes)

40 views47 pages

Intro SVM PDF

This document provides an introduction to support vector machines classification. It discusses how SVMs find the optimal separating hyperplane to classify data points into different classes, and how this can be extended to non-linear classification using kernels. It also covers topics like choosing hyperparameters, model selection, and computational considerations.

Uploaded by

Anonymous PKVCsG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views47 pages

Intro SVM PDF

Uploaded by

Anonymous PKVCsG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

SVMC

An introduction to Support Vector Machines Classification

6.783, Biomedical Decision Support

Lorenzo Rosasco
([email protected])
Department of Brain and Cognitive Science
MIT

Friday, October 30, 2009

A typical problem

We have a cohort of patients from two

groups- say A and B.

We wish to devise a classification rule to

distinguish patients of one group from
patients of the other group.

Friday, October 30, 2009

Learning and
Generalization
Goal: classify correctly new
patients

3
Friday, October 30, 2009

Plan
1. Linear SVM
2. Non Linear SVM: Kernels
3. Tuning SVM
4. Beyond SVM: Regularization Networks

Friday, October 30, 2009

Learning from Data

To make predictions we need informations
about the patients
patient 1:

x = (x , . . . , x )
1

patient 2 : x = (x , . . . , x )
1

....
patient ! : x = (x , . . . , x )
1

Friday, October 30, 2009

Linear model
Patients of class A are labeled y=1
Patients of class B are labeled y=-1
Linear model
wx=

classification rule

Friday, October 30, 2009

n
!

wj xj

j=1

sign(w x)

1D Case
Y
y=1

wx>0

wx=0
X
y=-1

Friday, October 30, 2009

wx<0

How do we find a good solution?

x = (x , x )
1

y=1

2D Classification Problem
Friday, October 30, 2009

y=-1

How do we find a good solution?

wx>0

wx<0

wx=0

Friday, October 30, 2009

How do we find a good solution?

Friday, October 30, 2009

How do we find a good solution?

Friday, October 30, 2009

How do we find a good solution?

Friday, October 30, 2009

How do we find a good solution?

The margin M
measures the
distance of the
two closest points
Friday, October 30, 2009

Maximum Margin Hyperplane

....with little effort ... one can show that
maximizing the margin M is equivalent to:
maximizing

Friday, October 30, 2009

1
!w!

he Linear, Homogeneous, Separable SVM

SVM

Linear and Separable SVM

Text

2
||w ||

minn
w R
Bias and Slack
subject to : yi (w x) 1 i = 1, . . . , !

The SVM introduced by Vapnik includes an unregularized bias

term b,an
leading
to classification
a function
of thesolution
form:
Typically
off-set
term is via
added
to the
f (x) = sign (w x + b).
In practice, we want to work with datasets that are not linearly

Friday, October 30, 2009

A more general
Algorithm
There are two things we would like to
improve:

Friday, October 30, 2009

Allow for errors

Non Linear Models

Measuring errors

Friday, October 30, 2009

Measuring errors (cont)

i
i
i

i
Friday, October 30, 2009

Slack Variables

The New Primal

Linear SVM

With slack variables, the primal SVM problem becomes

!"
1
2
min
C

+
||w
||
i=1 i
2
n
n
w R ,R ,bR

subject to : yi (w x + b) 1 i
i 0

Friday, October 30, 2009

i = 1, . . . , "
i = 1, . . . , "

Optimization
How do we solve this minimization problem?
(...and why do we call it SVM anyway?)

Friday, October 30, 2009

Some facts

Friday, October 30, 2009

Representer Theorem
Dual Formulation
Box Constraints and Support Vectors

Representer Theorem
The solution to the minimization problem
can be written as

wx=

Friday, October 30, 2009

!
!
i=1

ci (x xi )

min

cR! ,bR,R!

i=1 i

1 T
2 c Kc

Dual Problem

i = 1, . . . , "

i 0

i = 1, . . . , "

!"
subject to : yi ( j=1 cj K (xi , xj ) + b) 1 i

The coefficients can be found solving:

max

subject to :

i=1 i

1 T
2 Q

Text
Text

i=1 yi i

0 i C

xj )
Here Q = yi yj (xiR.Rifkin
i = ci /yi
Friday, October 30, 2009

i = 1, . . . , "

Support Vector Machines

Towards Simpler Optimality Conditions, II

Toward
Simpler
Optimality
Conditions

Deter
Simpler Optimality Conditions Determining
b

Optimality conditions

with
little
effort
...
one
can
show
that
Suppose we have the optimal s. Also suppose (this hap

i (this happens
e we haveConversely,
the optimal
i s.Also
suppose
suppose
=
C:
i
in practice)
that
exists0an
ce) that
there exists
anthere
i satisfying
< i isatisfying
< C. Then0 < i < C. The

i = C = yi f (xi ) 1 + i = 0

< 0C
i < C =
If i >

==i y>i f (x0i ) 1

then

= i = 0 = i = 0
!
!
!
!
=
yj =
K (x
xij )( +some
b)yj
=
i(
j is
i,y
j1Ktraining
(x0i , xj ) +points
b) 1 = 0
The ysolution
sparse:
j=1

do not contribute to j=1

the solution.
!
!

!
!
= b = yi =yj bj K=
(xy
i , x
j)
yj j K (xi , xj )
i
j=1

Friday, October 30, 2009

R. Rifkin

Support Vector Machines

j=1

Sparse Solution
Note that:
The solution depends only on the training
Geometric Interpretation
Reduced
Optimalityof
set points. (no dependence
onofthe
number
Conditions
features!)

R. Rifkin

Friday, October 30, 2009

Support Vector Machines

M AP

Feature Map

X F

lanes in the feature space

f (x) = w (x)

f (x) = ", (x)#

n linear functions in the original space.

Friday, October 30, 2009

min

i=1 i

1 T
2 c Kc

A Key Observation
i = 1, . . . , "

cR! ,bR,R!

!"
subject to : yi ( j=1 cj K (xi , xj ) + b) 1 i
i 0

i = 1, . . . , "

The solution depends only on Q = yi yj (xi xj )

max

subject to :

i=1 i

1 T
2 Q

Text

i=1 yi i

0 i C

i = 1, . . . , "

Idea: use Q = yi yj ((xi ) (xj ))

R. Rifkin

Friday, October 30, 2009

Support Vector Machines

Kernels and Feature

Maps
The crucial quantity is the inner product
K(x, t) = (x) (t)

called Kernel.
A function is called Kernel if it is:
symmetric
positive definite
Friday, October 30, 2009

Examples of pd kernels

Examples of Kernels

Very common examples of symmetric pd kernels are

Linear kernel
K (x, x ! ) = x x !
Gaussian kernel

K (x, x ! ) = e

!xx # !2

Polynomial kernel
K (x, x ! ) = (x x ! + 1)d ,

d N

For specific applications, designing an effective kernel is a

challenging problem.

L. Rosasco

Friday, October 30, 2009

RKHS

Non Linear SVM

Summing up:

Define Feature Map either explicitly or via a

kernel

Find linear solution in the Feature space

Use same solver as in the linear case
Representer theorem now gives:
w (x) =

Friday, October 30, 2009

!
!
i=1

ci ((x) (xi )) =

!
!
i=1

ci K(x, xi )

Example in 1D
Y
y=1

X
y=-1

Friday, October 30, 2009

Software
Good Large-Scale Solvers
SVM Light: http://svmlight.joachims.org
SVM Torch: http://www.torch.ch
libSVM:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Friday, October 30, 2009

R. Rifkin

Support Vector Machines

Model Selection

We have to fix the Regularization parameter C

We have to choose the kernel (and its
parameter)

Using default values is

usually a BAD BAD idea

Friday, October 30, 2009

Regularization Parameter

With slack variables, the primal SVM problem becomes

!"
1
2
min
C

+
||w
||
i
i=1
2
n
n
w R ,R ,bR

subject to : yi (w x + b) 1 i

Friday, October 30, 2009

i = 1, . . .

i 0
i = 1, . . .
Large C: we try to minimize errors ignoring
the complexity of the solution

Small C we ignore the errors to obtain a

simple solution

Which Kernel?

For very high dimensional data linear kernel is often

the default choice

allows computational speed up

less prone to overfitting

Gaussian Kernel with proper tuning is another

common choice

Whenever possible use prior knowledge

to build problem specific features or

Friday, October 30, 2009

2D demo
Large and Small Margin Hyperplanes

demo

(a)

(b)

R. Rifkin

Friday, October 30, 2009

Support Vector Machines

Practical Rules
We can choose C (and the kernel
parameter) via cross validation

Holdout set
Training Set

Validation
Set

K-fold cross validation

K=# of examples is called Leave One Out

Friday, October 30, 2009

K-Fold CV
We have to compute several solutions...

Friday, October 30, 2009

ISCLASS minimum, and this is reflected in inefficiencies nearer

CLASS curves in other simulation studies we have done show this
We have observed (as did Joachims) that the value of XA in the
a good estimate of the value of MISCLASS at its minimizer, only
stic. The GACV at its minimizer is an estimate of twice the miste. The value of one half the GACV is somewhat more pessimistic.
ce one obtains the solution to the problem the computation of both
R)XA are equally trivial.

A Rule of Thumb

This is how the CV error typically looks like

og10 GCKL

log10 GACV

!!"'%
!!"'$

)%%%

!"6%

'
!!"$%

!!"&%

*
+,-(.45-0/3

!!"$$

!"&%
!"'%

(
)

!"(%

!%%%

!!"&$
!)
!!"#%

!!"#$

!!"(
!!"'

!)!
,-(.+/012/3

!'
!(!

og10 BRM ISCLASS

!!"&
!)$

!)!
+,-(.+/012/3

log10 BRXA
!)%%

Fix a reasonable kernel, then fine tune C

!!"'

$
'

Friday, October 30, 2009

!)")

*
(

!!"$
!!"&

Which values do we start from?

Basics: RKHS, Kernel

For
the
Gaussian
kernel,
of the
RKHS
H with
a positive semidefinite
kernelpick
functionsigma
k:
order oflinear:
the average
distance...
k(X , X ) = X X
i

polynomial:
gaussian:

t
i

k(Xi , Xj ) = (Xit Xj + 1)d

"
!
||Xi Xj ||2
k(Xi , Xj ) = exp
2

Define the kernel matrix K to satisfy Kij = k(Xi , Xj ).

Abusing notation, allow k to take and produce sets:

(X
, X) = K
TakekGiven
min
(and max) C as the value for which
an arbitrary point X , k (X , X ) is a column vector
whose ith entry is k (X , X ).
the
training
set
error
does
not
increase
The linear kernel has special properties, which we discuss
in detail later.
(decrease)
anymore.

R. Rifkin

Friday, October 30, 2009

Regularized Least Squares

Computational Considerations

the training time depends on the

parameters: the more we fit, the slower the
algorithm.

typically the computational burden is in the

selection of the regularization parameter
(solvers for regularization path).

Friday, October 30, 2009

Regularization Networks
SVM are an example of a family of algorithms
of the form:
C

!
!
i=1

V (yi , w (xi )) + !w!2

V is called loss function

Friday, October 30, 2009

Hinge Loss
V (yw (x))

0-1 loss
hinge loss

Friday, October 30, 2009

yw (x)

Loss functions

L OSS FUNCTIONS

Friday, October 30, 2009

Representer Theorem
For a LARGE class of loss functions:
w (x) =

n
!
i=1

i ((x) (xi )) =

n
!

i K(x, xi )

i=1

The way we compute the coefficients depends on the

considered loss function.

Friday, October 30, 2009

Regularized LS
The simplest, yet powerful, algorithm is
probably RLS
Square loss V (y, w (x)) = (y w (x))2
Algorithm

1
(Q + I) = y, Qi,j = K(xi , xj )
C

Leave one out can be computed at the price

of one (!!!) solution
Friday, October 30, 2009

Summary

Friday, October 30, 2009

Separable, Linear SVM

Non Separable, Linear SVM
Non Separable, Non Linear SVM
How to use SVM

Complete Guide To Sewing
94% (33)
Complete Guide To Sewing
440 pages
The Origins of New Ways of Working - Office Concepts in The 1970s
No ratings yet
The Origins of New Ways of Working - Office Concepts in The 1970s
12 pages
The Measurement Imperative
0% (1)
The Measurement Imperative
10 pages
The Origins of New Ways of Working - Office Concepts in The 1970s
No ratings yet
The Origins of New Ways of Working - Office Concepts in The 1970s
12 pages
SVM
No ratings yet
SVM
21 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Financial Market Volatility Forecasting
No ratings yet
Financial Market Volatility Forecasting
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
ML SVM Lect10 11
No ratings yet
ML SVM Lect10 11
27 pages
Supervised and Unsupervised SVM Techniques
No ratings yet
Supervised and Unsupervised SVM Techniques
78 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
SVM Student
No ratings yet
SVM Student
40 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
10 SVM
No ratings yet
10 SVM
77 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
33 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
This Is
No ratings yet
This Is
7 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
SVM Basics for Computer Science Students
No ratings yet
SVM Basics for Computer Science Students
36 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Deep Learn
No ratings yet
Deep Learn
48 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
AI18-Support Vector Machines
No ratings yet
AI18-Support Vector Machines
24 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
2.1 SVM
No ratings yet
2.1 SVM
16 pages
State of Ecommerce Report
No ratings yet
State of Ecommerce Report
28 pages
PSPP Users' Guide
No ratings yet
PSPP Users' Guide
209 pages
Xts Cheat Sheet R
No ratings yet
Xts Cheat Sheet R
1 page
Calmar Eng Doc
No ratings yet
Calmar Eng Doc
36 pages
Game Def
No ratings yet
Game Def
11 pages
Using The NoSQL Capabilities in Postgres
No ratings yet
Using The NoSQL Capabilities in Postgres
18 pages
Data Science Recommended Books
No ratings yet
Data Science Recommended Books
23 pages
Excel Cookbook Final
No ratings yet
Excel Cookbook Final
19 pages
Documentation of Mini Project..
No ratings yet
Documentation of Mini Project..
55 pages
Exam 2018
No ratings yet
Exam 2018
6 pages
Sindh University
No ratings yet
Sindh University
10 pages
ML Assignments
No ratings yet
ML Assignments
2 pages
Convex Optimization MCQs Guide
No ratings yet
Convex Optimization MCQs Guide
13 pages
Ganga Water Quality Assessment
No ratings yet
Ganga Water Quality Assessment
25 pages
Regression and SVM Formula Guide
No ratings yet
Regression and SVM Formula Guide
2 pages
Thesis Report
No ratings yet
Thesis Report
54 pages
1 s2.0 S0167473020301107 Main
No ratings yet
1 s2.0 S0167473020301107 Main
14 pages
Machine Learning IDS for Cybersecurity
No ratings yet
Machine Learning IDS for Cybersecurity
5 pages
Survey
No ratings yet
Survey
20 pages
Gr4j Machine Learning
No ratings yet
Gr4j Machine Learning
21 pages
Remote Sensing Applications: Society and Environment: Articleinfo
No ratings yet
Remote Sensing Applications: Society and Environment: Articleinfo
15 pages
Research Paper
No ratings yet
Research Paper
10 pages
Kehri Awale 2020 A Facial Emg Data Analysis For Emotion Classification
No ratings yet
Kehri Awale 2020 A Facial Emg Data Analysis For Emotion Classification
14 pages
Topic 08 - Data Modelling - Part II
No ratings yet
Topic 08 - Data Modelling - Part II
59 pages
Underwater Mine Detection ML
No ratings yet
Underwater Mine Detection ML
10 pages
AI Overview v3
No ratings yet
AI Overview v3
38 pages
Pcos Review (Bharath)
No ratings yet
Pcos Review (Bharath)
3 pages
Intrusion Detection System An Automatic Machine Learning Algorithms Using Auto - WEKA
No ratings yet
Intrusion Detection System An Automatic Machine Learning Algorithms Using Auto - WEKA
5 pages
MLT 10
No ratings yet
MLT 10
3 pages
Class Notes ML 1
No ratings yet
Class Notes ML 1
108 pages
Mulberry Leaf Disease Detection
No ratings yet
Mulberry Leaf Disease Detection
7 pages
Driver Drowsiness Classification Using Fuzzy Wavelet-Packet-Based Feature-Extraction Algorithm
No ratings yet
Driver Drowsiness Classification Using Fuzzy Wavelet-Packet-Based Feature-Extraction Algorithm
11 pages
Cybersecurity of Autonomous Vehicles A Systematic Literature Review of Adversarial Attacks and Defense Models
No ratings yet
Cybersecurity of Autonomous Vehicles A Systematic Literature Review of Adversarial Attacks and Defense Models
21 pages
Vector To Matrix Representation For CNN Networks For Classifying Astronomical Data
No ratings yet
Vector To Matrix Representation For CNN Networks For Classifying Astronomical Data
21 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
30 pages
Honey Adulteration Detection Methods
No ratings yet
Honey Adulteration Detection Methods
7 pages
Tian Han
No ratings yet
Tian Han
36 pages
Fine-Tuning Transformer Models Using Transfer Learning For Multilingual Threatening Text Identification
No ratings yet
Fine-Tuning Transformer Models Using Transfer Learning For Multilingual Threatening Text Identification
13 pages