Support Vector Machines ( . .
I SVM " " " "
.
T minimi e:
T L :
S L , , , unique
( ) . O " ", + , " "
ppo ec o . T . T " "
, ( ) . N ,
" " 2D . I 3D ;
, .
S L
. I , Q P . A SVM P ' SMO (S
M O ) . F SVM , ,
/ .
Useful Equations for solving SVM questions
A. Equations derived from optimi ing the Lagrangian:
1. Partial of the Lagrangian wrt to b: F
N .
S ( ) 0.
2. Partial of the Lagrangian wrt to w: F
F .
T .
S
F (
).
S , .
B. Equations from the boundaries and constraints:
3. The Decision boundar :
G , .
T ,
support vectors .
S
4. Positive gutter:
G , .
For use when the Kernel is linear.
5. Nega i e g er:
6. The id h of he margin (or road):
where,
Alternate formula for the two support vector case:
Thi e ai i ef he i g SVM be i 1D 2D, he e he id h f he ad ca be visuall determined.
Common SVM Kernels:
In document classification, feature vectors are composed of binary word
features:
Linear Kernel I(word=foo) outputs 1 if the word "foo" appears in the document 0 if it does
not.
Each document is represented as vocabulary length feature vectors. Support
vectors found are generally particularly salient documents (documents best at
discriminating topics being classified).
Decomposable Kernels
Idea: Define that transforms input Example:
vectors into a different (usually higher)
dimensional space where the data is (more
easily) linearly separable.
n>1
Example: Quadratic Kernel:
In 2D resulting decision boundary can look parabolic, linear or
hyperbolic depending on which terms in the expansion dominate.
Here is an expansion of the quadratic kernel, with u = [x, y]
Polynomial Kernel
HW: Try this Kernel using Professor Winston's demo
In 2D generated decision boundaries resemble contour circles around clusters
Radial Basis Function (RBF) or Gaussian of +ve and ve points. Support vectors are generally +ve or ve points that
Ke e a ec e he i gc e . The c ace d a e f
f ec Ga ia .
Wi fi a a da a. Ma e hibi
e fi i g he ed i e . HW: T hi Ke e i g P fe Wi ' de
Si i a KNN b iha i
ha i g a e; eigh f each e Whe i a ge ge f a e Ga ia . Whe i a ge
de e i ed b Ga ia ha e Ga ia . (He ce he i ga a c de i i
P i fa he a a ge e f a ea c e / de e a d ec i ).
a e ha i ea b
He e i he Ke e i 2D e a ded , ih =[ , ]
A a i ge c e a ec i a ache e (0) = 1. A a i
e fa a a f a ec i a ache e (i fi i ) = 0.
P e ie f a h:
Sig ida ( a h) Ke e Si i a he ig id f ci
A f c bi a i f i ea
deci i b da ie Ra ge f 1 +1.
ah ( ) => +1 he >> 0
ah ( ) => 1 he << 0
Re i g deci i b da ie a e gica c bi a i f i ea
b da ie . N diffe e f ec d a e e i Ne a Ne .
Li e RBF, a e hibi e fi i g he i e ed.
Li ea c bi a i f Ke e Sca i g:
f a>0
Idea: Ke e f c i a ec ed de Li ea c bi a i :
addi i a d ca i g (b a ii e be ). a,b>0
Method 1 of Solving SVM parameters b inspection:
Thi i a e b e i P b e 2.A f 2006 i 4:
We a e gi e he f i gga h ih a d i he a i ;
+ e i a 1 (0, 0) a d a e i 2 a (4, 4).
Ca a SVM e a a e hi ? i.e. i i i ea e a ab e? Hec Yeah! i g he i e ab e.
Part 2A: Provide a decision boundar :
We ca fi d he deci i b da b g a hica i ec i .
1. The deci i b da ie he i e: = + 4
2. We ha e a + e ec a (0, 0) i h i e e a i =
3. We ha e a e ec a (4, 4) i h i e e a i = +8
Gi e he e a i f he deci i b da , e e a age he a geb a ge he deci i b da c f i h he
de i ed f , a e :
1. (< beca e + e i be he i e)
2.
3. ( i ied b 1)
4. ( ii g he c efficie e ici )
N e ca ead he i f he e a i c efficie :
1 = 1 2 = 1 b=4
Ne , i g f af id h f ad, e chec ha he e eigh gi e a ad id h f: .
WAIT! Thi i c ea he id h f he " ide " ad/ a gi .
We e e be ha a i e c (c>0) f he b da e a i i i he a e deci i b da . S a e ai f he
f :
S ide hi deci i b da . S he e i a e ge e a i :
1 = c 2 = c b = 4c
a d
U ing The Wid h of he Road Con ain
G a hica e ee ha he ide id h a gi h d be:
The i eigh ec a d i e ce ca be ed b i gf cc ai ed b he id h f he ad.
Le g h f i e f c:
N gi a hi i he a gi id h e ai a d i gf c, e ge :
=> => =>
Thi ea he e eigh ec a d i e ce f he SVM ol ion h d be:
a d
Ne e ol e fo alpha , i g he ec a de ai 1.
P gi i he ec a e f ec a d :
We ge ide ica e ai :
U i gE ai 1, e ca ef he he a ha:
Pa 2B: D e he b da cha ge if a + e i 3 i added a (1, 1)?
N . S ec a e i a 1, a d 2. Deci i b da a he a e.
Pa 2C: Wha if i 2 ( e) i ed c di a e ( , )?
H i a e cha ge, i c ea e, dec ea e a a e? Whe = 2? a d = 8?
A e : G bac h e ed f a ha :
P gi i 2
S i gf
U i g he fac ha ,
a d id h f ad/ a gi .
We e e a ha i e f he a gi m:
A e:
Whe cha ge f 4 2. The a gi ( ad id h) i ha ed a d i a ha ed. S a ha i c ea e b a
fac f 4.
Whe cha ge f 4 8. The a gi i d b ed, i a d b ed. S a ha dec ea e b a fac f 4.
Th gh e d ide a f f he e. A ha i ge e a cha ge inversel i h .
Wide ad > e a ha. Na ed ad > highe a ha
Me h d 2: S i g f a ha, b, a d ih i a i ec i (B c i g Ke e
a d i gC ai e a i )
E a ef 2005 Fi a E a .
I hi be a e d ha ha e he f i g i .
e i : A a (0, 0) B a (1, 1)
+ e i : C a (2, 0)
and that these points lie on the gutter in the SVM ma margin solution.
S e 1. C ea e e f ci a e , hich i hi ca e, he e a e a d d c .
K(A, A) = 0*0+0*0 = 0 K(A, B) = 0*1+0*1 = 0 K(A, C) = 0*2+0*0 = 0
K(B, A) = 1*0+1*0 = 0 K(B, B) = 1*1+1*1 = 2 K(B, C) = 1*2+1*0 = 2
K(C, A) = 2*0+0*0 = 2 K(C, B) = 2*1+0*1 = 2 K(C, C) = 2*2+0*0 = 4
S e 2: W i e he e fe ai , i g SVM c ai :
C ai 1: ,
C ai 2: i i eg e.
C ai 3: ega i e g e.
Thi i ie d 4 e ai .
C1 1 1 1 0 0
C3.A AK(A,A)= BK(B,A)= cK(C,A)=+1*2=2 + 1 1
1*0=0 1*0=0
AK(A,B)= BK(B,B)=
C3.B cK(C,B)=+1*2=2 + 1 1
1*0=0 1*2=2
AK(A,C)= BK(B,C)=
C2.C cK(C,C)=+1*4=4 + 1 +1
1*0=0 1*2=2
For clarit here are the four equations:
C1
C3.A
C3.B
C2.C
Step 3: Use our favorite method of solving linear equations to solve for the 4 unknowns.
Answer:
This is a more general wa to solve SVM parameters, without the help of geometr . This method can be applied to problems
where "margin" width or boundar equation can not be derived b inspection. (e.g. > 2D)
NOTE: We used the gutter constraints as equalities above because we are told that the given points lie on the "gutter".
More realisticall , if we were given more points, and not all points la on the gutters, then we would be solving a s stem of
ineq ali ie (because the gutter equations are reall constraints on >= 1 or <= 1).
In the quadratic programming solvers used to solve SVMs, we are in fact doing just that, we are minimi ing a target function
b subjecting it to a s stem of linear inequalit constraints.
E ample of SVMs ith a NonLinear Kernel
From Part 2E of 2006 Q4. You are given the graph below and the following kernel:
and ou are asked to solve for equation for the decision boundar .
Step 1: First, decompose the kernel into a dot product of functions:
A e:
S e 2: C e a igi a i i he e ace i g he a f . (We a e g i g f 2D 1D).
Positi e i a e a :
Negati e i a ea :
S e 3: P he i i he e ace, hi a ea a a i e f 0 8.
Wi h i i e i a 0, 2, 4 a d ega i e i a 6, 8.
The ec ie be ee a d (be ee a e f 4 a d 6)
He ce he deci i b da ( a i a gi ) h d be:
The < d e he i i e i bei g a e ha 5.
E a di g he de e i ed deci i b da i e fc e f , e ge :
S a e b h ide :
C e ( a da d f ):
Thi i a ci c e i h adi
An Abstract Lesson on Support Vector Behavior
S e ha e he ab e e f i . Le ' e he SVM a a e e b i ec i .
1. B da e ai :
=> =>
2. Read ff he a db a d i b c (c>0):
3. N a he id h f he ad/ a gi c ai :
ggi g i i e g h f ,a d i gf c:
=>
4. N e ha e he SVM i a i a d b:
5. Ne , ef he i g he ag a gia e ai :
a d
a) F e a di g he fi e ai , e ge :
hich ead e ai :
a d
b) F e a di g he ec de ai , e ge :
c) P i g he e ai f a) a d b) ge he e ca ef he he a ha .
a d i ia f
We ee ha he + e ec a ha a e i ba ed he a i f di a ce de e i ed b a d . If = ee
e a , he = =
Ob e a i A:
Q: S e e ed i A he igi a (0, 0). Wha ha e a d ?
A: Thi c fig a i ba ica i ie = 0; e ge : a d .
C ce a , bec e he sole primar support vector beca e i A i di ec ac f i B. P i
A ake p all he ha e of he "p e e" in holding p he ma gin; poin C, ho gh ill on he g e , effec i el become a
non ppo ec o . So hi implie ha poin on he g e ma no al a e e he ole of being a ppo ec o .
Ob e a ion B:
Q: S ppo e e changed k, b mo ing poin B p/o do n he a i ha happen o he alpha ?
A: All he alpha a e p opo ional o
If k decreases, he oad narro s, he alpha increases. Analog , the supports need to appl more "pressure" to push the
margin tighter.
If k increases, he oad idens, he alpha decrease. Analog : wider road needs less "pressure" on the supports to hold it in
place.