Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
46 views44 pages

Lecture 6 Classification SVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views44 pages

Lecture 6 Classification SVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

INT3405 - Machine Learning


Lecture 5: Classification (P3) - SVM
Duc-Trong Le & Hoang Van Xiem

Hanoi, 03/2024
Outline
● Problem and Intuition
● Formulation of Linear SVM
○ Hard Margin SVM
○ Soft Margin SVM
○ Primal/dual Problems
● Nonlinear SVM with Kernel
○ Kernel Tricks
○ SVM with Kernel
● Multi-class classification
FIT-CS INT3405 - Machine Learning 2
Recap: Bayes Theorem & Decision Boundary

Posterior Likelihood Prior Decision Boundary

FIT-CS INT3405 - Machine Learning 3


History
● SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather
popular since.
● Theoretically well motivated algorithm: developed from Statistical
Learning Theory (Vapnik & Chervonenkis) since the 60s
● Empirically good performance: successful applications in many fields
(bioinformatics, text, image recognition, . . . )
● Centralized website: www.kernel-machines.org

FIT-CS INT3405 - Machine Learning 4


Problem Setting
● Problem Setting
○ Training data

○ For two-class (binary)


classification

● Goal
○ To find an optimal linear
hyperplane (decision boundary)
that separates all the data

FIT-CS INT3405 - Machine Learning 5


Intuition

● One possible solution

FIT-CS INT3405 - Machine Learning 6


Intuition

● Another possible
solution

FIT-CS INT3405 - Machine Learning 7


Intuition

● Too many other


possible solutions

FIT-CS INT3405 - Machine Learning 8


Intuition

● Which one is better


than the other?
● How to define better?

FIT-CS INT3405 - Machine Learning 9


Intuition: Maximum Margin
● Intuition of “Margin”
○ The margin of a linear classifier as the
width that the boundary could be
increased by before hitting a data point.

● Idea of SVM
○ Find the separating hyperplane
maximizing the margin

FIT-CS INT3405 - Machine Learning 10


Support Vector Machines (SVM)

Support
Vectors

FIT-CS INT3405 - Machine Learning 11


SVM: Optimization Formulation (1)

●From Margin to Norm


■ Margin: distance between

■ Maximizing margin is equivalent to minimizing


●Constraints
○ Separation with margin, i.e.,

○ Simplified as the equivalent constraint

FIT-CS INT3405 - Machine Learning 12


SVM: Optimization Formulation (2)
●SVM as a Quadratic Programming (QP) problem

○ Convex problem, has unique minimum


○ Quadratic objective function
○ Linear equality and inequality constraints

FIT-CS INT3405 - Machine Learning 13


Linearly Non-separable Cases
● What if the data cannot be linearly separable?

● For such case,


Hard margin SVM cannot be applied
directly

FIT-CS INT3405 - Machine Learning 14


Soft Margin SVM
●Standard Linear SVM
○Introduce slack variables
○Relax the constraints
○Penalize the relaxation
Primal
Problem:

C is a regularization parameter. Soft margin SVM trade off between


maximizing the margin and minimizing the misclassification error rate

FIT-CS INT3405 - Machine Learning 15


Linearly Non-separable Case
● Re-written as an unconstrained optimization:

FIT-CS INT3405 - Machine Learning 16


Linearly Non-separable Case
Model Training
Complexity Error

Support
Vector
Machine

Regularized
logistic
regression

Choice of Parameter C:
Large C: Lower bias, high variance
Small C: Higher bias, low variance
FIT-CS INT3405 - Machine Learning 17
Dual Form of SVM

https://www.quora.com/What-is-primal-and-dual-formulation-in-SVM

FIT-CS INT3405 - Machine Learning 18


Suppose we’re in 1-dimension

What would SVMs


do with this data?

FIT-CS INT3405 - Machine Learning 19


Suppose we’re in 1-dimension

Not a big surprise

FIT-CS INT3405 - Machine Learning 20


Harder 1-dimensional Dataset

FIT-CS INT3405 - Machine Learning 21


Harder 1-dimensional Dataset

FIT-CS INT3405 - Machine Learning 22


Harder 1-dimensional Dataset

FIT-CS INT3405 - Machine Learning 23


SVM: Nonlinear Case
● Limitation of linear SVM
○ Linear SVM classifiers sometimes are restricted for some complex classification tasks
where data are not linearly separable in input space
● Basic Idea of Nonlinear SVM
○ Map data into a richer feature space including nonlinear features, then construct a
linear hyperplane in that space (using the same way)

FIT-CS INT3405 - Machine Learning 24


SVM: Nonlinear Case
● First, define a feature mapping

● Then learns a hyperplane in the feature space

● Almost the same Primal form of SVM

FIT-CS INT3405 - Machine Learning 25


SVM: Nonlinear Case
● The dual problem

● The optimal solution

FIT-CS INT3405 - Machine Learning 26


How to choose the feature mapping?

• Polynomial mapping
• Example:

• Problem of using explicit feature mapping:


• The dimensionality of can be very large, making w hard to
represent explicitly in memory, and hard for the QP to solve

FIT-CS INT3405 - Machine Learning 27


Kernel Tricks
• Idea: Replacing dot product with a kernel function

• Not all functions are kernel functions


• A function could be a kernel if it is
○ Symmetric:
○ Positive semi-definite (PSD): the “Gram matrix” K
defined by is PSD
(the PSD means )
• Benefits
○ Efficiency: Computing kernel is often more efficient than compute
and the dot product
○ Flexibility: can choose various kernel functions as long as the
existence of is guaranteed (Mercer’ condition)
FIT-CS INT3405 - Machine Learning 28
Kernel Functions
• Linear Kernel

• Polynomial Kernel (degree d)

• Gaussian / RBF Kernel

FIT-CS INT3405 - Machine Learning 29


Kernel Functions
● Example: Polynomial Kernels

FIT-CS INT3405 - Machine Learning 30


Gaussian/RBF Kernel
●The kernel can be inner product in the infinite dimensional space.
Assume x∈R.

FIT-CS INT3405 - Machine Learning 31


Nonlinear SVM with Kernel (1)
● Introducing nonlinearity into the model
● Computationally efficient
● The dual form

● The decision function

FIT-CS INT3405 - Machine Learning 32


Nonlinear SVM with Kernel (2)

FIT-CS INT3405 - Machine Learning 33


Nonlinear SVM with Kernel (3)

FIT-CS INT3405 - Machine Learning 34


Nonlinear SVM with Kernel (4)
●The inner product in the feature space (similarity score) is performed
implicitly
●Any linear classification method can be easily extended to nonlinear
feature space (e.g., kernelized logistic regression)
●Non-vectorial data can be utilized (as long as kernel matrix is PSD)
●Questions:
○ Which kernel to use? How to set the parameters?
○ One kernel for each feature type or for all?

FIT-CS INT3405 - Machine Learning 35


Curse of Kernalization

●Challenge
○ Training kernel classifiers is often much more computationally
expensive
○ For kernel SVM, if one solves it by typical QP solvers, it will need
O(N^3). Even for faster solvers (SMO) or others, it typically needs
at least O(N^2) time cost.
○ But linear classifiers can be trained in much fasters, typically in
linear time O(N)
●Question
○ How to train kernel machines for large-scale datasets?

FIT-CS INT3405 - Machine Learning 36


Kernel Approximation
● Our goal
○ To construct a new representation
so that :
● Linear model
○ The hypothesis can be rewritten:

where
○ Then apply linear classifiers on the new representation z
● Two methods
○ Kernel Functional Approximation: Fourier method
○ Kernel Matrix Approximation: Nystrӧm method

FIT-CS INT3405 - Machine Learning 37


Multi-class Classification
● Consider k classes
● One-against-the rest: Train k binary SVMs:
○ 1st class vs. (2 − k)th class
○ 2nd class vs. (1, 3 − k)th class
○…
● k decision functions

FIT-CS INT3405 - Machine Learning 38


Multi-class Classification
● Prediction

● Reason: If it’s the 1st class, then we should have

FIT-CS INT3405 - Machine Learning 39


Multi-class Classification
● One-against-one: train k(k − 1)/2 binary SVMs
(1,2), (1,3), . . . , (1,k), (2,3), (2,4), . . . , (k−1,k)
● Example: if 4 classes 6 binary SVMs

FIT-CS INT3405 - Machine Learning 40


Multi-class Classification
● For a testing data, predict all binary SVMs

● Select the one with the


largest vote

● May use decision values as well

FIT-CS INT3405 - Machine Learning 41


Multi-class Classification
● There are many other methods
● A comparison in [Hsu and Lin, 2002]
● Accuracy similar for many problems
● But 1-against-1 fastest for training
● Assume the SVM optimization with size n is
● 1 vs. all
○ k problems, each has N data
● 1 vs. 1
○ k(k − 1)/2 problems, each 2N/k data on average

Chih-Wei Hsu and Chih-Jen Lin, "A comparison of methods for multiclass support vector machines," in IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 415-
425, March 2002

FIT-CS INT3405 - Machine Learning 42


Summary
● Problem and Intuition
● Formulation of Linear SVM
○ Hard Margin SVM
○ Soft Margin SVM
○ Primal/dual Problems
● Nonlinear SVM with Kernel
○ Kernel Tricks
○ SVM with Kernel
● Multi-class classification
FIT-CS INT3405 - Machine Learning 43
UET
Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

Thank you
Email me
[email protected]

You might also like