CS-471: Machine Learning
Week 7: Support Vector Machine
Instructor: Dr. Daud Abdullah
Lecture Outline
• Recap: K-NN algorithm
• Support Vector Machine
• Concept of Kernel in SVM
• Hinge Loss
2
Support Vector Machines
• With supervised learning, the performance of many supervised
learning algorithms can be similar
• Flexible choice between learning algorithm A or learning algorithm B
• What matters more is the amount of data you have for these algorithms,
and skill in applying these algorithms
• Support vector machine
• Compared to logistic regression the SVM sometimes gives a cleaner, and
sometimes more powerful way of learning complex non-linear functions
3
Recap: Logistic Regression
4
Alternative View of Logistic Regression
5
Recap: Logistic Regression
6
Recap: Logistic Regression
7
Alternative View of Logistic Regression
Gives the SVM computational advantages leading to easier
optimization problem
8
Logistic Regression
Support Vector Machine
9
SVM Hypothesis
Hypothesis
10
Support Vector
Machines:
Large Margin
Intuition
11
Support Vector Machine
12
Support Vector Machine
13
SVM Decision Boundary
14
SVM Decision Boundary
Note: You want to minimize the ‘C’, and ideally make it zero
15
SVM Decision Boundary
16
SVM Decision Boundary: Linearly Separable Case
SVM is sometimes also called a large margin classifier and
this is a consequence of the optimization problem we solved
17
Large Margin Classifier in Presence of Outlier
18
Large Margin Classifier in Presence of Outlier
For a single outlier, the DB should not ideally change like shown above
19
Large Margin Classifier in Presence of Outlier
C not too large
If the regularization parameter C were very large, then SVM will end up
changing the Decision Boundary If C were reasonably small, then black
line will remain as the Decision Boundary 20
Support Vector Machine
The Concept of Kernels in SVM
21
Non-Linear Decision Boundary
Is there a different/better choice of features f1,f2,f3,…?
22
Kernel
• Given x , compute new feature depending on proximity to
landmarks l(1), l(2), l(3)
• Manually select three landmark points
23
Kernel
24
Kernel and Similarity
25
Example
26
27
28
29
SVM Parameters
C (1/ 𝜆 ).
Large C: Lower bias, high variance. Large 𝜆
Small C: Higher bias, low variance. Small 𝜆
30
Choosing the landmarks
31
Kernel(Similarity) Functions
Note: Do perform feature scaling before using the Gaussian kernel
32
In-Class Activity (10mins)
• Discussion about Hinge Loss
• Formula for calculation of loss
• What is the mathematical condition for hinge loss to be zero?
• Why do we use hinge loss instead of other loss functions in SVM?
33
Hinge Loss
• Hinge loss is the loss function used in Support Vector Machines (SVMs) for
classification tasks. It helps maximize the margin between data points and
the decision boundary.
• For a given training sample (𝑥𝑖 , 𝑦𝑖 ) where:
• 𝑦𝑖 ∈ {−1,+1}, (labels must be -1 or +1),
• w is the weight vector, and
• b is the bias,
• the Hinge loss is defined as:
𝐿𝐻𝑖𝑛𝑔𝑒 = max(0,1 − 𝑦𝑖 (𝑤. 𝑥1 + 𝑏))
34
Hinge loss
• Encourages maximum margin between classes.
• Penalizes misclassified points and those inside the margin.
• Works well with the SVM optimization
35
Hinge Loss v/s Zero-One Loss
• The vertical axis represents the
value of the Hinge loss (in blue)
and zero-one loss (in green) for
fixed t = 1, while the horizontal axis
represents the value of the
prediction y.
• The plot shows that the Hinge loss
penalizes predictions y < 1,
corresponding to the notion of a
margin in a support vector
machine.
36
Questions?
37