Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views37 pages

Lecture W7ab

The lecture covers Support Vector Machines (SVM), emphasizing their advantages over logistic regression in learning complex non-linear functions. Key concepts include the role of kernels in SVM, the importance of the regularization parameter C, and the use of hinge loss as a loss function to maximize the margin between data points and the decision boundary. The session also includes a recap of logistic regression and discussions on SVM parameters and decision boundaries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views37 pages

Lecture W7ab

The lecture covers Support Vector Machines (SVM), emphasizing their advantages over logistic regression in learning complex non-linear functions. Key concepts include the role of kernels in SVM, the importance of the regularization parameter C, and the use of hinge loss as a loss function to maximize the margin between data points and the decision boundary. The session also includes a recap of logistic regression and discussions on SVM parameters and decision boundaries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

CS-471: Machine Learning

Week 7: Support Vector Machine


Instructor: Dr. Daud Abdullah
Lecture Outline

• Recap: K-NN algorithm

• Support Vector Machine

• Concept of Kernel in SVM

• Hinge Loss

2
Support Vector Machines
• With supervised learning, the performance of many supervised
learning algorithms can be similar
• Flexible choice between learning algorithm A or learning algorithm B
• What matters more is the amount of data you have for these algorithms,
and skill in applying these algorithms
• Support vector machine
• Compared to logistic regression the SVM sometimes gives a cleaner, and
sometimes more powerful way of learning complex non-linear functions

3
Recap: Logistic Regression

4
Alternative View of Logistic Regression

5
Recap: Logistic Regression

6
Recap: Logistic Regression

7
Alternative View of Logistic Regression

Gives the SVM computational advantages leading to easier


optimization problem
8
Logistic Regression

Support Vector Machine

9
SVM Hypothesis

Hypothesis

10
Support Vector
Machines:
Large Margin
Intuition

11
Support Vector Machine

12
Support Vector Machine

13
SVM Decision Boundary

14
SVM Decision Boundary

Note: You want to minimize the ‘C’, and ideally make it zero
15
SVM Decision Boundary

16
SVM Decision Boundary: Linearly Separable Case

SVM is sometimes also called a large margin classifier and


this is a consequence of the optimization problem we solved
17
Large Margin Classifier in Presence of Outlier

18
Large Margin Classifier in Presence of Outlier

For a single outlier, the DB should not ideally change like shown above

19
Large Margin Classifier in Presence of Outlier

C not too large

If the regularization parameter C were very large, then SVM will end up
changing the Decision Boundary If C were reasonably small, then black
line will remain as the Decision Boundary 20
Support Vector Machine

The Concept of Kernels in SVM

21
Non-Linear Decision Boundary

Is there a different/better choice of features f1,f2,f3,…?

22
Kernel

• Given x , compute new feature depending on proximity to


landmarks l(1), l(2), l(3)
• Manually select three landmark points

23
Kernel

24
Kernel and Similarity

25
Example

26
27
28
29
SVM Parameters

C (1/ 𝜆 ).

Large C: Lower bias, high variance. Large 𝜆

Small C: Higher bias, low variance. Small 𝜆

30
Choosing the landmarks

31
Kernel(Similarity) Functions

Note: Do perform feature scaling before using the Gaussian kernel

32
In-Class Activity (10mins)

• Discussion about Hinge Loss


• Formula for calculation of loss
• What is the mathematical condition for hinge loss to be zero?
• Why do we use hinge loss instead of other loss functions in SVM?

33
Hinge Loss
• Hinge loss is the loss function used in Support Vector Machines (SVMs) for
classification tasks. It helps maximize the margin between data points and
the decision boundary.
• For a given training sample (𝑥𝑖 , 𝑦𝑖 ) where:
• 𝑦𝑖 ∈ {−1,+1}, (labels must be -1 or +1),
• w is the weight vector, and
• b is the bias,
• the Hinge loss is defined as:
𝐿𝐻𝑖𝑛𝑔𝑒 = max(0,1 − 𝑦𝑖 (𝑤. 𝑥1 + 𝑏))

34
Hinge loss
• Encourages maximum margin between classes.
• Penalizes misclassified points and those inside the margin.
• Works well with the SVM optimization

35
Hinge Loss v/s Zero-One Loss
• The vertical axis represents the
value of the Hinge loss (in blue)
and zero-one loss (in green) for
fixed t = 1, while the horizontal axis
represents the value of the
prediction y.
• The plot shows that the Hinge loss
penalizes predictions y < 1,
corresponding to the notion of a
margin in a support vector
machine.

36
Questions?
37

You might also like