0% found this document useful (0 votes)

18 views8 pages

SVM 2

The lecture discusses Support Vector Machines (SVM) and the optimization problem involved in finding the optimal hyperplane for classification. It explains the dual formulation of the problem, the importance of support vectors, and the KKT conditions that must be satisfied for the solution. Additionally, it contrasts SVM with Linear Discriminant Analysis (LDA) in terms of stability and susceptibility to noise in the data.

Uploaded by

srkkps6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

SVM 2

Uploaded by

srkkps6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NPTEL

NPTEL ONLINE CERIFICATION COURSE

Introduction to Machine Learning

Lecture 28

Prof. Balaraman Ravindran

Computer Science and Engineering
Indian Institute of Technology Madras

Support Vector Machines II

Interpretation & Analysis

So this is the optimization problem which is actually a simple optimization problem is a

quadratic objective and a set of linear constraints. We already saw how to solve this. You guys
had a convex optimization tutorial. So one of the things that we are looking for from the convex
optimization tutorial is that you will know how to solve this problem. So I what we do after this,
write the Lagrangian right.

(Refer Slide Time: 01:46)

(Refer to slide time 01.46)

I have to apply this for every data point so I run runs from i=1 to n okay. And I put a p there so
there is a primal so we will have to form the dual of this the dual looks a lot easier to solve. Dual
is actually a lot easier to solve. So we will go ahead and do the dual. So for first I will take the
derivatives. So derivative with respect to β and you can do that and solve it we will get that
derivative with respect to β0 so now is where I'm going to do some hand waving but you can go
through this computation so take that substitute into this okay. And do a lot of simplification
rights, so remember we have this β squared here therefore I am going to get a αiαjyiyj kind of
terms.
(Refer to slide time 03.26)
(Refer to slide time: 04.38)

So the dual will be so the dual is going to be a slightly simpler form why is it a slightly simpler
form so I have to only consider my constraints have become of lot simpler here right it just going
to be αi’s should be non-negative that solves my constraints are so it turns out that there are
efficient ways of solving optimization problems of this form. You do not have to worry about it.
Here are lots of packages that solve SVMs for you.

But then you just need to know what kind of optimization problem we are solving. I do not want
you to use it as a black box. Essentially what you are going to be solving is this. So when you
have a solution, when you have something that is both primal and dual so we can actually show
that the duality gap is 0 in this case so it is not going to that. But the point is when I have a
solution to the problem right it has to satisfy certain conditions.

It is already looked at that the KKT conditions. If people do not remember it please go back and
revise that right. So there are a whole bunch of things so you need to for you need to have the
solution to be primal feasible right. You need to have the solution to be dual feasible and so that
essentially have a bunch of things. Primal feasible would mean that well your αi’s have to be
great that will be dual feasible way that will be one condition these need to whole because it is a
solution for the primal and there you are you have your complimentary slackness right. So that in
this case becomes
(Refer to slide time 06.37)
So I know if in the notes I think you saw it as λifi right. So this essentially that is it so this is αifi.
So this is this may affect so that is the fourth these are the KKT conditions, that need to be
satisfied okay. And so what does this tell us?

Tells us a couple of things one so we know what the form of β should be what is the form of β it
has to be αiyixi right. So it is essentially what you are going to do is your β will be taking out
certain data points from your training data right and adding them up. So suitably multiplying it
by the output the desired output, so if xi’s output was positive then this will be +1 if xi’s output
was negative this will be -1.

So it is going to take a few of those and they are going to add them up right. So this should
remind you of perceptrons. So if you remember what we did in perceptions is we took whatever
was misclassified we just kept adding it to the weight vector. So in some sense you are doing
something very similar to that but instead of having some kind of a heuristic approach to
optimizing things. We did do a gradient descent but then we just said ok we will arbitrarily pick
the set of misclassified points and we will do the gradient descent and so on so forth. But here we
started off by saying okay we will minimize the distance to the closest point and from there we
derive something and it looks very suspiciously like the perceptron update rule. In fact nowadays
when people say I am going to train a perceptron they are actually doing this more often than
using the perceptrons learning rule right away. So now something else that you can observe so
this condition has to be satisfied. This condition has to be satisfied. So let us look at it there are
two terms here so when will i [ yi ( xiT   0 ) 1] be 0? When either  i is 0 or yi ( xiT   0 ) 1
is 0. These are some condition when this has to be 0. Yeah! You are right but for geometrically
can you give me an answer. So  i has to be zero is when the other term is not 0. So when will
the other term be not 0? When it is not the closest point. If xi is the closest point it will be bang
on the margin and for a point here that term will be 0. For a point here that term will be greater
than 1 or a point here the term will be greater than 1.

You see that so since the term will be greater than 1 the term in the square brackets will be non 0
so αi’s have to be 0. Correct, so what does this mean it means that points that are further away
from the hyper plane do not contribute to finding β. Because the α’s will be zero points that are
far away from the hyper plane are not going to contribute in finding β. In fact the points that will
contribute to β are exactly those points that are on the margin.

So in fact for this, this data set that they drew here right there are only two important points at
that one and this one because only two points are on the margin right. Such points which lie on
the margin are known as support points or support vectors right. And your β is going to depend
only on the support points. What about β0? So we can plug in any data point here, and we can
solve for β0 right.

One of these support points you can plug it in here and you can solve for β0. Which support point
do you pick? Ideally all of them should give you the same answer but usually does not happen
because of numerical reasons. So what typically people do is they plug in all the support points
okay. Solve for β0 and take the average right. So each one in turn it for every support point you
are going to get slightly different β0 you just take the average okay.

So that is how you compute the hyper plane at the end of it is basically how here when would α
be 0 if your data is on the hyper plane. So that will be one case when that happens. Essentially
you have two points which are on the same. It is not collinear but repeated things; I give you two
data points that are on the same point right. So by definition most of the support vectors will lie
on the same line so it cannot be collinear okay. So right in such cases that could be the case but,
yeah! These are generally degenerate cases yeah! So sure call them support vectors if you want.
So one thing to note is my fˆ so how this going to look likes now that I given the form for β here.
This is essentially going to look like I can flip these things around anyway that plus β0 right.
(Refer to slide time 15.24)
So, so if you think about it I will come back to this point later so if you look at the dual I only
have XTX right and if you look at the final classifier I am going to use I am going to have XT X.
So if i have a very efficient way of computing XTX right I can do some tricks with this whole
thing we will come back to that okay. I will just I want you to remember this so any questions on
this, any questions on this?

So before we move on I just wanted to point out something so if you think about, how LDA
works right. So LDA tries to do density estimation eventually right, if you if you think about it
you make some assumptions about the probability distribution the form of the probability
distribution. What assumption will you make it is Gaussian with equal covariance across all the
classes’ right.

Though, that essentially means that every data point in your training set is going to contribute
towards the parameters that you are estimating right. So the β will estimate there will depend on
all the data points that were given to you, whether they are here right close to the hyper plane or
whether they are very far away from the hyper plane. Let us all the data points will determine
your class boundary so that means that it becomes little susceptible to noise.

And if I have one or two data points that are generated through noise right even that will
contribute to determining the separating plane hyper plane right. On the other hand we test with
this kind of optimal hyper plane we are only worried about points that are close to the boundary
right. So I can do whatever I want here right I can change move a few points over here and
things like that it does not really matter.

What matters is if any noise enters close to the boundary? So that so in some sense if my noise is
uniform, the LDA will get more affected. Because even if noise insert some points there right
LDA classifier will change. Well my optimal hyper plane classifier will not move it will be
affected only by that fraction of the noise that changes the actual decision surface. Having said
that I should point out that if, if your data is truly Gaussian with equal covariance LDA is
actually optimal. It is probably optimal. While this one will depend on the actual data that you
get but in general would say this is more preferable because this is more stable. People remember
what stability is right; small changes in the data will not cause the classifier to change
significantly right.
So here small changes in the data will not cause it to change significantly in an expected sense
right. If I go and take the support vector and move it somewhere else okay. The class boundary
will change. But then I have whole bunch of other vectors which I can move around nothing will
happen to the class boundary unless I move it closer to the hyper plane than the existing support
vectors. If I take a point from here and move it here of course the class boundary will change. As
long as I do not modify which are the support vectors, I will get back the same classification
surface again and again all right. So in that sense SVM or will come to SVM little bit, this kind
of optimal hyper plane are very stable.

IIT Madras Production

Funded by
Department of Higher Education
Ministry of Higher Education
Government of India

www.nptel.ac.in
Copyrights Reserved

Perceptrons
No ratings yet
Perceptrons
12 pages
SVM 1
No ratings yet
SVM 1
6 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Main 7
No ratings yet
Main 7
25 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Advantages:: Q.No 1.a Ans
No ratings yet
Advantages:: Q.No 1.a Ans
12 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
25 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
28 pages
Week4 Summary Detail
No ratings yet
Week4 Summary Detail
9 pages
SVM Intro
No ratings yet
SVM Intro
114 pages
Support Vecto Machine
No ratings yet
Support Vecto Machine
62 pages
SVM 3
No ratings yet
SVM 3
11 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
07 SVMs
No ratings yet
07 SVMs
68 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
Introduction To Machine Learning (CS 771A, IIT Kanpur) : Course Notes and Exercises
No ratings yet
Introduction To Machine Learning (CS 771A, IIT Kanpur) : Course Notes and Exercises
39 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
SVM 1
No ratings yet
SVM 1
8 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
02
No ratings yet
02
11 pages
EXPLOR 1 Stamped
No ratings yet
EXPLOR 1 Stamped
46 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Machine Learning: SVM & Kernels
No ratings yet
Machine Learning: SVM & Kernels
5 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
SVM Slides
No ratings yet
SVM Slides
32 pages
COMP 4211 - Machine Learning
No ratings yet
COMP 4211 - Machine Learning
19 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
Support Vector Machine (With Numerical Example) - by Balaji C - Medium
No ratings yet
Support Vector Machine (With Numerical Example) - by Balaji C - Medium
16 pages
Math Behind SVM Part 2 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 2 (Support Vector Machine) - by MLMath - Io - Medium
6 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Dual Problem Optimization Guide
No ratings yet
Dual Problem Optimization Guide
21 pages
10 SVM
No ratings yet
10 SVM
77 pages
Ds 3
No ratings yet
Ds 3
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Class06 SVM
No ratings yet
Class06 SVM
47 pages
SVM Student
No ratings yet
SVM Student
40 pages
Report 1
No ratings yet
Report 1
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
PSR - Module4 - Test of Significance
No ratings yet
PSR - Module4 - Test of Significance
53 pages
PSR - Module3 - Bivariate Random Variables - Class11
No ratings yet
PSR - Module3 - Bivariate Random Variables - Class11
64 pages
Lec 20
No ratings yet
Lec 20
16 pages
Lec 12
No ratings yet
Lec 12
9 pages
Experiment No 1 Analysis of Control System Parameters
No ratings yet
Experiment No 1 Analysis of Control System Parameters
6 pages
Finite Element Analysis Basics
No ratings yet
Finite Element Analysis Basics
5 pages
Class-Ix Mathematics Assignment-2 - 2 Polynomials
0% (1)
Class-Ix Mathematics Assignment-2 - 2 Polynomials
2 pages
SAT Algebra - Results
No ratings yet
SAT Algebra - Results
68 pages
สูตรตรีโกณมิติที่สำคัญ
No ratings yet
สูตรตรีโกณมิติที่สำคัญ
1 page
Monday 24 May 2021: Further Pure Mathematics
No ratings yet
Monday 24 May 2021: Further Pure Mathematics
6 pages
Excel Function Reference Guide
No ratings yet
Excel Function Reference Guide
18 pages
Engineers' Guide to SBFEM Enhancements
No ratings yet
Engineers' Guide to SBFEM Enhancements
11 pages
Whittaker E.T. - On An Expression of The Electromagnetic Field Due To Electrons by Means of Two Scalar Potential Functions
No ratings yet
Whittaker E.T. - On An Expression of The Electromagnetic Field Due To Electrons by Means of Two Scalar Potential Functions
6 pages
10th Real Numbers Test Paper-2by Jsunil
No ratings yet
10th Real Numbers Test Paper-2by Jsunil
3 pages
Weinstein Manifolds Overview
No ratings yet
Weinstein Manifolds Overview
32 pages
16 9
100% (1)
16 9
19 pages
Mths112 Reader 2023 e
No ratings yet
Mths112 Reader 2023 e
134 pages
3.7 Optimization Problems
No ratings yet
3.7 Optimization Problems
20 pages
Applications of Laplace Transformation in Engineering Field
No ratings yet
Applications of Laplace Transformation in Engineering Field
3 pages
Trigno Metry Fundas
No ratings yet
Trigno Metry Fundas
17 pages
Econ Math: Exponential & Logarithms
No ratings yet
Econ Math: Exponential & Logarithms
15 pages
Matrix Algebra: Operations & Applications
No ratings yet
Matrix Algebra: Operations & Applications
89 pages
Mcam 21101
No ratings yet
Mcam 21101
2 pages
Continuous-Time Signals: David W. Graham EE 327
No ratings yet
Continuous-Time Signals: David W. Graham EE 327
18 pages
2 Elements
No ratings yet
2 Elements
8 pages
Numerical Methods for Engineers
No ratings yet
Numerical Methods for Engineers
9 pages
Introduction To Tensors: Contravariant and Covariant Vectors
No ratings yet
Introduction To Tensors: Contravariant and Covariant Vectors
18 pages
Mathematics 317 Solutions To Ass2
No ratings yet
Mathematics 317 Solutions To Ass2
2 pages
Multiple Integrals
No ratings yet
Multiple Integrals
25 pages
Linear Algebra For Business Analytics
No ratings yet
Linear Algebra For Business Analytics
27 pages
MATH 102 Integration 2 Lecture
No ratings yet
MATH 102 Integration 2 Lecture
24 pages
Cengage Learning Exponents Lesson 3 Activities
No ratings yet
Cengage Learning Exponents Lesson 3 Activities
10 pages
CAT Algebra Questions
No ratings yet
CAT Algebra Questions
6 pages

SVM 2

Uploaded by

SVM 2

Uploaded by

NPTEL

NPTEL ONLINE CERIFICATION COURSE

Introduction to Machine Learning

Prof. Balaraman Ravindran

Support Vector Machines II

So this is the optimization problem which is actually a simple optimization problem is a

(Refer Slide Time: 01:46)

IIT Madras Production

You might also like