0% found this document useful (0 votes)

208 views45 pages

Machine Learning

This document provides an overview of instance-based learning and support vector machines (SVMs). It discusses 1-nearest neighbor classification and how k-nearest neighbor addresses some of its limitations. It then introduces SVMs, explaining how they find the optimal separating hyperplane that maximizes the margin between classes. The document outlines how SVMs can handle non-linearly separable data using kernels or slack variables to penalize misclassifications. Finally, it derives the dual formulation of SVMs, showing how solving the dual quadratic program is more efficient than the primal formulation.

Uploaded by

Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

208 views45 pages

Machine Learning

Uploaded by

Abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

CSE

575: Stascal Machine Learning

Jingrui He
CIDSE, ASU

Instance-based Learning

1-Nearest Neighbor
Four things make a memory based learner:
1. A distance metric

Euclidian (and many more)
2. How many nearby neighbors to look at?
One
1. A weigh:ng func:on (op:onal)

Unused

2. How to t with the local points?

Just predict the same output as the nearest neighbor.

Consistency of 1-NN
Consider an es*mator fn trained on n examples
e.g., 1-NN, regression, ...

Es*mator is consistent if true error goes to zero as amount of

data increases
e.g., for no noise data, consistent if:

Regression is not consistent!

Representa*on bias

1-NN is consistent (under some mild neprint)

What about variance???

1-NN overts?

k-Nearest Neighbor
Four things make a memory based learner:
1. A distance metric

Euclidian (and many more)
2. How many nearby neighbors to look at?
k
1. A weigh:ng func:on (op:onal)

Unused
2.

How to t with the local points?

Just predict the average output among the k nearest neighbors.

k-Nearest Neighbor (here k=9)

K-nearest neighbor for funcFon Hng smooth away noise, but there are clear
deciencies.
What can we do about all the discon*nui*es that k-NN gives us?

Curse of dimensionality for

instance-based learning
Must store and retrieve all data!
Most real work done during tes*ng
For every test sample, must search through all dataset very slow!
There are fast methods for dealing with large datasets, e.g., tree-based
methods, hashing methods,

Instance-based learning o^en poor with noisy or irrelevant features

Support Vector Machines

Linear classiers Which line is beber?

Data:

Example i:

w.x = j w(j) x(j)

w.x + b

= 0

Pick the one with the largest margin!

w.x = j w(j) x(j)

w.x + b

= 0

Maximize the margin

w.x + b

= 0

But there are a many planes

w.x + b

= 0

Review: Normal to a plane

margin 2

= -1
w.x + b

= 0
w.x + b

w.x + b

= +1

Normalized margin Canonical

hyperplanes

margin 2

= -1
w.x + b

= 0
w.x + b

w.x + b

= +1

Normalized margin Canonical

hyperplanes

w.x + b

= 0

= +1

= -1

w.x + b

Margin maximiza*on using

canonical hyperplanes

margin 2

= -1
w.x + b

= 0
w.x + b

w.x + b

= +1

Support vector machines (SVMs)

Solve eciently by quadra*c

programming (QP)
Well-studied solu*on algorithms

Hyperplane dened by support vectors

margin 2

What if the data is not linearly

separable?
Use features of features
of features of features.

What if the data is s*ll not linearly

separable?
Minimize w.w and number of training
mistakes
Tradeo two criteria?

Tradeo #(mistakes) and w.w

0/1 loss
Slack penalty C
Not QP anymore
Also doesnt dis*nguish near misses
and really bad mistakes

Slack variables Hinge loss

If margin 1, dont care

If margin < 1, pay linear penalty
21

Side note: Whats the dierence

between SVMs and logis*c regression?
SVM:

LogisFc regression:
Log loss:

Constrained op*miza*on

Lagrange mul*pliers Dual variables

Moving the constraint to objecFve funcFon
Lagrangian:

Solve:

Lagrange mul*pliers Dual variables

Solving:

Dual SVM deriva*on (1)

the linearly separable case

Dual SVM deriva*on (2)

the linearly separable case

w.x + b

= 0

Dual SVM interpreta*on

Dual SVM formula*on

the linearly separable case

Dual SVM deriva*on

the non-separable case

Dual SVM formula*on

the non-separable case

Why did we learn about the dual SVM?

There are some quadra*c programming
algorithms that can solve the dual faster than
the primal
But, more importantly, the kernel trick!!!
Another lible detour

Reminder from last *me: What if the

data is not linearly separable?
Use features of features
of features of features.

Feature space can get really 33 large really quickly!

number of monomial terms

Higher order polynomials

d=4

m input features
d degree of polynomial

d=3
d=2
number of input dimensions
34

grows fast!
d = 6, m = 100
about 1.6 billion terms

Dual formula*on only depends on

dot-products, not on w!

Dot-product of polynomials

Finally: the kernel trick!

Never represent features explicitly

Compute dot products in closed form

Constant-*me high-dimensional dot-

products for many classes of features
Very interes*ng theory Reproducing
Kernel Hilbert Spaces

Polynomial kernels
All monomials of degree d in O(d) opera*ons:

How about all monomials of degree up to d?

Solu*on 0:
Beber solu*on:

Common kernels
Polynomials of degree d

Polynomials of degree up to d
Gaussian kernels
Sigmoid

Overvng?
Huge feature space with kernels, what about
overvng???
Maximizing margin leads to sparse set of support
vectors
Some interes*ng theory says that SVMs search for
simple hypothesis with large margin
O^en robust to overvng

What about at classicaon me

For a new input x, if we need to represent
(x), we are in trouble!
Recall classier: sign(w.(x)+b)
Using kernels we are cool!

SVMs with kernels

Choose a set of features and kernel func*on
Solve dual problem to obtain support vectors
i
At classica*on *me, compute:

Classify as

Whats the dierence between

SVMs and Logis*c Regression?

Loss function

High dimensional
features with
kernels

SVMs

Logistic
Regression

Hinge loss

Log-loss

Yes!

Kernels in logis*c regression

Dene weights in terms of support vectors:

Derive simple gradient descent rule on i

Whats the dierence between SVMs

and Logis*c Regression? (Revisited)

Loss function
High dimensional
features with
kernels
Solution sparse
Semantics of
output

SVMs

Logistic
Regression

Hinge loss

Log-loss

Yes!

Often yes!

Almost always no!

Margin

Real probabilities

SVM
No ratings yet
SVM
21 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
SVM Guide for Data Scientists
No ratings yet
SVM Guide for Data Scientists
48 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines For Classification and Regression: Steve R. Gunn
No ratings yet
Support Vector Machines For Classification and Regression: Steve R. Gunn
66 pages
Extending Machine Learning Models
No ratings yet
Extending Machine Learning Models
64 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
SVMs: Classification & Regression Guide
No ratings yet
SVMs: Classification & Regression Guide
66 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
Rotating Machinery and Signal Processing.
No ratings yet
Rotating Machinery and Signal Processing.
142 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Traffic Accident Severity Prediction
No ratings yet
Traffic Accident Severity Prediction
9 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Intro to Support Vector Machines
No ratings yet
Intro to Support Vector Machines
25 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
10 SVM
No ratings yet
10 SVM
23 pages
Modeling, State of Charge Estimation, and Charging of Lithium-Ion Battery in Electric Vehicle: A Review
No ratings yet
Modeling, State of Charge Estimation, and Charging of Lithium-Ion Battery in Electric Vehicle: A Review
25 pages
This Is
No ratings yet
This Is
7 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
33 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Arshiya Week 11 Last Version of Stage 3
No ratings yet
Arshiya Week 11 Last Version of Stage 3
72 pages
Deep Learning Using SVM in Matlab
No ratings yet
Deep Learning Using SVM in Matlab
13 pages
Advances in Computing and Data Sciences
No ratings yet
Advances in Computing and Data Sciences
532 pages
Review of Related Literature
No ratings yet
Review of Related Literature
6 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
BTP Sixth Sem Report
No ratings yet
BTP Sixth Sem Report
31 pages
Thesis Report
No ratings yet
Thesis Report
54 pages
Challenges and Issues in Sentiment Analysis - A Comprehensive Survey
No ratings yet
Challenges and Issues in Sentiment Analysis - A Comprehensive Survey
18 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
P.31 ICAIBDA Paper Halaman 179-185
No ratings yet
P.31 ICAIBDA Paper Halaman 179-185
302 pages
08 Classification
No ratings yet
08 Classification
46 pages
Image-Based Product Recommendation System
No ratings yet
Image-Based Product Recommendation System
7 pages
A Review of Machine Learning Applications in Additive Manufacturing
No ratings yet
A Review of Machine Learning Applications in Additive Manufacturing
11 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Loan Approval Prediction2
No ratings yet
Loan Approval Prediction2
72 pages
Spam Email Detection
No ratings yet
Spam Email Detection
23 pages
Paper 6
No ratings yet
Paper 6
5 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Soni 2022 J. Phys. Conf. Ser. 2161 012065
No ratings yet
Soni 2022 J. Phys. Conf. Ser. 2161 012065
11 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Maher SEBAI Internship Presentation
No ratings yet
Maher SEBAI Internship Presentation
94 pages
Geochemical Anomalies
No ratings yet
Geochemical Anomalies
8 pages
Research Paper
No ratings yet
Research Paper
10 pages
Artificial Intelligence and Machine Learning (Theory Exam)
No ratings yet
Artificial Intelligence and Machine Learning (Theory Exam)
65 pages
Iot-Centric Data Protection Using Deep Learning Technique For Preserving Security and Privacy in Cloud
No ratings yet
Iot-Centric Data Protection Using Deep Learning Technique For Preserving Security and Privacy in Cloud
7 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Honey Adulteration Detection Methods
No ratings yet
Honey Adulteration Detection Methods
7 pages
Topic Wise Dsa Questions
No ratings yet
Topic Wise Dsa Questions
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Fuzzy Logic-Based Control For Intelligent Vehicles: A Survey
No ratings yet
Fuzzy Logic-Based Control For Intelligent Vehicles: A Survey
18 pages
Introduction To Machine Learning - Unit 7 - Week 4
No ratings yet
Introduction To Machine Learning - Unit 7 - Week 4
4 pages
ML Experiments
No ratings yet
ML Experiments
22 pages
MCA2Syllabus2024 25
No ratings yet
MCA2Syllabus2024 25
24 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Main Project
No ratings yet
Main Project
6 pages
Ai 05 00143
No ratings yet
Ai 05 00143
17 pages
ML 18-20 SVM
No ratings yet
ML 18-20 SVM
44 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

CSE

575: Sta*s*cal Machine Learning

2. How to t with the local points?

Es*mator is consistent if true error goes to zero as amount of

Regression is not consistent!

1-NN is consistent (under some mild neprint)

What about variance???

How to t with the local points?

k-Nearest Neighbor (here k=9)

Curse of dimensionality for

Instance-based learning o^en poor with noisy or irrelevant features

Support Vector Machines

Linear classiers Which line is beber?

w.x = j w(j) x(j)

Pick the one with the largest margin!

w.x = j w(j) x(j)

Maximize the margin

But there are a many planes

Review: Normal to a plane

Normalized margin Canonical

Normalized margin Canonical

Margin maximiza*on using

Support vector machines (SVMs)

Solve eciently by quadra*c

Hyperplane dened by support vectors

What if the data is not linearly

What if the data is s*ll not linearly

Tradeo #(mistakes) and w.w

Slack variables Hinge loss

If margin 1, dont care

Side note: Whats the dierence

Lagrange mul*pliers Dual variables

Lagrange mul*pliers Dual variables

Dual SVM deriva*on (1)

Dual SVM deriva*on (2)

Dual SVM interpreta*on

Dual SVM formula*on

Dual SVM deriva*on

Dual SVM formula*on

Why did we learn about the dual SVM?

Reminder from last *me: What if the

Feature space can get really 33 large really quickly!

number of monomial terms

Higher order polynomials

Dual formula*on only depends on

Finally: the kernel trick!

Never represent features explicitly

Constant-*me high-dimensional dot-

How about all monomials of degree up to d?

What about at classica*on *me

SVMs with kernels

Whats the dierence between

Kernels in logis*c regression

Dene weights in terms of support vectors:

Derive simple gradient descent rule on i

Whats the dierence between SVMs

Almost always no!

You might also like

575: Stascal Machine Learning

What about at classicaon me