0% found this document useful (0 votes)

77 views16 pages

Introduction To Kernels: Max Welling

The document introduces kernels and kernel methods. It discusses: 1) How kernel methods allow applying linear algorithms to non-linear problems by mapping data to high-dimensional feature spaces. 2) The "kernel trick" which allows computing similarities between points in feature space using kernel functions without needing to explicitly compute the mapping. 3) How positive semi-definite kernel functions correspond to an inner product in some feature space. 4) How kernel methods consist of a kernel choice and learning algorithm, allowing different combinations.

Uploaded by

Kamesh Reddi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views16 pages

Introduction To Kernels: Max Welling

Uploaded by

Kamesh Reddi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 16

(chapters 1,2,3,4)

Introduction to Kernels

Max Welling
October 1 2004

1
Introduction
• What is the goal of (pick your favorite name):
- Machine Learning
- Data Mining
- Pattern Recognition
- Data Analysis
- Statistics

Automatic detection of non-coincidental structure in data.

• Desiderata:
- Robust algorithms insensitive to outliers and wrong
model assumptions.
- Stable algorithms: generalize well to unseen data.
- Computationally efficient algorithms: large datasets.
2
Let’s Learn Something
Find the common characteristic (structure) among the following
statistical methods?

1. Principal Components Analysis

2. Ridge regression
3. Fisher discriminant analysis
4. Canonical correlation analysis

Answer:
We consider linear combinations of input vector: f ( x )  wT x

Linear algorithm are very well understood and enjoy strong guarantees.
(convexity, generalization bounds).
3
Can we carry these guarantees over to non-linear algorithms?
Feature Spaces

 : x   ( x), R  F d

non-linear mapping to F 
1. high-D space L2
2. infinite-D countable space :
3. function space (Hilbert space)

example: ( x, y )  ( x , y , 2 xy )
2 2

4
Ridge Regression (duality)

problem: min w  ( yi  wT xi ) 2   || w ||2
i 1

input regularization
target

solution: w  ( X T X   I d ) 1 X T y dxd inverse

 X T ( XX T   I  ) 1 y   inverse
 X T (G   I  ) 1 y Gij  xi , x j 

  xi i Gram-matrix
i 1

linear comb. data Dual Representation 5

Kernel Trick
Note: In the dual representation we used the Gram matrix
to express the solution.

Kernel Trick: kernel

Replace : x   ( x),
Gij  xi , x j  Gij   ( xi ),  ( x j )  K ( xi , x j )

If we use algorithms that only depend on the Gram-matrix, G,

then we never have to know (compute) the actual features 

This is the crucial point of kernel methods

6
Modularity

Kernel methods consist of two modules:

1) The choice of kernel (this is non-trivial)

2) The algorithm which takes kernels as input

Modularity: Any kernel can be used with any kernel-algorithm.

some kernels: some kernel algorithms:
2
- support vector machine
k ( x, y )  e( || x  y|| / c)
- Fisher discriminant analysis
k ( x, y )  ( x, y   ) d - kernel regression
k ( x, y )  tanh(  x, y   )
- kernel PCA
1
k ( x, y )  - kernel CCA 7
|| x  y || c 2 2
What is a proper kernel
Definition: A finitely positive semi-definite function k : x  y  R
is a symmetric function of its arguments for which matrices formed
by restriction on any finite subset of points is positive semi-definite.
 T K  0 
Theorem: A function k : x  y  R can be written
as k ( x, y )   ( x), ( y )  where  ( x) is a feature map
x   ( x)  F iff k(x,y) satisfies the semi-definiteness property.

Relevance: We can now check if k(x,y) is a proper kernel using

only properties of k(x,y) itself,
i.e. without the need to know the feature map! 8
Reproducing Kernel Hilbert Spaces
The proof of the above theorem proceeds by constructing a very
special feature map (note that more feature maps may give rise to a kernel)

 : x   ( x)  k ( x,.) i.e. we map to a function space.

definition function space: reproducing property:

m
f (.)    i k ( xi ,.) any m,{xi }  f ,  ( x)  f , k ( x,.) 
i 1
k
   i k ( xi ,.), k ( x,.) 
m 
 f , g    i  j k ( xi , x j )
i 1 j 1 i 1
k

  k ( x , x)  f ( x)
m 
 f , f    i j k ( xi , x j )  0 i i
i 1 j 1 i 1

( finite positive semi  definite)    ( x),  ( y )  k ( x, y ) 9

Mercer’s Theorem
Theorem: X is compact, k(x,y) is symmetric continuous function s.t.
Tk f   k (., x ) f ( x ) dx is a positive semi-definite operator: Tk  0
i.e.
  k ( x, y) f ( x) f ( y) dxdy  0 f  L2 ( X )
then there exists an orthonormal feature basis of eigen-functions
such that:

k ( x, y )    i ( x ) j ( y )
i 1

Hence: k(x,y) is a proper kernel.

Note: Here we construct feature vectors in L2, where the RKHS
construction was in a function space. 10
Learning Kernels
• All information is tunneled through the Gram-matrix information
bottleneck.
• The real art is to pick an appropriate kernel.
2
e.g. take the RBF kernel: k ( x, y )  e( || x  y|| / c )

if c is very small: G=I (all data are dissimilar): over-fitting

if c is very large: G=1 (all data are very similar): under-fitting

We need to learn the kernel. Here is some ways to combine

kernels to improve them:
 k1 ( x, y )   k2 ( x, y )  k ( x, y )  ,   0 k1 cone
k ( x, y ) k ( x , y )  k ( x, y ) k2
1 2
any positive
k1 (( x), ( y ))  k ( x, y ) polynomial
11
Stability of Kernel Algorithms
Our objective for learning is to improve generalize performance:
cross-validation, Bayesian methods, generalization bounds,...

Call ES [ f ( x)]  0 a pattern a sample S.

Is this pattern also likely to be present in new data: EP [ f ( x)]  0 ?
We can use concentration inequalities (McDiamid’s theorem)
to prove that:

Theorem: Let S  {x1 ,..., x} be a IID sample from P and define
the sample mean of f(x) as: f 1  f ( xi ) then it follows that:


 i 1

R 1 R  sup x || f ( x) ||
P(|| f  EP [ f ] || (2  2 ln( ))  1  
 
12
(prob. that sample mean and population mean differ less than is more than ,independent of P!
Rademacher Complexity
Prolem: we only checked the generalization performance for a
single fixed pattern f(x).
What is we want to search over a function class F?

Intuition: we need to incorporate the complexity of this function class.

Rademacher complexity captures the ability of the function class to

fit random noise. ( i  1 uniform distributed)  i  1
(empirical RC)
f1
 2  f2
R ( F )  E [sup |   i f ( xi ) |,| x1 ,..., x ]
f F  i 1

2 
R ( F )  ES E [sup |   i f ( xi ) |]
f F  i 1 xi 13
Generalization Bound
Theorem: Let f be a function in F which maps to [0,1]. (e.g. loss functions)
Then, with probability at least 1   over random draws of size 
every f satisfies:
2
ln( )
E p [ f ( x)]  Edata [ f ( x)]  R ( F )  
2
2
 ln( )
 Edata [ f ( x)]  R ( F )  3 
2
Relevance: The expected pattern E[f]=0 will also be present in a new
data set, if the last 2 terms are small:
- Complexity function class F small
- number of training data large 14
Linear Functions (in feature space)
Consider the FB  { f : x  w,  ( x)  , || w || B}
function class: with k ( x, y )  ( x),  ( y ) 

and a sample: S  {x1 ,..., x}

Then, the empirical  2B

R ( FB )  tr ( K )
RC of FB is bounded by: 

Relevance: Since: {x    i k ( xi , x) ,  T K  B}  FB it follows that
if we control the norm i 1 T K || w ||2 in kernel algorithms, we control
the complexity of the function class (regularization). 15
Margin Bound (classification)
Theorem: Choose c>0 (the margin).
F : f(x,y)=-yg(x), y=+1,-1
S: {( x1 , y1 ),..., ( x , y )} IID sample
 : (0,1) : probability of violating bound.
2
ln( )
1 
4 
Pp [ y  sign( g ( x ))]   i  tr ( K )  3
c i 1 c 2
(prob. of misclassification)
i  (c  yi g ( xi ))  ( slack variable)
( f )  f if f  0 and 0 otherwise

Relevance: We our classification error on new samples. Moreover, we have a

strategy to improve generalization: choose the margin c as large possible such
that all samples are correctly classified: i  0 (e.g. support vector machines).
16

Kernel Methods For Pattern Analysis
100% (3)
Kernel Methods For Pattern Analysis
478 pages
Astm D7234-12 (Adhesion Strength of Coatings On Concrete)
No ratings yet
Astm D7234-12 (Adhesion Strength of Coatings On Concrete)
9 pages
Foundations of Machine
No ratings yet
Foundations of Machine
120 pages
Vahid
No ratings yet
Vahid
18 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
Machine Learning: Kernel Methods
No ratings yet
Machine Learning: Kernel Methods
6 pages
Machine Learning Notes 1
No ratings yet
Machine Learning Notes 1
120 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
SVM Kernel Functions
No ratings yet
SVM Kernel Functions
12 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
Lecture4 introToRKHS
No ratings yet
Lecture4 introToRKHS
33 pages
Bored Cast-In Situ Piles
100% (1)
Bored Cast-In Situ Piles
7 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
Financial Market Volatility Forecasting
No ratings yet
Financial Market Volatility Forecasting
52 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Ds 11
No ratings yet
Ds 11
21 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Cours2 ML
No ratings yet
Cours2 ML
21 pages
Extending Machine Learning Models
No ratings yet
Extending Machine Learning Models
64 pages
Kernel Models for Data Scientists
No ratings yet
Kernel Models for Data Scientists
56 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Intro&NP Stat
No ratings yet
Intro&NP Stat
122 pages
Lecture 8 - Kernels
No ratings yet
Lecture 8 - Kernels
32 pages
Master The PRINCE2 Themes With Pictures
100% (2)
Master The PRINCE2 Themes With Pictures
11 pages
Machine Learning - The Science of Selection Under Uncertainty
No ratings yet
Machine Learning - The Science of Selection Under Uncertainty
85 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
GDS Cycle V SOP
No ratings yet
GDS Cycle V SOP
5 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
05 Kernel
No ratings yet
05 Kernel
24 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Lecture17 Kernels
No ratings yet
Lecture17 Kernels
23 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
Kernal and Multiclass
No ratings yet
Kernal and Multiclass
51 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
Kernel Methods
No ratings yet
Kernel Methods
19 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
25 pages
תרגול - SVM 1
No ratings yet
תרגול - SVM 1
32 pages
SCH Smo 03 C
No ratings yet
SCH Smo 03 C
24 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
Kernel Methods for Statisticians
No ratings yet
Kernel Methods for Statisticians
53 pages
Data Mining & Analysis Guide
No ratings yet
Data Mining & Analysis Guide
148 pages
Kernel Trick
No ratings yet
Kernel Trick
40 pages
Yirun Fu Reaserch Paper
No ratings yet
Yirun Fu Reaserch Paper
42 pages
Information Theory With Kernel Methods: Francis Bach Inria, Ecole Normale Sup Erieure PSL Research University
No ratings yet
Information Theory With Kernel Methods: Francis Bach Inria, Ecole Normale Sup Erieure PSL Research University
47 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
No ratings yet
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
644 pages
Machine Learning Foundations
No ratings yet
Machine Learning Foundations
119 pages
cs229 Notes3
No ratings yet
cs229 Notes3
30 pages
Matlab For Optimization PDF
No ratings yet
Matlab For Optimization PDF
49 pages
06 Convergence Algorithm and Diagnostics-Libre
No ratings yet
06 Convergence Algorithm and Diagnostics-Libre
25 pages
Conceptual Framework
No ratings yet
Conceptual Framework
12 pages
Dividend Policy Veddanta
No ratings yet
Dividend Policy Veddanta
14 pages
Note KT 1
No ratings yet
Note KT 1
5 pages
Statistical Learning Theory Notes
No ratings yet
Statistical Learning Theory Notes
119 pages
Carrier BacnetSC Setup Guide
No ratings yet
Carrier BacnetSC Setup Guide
27 pages
PLC, Scada Training
100% (1)
PLC, Scada Training
47 pages
Camps-Valls, Martínez-Ramón, Rojo-Álvarez - 2009 - Kernal Methods
No ratings yet
Camps-Valls, Martínez-Ramón, Rojo-Álvarez - 2009 - Kernal Methods
5 pages
CS502 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS502 Mcqs MidTerm by Vu Topper RM
45 pages
PORT AND TERMINAL INFORMATION BOOK-Ver 3 1 - 18 12 13
No ratings yet
PORT AND TERMINAL INFORMATION BOOK-Ver 3 1 - 18 12 13
21 pages
The Architecture of Flex and Java Applications
No ratings yet
The Architecture of Flex and Java Applications
33 pages
Answers To Problems: N, N N N, (Iii) N N N
No ratings yet
Answers To Problems: N, N N N, (Iii) N N N
3 pages
Lab 3
No ratings yet
Lab 3
16 pages
Industrial Catalytic Reactors Guide
No ratings yet
Industrial Catalytic Reactors Guide
20 pages
More Kernels and Their Properties
No ratings yet
More Kernels and Their Properties
3 pages
HW 9 Bootstrap, Jackknife, and Permutation Tests
No ratings yet
HW 9 Bootstrap, Jackknife, and Permutation Tests
7 pages
Guide To Developing An Approved Culinology Degree Program - Updated 2017
No ratings yet
Guide To Developing An Approved Culinology Degree Program - Updated 2017
15 pages
Advanced Distillation Curve Approach
No ratings yet
Advanced Distillation Curve Approach
14 pages
Maintenance Task Record E Rating English
No ratings yet
Maintenance Task Record E Rating English
11 pages
Texas Scorecard Mail - Re - ARIF Results - Rolando Ortiz Redacted
No ratings yet
Texas Scorecard Mail - Re - ARIF Results - Rolando Ortiz Redacted
8 pages
TC74VHC240F, TC74VHC240FK TC74VHC244F, TC74VHC244FK
No ratings yet
TC74VHC240F, TC74VHC240FK TC74VHC244F, TC74VHC244FK
10 pages
StuffIt Expander Read Me
No ratings yet
StuffIt Expander Read Me
10 pages
Merritt V Government FT
No ratings yet
Merritt V Government FT
11 pages
Hydraulic Sealing Surface Insights
No ratings yet
Hydraulic Sealing Surface Insights
7 pages
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
No ratings yet
Strategic Change - 2022 - Joy - Digital Future of Luxury Brands Metaverse Digital Fashion and Non Fungible Tokens
7 pages
CONVERSN
No ratings yet
CONVERSN
2 pages
RONSAIRO
No ratings yet
RONSAIRO
3 pages
MunicipalBank E-Passbook13-05-2024 195315
No ratings yet
MunicipalBank E-Passbook13-05-2024 195315
3 pages
Conflict Style Self-Assessment
No ratings yet
Conflict Style Self-Assessment
2 pages
Cash Management: Guide To Trading Internationally
No ratings yet
Cash Management: Guide To Trading Internationally
4 pages
Management MCQ - Merged (1) - 1
No ratings yet
Management MCQ - Merged (1) - 1
1 page
Chapter 1 Governments and Individuals PDF
No ratings yet
Chapter 1 Governments and Individuals PDF
24 pages
Dabra, Gwalior
No ratings yet
Dabra, Gwalior
3 pages
Mentor Interview Questions
No ratings yet
Mentor Interview Questions
3 pages
Basic Conducting Online Lesson Plan 3 31
No ratings yet
Basic Conducting Online Lesson Plan 3 31
1 page