0% found this document useful (0 votes)

155 views51 pages

ML Kernel Methods

The document discusses kernel methods for machine learning. It introduces kernels as a way to efficiently compute inner products in high-dimensional feature spaces. Kernels must be positive definite symmetric to guarantee the existence of a feature mapping. Common kernels like polynomial and Gaussian kernels are presented. Kernel methods allow non-linear decision boundaries by mapping inputs into feature spaces. The reproducing kernel Hilbert space is constructed from a positive definite kernel.

Uploaded by

Atharva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views51 pages

ML Kernel Methods

Uploaded by

Atharva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Foundations of Machine Learning

Kernel Methods

Mehryar Mohri
Courant Institute and Google Research
[email protected]
Motivation
Efficient computation of inner products in high
dimension.
Non-linear decision boundary.
Non-vectorial inputs.
Flexible selection of more complex features.

Mehryar Mohri - Foundations of Machine Learning page 2

This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 3

Non-Linear Separation

Linear separation impossible in most problems.

Non-linear mapping from input space to high-
dimensional feature space: : X F .
Generalization ability: independent of dim(F ),
depends only on margin and sample size.
Mehryar Mohri - Foundations of Machine Learning page 4
Kernel Methods
Idea:
• Define K : X X R , called kernel, such that:
(x) · (y) = K(x, y).
• K often interpreted as a similarity measure.
Benefits:
• Efficiency: K is often more efficient to compute
than and the dot product.
• Flexibility: K can be chosen arbitrarily so long as
the existence of is guaranteed (PDS condition
or Mercer’s condition).
Mehryar Mohri - Foundations of Machine Learning page 5
PDS Condition
Definition: a kernel K: X X R is positive definite
symmetric (PDS) if for any {x1 , . . . , xm } X , the
matrix K = [K(xi , xj )]ij Rm m is symmetric
positive semi-definite (SPSD).
K SPSD if symmetric and one of the 2 equiv. cond.’s:

• its eigenvalues are non-negative. m

• for any c R , c Kc = c c K(x , x )

m 1

i,j=1
i j i j 0.

Terminology: PDS for kernels, SPSD for kernel

matrices (see (Berg et al., 1984)).
Mehryar Mohri - Foundations of Machine Learning page 6
Example - Polynomial Kernels
Definition:
x, y RN , K(x, y) = (x · y + c)d , c > 0.

Example: for N = 2 and d = 2 ,

K(x, y) = (x1 y1 + x2 y2 + c)2
x21 y12
x22 y22
2 x1 x2 2 y1 y2
= · .
2c x1 2c y1
2c x2 2c y2
c c
Mehryar Mohri - Foundations of Machine Learning page 7
XOR Problem
Use second-degree polynomial kernel with c = 1:
x2
√
2 x1 x2
(-1, 1)
√ √ √
(1, 1)
√ √ √
(1, 1, + 2, − 2, − 2, 1) (1, 1, + 2, + 2, + 2, 1)

√
x1 2 x1

(-1, -1) (1, -1) √ √ √

(1, 1, − 2, − 2, + 2, 1)
√ √ √
(1, 1, − 2, + 2, − 2, 1)

Linearly non-separable Linearly separable by

x1 x2 = 0.

Mehryar Mohri - Foundations of Machine Learning page 8

Normalized Kernels
Definition: the normalized kernel K associated to a
kernel K is defined by
0 if (K(x, x) = 0) (K(x , x ) = 0)
x, x X , K (x, x ) = K(x,x )
otherwise.
K(x,x)K(x ,x )

• If K is PDS, then K is PDS:

m m m 2
ci cj K(xi , xj ) ci cj (xi ), (xj ) ci (xi )
= = 0.
K(xi , xi )K(xj , xj ) i,j=1 (x )
i H (x )
j H (xi ) H
i,j=1 i=1 H

• By definition, for all x with K(x, x) = 0 ,

K (x, x) = 1.
Mehryar Mohri - Foundations of Machine Learning page 9
Other Standard PDS Kernels
Gaussian kernels:
||x y||2
K(x, y) = exp , = 0.
2 2

• Normalized kernel of (x, x ) exp x·x

2 .

Sigmoid Kernels:

K(x, y) = tanh(a(x · y) + b), a, b 0.

Mehryar Mohri - Foundations of Machine Learning page 10

Reproducing Kernel Hilbert Space
(Aronszajn, 1950)
Theorem: Let K: X X R be a PDS kernel. Then,
there exists a Hilbert space H and a mapping
from X to H such that
x, y X, K(x, y) = (x) · (y).

Proof: For any x X , define (x) : X RX as follows:

y X, (x)(y) = K(x, y).

• Let H = a (x ) : a R, x X, card(I) < .

0
i I
i i i i

• We are going to define an inner product ·, · on H . 0

Mehryar Mohri - Foundations of Machine Learning page 11

• Definition: for anyf = i I ai (xi ), g =
j J
bj (yj ),

f, g = ai bj K(xi , yj ) = bj f (yj ) = ai g(xi ).

i I,j J j J i I

• ·, · does not depend on representations of f and g.

• ·, · is bilinear and symmetric.

• ·, · is positive semi-definite since K is PDS: for any f,

f, f = ai aj K(xi , xj ) 0.
i,j I
• note: for any f , . . . , f m
1 m and c1 , . . . , cm ,
m m
ci cj f i , f j = ci f i , cj f j 0.
i,j=1 i=1 j=1

·, · is a PDS kernel on H0 .
Mehryar Mohri - Foundations of Machine Learning page 12
• ·, · is definite:

• first, Cauchy-Schwarz inequality for PDS kernels.

K(x,x) K(x,y)
If K is PDS,
M= is SPSD for all
x, y
K(y,x) K(y,y) X
In particular, the product of its eigenvalues, det(M)
is non-negative:
det(M) = K(x, x)K(y, y) K(x, y)2 0.

• since ·, · is a PDS kernel, for any f H0 and x X ,

f, (x) 2
f, f (x), (x) .
• observe the reproducingX
property of ·, · :
8f 2 H0 , 8x 2 X, f (x) = ai K(xi , x) = hf, (x)i.

•
i2I
Thus,[f (x)]2 f, f K(x, x) for all x X , which
shows the definiteness of ·, · .
Mehryar Mohri - Foundations of Machine Learning page 13
• Thus, ·, · defines an inner product on H , which0
thereby becomes a pre-Hilbert space.
• H can be completed to form a Hilbert space H in
0
which it is dense.
Notes:
• H is called the reproducing kernel Hilbert space
(RKHS) associated to K.
• A Hilbert space such that there exists : X H
with K(x, y) = (x)· (y) for all x, y X is also
called a feature space associated to K . is called
a feature mapping.
• Feature spaces associated to K are in general not
unique.
Mehryar Mohri - Foundations of Machine Learning page 14
This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 15

SVMs with PDS Kernels
(Boser, Guyon, and Vapnik, 1992)
Constrained optimization:
(xi )· (xj )
m m
1
max i i j yi yj K(xi , xj )
i=1
2 i,j=1
m
subject to: 0 i C i yi = 0, i [1, m].
i=1
Solution:
m
h(x) = sgn i yi K(xi , x) +b ,
m i=1
with b = yi j yj K(xj , xi ) for any xi with
j=1 0< i < C.
Mehryar Mohri - Foundations of Machine Learning page 16
Rad. Complexity of Kernel-Based Hypotheses

Theorem: Let K: X X R be a PDS kernel and

let : X ! H be a feature mapping associated to K.
Let S {x : K(x, x) R2 } be a sample of size m , and
let H = {x 7! w· (x) : kwkH  ⇤}. Then,
Tr[K] R2 2
RS (H) .
m m
m m
1
Proof: RS (H) =
m
E sup w · i (xi )
m
E i (xi )
w i=1 i=1
m 2 1/2 m 1/2
(Jensen’s ineq.) E i (xi ) E (xi ) 2
m i=1
m i=1
m 1/2
Tr[K] R2 2
= E K(xi , xi ) = .
m i=1
m m
Mehryar Mohri - Foundations of Machine Learning page 17
Generalization: Representer Theorem
(Kimeldorf and Wahba, 1971)
Theorem: Let K: X X R be a PDS kernel with H
the corresponding RKHS. Then, for any non-
decreasing function G: R R and any L: Rm R {+ }
problem
argmin F (h) = argmin G( h H) + L h(x1 ), . . . , h(xm )
h H h H
m
admits a solution of the form h = i K(xi , ·).
i=1
If G is further assumed to be increasing, then any
solution has this form.

Mehryar Mohri - Foundations of Machine Learning page 18

• Proof: let H = span({K(x , ·):h =i h [1,+m]})
1 . Any h
i H
admits the decomposition h 1 according
to H = H1 H1 .
• Since G is non-decreasing,
H) + h
= G( h H ).
G( h1 G h1 2 2
H H

• By the reproducing property, for all i [1, m],

h(xi ) = h, K(xi , ·) = h1 , K(xi , ·) = h1 (xi ).
• Thus, L h(x ), . . . , h(x 1 m) = L h1 (x1 ), . . . , h1 (xm )
and F (h1 ) F (h).

• If G is increasing, then F (h ) < F (h) when h 1 =0

and any solution of the optimization problem
must be in H1.

Mehryar Mohri - Foundations of Machine Learning page 19

Kernel-Based Algorithms
PDS kernels used to extend a variety of algorithms
in classification and other areas:
• regression.
• ranking.
• dimensionality reduction.
• clustering.
But, how do we define PDS kernels?

Mehryar Mohri - Foundations of Machine Learning page 20

This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 21

Closure Properties of PDS Kernels
Theorem: Positive definite symmetric (PDS)
kernels are closed under:
• sum,
• product,
• tensor product,
• pointwise limit,
• composition with a power series with non-
negative coefficients.

Mehryar Mohri - Foundations of Machine Learning page 22

Closure Properties - Proof
Proof: closure under sum:
c Kc 0 c Kc 0 c (K + K )c 0.

• closure under product: K✓ = MM ,

◆
m
X m
X hX
m i
ci cj (Kij K0ij ) = ci cj Mik Mjk K0ij
i,j=1 i,j=1 k=1
Xm  m
X
= ci cj Mik Mjk K0ij
k=1 i,j=1
2 3> 2 3
m
X c1 M1k c1 M1k
= 4 · · · 5 K0 4 · · · 5 0.
k=1 cm Mmk cm Mmk

Mehryar Mohri - Foundations of Machine Learning page 23

• Closure under tensor product:
• definition: for all x , x , y , y 1 2 1 2 X,
(K1 K2 )(x1 , y1 , x2 , y2 ) = K1 (x1 , x2 )K2 (y1 , y2 ).

• thus, PDS kernel as product of the kernels

(x1 , y1 , x2 , y2 ) K1 (x1 , x2 ) (x1 , y1 , x2 , y2 ) K2 (y1 , y2 ).

• Closure under pointwise limit: if for all x, y X,

lim Kn (x, y) = K(x, y),
n

Then, ( n, c Kn c 0) lim c Kn c = c Kc 0.
n

Mehryar Mohri - Foundations of Machine Learning page 24

• Closure under composition with power series:
• assumptions: Kf (x)
PDS kernel with|K(x, y)| < for
all x, y X and
= a x ,a
n=0 n 0
n
n power
series with radius of convergence .
• f K is a PDS kernel since K n is PDS by closure
N
under product, n=0 an K n is PDS by closure
under sum, and closure under pointwise limit.
Example: for any PDS kernel K, exp(K) is PDS.

Mehryar Mohri - Foundations of Machine Learning page 25

This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 26

Sequence Kernels
Definition: Kernels defined over pairs of strings.

• Motivation: computational biology, text and

speech classification.

• Idea: two sequences are related when they share

some common substrings or subsequences.

• Example: bigramXkernel;
K(x, y) = countx (u) ⇥ county (u).
bigram u

Mehryar Mohri - Foundations of Machine Learning page 27

Weighted Transducers
b:a/0.6
b:a/0.2
a:b/0.1 1 a:a/0.4 b:a/0.3
2 3/0.1
0 a:b/0.5

T (x, y) = Sum of the weights of all accepting

paths with input x and output y .
T (abb, baa) = .1 .2 .3 .1 + .5 .3 .6 .1

Mehryar Mohri - Foundations of Machine Learning page 28

Rational Kernels over Strings
(Cortes et al., 2004)
Definition: a kernel K : R is rational if K = T
for some weighted transducer T .

Definition: let T1 : R and T2 : R be

two weighted transducers. Then, the composition
of T1 and T2 is defined for all x ,y by
(T1 T2 )(x, y) = T1 (x, z) T2 (z, y).
z

Definition: the inverse of a transducer T : R

is the transducer T 1 : R obtained from T
by swapping input and output labels.
Mehryar Mohri - Foundations of Machine Learning page 29
PDS Rational Kernels
General Construction
Theorem: for any weighted transducer T : R,
the function K = T T 1 is a PDS rational kernel.
Proof: by definition, for all x, y ,
K(x, y) = T (x, z) T (y, z).
z

• K is pointwise limit of (K ) n n 0 defined by

x, y , Kn (x, y) = T (x, z) T (y, z).
|z| n
•K n is PDS since for any sample (x1 , . . . , xm ),
Kn = AA with A = (Kn (xi , zj )) i [1,m] .
j [1,N ]

Mehryar Mohri - Foundations of Machine Learning page 30

PDS Sequence Kernels
PDS sequences kernels in computational biology,
text classification, other applications:
• special instances of PDS rational kernels.
• PDS rational kernels easy to define and modify.
• single general algorithm for their computation:
composition + shortest-distance computation.
• no need for a specific ‘dynamic-programming’
algorithm and proof for each kernel instance.
• general sub-family: based on counting
transducers.
Mehryar Mohri - Foundations of Machine Learning page 31
Counting Transducers
b:ε/1
b:ε/1
a:ε/1
X = ab
a:ε/1 Z = bbabaabba
X:X/1
0 1/1
εεabεεεεε εεεεεabεε
TX

X may be a string or an automaton

representing a regular expression.
Counts of Z in X : sum of the weights of
accepting paths of Z TX .

Mehryar Mohri - Foundations of Machine Learning page 32

Transducer Counting Bigrams
b:ε/1 b:ε/1
a:ε/1 a:ε/1

0 a:a/1 1 a:a/1 2/1

b:b/1 b:b/1
Tbigram

Counts of Z given by Z Tbigram ab .

Mehryar Mohri - Foundations of Machine Learning page 33

Transducer Counting Gappy Bigrams

b:ε/1 b:ε/λ b:ε/1

a:ε/1 a:ε/λ a:ε/1

0 a:a/1 1 a:a/1 2/1

b:b/1 b:b/1
Tgappy bigram

Counts of Z given by Z Tgappy bigram ab ,

gap penalty (0, 1) .

Mehryar Mohri - Foundations of Machine Learning page 34

Composition
Theorem: the composition of two weighted
transducer is also a weighted transducer.
Proof: constructive proof based on composition
algorithm.
• states identified with pairs.
• -free case: transitions defined by
E= (q1 , q1 ), a, c, w1 w2 , (q2 , q2 ) .
(q1 ,a,b,w1 ,q2 ) E1
(q1 ,b,c,w2 ,q2 ) E2

• general case: use of intermediate -filter.

Mehryar Mohri - Foundations of Machine Learning page 35
Composition Algorithm
ε-Free Case
a:a/0.6
b:a/0.2
b:b/0.3 2 a:b/0.5 a:b/0.3 2 b:a/0.5
a:b/0.1 b:b/0.1
0 1 b:b/0.4 3/0.7 0 1 3/0.6
a:b/0.4
a:b/0.2

a:a/.04 (0, 1)

a:a/.02 (3, 2)
a:b/.18
a:b/.01 b:a/.06 a:a/0.1
(0, 0) (1, 1) (2, 1) (3, 1)
a:b/.24

b:a/.08 (3, 3)/.42

Complexity: O(|T1 | |T2 |) in general, linear in some cases.

Mehryar Mohri - Foundations of Machine Learning page 36
(c)
!:!1 !:!1 !:!1 !:!1 !:!1

A' 0
Redundant ε-Paths Problem
a:a
1
b:!2
2
c:!2
3
d:d
4

(d)
!2:! !2:! !2:! (MM,!2Pereira,
:! and Riley, 1996; Pereira and Riley, 1997)
T1 0 a:aa:d 1 b:ε !1:e2 c:ε d:a3 d:d 4 0 a:d 1 ε:e 2 d:a 3 T2
B' 0 1 2 3

ε:ε1 ε:ε1 ε:ε1 ε:ε1 ε:ε1 ε2:ε ε2:ε ε2:ε ε2:ε

T!1 0 a:a 1 b:ε2 2 c:ε2 3 d:d 4 0 a:d 1 ε1: e 2 d:a 3 T!2

a:d !:e
0,0 1,1 1,2 ε1:ε1
(x:x) (!1:!1)
b:! b:e b:! ε2:ε1 ε1:ε1
(!2:!2) (!2:!1) (!2:!2) x:x 1
!:e x:x
2,1 (!1:!1)
2,2
0 F
c:! c:!
ε2:ε2 ε2:ε2
(!2:!2) (!2:!2)
x:x 2
3,1 !:e 3,2
(!1:!1)
d:a
(x:x)

4,3
T = T!1 ◦ F ◦ T!2 .
Mehryar Mohri - Foundations of Machine Learning page 37
Kernels for Other Discrete Structures
Similarly, PDS kernels can be defined on other
discrete structures:

• Images,
• graphs,
• parse trees,
• automata,
• weighted automata.
Mehryar Mohri - Foundations of Machine Learning page 38
This Lecture
Kernels
Kernel-based algorithms
Closure properties
Sequence Kernels
Negative kernels

Mehryar Mohri - Foundations of Machine Learning page 39

Questions
Gaussian kernels have the form exp( d2 ) where d is
a metric.
• for what other functions d does exp( d2 ) define a
PDS kernel?
• what other PDS kernels can we construct from a
metric in a Hilbert space?

Mehryar Mohri - Foundations of Machine Learning page 40

Negative Definite Kernels
(Schoenberg, 1938)
Definition: A function K: X X R is said to be a
negative definite symmetric (NDS) kernel if it is
symmetric and if for all 1
{x , . . . , xm } X and c R m 1

with 1 c = 0 ,
c Kc 0.

Clearly, if K is PDS, then K is NDS, but the

converse does not hold in general.

Mehryar Mohri - Foundations of Machine Learning page 41

Examples
The squared distance ||x y||2 in a Hilbert space H
m
defines an NDS kernel. If i=1 ci = 0 ,
m m
ci cj ||xi xj ||2 = ci cj (xi xj ) · (xi xj )
i,j=1 i,j=1
m
= ci cj ( xi 2
+ xj 2
2xi · xj )
i,j=1
m m m
= ci cj ( xi 2
+ xj 2
) 2 ci xi · cj xj
i,j=1 i=1 j=1
m
ci cj ( xi 2
+ xj 2
)
i,j=1
m m m m
= cj ci ( xi 2
+ ci cj xj 2
= 0.
j=1 i=1 i=1 j=1

Mehryar Mohri - Foundations of Machine Learning page 42

NDS Kernels - Property
(Schoenberg, 1938)
Theorem: Let K: X X R be an NDS kernel such
that for all x, y X, K(x, y) = 0 iff x = y . Then, there
exists a Hilbert space H and a mapping : X H
such that
∀x, y ∈ X, K(x, y) = ∥Φ(x) − Φ(y)∥2 .

Thus, under the hypothesis of the theorem, K

√

defines a metric.

Mehryar Mohri - Foundations of Machine Learning page 43

PDS and NDS Kernels
(Schoenberg, 1938)
Theorem: let K: X X R be a symmetric kernel,
then:
• K is NDS iff exp( tK) is a PDS kernel for all t > 0 .
• Let K be defined for any x0 by
K (x, y) = K(x, x0 ) + K(y, x0 ) K(x, y) K(x0 , x0 )
for all x, y X. Then, K is NDS iff K is PDS.

Mehryar Mohri - Foundations of Machine Learning page 44

Example
The kernel defined by K(x, y) = exp( t||x y||2 )
is PDS for all t > 0 since ||x y||2 is NDS.
The kernel exp( |x y|p )is not PDS for p > 2 .
Otherwise, for any t > 0 ,{x1 , . . . , xm } X and c Rm 1

m m
t|xi xj |p |t1/p xi t1/p xj |p
ci cj e = ci cj e 0.
i,j=1 i,j=1

This would imply that |x y|p is NDS for p > 2, but

that cannot be (see past homework assignments).

Mehryar Mohri - Foundations of Machine Learning page 45

Conclusion
PDS kernels:
• rich mathematical theory and foundation.
• general idea for extending many linear
algorithms to non-linear prediction.
• flexible method: any PDS kernel can be used.
• widely used in modern algorithms and
applications.
• can we further learn a PDS kernel and a
hypothesis based on that kernel from labeled
data? (see tutorial: http://www.cs.nyu.edu/~mohri/icml2011-
tutorial/).
Mehryar Mohri - Foundations of Machine Learning page 46
References
• N. Aronszajn, Theory of Reproducing Kernels, Trans. Amer. Math. Soc., 68, 337-404, 1950.

• Peter Bartlett and John Shawe-Taylor. Generalization performance of support vector

machines and other pattern classifiers. In Advances in kernel methods: support vector learning,
pages 43–54. MIT Press, Cambridge, MA, USA, 1999.

• Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic Analysis on
Semigroups. Springer-Verlag: Berlin-New York, 1984.

• Bernhard Boser, Isabelle M. Guyon, and Vladimir Vapnik. A training algorithm for optimal
margin classifiers. In proceedings of COLT 1992, pages 144-152, Pittsburgh, PA, 1992.

• Corinna Cortes, Patrick Haffner, and Mehryar Mohri. Rational Kernels: Theory and
Algorithms. Journal of Machine Learning Research (JMLR), 5:1035-1062, 2004.

• Corinna Cortes and Vladimir Vapnik, Support-Vector Networks, Machine Learning, 20,
1995.

• Kimeldorf, G. and Wahba, G. Some results on Tchebycheffian Spline Functions, J. Mathematical

Analysis and Applications, 33, 1 (1971) 82-95.

Mehryar Mohri - Foundations of Machine Learning page 47 Courant Institute, NYU

References
• James Mercer. Functions of Positive and Negative Type, and Their Connection with the
Theory of Integral Equations. In Proceedings of the Royal Society of London. Series A,
Containing Papers of a Mathematical and Physical Character,Vol. 83, No. 559, pp. 69-70, 1909.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and
Speech Processing, In Proceedings of the 12th biennial European Conference on Artificial
Intelligence (ECAI-96),Workshop on Extended finite state models of language. Budapest,
Hungary, 1996.

• Fernando C. N. Pereira and Michael D. Riley. Speech Recognition by Composition of

Weighted Finite Automata. In Finite-State Language Processing, pages 431-453. MIT Press,
1997.

• I. J. Schoenberg, Metric Spaces and Positive Definite Functions. Transactions of the American
Mathematical Society,Vol. 44, No. 3, pp. 522-536, 1938.

• Vladimir N.Vapnik. Estimation of Dependences Based on Empirical Data. Springer, Basederlin,

1982.

• Vladimir N.Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

• Vladimir N.Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998.

Mehryar Mohri - Foundations of Machine Learning page 48 Courant Institute, NYU
Appendix
Mercer’s Condition
(Mercer, 1909)
Theorem: Let X X be a compact subset of RN and
let K : X X R be in L (X X) and symmetric.
Then, K admits a uniformly convergent expansion

K(x, y) = an n (x) n (y), with an > 0,

n=0

iff for any function c in L2 (X),

c(x)c(y)K(x, y)dxdy 0.
X X

Mehryar Mohri - Foundations of Machine Learning page 50

SVMs with PDS Kernels
Constrained optimization: Hadamard product

max 2 1 ( y) K( y)
subject to: 0 C y = 0.

Solution:
m
h = sgn i yi K(xi , ·) +b ,
i=1
with b = yi ( y) Kei for any xi with
0 < i < C.

Mehryar Mohri - Foundations of Machine Learning page 51

Bayesian Statistics (An Introduction) (4th Edition) Lee
No ratings yet
Bayesian Statistics (An Introduction) (4th Edition) Lee
10 pages
Abacus Math FreeBook
100% (5)
Abacus Math FreeBook
126 pages
Byzantine Machine Learning: A Primer: Rachid Guerraoui Nirupam Gupta Rafael Pinot
No ratings yet
Byzantine Machine Learning: A Primer: Rachid Guerraoui Nirupam Gupta Rafael Pinot
39 pages
Neural Networks in Finance
No ratings yet
Neural Networks in Finance
10 pages
TD Calcul Stochastique
No ratings yet
TD Calcul Stochastique
3 pages
Stochastic Calculus Notes 4/5
No ratings yet
Stochastic Calculus Notes 4/5
22 pages
Cours Calcul Stochastique
100% (1)
Cours Calcul Stochastique
68 pages
Convolution in 1D and 2D
No ratings yet
Convolution in 1D and 2D
18 pages
Ledoux Concentration of Measure
No ratings yet
Ledoux Concentration of Measure
250 pages
Thesis Jur Erbrink
No ratings yet
Thesis Jur Erbrink
245 pages
SVM and Kernels
No ratings yet
SVM and Kernels
13 pages
Haar Measure on Compact Groups
No ratings yet
Haar Measure on Compact Groups
12 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
1 s2.0 S0169207006000239 Main
No ratings yet
1 s2.0 S0169207006000239 Main
10 pages
Martingales in Discrete-Time - (Kozdron)
No ratings yet
Martingales in Discrete-Time - (Kozdron)
5 pages
How Does SVD Work?: Singular Value Decomposition (SVD) On Wikipedia
No ratings yet
How Does SVD Work?: Singular Value Decomposition (SVD) On Wikipedia
6 pages
Mathematical Statistics Final Exam
No ratings yet
Mathematical Statistics Final Exam
5 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Advanced Financial Modeling
No ratings yet
Advanced Financial Modeling
60 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
TD Processus MPCI
No ratings yet
TD Processus MPCI
11 pages
2020 Notes Numprofin
No ratings yet
2020 Notes Numprofin
170 pages
Bandits
No ratings yet
Bandits
2 pages
Lecture01a (Linear Algebra)
100% (1)
Lecture01a (Linear Algebra)
113 pages
Time Series Course
No ratings yet
Time Series Course
98 pages
Chapter19 ModelsofNonstationaryTimeSeries
No ratings yet
Chapter19 ModelsofNonstationaryTimeSeries
19 pages
Scilab Tutorial: Phani Raj
No ratings yet
Scilab Tutorial: Phani Raj
11 pages
Master LN
No ratings yet
Master LN
135 pages
Probability for Robotics Students
100% (1)
Probability for Robotics Students
19 pages
Orthogonal Matrices and Applications.: Presented by - G Venkata Jaya Krishna 228R1A0597
No ratings yet
Orthogonal Matrices and Applications.: Presented by - G Venkata Jaya Krishna 228R1A0597
13 pages
Multi-Label Long Short-Term Memory-Based Framework To Analyze Drug Functions From Biological Properties
No ratings yet
Multi-Label Long Short-Term Memory-Based Framework To Analyze Drug Functions From Biological Properties
6 pages
Tutorials in Probability
100% (1)
Tutorials in Probability
493 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Simulated Annealing
No ratings yet
Simulated Annealing
16 pages
C 2 OneFactor Vasicek
No ratings yet
C 2 OneFactor Vasicek
21 pages
Unsupervised Learning: K-Means & GMM
No ratings yet
Unsupervised Learning: K-Means & GMM
27 pages
Brownian Motion for Mathematicians
No ratings yet
Brownian Motion for Mathematicians
40 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
Markov 123
No ratings yet
Markov 123
108 pages
Age-Period-Cohort Analysis: New Models, Methods, and Empirical Applications
No ratings yet
Age-Period-Cohort Analysis: New Models, Methods, and Empirical Applications
339 pages
Branch and Price - Wikipedia
No ratings yet
Branch and Price - Wikipedia
3 pages
Importance Sampling
No ratings yet
Importance Sampling
8 pages
Advanced Macroeconomics for Grad Students
No ratings yet
Advanced Macroeconomics for Grad Students
56 pages
Cours Probabilités Avancées
100% (1)
Cours Probabilités Avancées
67 pages
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
MAST10018 Exercises
No ratings yet
MAST10018 Exercises
67 pages
Markov Chains for Mathematicians
No ratings yet
Markov Chains for Mathematicians
59 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Stochastic Processes Notes
No ratings yet
Stochastic Processes Notes
2 pages
Oksendal Stochastic Differential Equations PDF Free
No ratings yet
Oksendal Stochastic Differential Equations PDF Free
385 pages
Stochastic Calculus Exercises
100% (2)
Stochastic Calculus Exercises
85 pages
SSNAO Dupliant
No ratings yet
SSNAO Dupliant
9 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
7 pages
ARDL
No ratings yet
ARDL
29 pages
Machine Learning With Kernel Methods
No ratings yet
Machine Learning With Kernel Methods
760 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
01 Stable Matching
No ratings yet
01 Stable Matching
31 pages
Data Structures - CS1353 - Assignment 1: Array Size (N) 50
No ratings yet
Data Structures - CS1353 - Assignment 1: Array Size (N) 50
2 pages
A
No ratings yet
A
3 pages
Breakfast Lunch Snacks Drinks Dinner
No ratings yet
Breakfast Lunch Snacks Drinks Dinner
3 pages
Vaccina1365tion FAQ
No ratings yet
Vaccina1365tion FAQ
2 pages
Vaccine Preventable Diseases Such As Meales, Mumps, Rubella, Chickenpox, Hepatis - A and Tyhoid
No ratings yet
Vaccine Preventable Diseases Such As Meales, Mumps, Rubella, Chickenpox, Hepatis - A and Tyhoid
2 pages
Contemporary Models of Development and Underdevelopment
No ratings yet
Contemporary Models of Development and Underdevelopment
22 pages
Migration Approach-BMS PV Integration
No ratings yet
Migration Approach-BMS PV Integration
21 pages
RajeshAC SrDBA
No ratings yet
RajeshAC SrDBA
3 pages
Hexagon Mi 454sf Datasheet en
No ratings yet
Hexagon Mi 454sf Datasheet en
5 pages
JCT3V-G1100 (CTC)
No ratings yet
JCT3V-G1100 (CTC)
7 pages
Modulation
No ratings yet
Modulation
9 pages
12 Must-Watch Mograph Videos: Grab Some Popcorn. It'S Binge Watching Time!
No ratings yet
12 Must-Watch Mograph Videos: Grab Some Popcorn. It'S Binge Watching Time!
6 pages
Jottings From The Treatises of The Late Leader Sumugam Aiyar of The Brahmaswarupini Movement
No ratings yet
Jottings From The Treatises of The Late Leader Sumugam Aiyar of The Brahmaswarupini Movement
17 pages
Math-Q3-Week 6
No ratings yet
Math-Q3-Week 6
6 pages
Grade 10 Singapore and Asian Schools Math Olympiad: Choose Correct Answer(s) From The Given Choices
No ratings yet
Grade 10 Singapore and Asian Schools Math Olympiad: Choose Correct Answer(s) From The Given Choices
2 pages
Statistical Analysis of Home Prices
No ratings yet
Statistical Analysis of Home Prices
4 pages
GRASPS
No ratings yet
GRASPS
1 page
Rsnetworx
No ratings yet
Rsnetworx
58 pages
Bridge Course
No ratings yet
Bridge Course
8 pages
SC - Cost Rollup
No ratings yet
SC - Cost Rollup
6 pages
AQA As 2.0 Optical Fibres 1 Questions
No ratings yet
AQA As 2.0 Optical Fibres 1 Questions
13 pages
Artikel Assigment Oum
No ratings yet
Artikel Assigment Oum
6 pages
Personal Data Sheet: Single Married Annulled Widowed Separated Others, Specify
No ratings yet
Personal Data Sheet: Single Married Annulled Widowed Separated Others, Specify
4 pages
Parallel Texts Alignment Strategies
No ratings yet
Parallel Texts Alignment Strategies
7 pages
Span of Control
No ratings yet
Span of Control
3 pages
Final Report Askari Bank
No ratings yet
Final Report Askari Bank
117 pages
Geographical Information System: Unit 1 Fundementals of GIS
100% (6)
Geographical Information System: Unit 1 Fundementals of GIS
81 pages
Algebra 2 Trigonometry Practice
No ratings yet
Algebra 2 Trigonometry Practice
11 pages
3 Year Aie Diploma Proposed
No ratings yet
3 Year Aie Diploma Proposed
10 pages
002 Ostrich PDF
No ratings yet
002 Ostrich PDF
10 pages
Fundamentals of HPLC
100% (1)
Fundamentals of HPLC
37 pages
Discussedlessonplan
No ratings yet
Discussedlessonplan
2 pages
Chiller PDF
No ratings yet
Chiller PDF
20 pages