Lecture 4

The document discusses advanced concepts in Principal Component Analysis (PCA) and kernel methods for classification, focusing on the use of momentum and Chebyshev polynomials to enhance PCA convergence. It explains the limitations of linear models and introduces kernel methods to enable more complex decision boundaries while maintaining computational efficiency through the Gram matrix. The document emphasizes the importance of kernel properties and provides various examples of kernels used in machine learning.

Uploaded by

rysul12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views49 pages

Lecture 4

Uploaded by

rysul12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

The Kernel Trick, Gram

Matrices, and Feature Extraction

CS6787 Lecture 4 — Fall 2017
Momentum for Principle
Component Analysis
CS6787 Lecture 3.1 — Fall 2017
Principle Component Analysis
• Setting: find the dominant eigenvalue-eigenvector pair of a positive
semidefinite symmetric matrix A.
T
x Ax
u1 = arg max T
x x x
• Many ways to write this problem, e.g.
kBkF is Frobenius norm
XX
p kBk2F = 2
Bi,j
T 2
1 u1 = arg min kxx AkF i j
x
PCA: A Non-Convex Problem
• PCA is not convex in any of these formulations

• Why? Think about the solutions to the problem: u and –u

• Two distinct solutions à can’t be convex

• Can we still use momentum to run PCA more quickly?

Power Iteration
• Before we apply momentum, we need to choose what base algorithm
we’re using.

• Simplest algorithm: power iteration

• Repeatedly multiply by the matrix A to get an answer

xt+1 = Axt
Why does Power Iteration Work?
n
X
• Let eigendecomposition of A be A = u u
i i i
T

i=1

• For 1 > 2 1 ··· n

• PI converges in direction because cosine-squared of angle to u1 is

T 2 T t 2 2t T 2
(u x t ) (u A x 0 ) (u x
1 0 )
cos2 (✓) = 1 2 = 1 t = P n
1
2t (uT x )2
kxt k kA x0 k2 i=1 i i 0
Pn ✓ ◆ !
2t T 2 2t
i=2 i (u x
i 0 ) 2
= 1 Pn 2t (uT x )2
= 1 ⌦
i=1 i i 0 1
What about a more general algorithm?
• Use both current iterate, and history of past iterations

xt+1 = ↵t Axt + t,1 xt 1 + t,2 xt 2 + ··· + t,t x0

• for fixed parameters ⍺ and β

• What class of functions can we express in this form?

• Notice: xt is always a degree-t polynomial in A times x0

• Can prove by induction that we can express ANY polynomial
Power Iteration and Polynomials
• Can also think of power iteration as a degree-t polynomial of A

x t = At x 0
• Is there a better degree-t polynomial to use than ft (x) = xt ?
• If we use a different polynomial, then we get
n
X
xt = ft (A)x0 = ft ( i )ui uTi x0
i=1

• Ideal solution: choose polynomial with zeros at all non-dominant eigenvalues

• Practically, make ft ( 1 ) to be as large as possible and |ft ( )| < 1 for all | | < 2
Chebyshev Polynomials Again
• It turns out that Chebyshev polynomials solve this problem.

• Recall: T0 (x) = 0, T1 (x) = x and

Tn+1 (x) = 2xTn (x) Tn 1 (x)

• Nice properties:

|x|  1 ) |Tn (x)|  1

Chebyshev Polynomials
1.5

0.5

0
T0 (u) = 1
-0.5

-1