Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views46 pages

Lecture11 Pca

The document is a lecture on Principal Component Analysis (PCA) presented by Matt Gormley at Carnegie Mellon University, covering its role in dimensionality reduction for high-dimensional data. It discusses the mathematical foundation of PCA, including eigenvectors, eigenvalues, and algorithms for computation, along with practical applications in areas like face recognition and image compression. Key reminders include upcoming quizzes and homework assignments related to linear algebra and matrix calculus.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views46 pages

Lecture11 Pca

The document is a lecture on Principal Component Analysis (PCA) presented by Matt Gormley at Carnegie Mellon University, covering its role in dimensionality reduction for high-dimensional data. It discusses the mathematical foundation of PCA, including eigenvectors, eigenvalues, and algorithms for computation, along with practical applications in areas like face recognition and image compression. Key reminders include upcoming quizzes and homework assignments related to linear algebra and matrix calculus.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

10-606 Mathematical Foundations for Machine Learning

Machine Learning Department


School of Computer Science
Carnegie Mellon University

Deriving Principal
Component Analysis
(PCA)
Matt Gormley
Lecture 11
Oct. 3, 2018

1
Reminders
• Quiz 1: Linear Algebra (today)
• Homework 3: Matrix Calculus + Probability
– Out: Wed, Oct. 3
– Due: Wed, Oct. 10 at 11:59pm
• Quiz 2: Matrix Calculus + Probability
– In-class, Wed, Oct. 10

3
Q&A
4
DIMENSIONALITY REDUCTION

6
PCA Outline
• Dimensionality Reduction
– High-dimensional data
– Learning (low dimensional) representations
• Principal Component Analysis (PCA)
– Examples: 2D and 3D
– Data for PCA
– PCA Definition
– Objective functions for PCA
– PCA, Eigenvectors, and Eigenvalues
– Algorithms for finding Eigenvectors /
Eigenvalues
• PCA Examples
– Face Recognition
– Image Compression

7
High Dimension Data
Examples of high dimensional data:
– High resolution images (millions of pixels)

8
High Dimension Data
Examples of high dimensional data:
– Multilingual News Stories
(vocabulary of hundreds of thousands of words)

9
High Dimension Data
Examples of high dimensional data:
– Brain Imaging Data (100s of MBs per scan)

Image from (Wehbe et al., 2014)


10
Image from https://pixabay.com/en/brain-mrt-magnetic-resonance-imaging-1728449/
High Dimension Data
Examples of high dimensional data:
– Customer Purchase Data

11
Learning Representations
PCA, Kernel PCA, ICA: Powerful unsupervised learning techniques
for extracting hidden (potentially lower dimensional) structure
from high dimensional datasets.
Useful for:
• Visualization
• More efficient use of resources
(e.g., time, memory, communication)

• Statistical: fewer dimensions à better generalization


• Noise removal (improving data quality)
• Further processing by machine learning algorithms
Slide from Nina Balcan
PRINCIPAL COMPONENT
ANALYSIS (PCA)

16
PCA Outline
• Dimensionality Reduction
– High-dimensional data
– Learning (low dimensional) representations
• Principal Component Analysis (PCA)
– Examples: 2D and 3D
– Data for PCA
– PCA Definition
– Objective functions for PCA
– PCA, Eigenvectors, and Eigenvalues
– Algorithms for finding Eigenvectors / Eigenvalues
• PCA Examples
– Face Recognition
– Image Compression
17
Principal Component Analysis (PCA)

In case where data lies on or near a low d-dimensional linear subspace,


axes of this subspace are an effective representation of the data.

Identifying the axes is known as Principal Components Analysis, and can be


obtained by using classic matrix computation tools (Eigen or Singular Value
Decomposition).

Slide from Nina Balcan


2D Gaussian dataset

Slide from Barnabas Poczos


1st PCA axis

Slide from Barnabas Poczos


2nd PCA axis

Slide from Barnabas Poczos


Principal Component Analysis (PCA)
Whiteboard
– Data for PCA
– PCA Definition
– Objective functions for PCA

22
Data for PCA (t(1) )T
D= (i) N
{t }i=1 (t(2) )T
s= ..
.
(t(N ) )T
We assume the data is centered, and that each
axis has sample variance equal to one.
N
1
µ= t (i)
=0
N i=1
N
2 1 (i) 2
j = (xj ) =1
N i=1 23
Sample Covariance Matrix
The sample covariance matrix is given by:
N
1 (i) (i)
jk = (xj µj )(xk µk )
N i=1

Since the data matrix is centered, we rewrite as:

1 T
= s s
N

24
Maximizing the Variance
Quiz: Consider the two projections below
1. Which maximizes the variance?
2. Which minimizes the reconstruction error?

Option A Option B

25
PCA
Equivalence of Maximizing Variance and Minimizing Reconstruction Error

26
Principal Component Analysis (PCA)
Whiteboard
– PCA, Eigenvectors, and Eigenvalues
– Algorithms for finding Eigenvectors /
Eigenvalues
– SVD: Relation of Singular Vectors to
Eigenvectors

27
SVD for PCA

28
SVD for PCA

29
Principal Component Analysis (PCA)
X X # v = λv , so v (the first PC) is the eigenvector of
sample correlation/covariance matrix ' ' (
Sample variance of projection v ( ' ' ( v = )v ( v = )

Thus, the eigenvalue ) denotes the amount of variability


captured along that dimension (aka amount of energy along that
dimension).

Eigenvalues )* ≥ ), ≥ )- ≥ ⋯

• The 1st PC /* is the the eigenvector of the sample covariance matrix ' ' (
associated with the largest eigenvalue
• The 2nd PC /, is the the eigenvector of the sample covariance matrix
' ' ( associated with the second largest eigenvalue
• And so on …
Slide from Nina Balcan
How Many PCs?
• For M original dimensions, sample covariance matrix is MxM, and has
up to M eigenvectors. So M PCs.
• Where does dimensionality reduction come from?
Can ignore the components of lesser significance.

25

20
Variance (%)

15

10

0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10

• You do lose some information, but if the eigenvalues are small, you don’t lose
much
– M dimensions in original data
– calculate M eigenvectors and eigenvalues
– choose only the first D eigenvectors, based on their eigenvalues
– final data set has only D dimensions

© Eric Xing @ CMU, 2006-2011 32


Slides from Barnabas Poczos

Original sources include:


• Karl Booksh Research group
• Tom Mitchell
• Ron Parr

PCA EXAMPLES

33
Face recognition

Slide from Barnabas Poczos


Challenge: Facial Recognition
• Want to identify specific person, based on facial image
• Robust to glasses, lighting,…
Þ Can’t just use the given 256 x 256 pixels

Slide from Barnabas Poczos


Applying PCA: Eigenfaces
Method: Build one PCA database for the whole dataset and
then classify based on the weights.

• Example data set: Images of faces


– Famous Eigenface approach
[Turk & Pentland], [Sirovich & Kirby]
x1, …, xm • Each face x is …
– 256 ´ 256 values (luminance at location)
– x in Â256´256
real values
256 x 256

(view as 64K dim vector)


X=

m faces

Slide from Barnabas Poczos


Principle Components

Slide from Barnabas Poczos


Reconstructing…

• … faster if train with…


– only people w/out glasses
– same lighting conditions
Slide from Barnabas Poczos
Shortcomings
• Requires carefully controlled data:
– All faces centered in frame
– Same size
– Some sensitivity to angle
• Alternative:
– “Learn” one set of PCA vectors for each angle
– Use the one with lowest error

• Method is completely knowledge free


– (sometimes this is good!)
– Doesn’t know that faces are wrapped around 3D objects
(heads)
– Makes no effort to preserve class distinctions

Slide from Barnabas Poczos


Image Compression

Slide from Barnabas Poczos


Original Image

• Divide the original 372x492 image into patches:


• Each patch is an instance that contains 12x12 pixels on a grid
• View each as a 144-D vector

Slide from Barnabas Poczos


L2 error and PCA dim

Slide from Barnabas Poczos


PCA compression: 144D à 60D

Slide from Barnabas Poczos


PCA compression: 144D à 16D

Slide from Barnabas Poczos


16 most important eigenvectors
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

Slide from Barnabas Poczos


PCA compression: 144D à 6D

Slide from Barnabas Poczos


6 most important eigenvectors

2 2 2
4 4 4
6 6 6
8 8 8
10 10 10
12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

2 2 2
4 4 4
6 6 6
8 8 8
10 10 10
12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

Slide from Barnabas Poczos


PCA compression: 144D à 3D

Slide from Barnabas Poczos


3 most important eigenvectors
2 2

4 4

6 6

8 8

10 10

12 12
2 4 6 8 10 12 2 4 6 8 10 12

10

12
2 4 6 8 10 12

Slide from Barnabas Poczos


PCA compression: 144D à 1D

Slide from Barnabas Poczos


60 most important eigenvectors

Looks like the discrete cosine bases of JPG!...


Slide from Barnabas Poczos
2D Discrete Cosine Basis

http://en.wikipedia.org/wiki/Discrete_cosine_transform

Slide from Barnabas Poczos

You might also like