Lecture11 Pca
Lecture11 Pca
Deriving Principal
Component Analysis
(PCA)
Matt Gormley
Lecture 11
Oct. 3, 2018
1
Reminders
• Quiz 1: Linear Algebra (today)
• Homework 3: Matrix Calculus + Probability
– Out: Wed, Oct. 3
– Due: Wed, Oct. 10 at 11:59pm
• Quiz 2: Matrix Calculus + Probability
– In-class, Wed, Oct. 10
3
Q&A
4
DIMENSIONALITY REDUCTION
6
PCA Outline
• Dimensionality Reduction
– High-dimensional data
– Learning (low dimensional) representations
• Principal Component Analysis (PCA)
– Examples: 2D and 3D
– Data for PCA
– PCA Definition
– Objective functions for PCA
– PCA, Eigenvectors, and Eigenvalues
– Algorithms for finding Eigenvectors /
Eigenvalues
• PCA Examples
– Face Recognition
– Image Compression
7
High Dimension Data
Examples of high dimensional data:
– High resolution images (millions of pixels)
8
High Dimension Data
Examples of high dimensional data:
– Multilingual News Stories
(vocabulary of hundreds of thousands of words)
9
High Dimension Data
Examples of high dimensional data:
– Brain Imaging Data (100s of MBs per scan)
11
Learning Representations
PCA, Kernel PCA, ICA: Powerful unsupervised learning techniques
for extracting hidden (potentially lower dimensional) structure
from high dimensional datasets.
Useful for:
• Visualization
• More efficient use of resources
(e.g., time, memory, communication)
16
PCA Outline
• Dimensionality Reduction
– High-dimensional data
– Learning (low dimensional) representations
• Principal Component Analysis (PCA)
– Examples: 2D and 3D
– Data for PCA
– PCA Definition
– Objective functions for PCA
– PCA, Eigenvectors, and Eigenvalues
– Algorithms for finding Eigenvectors / Eigenvalues
• PCA Examples
– Face Recognition
– Image Compression
17
Principal Component Analysis (PCA)
22
Data for PCA (t(1) )T
D= (i) N
{t }i=1 (t(2) )T
s= ..
.
(t(N ) )T
We assume the data is centered, and that each
axis has sample variance equal to one.
N
1
µ= t (i)
=0
N i=1
N
2 1 (i) 2
j = (xj ) =1
N i=1 23
Sample Covariance Matrix
The sample covariance matrix is given by:
N
1 (i) (i)
jk = (xj µj )(xk µk )
N i=1
1 T
= s s
N
24
Maximizing the Variance
Quiz: Consider the two projections below
1. Which maximizes the variance?
2. Which minimizes the reconstruction error?
Option A Option B
25
PCA
Equivalence of Maximizing Variance and Minimizing Reconstruction Error
26
Principal Component Analysis (PCA)
Whiteboard
– PCA, Eigenvectors, and Eigenvalues
– Algorithms for finding Eigenvectors /
Eigenvalues
– SVD: Relation of Singular Vectors to
Eigenvectors
27
SVD for PCA
28
SVD for PCA
29
Principal Component Analysis (PCA)
X X # v = λv , so v (the first PC) is the eigenvector of
sample correlation/covariance matrix ' ' (
Sample variance of projection v ( ' ' ( v = )v ( v = )
Eigenvalues )* ≥ ), ≥ )- ≥ ⋯
• The 1st PC /* is the the eigenvector of the sample covariance matrix ' ' (
associated with the largest eigenvalue
• The 2nd PC /, is the the eigenvector of the sample covariance matrix
' ' ( associated with the second largest eigenvalue
• And so on …
Slide from Nina Balcan
How Many PCs?
• For M original dimensions, sample covariance matrix is MxM, and has
up to M eigenvectors. So M PCs.
• Where does dimensionality reduction come from?
Can ignore the components of lesser significance.
25
20
Variance (%)
15
10
0
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
• You do lose some information, but if the eigenvalues are small, you don’t lose
much
– M dimensions in original data
– calculate M eigenvectors and eigenvalues
– choose only the first D eigenvectors, based on their eigenvalues
– final data set has only D dimensions
PCA EXAMPLES
33
Face recognition
m faces
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2
4 4 4
6 6 6
8 8 8
10 10 10
12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
2 2 2
4 4 4
6 6 6
8 8 8
10 10 10
12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12
4 4
6 6
8 8
10 10
12 12
2 4 6 8 10 12 2 4 6 8 10 12
10
12
2 4 6 8 10 12
http://en.wikipedia.org/wiki/Discrete_cosine_transform