CS303: Mathematical Foundations for AI
Singular Value Decomposition
17 Jan 2025
Recap
• Recap
▶ Matrices as transformations
• Geometric
• Convolution
▶ Determinant
▶ Eigen Values and Vectors
▶ Eigen Decomposition
▶ Power Method
• Singular Value Decomposition
1 / 36
References
• https://rksmvv.ac.in/wp-content/uploads/2021/04/Gilbert
Strang Linear Algebra and Its Applicatio 230928 225121.pdf —
TextBook
• https://pabloinsente.github.io/intro-linear-algebra#
singular-value-decomposition
• https://mcrovella.github.io/CS132-Geometric-Algorithms/
L25SVD.html
• Blog
2 / 36
Low Rank Representation
Image Shape: (667, 1000, 3)
3 / 36
Low Rank Representation
Image as lower ranks (Compression)
3 / 36
Low Rank Representation
Image Shape: (2003, 3000, 3)
3 / 36
Low Rank Representation
Image as lower ranks (Compression)
3 / 36
Low Rank Representation
X which is n × d as rank 1
• Rank is 1
• Represent all vectors a scalar multiple of a single vector u
•
X = x1 x2 . . . x d
≈ v1 u v2 u . . . v d u = u v1 v2 . . . v d
u1
u2
= uvT =
· v1 v2 . . . v d
un
4 / 36
Low Rank Representation
X which is n × d as rank 2
• Rank is 2
• Represent all vectors a scalar multiple of vectors u1 and u2
X = x1 x2 . . . x d
≈ v11 u1 + v21 u2 v12 u1 + v22 u2 . . . v1d u1 + v2d u2
= v11 u1 v12 u1 . . . v1d u1 + v21 u2 v22 u2 . . . v2d u2
= u1 v1T + u2 v2T
u11 u21
v11 v21 . . . vd1 u12 u2n v11 v12 . . . v1d
= u1 u2 =
v12 v22 . . . vd2 · · v21 v22 . . . v2d
u1n u2n
5 / 36
Low Rank Representation
X as rank k
• Rank is k
• Represent all vectors a scalar multiple of a single vector U which is n × k and V T
which is k × d
X ≈ UV T
= u1 v1T + u2 v2T + . . . + uk vkT
u11 · · · uk1
u12 · · · ukn v 11 v 12 . . . v 1d
. .. .. ..
= .. .. ..
.. . . .
. . .
vk1 vk2 . . . vkd
u1n · · · ukn
6 / 36
Image Denoising
Image as lower ranks (Compression)
7 / 36
Image Denoising
Image as lower ranks (Denoising)
7 / 36
Optimal Low Rank Representation
What is optimal low rank representation of a matrix A?
• What approach did we use previously to get the low rank factorization?
• Find the rank k subspace that preserves the maximum information
8 / 36
Optimal Low Rank Representation
9 / 36
• Find the direction x that maximizes ∥ Ax ∥ where ∥ x ∥= 1
• For square matrix: ∥ Ax ∥=∥ λx ∥= |λ| (largest eigenvalue)
• What about general non-square matrices?
Find the unit vectors x which maximize
∥ Ax ∥2 or ( Ax )T ( Ax ) or x T ( AT A) x
What can you say about the matrix A T A? Symmetric
10 / 36
Symmetric Matrices
Definition 1: Symmetric Matrix
Matrix A is symmetric if A = A T
• A is a square matrix
• E.g.
0 −1 5
0 4
, −1 5 8
4 3
5 8 7
11 / 36
Symmetric Matrices
Theorem 1: Spectral Theorem
The eigen vectors of a symmetric matrix are orthogonal
6 −2 −1
For e.g., A = −2 6 −1
−1 −1 5
• Characteristic equation −(λ − 8)(λ − 6)(λ − 3) = 0
• So eigen values are λ1 = 8, λ2 = 6, λ3 = 3
−1 −1 1
• Eigenvectors are 1 , −1 , 1
0 2 1
12 / 36
Symmetric Matrices
−1 −1 1
• Eigenvectors are v1 = 1 , v2 = −1 , v3 = 1
0 2 1
• Note that viT v j = 0, i.e. orthogonal
√ √ √
−1/√ 2 −1/√6 1/√3
• V = 1/ 2 −1/ 6 1/ 3
√ √
0 2/ 6 1/ 3
13 / 36
Proof Spectral Theorem
Given A is symmetric A = A T
Consider v1 and v2 corresponding to λ1 and λ2 s.t. λ1 ̸= λ2
λ1 v1T v2 = (λ1 v1 )T v2
= ( Av1 )T v2
= v1T AT v2
= v1T Av2 (Symmtery)
= v1T (λ2 v2 )
= λ2 (v1T v2 )
Therefore, λ1 v1T v2 = λ2 v1T v2 =⇒ v1T v2 = 0 as λ1 ̸= λ2
14 / 36
Symmetric Matrices
Theorem 2
Any (real) n × n matrix is orthogonally diagonalizable if and only if it is a sym-
metric matrix.
• Proof
√ √ √
−1/√ 2 −1/√6 1/√3
8 0 0
• V = 1/ 2 −1/ 6 1/ 3 , D = 0 6 0
√ √
0 2/ 6 1/ 3 0 0 3
• V is orthogonal matrix, columns are orthonormal
A = VDV −1 = VDV T
14 / 36
Symmetric Matrices in Equations
• Set of linear equations Ax = b
• Quadratic equations?
Q( x ) = 5x12 + 3x22 + 2x32 − x1 x2 + 8x2 x3
x1 5 −1/2 0
• x = x2 , Q( x ) = x T Ax, A = −1/2 3 4
x3 0 4 2
• There always exists a symmetric A
▶ Coefficients of x12 , x22 , . . . in the diagonal
▶ Coefficients of xi x j split in positions ij and ji
15 / 36
Positive Semi-definite
Definition 2: Positive semi-definite
A matrix A is positive semidefinite if,
x T Ax ≥ 0 ∀x
i.e., the corresponding Q( x ) ≥ 0 for all x
16 / 36
Positive Semi-definite
2 1 4 −1
Positive semidefinite and
1 3 −1 2
16 / 36
Positive Semi-definite
16 / 36
Positive Semi-definite (PSD)
Since x T Ax ≥ 0,
Consider an n-dimensional eigenvector v for A which is n × n,
Av = λv
v T Av = λv T v
λv T v ≥ 0 (definition of PSD)
λ≥0 (As vT v > 0)
Theorem 3
A symmetric matrix A is positive definite (semidefinite) if and only if all the eigen
values are always positive (non-negative)
17 / 36
Positive Semi-definite
Think about it ...
A positive definite matrix never flips a vector about its origin
https://gregorygundersen.com/blog/2022/02/27/positive-definite/
17 / 36
Singular Value Decomposition
The Singular Value Decomposition is the “Swiss Army Knife” and the “Rolls Royce” of
matrix decompositions. – Diane OLeary
• Every matrix A has a Singular Value Decomposition
18 / 36
Singular Value Decomposition (SVD)
Given any n × d matrix A
• What can you say about AAT ?
▶ Square (n × n)
▶ Symmetric ( AA T )T = ( A T )T A T = AA T
▶ AA T is positive semi-definite so non-negative eigen values
▶ rank( AA T ) =? rank( A T )
▶ Since symmetric one can chose eigen vectors to be orthonormal
U = u1 u2 ... un
19 / 36
SVD
Given any n × d matrix A
• What can you say about AT A?
▶ Square (d × d)
▶ Symmetric ( A T A)T = A T ( A T )T = A T A
▶ A T A is positive semi-definite so non-negative eigen values
▶ rank( A T A) =? rank( A)
▶ Since symmetric one can chose eigen vectors to be orthonormal
V = v1 v2 ... vd
20 / 36
SVD
• For AAT the orthonormal eigenvectors U s.t.
UTU = I
• For AT A the orthonormal eigenvectors V s.t.
VTV = I
• Both AAT and AT A have different number of eigenvalues but non-zero
eigenvalues are identical
21 / 36
SVD
Theorem 4
Given an m × n matrix A, both AA T and A T A have identical non-zero eigen
values
A( A T A) x = λAx i.e., AA T ( Ax ) = λ( Ax )
22 / 36
SVD
• For AAT the orthonormal eigenvectors U s.t.
UTU = I
• For AT A the orthonormal eigenvectors V s.t.
VTV = I
• Let λ1 , . . . , λr be the r positive eigen values for AAT and AT A
√
λ1 √
λ2
..
.
√
S=
λ r
..
.
0
23 / 36
SVD
Singular Value Decomposition
A = USV T
√
λ1 σ1
.. ..
. .
√
where S =
λr =
σr
.. ..
. .
0 0
σ′ s are singular values
24 / 36
Proof of SVD
A = USV T
UTU = I
VTV = I
Solve the above three to find U, S and V
25 / 36
Proof of SVD
Solve for V
A = USV T
A T = (USV T )T = VS T U T = VSU T
A T A = VSU T (USV T )
= VSSV T
= VS2 V T
A T AV = VS2
λ1
A T A v1 v2 . . . v d = v1 v2
. . . vd λ2
λd
or
A T Avi = λi vi = σi2 vi
25 / 36
Proof of SVD
Solve for U
A = USV T
A T = (USV T )T = VS T U T = VSU T
AA T = USV T (USV T )T
= USV T VST U T
= US2 U T
AA T U = US2
or
AA T ui = λi ui = σi2 ui
25 / 36
Singular Value Decomposition
σ1
.. T
v1
. .
A = u1 u2 . . . un σr ..
.. T
vd
.
0
v1T
= σ1 ui . . . σr ur . . . 0 ...
vdT
r
= ∑ σi ui viT
i =1
26 / 36
Optimal Low Rank using SVD
Optimal k < r rank representation is given by
k
A= ∑ σi ui viT
i =1
• ui are the eigen vectors of AAT
• vi are the eigen vectors of AT A
√
• σi are the singular values or λi
27 / 36
Example for SVD
Let the matrix A be:
1 2
A = 3 4 .
5 6
•
T 35 44
A A= .
44 56
•
5 11 17
AA T = 11 25 39 .
17 39 61
28 / 36
Example for SVD
• For AT A, the characteristic polynomial is:
35 − λ 44
det = 0.
44 56 − λ
The characteristic polynomial simplifies to:
λ2 − 91λ + 4 = 0.
• Eigenvalues of AT A:
λ1 = 91.0, λ2 = 0.2656.
√
• Singular values (σi = λi ):
σ1 = 9.5255, σ2 = 0.5143.
29 / 36
Example for SVD
• Eigenvectors of AT A (V):
−0.61962948 −0.78489445
V= .
−0.78489445 0.61962948
• Eigenvectors of AAT (U):
−0.2298477 0.88346102 0.40824829
U = −0.52474482 0.24078249 −0.81649658
−0.81964194 −0.40189603 0.40824829
30 / 36
Example for SVD
The diagonal matrix Σ is:
9.5255 0
Σ= 0 0.5143 .
0 0
The singular value decomposition of A is:
A = UΣV T ,
where:
−0.2298 0.8835 0.4082 9.5255 0
−0.6196 −0.7849
U = −0.5247 0.2408 −0.8165 , Σ =
0 T
0.5143 , V =
−0.7849 0.6196
−0.8196 −0.4019 0.4082 0 0
31 / 36
Example for SVD
Rank 1 approximation
A = σ1 u1 v1T
−0.2298
= 9.5255 −0.5247 0.6196 −0.7849
−0.8196
1.35662819 1.71846235
= 3.09719707 3.92326845]
4.83776596 6.12807454
1 2
where 3 4
5 6
32 / 36
Optimal Low Rank (Intuition)
• Recollect the maximizing the “projection notion”
• Find the unit vector x that maximizes ∥ Ax ∥ or ∥ Ax ∥2
Theorem 5
Let A be an n × d matrix. Then the maximum value of ∥ Ax ∥, where x ranges
over unit vectors, is the largest singular value σ1
33 / 36
Optimal Low Rank (Intuition)
• σ2 maximizes ∥ Ax ∥ when x must be a unit vector orthogonal to v1
• σ3 maximizes ∥ Ax ∥ when x must be a unit vector orthogonal to v1 and v2
• so on ...
• Greedily selecting based on singular values help in finding the optimal low rank
33 / 36
Optimal Low Rank
Theorem 6: EckartYoungMirsky theorem
For any n × d matrix A, the rank-k representation Ak given by SVD satisfies the
following
• ∥ A − Ak ∥ F ≤ ∥ A − Bk ∥ for any Bk , s.t. rank( Bk ) ≤ k
q
• ∥ A∥ F = ∑ij A2ij
34 / 36
Optimal Low Rank (Intuition)
Given σ1 ≥ σ2 ≥ . . . ≥ σr ,
r k
∥ A − Ak ∥ F = ∥ ∑ σi ui viT − ∑ σi ui vi ∥
i =1 i =1
r
=∥ ∑ σi ui viT ∥
i = k +1
r
= ∑ σi2
i = k +1
35 / 36
Applications of SVD
• Image Compression
• Image De-noising
• Latent Semantic Analysis
• Recommendation Systems
• Representation Learning
36 / 36