Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
18 views48 pages

Lecture 4

The document discusses the mathematical foundations of Singular Value Decomposition (SVD) in the context of AI, covering concepts such as low rank representation, image denoising, and properties of symmetric matrices. It explains the significance of SVD in matrix decomposition, including its applications in preserving maximum information and optimal low rank representation. Additionally, the document outlines the proof of SVD and its relationship with eigenvalues and eigenvectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views48 pages

Lecture 4

The document discusses the mathematical foundations of Singular Value Decomposition (SVD) in the context of AI, covering concepts such as low rank representation, image denoising, and properties of symmetric matrices. It explains the significance of SVD in matrix decomposition, including its applications in preserving maximum information and optimal low rank representation. Additionally, the document outlines the proof of SVD and its relationship with eigenvalues and eigenvectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

CS303: Mathematical Foundations for AI

Singular Value Decomposition


17 Jan 2025
Recap

• Recap
▶ Matrices as transformations
• Geometric
• Convolution
▶ Determinant
▶ Eigen Values and Vectors
▶ Eigen Decomposition
▶ Power Method
• Singular Value Decomposition

1 / 36
References

• https://rksmvv.ac.in/wp-content/uploads/2021/04/Gilbert
Strang Linear Algebra and Its Applicatio 230928 225121.pdf —
TextBook
• https://pabloinsente.github.io/intro-linear-algebra#
singular-value-decomposition
• https://mcrovella.github.io/CS132-Geometric-Algorithms/
L25SVD.html
• Blog

2 / 36
Low Rank Representation

Image Shape: (667, 1000, 3)

3 / 36
Low Rank Representation

Image as lower ranks (Compression)

3 / 36
Low Rank Representation

Image Shape: (2003, 3000, 3)

3 / 36
Low Rank Representation

Image as lower ranks (Compression)

3 / 36
Low Rank Representation

X which is n × d as rank 1
• Rank is 1
• Represent all vectors a scalar multiple of a single vector u
•  
X = x1 x2 . . . x d
   
≈ v1 u v2 u . . . v d u = u v1 v2 . . . v d
 
u1
u2 
= uvT = 
  
 ·  v1 v2 . . . v d

un

4 / 36
Low Rank Representation
X which is n × d as rank 2
• Rank is 2
• Represent all vectors a scalar multiple of vectors u1 and u2
 
X = x1 x2 . . . x d
 
≈ v11 u1 + v21 u2 v12 u1 + v22 u2 . . . v1d u1 + v2d u2
   
= v11 u1 v12 u1 . . . v1d u1 + v21 u2 v22 u2 . . . v2d u2
= u1 v1T + u2 v2T
 
  u11 u21  
  v11 v21 . . . vd1  u12 u2n  v11 v12 . . . v1d
= u1 u2 =
 
v12 v22 . . . vd2 · ·  v21 v22 . . . v2d
u1n u2n

5 / 36
Low Rank Representation

X as rank k
• Rank is k
• Represent all vectors a scalar multiple of a single vector U which is n × k and V T
which is k × d

X ≈ UV T
= u1 v1T + u2 v2T + . . . + uk vkT
 
u11 · · · uk1  
 u12 · · · ukn  v 11 v 12 . . . v 1d
 . .. .. .. 
=  .. ..   ..

.. . . . 
 . . . 
vk1 vk2 . . . vkd
u1n · · · ukn

6 / 36
Image Denoising

Image as lower ranks (Compression)

7 / 36
Image Denoising

Image as lower ranks (Denoising)

7 / 36
Optimal Low Rank Representation

What is optimal low rank representation of a matrix A?


• What approach did we use previously to get the low rank factorization?
• Find the rank k subspace that preserves the maximum information

8 / 36
Optimal Low Rank Representation

9 / 36
• Find the direction x that maximizes ∥ Ax ∥ where ∥ x ∥= 1
• For square matrix: ∥ Ax ∥=∥ λx ∥= |λ| (largest eigenvalue)
• What about general non-square matrices?
Find the unit vectors x which maximize

∥ Ax ∥2 or ( Ax )T ( Ax ) or x T ( AT A) x

What can you say about the matrix A T A? Symmetric

10 / 36
Symmetric Matrices

Definition 1: Symmetric Matrix

Matrix A is symmetric if A = A T
• A is a square matrix
• E.g.  
  0 −1 5
0 4 
, −1 5 8
4 3
5 8 7

11 / 36
Symmetric Matrices

Theorem 1: Spectral Theorem

The eigen vectors of a symmetric matrix are orthogonal


 
6 −2 −1
For e.g., A = −2 6 −1
−1 −1 5
• Characteristic equation −(λ − 8)(λ − 6)(λ − 3) = 0
• So eigen values are λ1 = 8, λ2 = 6, λ3 = 3
     
−1 −1 1
• Eigenvectors are  1  , −1 , 1
0 2 1

12 / 36
Symmetric Matrices

     
−1 −1 1
• Eigenvectors are v1 =  1  , v2 = −1 , v3 = 1
0 2 1
• Note that viT v j = 0, i.e. orthogonal
√ √ √ 
−1/√ 2 −1/√6 1/√3

• V =  1/ 2 −1/ 6 1/ 3
√ √
0 2/ 6 1/ 3

13 / 36
Proof Spectral Theorem

Given A is symmetric A = A T
Consider v1 and v2 corresponding to λ1 and λ2 s.t. λ1 ̸= λ2

λ1 v1T v2 = (λ1 v1 )T v2
= ( Av1 )T v2
= v1T AT v2
= v1T Av2 (Symmtery)
= v1T (λ2 v2 )
= λ2 (v1T v2 )

Therefore, λ1 v1T v2 = λ2 v1T v2 =⇒ v1T v2 = 0 as λ1 ̸= λ2

14 / 36
Symmetric Matrices

Theorem 2
Any (real) n × n matrix is orthogonally diagonalizable if and only if it is a sym-
metric matrix.

• Proof
√ √ √ 
−1/√ 2 −1/√6 1/√3
  
8 0 0
• V =  1/ 2 −1/ 6 1/ 3 , D = 0 6 0
√ √
0 2/ 6 1/ 3 0 0 3
• V is orthogonal matrix, columns are orthonormal

A = VDV −1 = VDV T

14 / 36
Symmetric Matrices in Equations

• Set of linear equations Ax = b


• Quadratic equations?

Q( x ) = 5x12 + 3x22 + 2x32 − x1 x2 + 8x2 x3

   
x1 5 −1/2 0
• x =  x2 , Q( x ) = x T Ax, A = −1/2 3 4
x3 0 4 2
• There always exists a symmetric A
▶ Coefficients of x12 , x22 , . . . in the diagonal
▶ Coefficients of xi x j split in positions ij and ji

15 / 36
Positive Semi-definite

Definition 2: Positive semi-definite


A matrix A is positive semidefinite if,

x T Ax ≥ 0 ∀x

i.e., the corresponding Q( x ) ≥ 0 for all x

16 / 36
Positive Semi-definite

   
2 1 4 −1
Positive semidefinite and
1 3 −1 2

16 / 36
Positive Semi-definite

16 / 36
Positive Semi-definite (PSD)
Since x T Ax ≥ 0,
Consider an n-dimensional eigenvector v for A which is n × n,

Av = λv
v T Av = λv T v
λv T v ≥ 0 (definition of PSD)
λ≥0 (As vT v > 0)

Theorem 3
A symmetric matrix A is positive definite (semidefinite) if and only if all the eigen
values are always positive (non-negative)

17 / 36
Positive Semi-definite

Think about it ...


A positive definite matrix never flips a vector about its origin
https://gregorygundersen.com/blog/2022/02/27/positive-definite/

17 / 36
Singular Value Decomposition

The Singular Value Decomposition is the “Swiss Army Knife” and the “Rolls Royce” of
matrix decompositions. – Diane OLeary

• Every matrix A has a Singular Value Decomposition

18 / 36
Singular Value Decomposition (SVD)

Given any n × d matrix A


• What can you say about AAT ?
▶ Square (n × n)
▶ Symmetric ( AA T )T = ( A T )T A T = AA T
▶ AA T is positive semi-definite so non-negative eigen values
▶ rank( AA T ) =? rank( A T )
▶ Since symmetric one can chose eigen vectors to be orthonormal
 

U =  u1 u2 ... un 

19 / 36
SVD

Given any n × d matrix A


• What can you say about AT A?
▶ Square (d × d)
▶ Symmetric ( A T A)T = A T ( A T )T = A T A
▶ A T A is positive semi-definite so non-negative eigen values
▶ rank( A T A) =? rank( A)
▶ Since symmetric one can chose eigen vectors to be orthonormal
 

V =  v1 v2 ... vd 

20 / 36
SVD

• For AAT the orthonormal eigenvectors U s.t.

UTU = I

• For AT A the orthonormal eigenvectors V s.t.

VTV = I

• Both AAT and AT A have different number of eigenvalues but non-zero


eigenvalues are identical

21 / 36
SVD

Theorem 4

Given an m × n matrix A, both AA T and A T A have identical non-zero eigen


values

A( A T A) x = λAx i.e., AA T ( Ax ) = λ( Ax )

22 / 36
SVD
• For AAT the orthonormal eigenvectors U s.t.
UTU = I

• For AT A the orthonormal eigenvectors V s.t.


VTV = I

• Let λ1 , . . . , λr be the r positive eigen values for AAT and AT A


√ 
λ1 √

 λ2 

 .. 
.

 
S=  
 λ r


 .. 
 . 
0
23 / 36
SVD

Singular Value Decomposition


A = USV T
√   
λ1 σ1
 ..   .. 
. .

   
   
where S = 
 λr =
  σr 

 ..   .. 
 .   . 
0 0
σ′ s are singular values

24 / 36
Proof of SVD

A = USV T
UTU = I
VTV = I
Solve the above three to find U, S and V

25 / 36
Proof of SVD
Solve for V
A = USV T
A T = (USV T )T = VS T U T = VSU T
A T A = VSU T (USV T )
= VSSV T
= VS2 V T
A T AV = VS2
 
 λ1
A T A v1 v2 . . . v d = v1 v2
  
. . . vd  λ2 
λd
or
A T Avi = λi vi = σi2 vi

25 / 36
Proof of SVD

Solve for U
A = USV T
A T = (USV T )T = VS T U T = VSU T
AA T = USV T (USV T )T
= USV T VST U T
= US2 U T
AA T U = US2

or
AA T ui = λi ui = σi2 ui

25 / 36
Singular Value Decomposition

 
σ1
 ..   T
 v1
 .  . 
 
A = u1 u2 . . . un  σr   .. 
 
 ..  T
 vd
 .
0
v1T
 

= σ1 ui . . . σr ur . . . 0  ... 
  

vdT
r
= ∑ σi ui viT
i =1

26 / 36
Optimal Low Rank using SVD

Optimal k < r rank representation is given by


k
A= ∑ σi ui viT
i =1

• ui are the eigen vectors of AAT


• vi are the eigen vectors of AT A

• σi are the singular values or λi

27 / 36
Example for SVD

Let the matrix A be:  


1 2
A = 3 4 .
5 6

•  
T 35 44
A A= .
44 56

•  
5 11 17
AA T = 11 25 39 .
17 39 61

28 / 36
Example for SVD
• For AT A, the characteristic polynomial is:
 
35 − λ 44
det = 0.
44 56 − λ
The characteristic polynomial simplifies to:
λ2 − 91λ + 4 = 0.

• Eigenvalues of AT A:
λ1 = 91.0, λ2 = 0.2656.


• Singular values (σi = λi ):
σ1 = 9.5255, σ2 = 0.5143.

29 / 36
Example for SVD

• Eigenvectors of AT A (V):
 
−0.61962948 −0.78489445
V= .
−0.78489445 0.61962948

• Eigenvectors of AAT (U):


 
−0.2298477 0.88346102 0.40824829
U = −0.52474482 0.24078249 −0.81649658
−0.81964194 −0.40189603 0.40824829

30 / 36
Example for SVD
The diagonal matrix Σ is:  
9.5255 0
Σ= 0 0.5143 .
0 0

The singular value decomposition of A is:

A = UΣV T ,

where:
   
−0.2298 0.8835 0.4082 9.5255 0  
−0.6196 −0.7849
U = −0.5247 0.2408 −0.8165 , Σ =
   0  T
0.5143 , V =
−0.7849 0.6196
−0.8196 −0.4019 0.4082 0 0

31 / 36
Example for SVD

Rank 1 approximation

A = σ1 u1 v1T
 
−0.2298  
= 9.5255 −0.5247 0.6196 −0.7849
−0.8196
 
1.35662819 1.71846235
= 3.09719707 3.92326845]
4.83776596 6.12807454
 
1 2
where 3 4
5 6

32 / 36
Optimal Low Rank (Intuition)

• Recollect the maximizing the “projection notion”


• Find the unit vector x that maximizes ∥ Ax ∥ or ∥ Ax ∥2

Theorem 5
Let A be an n × d matrix. Then the maximum value of ∥ Ax ∥, where x ranges
over unit vectors, is the largest singular value σ1

33 / 36
Optimal Low Rank (Intuition)

• σ2 maximizes ∥ Ax ∥ when x must be a unit vector orthogonal to v1


• σ3 maximizes ∥ Ax ∥ when x must be a unit vector orthogonal to v1 and v2
• so on ...
• Greedily selecting based on singular values help in finding the optimal low rank

33 / 36
Optimal Low Rank

Theorem 6: EckartYoungMirsky theorem

For any n × d matrix A, the rank-k representation Ak given by SVD satisfies the
following
• ∥ A − Ak ∥ F ≤ ∥ A − Bk ∥ for any Bk , s.t. rank( Bk ) ≤ k
q
• ∥ A∥ F = ∑ij A2ij

34 / 36
Optimal Low Rank (Intuition)

Given σ1 ≥ σ2 ≥ . . . ≥ σr ,
r k
∥ A − Ak ∥ F = ∥ ∑ σi ui viT − ∑ σi ui vi ∥
i =1 i =1
r
=∥ ∑ σi ui viT ∥
i = k +1
r
= ∑ σi2
i = k +1

35 / 36
Applications of SVD

• Image Compression
• Image De-noising
• Latent Semantic Analysis
• Recommendation Systems
• Representation Learning

36 / 36

You might also like