KASHIF JAVED
EED, UET, Lahore
1
Lecture 8
Eigenvectors and the Anisotropic
Multivariate Normal Distribution
KASHIF JAVED
Readings: EED, UET, Lahore
▪ https://people.eecs.berkeley.edu/~jrs/189/
2
Eigenvectors
• Given a square matrix 𝐴, if 𝐴𝑣 = 𝜆𝑣 for some vector 𝑣 ≠ 0, scalar 𝜆, then
𝑣 is an eigenvector of 𝐴 and 𝜆 is the eigenvalue of 𝐴 associated with 𝑣.
• It means that 𝑣 is a magical vector that, after being multiplied by 𝐴, still
points in the same direction, or in exactly the opposite direction
• The eigenvector 𝑣 is said to be normalized if 𝑣 = 1
𝑣 𝑇 𝐴𝑣 = 𝜆𝑣 𝑇 𝑣 = 𝜆
KASHIF JAVED
EED, UET, Lahore
3
Eigenvectors
3 5
4 4
• 𝐴= 5 3
4 4
• Find the eigenvalues and eigenvectors?
KASHIF JAVED
EED, UET, Lahore
4
Eigenvectors
3 5
4 4
• 𝐴= 5 3
4 4
• 𝐴 − 𝜆𝐼 . 𝑣 = 0
• Find the roots of 𝐴 − 𝜆𝐼
KASHIF JAVED
EED, UET, Lahore
5
Eigenvectors
3 5
4 4
• 𝐴= 5 3
4 4
• 𝜆2 − 1.5𝜆 − 1 = 0
• 𝜆1 = 2
• 𝜆2 = −1/2
KASHIF JAVED
EED, UET, Lahore
6
Eigenvectors
3 5
4 4
• 𝐴= 5 3
4 4
1/ 2
• 𝜆1 = 2 and its corresponding eigenvector: 𝑣=
1/ 2
−1/ 2
• 𝜆2 = −1/2 and its corresponding eigenvector: 𝑤 =
1/ 2
KASHIF JAVED
EED, UET, Lahore
7
Theorem
• Theorem: If 𝑣 is eigenvector of 𝐴 with eigenvalue 𝜆, then 𝑣 is eigenvector
of 𝐴𝑘 with eigenvalue 𝜆𝑘 , where 𝑘 is a +ve integer
• Proof: 𝐴2 𝑣 = 𝐴 𝜆𝑣 = 𝜆 𝐴𝑣 = 𝜆2 𝑣, etc
KASHIF JAVED
EED, UET, Lahore
8
Eigenvectors
KASHIF JAVED
EED, UET, Lahore
For most matrices, most vectors don’t have this property. So, the ones
that do are special, and we call them eigenvectors. 9
Eigenvectors
KASHIF JAVED
EED, UET, Lahore
Clearly, when you scale an eigenvector, it’s still an eigenvector. Only the
direction matters, not the length. 10
Theorem
• Theorem: If 𝐴 is invertible, then 𝑣 is eigenvector of 𝐴−1 with eigenvalue 1Τ𝜆
1 1
• Proof: 𝐴−1 𝑣 = 𝐴−1 𝐴𝑣 = 𝜆 𝑣
𝜆
KASHIF JAVED
EED, UET, Lahore
11
Eigenvectors
KASHIF JAVED
EED, UET, Lahore
Look at the figures but go from right to left.
12
Eigenvectors
• When you invert a matrix, the eigenvectors don’t change, but the
eigenvalues get inverted
• When you square a matrix, the eigenvectors don’t change, but the
eigenvalues get squared
KASHIF JAVED
EED, UET, Lahore
13
Spectral Theorem
• Every real, symmetric 𝑛 × 𝑛 matrix has real eigenvalues and 𝑛
eigenvectors that are mutually orthogonal, i.e., 𝑣𝑖𝑇 𝑣𝑗 = 0 for all 𝑖 ≠ 𝑗
• We can use them as a basis for ℝ𝑛 .
KASHIF JAVED
EED, UET, Lahore
14
Building a Matrix with Specified
Eigenvectors
• There are a lot of applications where you’re given a matrix, and you want
to extract the eigenvectors and eigenvalues.
• But when you’re learning the math, it’s more intuitive to go in the opposite
direction
• Suppose you know what eigenvectors and eigenvalues you want, and you
want to create the matrix that has those eigenvectors and eigenvalues
KASHIF JAVED
EED, UET, Lahore
15
Building a Matrix with Specified
Eigenvectors
• Choose 𝑛 mutually orthogonal unit 𝑛-vectors 𝑣1 , . . . , 𝑣𝑛 they specify an
orthonormal coordinate system
• Let 𝑉 = [𝑣1 , . . . , 𝑣𝑛 ] ⇐ 𝑛 × 𝑛 matrix
• Observe: 𝑉 𝑇 𝑉 = I off-diagonal 0’s because the vectors are orthogonal
diagonal 1’s because they’re unit vectors
• ⇒ 𝑉 𝑇 = 𝑉 −1 ⇒ 𝑉 𝑉 𝑇 = I
• 𝑉 is orthonormal matrix: acts like rotation (or reflection)
KASHIF JAVED
EED, UET, Lahore
16
Building a Matrix with Specified
Eigenvectors
• Choose some radii 𝜆𝑖 :
𝜆1 ⋯ 0
• Let Λ = ⋮ ⋱ ⋮
0 ⋯ 𝜆𝑛
• A diagonal matrix of eigenvalues
KASHIF JAVED
EED, UET, Lahore
17
Building a Matrix with Specified
Eigenvectors
• Definition of eigenvector: 𝐴𝑉 = 𝑉Λ
• This is the same definition of eigenvector that was given at the start of the
lecture—𝐴𝑣 = 𝜆𝑣—but this version covers all 𝑛 eigenvectors in one
statement
• How do we find the 𝐴 that satisfies this equation?
⇒ 𝐴𝑉𝑉 ⊤ = 𝑉Λ𝑉 ⊤ which leads us to . . .
KASHIF JAVED
EED, UET, Lahore
18
Theorem
𝐴 = 𝑉Λ𝑉 ⊤ = σ𝑛𝑖=1 𝜆𝑖 𝑣𝑖 𝑣𝑖 𝑇 has chosen eigenvectors/values
outer product: 𝑛 × 𝑛 matrix, rank 1
• This is a matrix factorization called the eigendecomposition
• Λ is the diagonalized version of 𝐴
• Every real, symmetric matrix has one
KASHIF JAVED
EED, UET, Lahore
19
Building a Matrix with Specified
Eigenvectors
• Example: Using the eigenvectors and eigenvalues from the start of the
lecture
1 −1 1 1 3 5
√2 √2
2 0 √2 √2 4 4
𝐴= 1 1 0
−1
−1 1 = 5 3
2
√2 √2 √2 √2 4 4
• This completes our task of finding a symmetric matrix with specified
orthonormal eigenvectors and eigenvalues
• It is more common in practice KASHIF
that you JAVED
need to compute eigenvectors and
EED, UET,
eigenvalues of a symmetric matrix, such Lahore
as a sample covariance matrix
20
Building a Matrix with Specified
Eigenvectors
• Observe: 𝐴2 = 𝑉Λ𝑉 ⊤ 𝑉Λ𝑉 ⊤ = 𝑉Λ2 𝑉 ⊤ 𝐴−2 = 𝑉Λ−2 𝑉 ⊤
• This is another way to see that squaring a matrix squares its eigenvalues
without changing its eigenvectors.
• It also suggests a way to define a matrix square root.
KASHIF JAVED
EED, UET, Lahore
21
Building a Matrix with Specified
Eigenvectors
• Given a symmetric PSD matrix Σ, we can find a symmetric square root
1
matrix 𝐴 = Σ : 2
▪ compute eigenvectors/values of Σ
▪ take square roots of Σ’s eigenvalues (A square root of a diagonal matrix is just
the square roots of the diagonal entries)
▪ reassemble matrix 𝐴 — with the same eigenvectors as Σ but changed
eigenvalues
KASHIF JAVED
EED, UET, Lahore
22
Visualizing Quadratic Forms
• To visualize a symmetric matrix, we can graph something called the
quadratic form, which shows how applying the matrix affects the length of
a vector
• The quadratic form of 𝑀 is 𝑥 ⊤ 𝑀𝑥
KASHIF JAVED
EED, UET, Lahore
23
Visualizing Quadratic Forms
• Let's compare two different functions
2
| 𝑧| = 𝑧𝑇 𝑧 ⇐ quadratic; isotropic; isosurfaces are spheres
| 𝐴−1 𝑥 |2 = 𝑥 𝑇 𝐴−2 𝑥 ⇐ quadratic form of 𝐴−2 (𝐴 is symmetric); anisotropic;
isosurfaces are ellipsoids
• We are going to use eigenvectorsKASHIF
as aJAVED
way to transform the shape of the
EED, UET, Lahore
first function into the shape of the second function
24
𝑧2 𝑥2
𝑧1 𝑥1
KASHIF JAVED
EED, UET, Lahore
25
𝑧2 𝑥2
𝑧1 𝑥1
𝑣1
𝜆1 =2
𝑣2
1
KASHIF JAVED 𝜆2 = −
2
EED, UET, Lahore
26
𝑧2 𝑥2
𝑧1 𝑥1
𝑣1
𝜆1 =2
𝑣2
1
Unit KASHIF JAVED 𝜆2 = −
2
vectors
EED, UET, Lahore
27
𝑧2 𝑥2
𝑧1 𝑥1
𝑣1
𝜆1 =2
𝑣2
1
KASHIF JAVED 𝜆2 = −
2
EED, UET, Lahore
28
𝑧2 𝑥2
𝑧1 𝑥1
𝑣1
𝜆1 =2
𝑣2
1
KASHIF JAVED 𝜆2 = −
2
EED, UET, Lahore
29
𝑧2 𝑥2
𝑧1 𝑥1
𝑣1
𝜆1 =2
𝑣2
1
Matrix is KASHIF JAVED 𝜆2 = −
2
mapping
circles on EED, UET, Lahore
ellipses
30
Visualizing Quadratic Forms
• The isocontours of the quadratic form 𝑥 ⊤ 𝐴−2 𝑥 are ellipsoids determined by
the eigenvectors/values of 𝐴
• {𝑥 ∶ | 𝐴−1 𝑥 |2 = 1} is an ellipsoid with axes 𝑣1 , 𝑣2 , … 𝑣𝑛 and radii
𝜆1 , 𝜆2 , … 𝜆𝑛 because if 𝐴−1 𝑥 = 𝑣𝑖 has length 1 (𝑣𝑖 lies on circle), 𝑥 = 𝐴𝑣𝑖 has
length 𝜆𝑖 (𝐴𝑣𝑖 lies on the ellipsoid)
• Special case: 𝐴 is diagonal ⇔ eigenvectors are coordinate axes
⇔ ellipsoids are axis-aligned KASHIF JAVED
EED, UET, Lahore
31
Visualizing Quadratic Forms
• A symmetric matrix 𝑀 is
• positive definite if 𝑤 𝑇 𝑀𝑤 > 0 for all 𝑤 ≠ 0. ⇔ all eigenvalues positive
• positive semidefnite if 𝑤 𝑇 𝑀𝑤 ≥ 0 for all 𝑤. ⇔ all eigenvalues nonnegative
• indefinite if at least one +ve eigenvalue & at least one -ve eigenvalue
• invertible if no zero eigenvalueKASHIF JAVED
EED, UET, Lahore
32
Visualizing Quadratic Forms
flat trough, whole line of minina
𝜆=0
KASHIF JAVED
EED, UET, Lahore
Positive eigenvalues correspond to axes where the curvature goes up;
negative eigenvalues correspond to axes where the curvature goes down. 33
Visualizing Quadratic Forms
• What does this tell us about 𝑥 ⊤ 𝐴−2 𝑥
• Every squared matrix is positive semidefinite, including 𝐴−2 .
• Eigenvalues of 𝐴−2 are squared, cannot be negative
• If 𝐴−2 exists, it is positive definite. An invertible matrix has no zero
eigenvalues
KASHIF JAVED
EED, UET, Lahore
34
Anisotropic Gaussians
• A multivariate normal distribution (Gaussian)
1 1
𝑋 ~ 𝒩 𝜇, Σ : 𝑓 𝑥 = 𝑒𝑥𝑝 − (𝑥 − 𝜇)𝑇 Σ −1 (𝑥 − 𝜇)
(2𝜋)𝑑 |Σ| 2
• 𝑋 and µ are 𝑑-vectors. 𝑋 is a random variable with mean µ
• |Σ| is determinant of Σ, which is a 𝑑 × 𝑑 SPD covariance matrix
• Σ −1 is a 𝑑 × 𝑑 SPD precision matrix
KASHIF JAVED
EED, UET, Lahore
35
Anisotropic Gaussians
• A multivariate normal distribution (Gaussian)
1 1
𝑋 ~ 𝒩 𝜇, Σ : 𝑓 𝑥 = 𝑒𝑥𝑝 − (𝑥 − 𝜇)𝑇 Σ −1 (𝑥 − 𝜇)
(2𝜋)𝑑 |Σ| 2
1
• Write 𝑓(𝑥) = 𝑛(𝑞(𝑥)) = 𝑒𝑥𝑝 −1/2(𝑞(𝑥)) and 𝑛 is a scalar
(2𝜋)𝑑 |Σ|
• where 𝑞(𝑥) = (𝑥 − 𝜇)𝑇 Σ −1 (𝑥 − 𝜇)
KASHIF JAVED
EED, UET, Lahore
36
Anisotropic Gaussians
• Write 𝑓(𝑥) = 𝑛(𝑞(𝑥)), where 𝑞(𝑥) = (𝑥 − 𝜇)𝑇 Σ −1 (𝑥 − 𝜇)
↑ ↑
ℝ → ℝ, exponential ℝ𝑑 → ℝ, quadratic
• Now 𝑞(𝑥) is a function we understand—it’s just a quadratic bowl centered
at 𝜇, the quadratic form of the precision matrix Σ −1 .
• The other function 𝑛(·) is a simple, monotonic, function, an exponential of
the negation of half its argument
• This mapping 𝑛(·) does not change theJAVED
KASHIF isosurfaces.
EED, UET, Lahore
37
Anisotropic Gaussians
• Principle: given monotonic 𝑛 : ℝ → ℝ, isosurfaces of 𝑛(𝑞(𝑥)) are same as
𝑞(𝑥) (different isovalues)
KASHIF JAVED
EED, UET, Lahore
38
KASHIF JAVED
A paraboloid
EED, (left) becomes
UET, a
Lahore
bivariate Gaussian (right) after you
compose it with a scalar function
39
(center).
Anisotropic Gaussians
• One of the main ideas is that if you understand the isosurfaces of a
quadratic function, then you understand the isosurfaces of a Gaussian,
because they’re the same.
• The differences are in the isovalues—in particular, the Gaussian achieves
its maximum at the mean, and decreases to zero as you move infinitely far
away from the mean
KASHIF JAVED
EED, UET, Lahore
40
Anisotropic Gaussians
• The isocontours of (𝑥 − 𝜇)𝑇 Σ −1 (𝑥 − 𝜇) are determined by
eigenvectors/values of Σ1/2 .
• Next lecture, we’ll consider the implications of this a bit more.
KASHIF JAVED
EED, UET, Lahore
41
Covariance
• Let 𝑅, 𝑆 be random variables—column vectors or scalars
• 𝐶𝑜𝑣 𝑅, 𝑆 = 𝐸 𝑅 − 𝐸 𝑅 𝑆 − 𝐸𝑆 𝑇
= 𝐸 𝑅𝑆 𝑇 − 𝜇𝑅 𝜇𝑆𝑇
• 𝑉𝑎𝑟(𝑅) = 𝐶𝑜𝑣(𝑅, 𝑅)
• If 𝑅 is a vector, covariance matrix for 𝑅 is
𝑣𝑎𝑟(𝑅1 ) ⋯ 𝐶𝑜𝑣(𝑅1 , 𝑅𝑑 )
𝑣𝑎𝑟(𝑅) = ⋮ ⋱ ⋮
𝐶𝑜𝑣(𝑅𝑑 , 𝑅1 ) ⋯ 𝑣𝑎𝑟(𝑅𝑑 )
KASHIF JAVED
EED, UET, Lahore
42
Iris Flower: Pairwise Scatter plot
KASHIF JAVED
EED, UET, Lahore
43
Iris Flower: Pairwise Scatter plot
KASHIF JAVED
EED, UET, Lahore
44
Covariance
• For a Gaussian 𝑅 ~ 𝒩 𝜇, Σ , one can show that 𝑉𝑎𝑟(𝑅) = Σ
• 𝑅𝑖 , 𝑅𝑗 independent ⇒ 𝐶𝑜𝑣(𝑅𝑖 , 𝑅𝑗 ) = 0
• the reverse implication is not generally true, but . . .
• 𝐶𝑜𝑣(𝑅𝑖 , 𝑅𝑗 ) = 0 AND multivariate normal dist. ⇒ 𝑅𝑖 , 𝑅𝑗 independent
KASHIF JAVED
EED, UET, Lahore
45
Covariance
• all features pairwise independent ⇒ 𝑉𝑎𝑟(𝑅) is diagonal
• the reverse is not generally true, but . . .
• 𝑉𝑎𝑟(𝑅) is diagonal AND joint normal
⇔ axis-aligned Gaussian; squared radii on diagonal of Σ = 𝑉𝑎𝑟(𝑅)
⇔ 𝑓(𝑥) = 𝑓(𝑥1 ) 𝑓(𝑥2 ) · · · 𝑓(𝑥2 )
multivariate
KASHIF
JAVED
Univariate Gaussians
EED, UET, Lahore
46
Covariance
• When the features are independent, you can write the multivariate
Gaussian PDF as a product of univariate Gaussian PDFs
• When they aren’t, you can do a change of coordinates to the eigenvector
coordinate system, and write it as a product of univariate Gaussian PDFs
in eigenvector coordinates
KASHIF JAVED
EED, UET, Lahore
47
Bivariate Gaussian Distribution
KASHIF JAVED
𝑥2EED, UET, Lahore
𝑥1
48
Bivariate Gaussian Distribution
KASHIF JAVED
When the variables areEED, UET, Lahore
independent, the major axes of the
density are parallel to the input axes - the density becomes
an ellipse if the variances are different 49
Bivariate Gaussian Distribution
KASHIF JAVED
EED, UET,
The density rotates depending Lahore
on the sign of the covariance.
50