Dimensionality Reduction
Dimensionality Reduction
2
2/69
/69
Dimensionality reduction
2
2/69
/69
Dimensionality reduction
2
2/69
/69
Dimensionality reduction
2
2/69
/69
Dimensionality reduction
2
2/69
/69
Dimensionality reduction
3
3/69
/69
Dimensionality reduction
3
3/69
/69
Dimensionality reduction
3
3/69
/69
Dimensionality reduction
3
3/69
/69
Dimensionality reduction
3
3/69
/69
Differentiable manifolds
4
4/69
/69
Dimensionality reduction
Advantages:
• Visualization: humans can only imagine things in 2D or 3D.
5
5/69
/69
Dimensionality reduction
Advantages:
• Visualization: humans can only imagine things in 2D or 3D.
• Computational efficiency: learning algorithms work faster in low
dimensions.
5
5/69
/69
Dimensionality reduction
Advantages:
• Visualization: humans can only imagine things in 2D or 3D.
• Computational efficiency: learning algorithms work faster in low
dimensions.
• Better performance: the projection might eliminate noise.
5
5/69
/69
Dimensionality reduction
Advantages:
• Visualization: humans can only imagine things in 2D or 3D.
• Computational efficiency: learning algorithms work faster in low
dimensions.
• Better performance: the projection might eliminate noise.
• Interpretability: the vectors spanning the subspace might have
interesting interpretations.
5
5/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Dimensionality reduction
6
6/69
/69
Fact 1
7
7/69
/69
Fact 1
Note: If the eigenvalues are not distinct, then the eigenvectors are not
unique. However, there is always some choice of eigenvectors which forms
an orthonormal basis.
7
7/69
/69
Fact 2 (Rayleigh quotient)
8
8/69
/69
Fact 2 (Rayleigh quotient)
Similarly,
w⊤A w
argmax = vd .
w ∈ Rd \{0} k w k2
8
8/69
/69
Principal Component Analysis
The principal directions in data
10
10/69
/69
Finding the principal subspace
How can we find the most relevant subspace for the data?
11
11/69
/69
Finding the principal subspace
How can we find the most relevant subspace for the data? By finding a basis
for it. The individual basis vectors are called the principal components.
11
/69
11/69
The first principal component
12
/69
12/69
The first principal component
12
/69
12/69
The first principal component
1X
n
p1 = arg max ( xi · v) 2 . (1)
∥v∥=1 n
i=1
12
/69
12/69
The first principal component
1X
n
p1 = arg max ( xi · v) 2 . (1)
∥v∥=1 n
i=1
12
/69
12/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
13
13/69
/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
Proof.
1X
n
(xi · v)2 =
n
i=1
13
13/69
/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
Proof.
1X 1X ⊤
n n
(xi · v)2 = (v xi )(x⊤
i v) =
n n
i=1 i=1
13
13/69
/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
Proof.
1X 1X ⊤ 1X ⊤
n n n
(xi · v)2 = (v xi )(x⊤
i v ) = v ( xi x⊤
i )v =
n n n
i=1 i=1 i=1
13
13/69
/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
Proof.
1X 1X ⊤ 1X ⊤
n n n
(xi · v)2 = (v xi )(x⊤ i v ) = v ( xi x⊤
i )v =
n n n
i=1 i=1 i=1
1 ⊤ P n
⊤ v=
v i=1 xi xi
n
13
13/69
/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
Proof.
1X 1X ⊤ 1X ⊤
n n n
(xi · v)2 = (v xi )(x⊤ i v ) = v ( xi x⊤
i )v =
n n n
i=1 i=1 i=1
1 ⊤ P n
⊤ v = v⊤ Σ b v.
v i=1 xi xi
n
13
13/69
/69
The first principal component
Theorem. The first principal component, p1 , is the eigenvector vd of the
sample covariance matrix
X n
b= 1
Σ xi x⊤
i
n
i=1
with largest eigenvalue.
Proof.
1X 1X ⊤ 1X ⊤
n n n
(xi · v)2 = (v xi )(x⊤ i v ) = v ( xi x⊤
i )v =
n n n
i=1 i=1 i=1
1 ⊤ P n
⊤ v = v⊤ Σ b v.
v i=1 xi xi
n
Since kvk = 1 , (1) is equivalent to the Rayleigh quotient optimization
problem
bv
v⊤Σ
p1 = arg max ,
v∈Rd \{0} kvk
so p1 is indeed the eigenvector vd of A with largest eigenvalue. 13
/69
13/69
Further principal components
b can be written as
Recall that Σ
X
d
b=
Σ λ i vi v⊤
i .
i=1
14
14/69
/69
Further principal components
b can be written as
Recall that Σ
X
d
b=
Σ λ i vi v⊤
i .
i=1
After we’ve found the first principal component p1 = vd , project the data to
span {v1 , . . . , vd−1 } .
14
14/69
/69
Further principal components
b can be written as
Recall that Σ
X
d
b=
Σ λ i vi v⊤
i .
i=1
After we’ve found the first principal component p1 = vd , project the data to
span {v1 , . . . , vd−1 } . This just removes λd vd v⊤
d from the sum. So the
second principal component is p2 = vd−1 , and so on.
14
14/69
/69
DNA data
[Christopher de Coro]
16
/69
16/69
Reconstruction from eigenfaces
[Christopher de Coro]
17
/69
17/69
Example: digits
18
18/69
/69
Example: digits
Advantages:
• Finds best projection
19
19/69
/69
Summary of PCA
Advantages:
• Finds best projection
• Rotationally invariant
19
19/69
/69
Summary of PCA
Advantages:
• Finds best projection
• Rotationally invariant
Disadvantages:
• Full PCA is expensive to compute
19
19/69
/69
Summary of PCA
Advantages:
• Finds best projection
• Rotationally invariant
Disadvantages:
• Full PCA is expensive to compute
• Components not sparse
19
19/69
/69
Summary of PCA
Advantages:
• Finds best projection
• Rotationally invariant
Disadvantages:
• Full PCA is expensive to compute
• Components not sparse
• Sensitive to outliers
19
19/69
/69
Summary of PCA
Advantages:
• Finds best projection
• Rotationally invariant
Disadvantages:
• Full PCA is expensive to compute
• Components not sparse
• Sensitive to outliers
• Linear
19
19/69
/69
NONLINEAR DIMENSIONALITY REDUCTION
• If the data lies close to a linear subspace of Rd , PCA can find it.
21
21/69
/69
• If the data lies close to a linear subspace of Rd , PCA can find it.
• But what if the data lies on a nonlinear manifold?
21
21/69
/69
• If the data lies close to a linear subspace of Rd , PCA can find it.
• But what if the data lies on a nonlinear manifold? Data which at first
looks very high dimensional often really has low dimensional structure.
21
21/69
/69
22
22/69
/69
General principle
23
/69
23/69
General principle
• Multidimensional Scaling
24
24/69
/69
Methods
• Multidimensional Scaling
• Isomap
24
24/69
/69
Methods
• Multidimensional Scaling
• Isomap
• Locally Linear Embedding
24
24/69
/69
Methods
• Multidimensional Scaling
• Isomap
• Locally Linear Embedding
• Laplacian Eigenmaps
24
24/69
/69
Methods
• Multidimensional Scaling
• Isomap
• Locally Linear Embedding
• Laplacian Eigenmaps
• SNE, etc..
24
24/69
/69
Multidimensional scaling (MDS)
Classical MDS
26
26/69
/69
Classical MDS
2 ∗ = ky − y k2 .
where Di,j = kxi − xj k and Di,j i j
26
26/69
/69
The Gram matrix
Gi,j = xi · xj .
27
27/69
/69
The Gram matrix
Gi,j = xi · xj .
27
27/69
/69
The Gram matrix
Gi,j = xi · xj .
27
27/69
/69
Classical MDS
28
28/69
/69
Classical MDS
Approach:
1. Compute the centered Gram matrix G .
28
28/69
/69
Classical MDS
Approach:
1. Compute the centered Gram matrix G .
2. Solve G∗ = argminG̃⪰0, rank(G̃)≤p k G̃ − G k2Frob .
28
28/69
/69
Classical MDS
Approach:
1. Compute the centered Gram matrix G .
2. Solve G∗ = argminG̃⪰0, rank(G̃)≤p k G̃ − G k2Frob .
3. Find y1 , y2 , . . . , yn ∈ Rp with Gram matrix G∗ .
28
28/69
/69
Classical MDS
29
29/69
/69
Classical MDS
29
29/69
/69
Classical MDS
29
29/69
/69
Gram → Data
Proposition 3. Let G ∈ Rn×n be a p.s.d. matrix of rank d with eigen-
decomposition
G = QΛQ⊤ .
30
30/69
/69
Gram → Data
Proposition 3. Let G ∈ Rn×n be a p.s.d. matrix of rank d with eigen-
decomposition
G = QΛQ⊤ .
Let xi = [QΛ1/2 ]⊤
i,∗ .
30
30/69
/69
Gram → Data
Proposition 3. Let G ∈ Rn×n be a p.s.d. matrix of rank d with eigen-
decomposition
G = QΛQ⊤ .
Let xi = [QΛ1/2 ]⊤
i,∗ . Then the Gram matrix of {x1 , . . . , xn } is G .
30
30/69
/69
Gram → Data
Proposition 3. Let G ∈ Rn×n be a p.s.d. matrix of rank d with eigen-
decomposition
G = QΛQ⊤ .
Let xi = [QΛ1/2 ]⊤
i,∗ . Then the Gram matrix of {x1 , . . . , xn } is G .
Notation:
• Mi,∗ denotes the i ’th row of M .
• Given D = diag(d1 , . . . , dm ) , D p := diag(dp1 , . . . , dpm ) .
30
30/69
/69
Gram → Data
Proposition 3. Let G ∈ Rn×n be a p.s.d. matrix of rank d with eigen-
decomposition
G = QΛQ⊤ .
Let xi = [QΛ1/2 ]⊤
i,∗ . Then the Gram matrix of {x1 , . . . , xn } is G .
Notation:
• Mi,∗ denotes the i ’th row of M .
• Given D = diag(d1 , . . . , dm ) , D p := diag(dp1 , . . . , dpm ) .
30
30/69
/69
Summary of Classical MDS
31
31/69
/69
Summary of Classical MDS
31
31/69
/69
Summary of Classical MDS
31
31/69
/69
Summary of Classical MDS
31
31/69
/69
Isomap
Tenenbaum, de Silva & Langford, 2000
Isomap
33
33/69
/69
Isomap
33
33/69
/69
Isomap
33
33/69
/69
Isomap
Underlying assumptions:
1. Data lies on a manifold.
33
33/69
/69
Isomap
Underlying assumptions:
1. Data lies on a manifold.
2. Goedesic distance on manifold is approximated by distance in the graph.
33
33/69
/69
Isomap
Underlying assumptions:
1. Data lies on a manifold.
2. Goedesic distance on manifold is approximated by distance in the graph.
3. The optimal embedding preserves these distances as much as possible.
33
33/69
/69
Shortest path distances
34
34/69
/69
Shortest path distances
34
34/69
/69
Shortest path distances
X
ℓ−1
d(i, j) = min δvk ,vk+1 ,
(v1 ,v2 ,...,vℓ )∈P(i,j)
k=1
where P is the set of paths that start at i and end at j (i.e., v1 = i and
vℓ = j ).
34
34/69
/69
Shortest path distances
X
ℓ−1
d(i, j) = min δvk ,vk+1 ,
(v1 ,v2 ,...,vℓ )∈P(i,j)
k=1
where P is the set of paths that start at i and end at j (i.e., v1 = i and
vℓ = j ).
34
34/69
/69
Shortest path distances
Proposition. The matrix D of all pairwise distances (Di,j = d(i, j)) can
be computed in O(n3 ) time.
35
35/69
/69
Shortest path distances
Proposition. The matrix D of all pairwise distances (Di,j = d(i, j)) can
be computed in O(n3 ) time.
Proposition. Let D (k) be the matrix of shortest path distances along the
restricted set of paths where each intermediate vertex comes from
{1, 2, . . . , k} . Then D(k) can be computed from D(k−1) in O(n2 ) time.
35
35/69
/69
Floyd–Warshall algorithm
36
36/69
/69
Floyd–Warshall algorithm
36
36/69
/69
Isomap example
37
37/69
/69
Isomap example
38
38/69
/69
Isomap example
39
39/69
/69
Properties of Isomap
40
40/69
/69
Properties of Isomap
40
40/69
/69
Properties of Isomap
40
40/69
/69
Properties of Isomap
40
40/69
/69
Properties of Isomap
40
40/69
/69
Locally Linear Embedding (LLE)
Roweis & Saul, 2000
LLE
Again trying to find an embedding RD → Rd , mapping xi 7→ yi .
42
42/69
/69
LLE
Again trying to find an embedding RD → Rd , mapping xi 7→ yi . Again
start with a k -nn graph based on distances in RD .
42
42/69
/69
LLE
Again trying to find an embedding RD → Rd , mapping xi 7→ yi . Again
start with a k -nn graph based on distances in RD .
42
42/69
/69
LLE
Again trying to find an embedding RD → Rd , mapping xi 7→ yi . Again
start with a k -nn graph based on distances in RD .
43
43/69
/69
Phase 1: find the weights
43
43/69
/69
Phase 1: find the weights
where K (i) is the local Gram matrix, Kj,j ′ = (xi − xj )⊤ (xi − xj ) , and
(i)
w = (wj )j∈knn(i) .
43
43/69
/69
Phase 1: find the weights
The local optimization problem is
44
44/69
/69
Phase 1: find the weights
The local optimization problem is
44
44/69
/69
Phase 1: find the weights
The local optimization problem is
and solve
∂
L(w) =
∂wj
44
44/69
/69
Phase 1: find the weights
The local optimization problem is
and solve
∂
L(w) = 2K (i) w − λ1 j = 0 j ∈ knn(i)
∂wj
44
44/69
/69
Phase 1: find the weights
The local optimization problem is
and solve
∂
L(w) = 2K (i) w − λ1 j = 0 j ∈ knn(i)
∂wj
(K (i) )−1 1
w = λ(K (i) )−1 1 enforcing constraints: w= .
k (K (i) )−1 1 k1
44
44/69
/69
Phase 2: find the y i ’s
Solution. X
Ψ= y⊤
i M yj . . .
i,j
45
/69
45/69
46
46/69
/69
Laplacian Eigenmaps
Belkin and Niyogi, 2002
Spectral Graph Theory
48
48/69
/69
Unweighted graphs
Let G be an unweighted, undirected graph with vertex set
V = {1, 2, . . . , n} and edge set E ⊆ V × V .
• The adjacency matrix of G is the matrix A ∈ {0, 1}n×n with
(
1 if i ∼ j
A=
0 otherwise,
49
49/69
/69
Unweighted graphs
Let G be an unweighted, undirected graph with vertex set
V = {1, 2, . . . , n} and edge set E ⊆ V × V .
• The adjacency matrix of G is the matrix A ∈ {0, 1}n×n with
(
1 if i ∼ j
A=
0 otherwise,
49
49/69
/69
Unweighted graphs
Let G be an unweighted, undirected graph with vertex set
V = {1, 2, . . . , n} and edge set E ⊆ V × V .
• The adjacency matrix of G is the matrix A ∈ {0, 1}n×n with
(
1 if i ∼ j
A=
0 otherwise,
L = D − A.
49
49/69
/69
Laplacian as a quadratic form
The Laplacian can be written as
1 if p = q = i or p = q = j
X
L= Ei,j where [Ei,j ]p,q = −1 if (p, q) = (i, j) or (p, q) = (j, i)
i∼j 0 otherwise.
50
50/69
/69
Laplacian as a quadratic form
The Laplacian can be written as
1 if p = q = i or p = q = j
X
L= Ei,j where [Ei,j ]p,q = −1 if (p, q) = (i, j) or (p, q) = (j, i)
i∼j 0 otherwise.
50
50/69
/69
Laplacian as a quadratic form
The Laplacian can be written as
1 if p = q = i or p = q = j
X
L= Ei,j where [Ei,j ]p,q = −1 if (p, q) = (i, j) or (p, q) = (j, i)
i∼j 0 otherwise.
1 X
f ⊤Lf = (f (i) − f (j))2 .
2
(i,j)∈E
50
50/69
/69
Laplacian as a quadratic form
The Laplacian can be written as
1 if p = q = i or p = q = j
X
L= Ei,j where [Ei,j ]p,q = −1 if (p, q) = (i, j) or (p, q) = (j, i)
i∼j 0 otherwise.
1 X
f ⊤Lf = (f (i) − f (j))2 .
2
(i,j)∈E
51
51/69
/69
Weighted graphs
Let G be a weighted, undirected graph with edge weights (wi,j )i,j . Note
that wi,j=wj,i , and if i 6∼ j , then wi,j = 0 .
• The adjacency matrix of G is the matrix A ∈ (R+ )n×n with
(
wi,j if i 6= j
A=
0 if i = j.
51
51/69
/69
Weighted graphs
Let G be a weighted, undirected graph with edge weights (wi,j )i,j . Note
that wi,j=wj,i , and if i 6∼ j , then wi,j = 0 .
• The adjacency matrix of G is the matrix A ∈ (R+ )n×n with
(
wi,j if i 6= j
A=
0 if i = j.
L = D − A.
51
51/69
/69
The normalized Laplacian
When the degree distribution is uneven, it is often much better to work with
the normalized Laplacian
52
52/69
/69
Example: cycle graph
53
53/69
/69
Example: cycle graph
53
53/69
/69
Example: path graph
54
54/69
/69
Connectivity
Theorem
The multiplicity of 0 in the spectrum of L (i.e., the number of zero
eigenvalues) is the number of connected components of G .
55
55/69
/69
Connectivity
Theorem
The multiplicity of 0 in the spectrum of L (i.e., the number of zero
eigenvalues) is the number of connected components of G .
55
55/69
/69
Fiedler vector
56
56/69
/69
Fiedler vector
56
56/69
/69
Fiedler vector
56
56/69
/69
Cheeger’s inequality
P P
Let S ⊂ V , = V \ S , and E(S, S) = i∈S j∈s wi,j .
57
57/69
/69
Cheeger’s inequality
P P
Let S ⊂ V , = V \ S , and E(S, S) = j∈s wi,j . Further for any
P i∈S
W ⊆ V , let d(W ) = i∈W d(i) .
• The conductance of S is defined as
E(S, S)
ϕ(S) = d(V ) .
d(S) d(S)
57
57/69
/69
Cheeger’s inequality
P P
Let S ⊂ V , = V \ S , and E(S, S) = j∈s wi,j . Further for any
P i∈S
W ⊆ V , let d(W ) = i∈W d(i) .
• The conductance of S is defined as
E(S, S)
ϕ(S) = d(V ) .
d(S) d(S)
57
57/69
/69
Cheeger’s inequality
P P
Let S ⊂ V , = V \ S , and E(S, S) = j∈s wi,j . Further for any
P i∈S
W ⊆ V , let d(W ) = i∈W d(i) .
• The conductance of S is defined as
E(S, S)
ϕ(S) = d(V ) .
d(S) d(S)
ϕ2G
≤ λ2 ≤ ϕ G ,
2dmax
where dmax is the maximum degree of any vertex in G .
57
57/69
/69
Example
58
58/69
/69
Example
59
59/69
/69
Example
The first few eigenvectors can be used for clustering → spectral graph
partitioning
60
60/69
/69
The Laplace–Beltrami operator
The graph Laplacian is the discrete analog of the Laplace-Beltrami operator.
• The Laplacian operator on Rd is
∂2 ∂2 ∂2
∆ = ∇2 = + + . . . + .
∂x21 ∂x22 ∂x2d
61
61/69
/69
The Laplace–Beltrami operator
The graph Laplacian is the discrete analog of the Laplace-Beltrami operator.
• The Laplacian operator on Rd is
∂2 ∂2 ∂2
∆ = ∇2 = + + . . . + .
∂x21 ∂x22 ∂x2d
1 X d p
∆= √ ∂i det gg i,j ∂j .
det g i,j=1
61
61/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
62
62/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
62
62/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
62
62/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
f (x − h) − 2f (x) + f (x + h)
= .
h2
62
62/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
f (x − h) − 2f (x) + f (x + h)
= .
h2
If we regard f as a vector, f = (. . . , f (x − h), f (x), f (x + h), . . .)⊤ ,
then the latter is just −Lf /h2 , where L is the Laplacian of the line graph.
62
62/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
f (x − h) − 2f (x) + f (x + h)
= .
h2
If we regard f as a vector, f = (. . . , f (x − h), f (x), f (x + h), . . .)⊤ ,
then the latter is just −Lf /h2 , where L is the Laplacian of the line graph.
Similarly for grids on Rd .
62
62/69
/69
Discretization of Laplacian
• In R , the (finite difference) discretization of ∇ = ∂x
∂
is derived from
∂ f (x + h/2) − f (x − h/2)
(∇f )(x) = f (x) = .
∂x h
f (x − h) − 2f (x) + f (x + h)
= .
h2
If we regard f as a vector, f = (. . . , f (x − h), f (x), f (x + h), . . .)⊤ ,
then the latter is just −Lf /h2 , where L is the Laplacian of the line graph.
Similarly for grids on Rd . hf, ∆f i is a natural measure of roughness of f
→ sheds new light on L as a quadratic form.
62
62/69
/69
The heat equation
The flow of heat in a homogenous medium is governed by the equation
∂
f (x, t) = κ∆f (x, t).
∂t
63
63/69
/69
The heat equation
The flow of heat in a homogenous medium is governed by the equation
∂
f (x, t) = κ∆f (x, t).
∂t
∆ is a negative definite self-adjoint operator.
63
63/69
/69
The heat equation
The flow of heat in a homogenous medium is governed by the equation
∂
f (x, t) = κ∆f (x, t).
∂t
∆ is a negative definite self-adjoint operator. Solution to this is
1 1
f (x, t) = eκt∆ f (x, 0) where eT := I + T + T 2 + T 3 + . . . .
2 6
63
63/69
/69
The heat equation
The flow of heat in a homogenous medium is governed by the equation
∂
f (x, t) = κ∆f (x, t).
∂t
∆ is a negative definite self-adjoint operator. Solution to this is
1 1
f (x, t) = eκt∆ f (x, 0) where eT := I + T + T 2 + T 3 + . . . .
2 6
In particular, if our domain M is compact, then the eigenfunctions of ∆ ,
i.e., ∆gi = λi gi form a basis for M and
X X
f (x, 0) = αi g i f (x, 0) = eλi κt αi gi .
i i
63
63/69
/69
The heat equation
The flow of heat in a homogenous medium is governed by the equation
∂
f (x, t) = κ∆f (x, t).
∂t
∆ is a negative definite self-adjoint operator. Solution to this is
1 1
f (x, t) = eκt∆ f (x, 0) where eT := I + T + T 2 + T 3 + . . . .
2 6
In particular, if our domain M is compact, then the eigenfunctions of ∆ ,
i.e., ∆gi = λi gi form a basis for M and
X X
f (x, 0) = αi g i f (x, 0) = eλi κt αi gi .
i i
The long time behavior of the system is determined by the low | λi | modes!!!
63
/69
63/69
Laplacian Eigenmaps
[Belkin&Niyogi]
• Turn dimensionality reduction into a
graph problem by forming knn-mesh,
possibly weighted by
wi,j = exp(− kxi − xj k2 /(2σ 2 ))
64
64/69
/69
Laplacian Eigenmaps
[Belkin&Niyogi]
• Turn dimensionality reduction into a
graph problem by forming knn-mesh,
possibly weighted by
wi,j = exp(− kxi − xj k2 /(2σ 2 ))
• Embed according to first p non-zero
e-value e-vectors:
v1 (i)
ϕ : V → Rp i 7→ ...
vp+1 (i)
64
64/69
/69
Laplacian Eigenmaps
[Belkin&Niyogi]
• Turn dimensionality reduction into a
graph problem by forming knn-mesh,
possibly weighted by
wi,j = exp(− kxi − xj k2 /(2σ 2 ))
• Embed according to first p non-zero
e-value e-vectors:
v1 (i)
ϕ : V → Rp i 7→ ...
vp+1 (i)
64
64/69
/69
Laplacian Eigenmaps: detail
65
65/69
/69
Laplacian Eigenmaps: detail
65
/69
65/69
66
66/69
/69
Three different metrics
67
/69
67/69
68
68/69
/69
69
69/69
/69