0% found this document useful (0 votes)

10 views183 pages

Dimensionality Reduction

The document discusses dimensionality reduction in machine learning, emphasizing that high-dimensional data often lies on lower-dimensional manifolds. It outlines the advantages of dimensionality reduction, including improved visualization, computational efficiency, and better performance. The document also introduces various techniques for dimensionality reduction, such as Principal Component Analysis (PCA) and other manifold learning methods.

Uploaded by

eswam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views183 pages

Dimensionality Reduction

Uploaded by

eswam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 183

Topic 2: DIMENSIONALITY REDUCTION

STAT 37710/CAAM 37710/CMSC 35400 Machine Learning

Risi Kondor, The University of Chicago
Dimensionality reduction

In ML data points are often represented as high dimensional real valued

vectors
x = (x1 , x1 , x3 , . . . , xd )⊤ ∈ Rd .

2
2/69
/69
Dimensionality reduction

In ML data points are often represented as high dimensional real valued

vectors
x = (x1 , x1 , x3 , . . . , xd )⊤ ∈ Rd .
The individual dimensions are called features (attributes).

2
2/69
/69
Dimensionality reduction

In ML data points are often represented as high dimensional real valued

vectors
x = (x1 , x1 , x3 , . . . , xd )⊤ ∈ Rd .
The individual dimensions are called features (attributes).

Example: Pixels of an image, a music file, etc.

2
2/69
/69
Dimensionality reduction

In ML data points are often represented as high dimensional real valued

vectors
x = (x1 , x1 , x3 , . . . , xd )⊤ ∈ Rd .
The individual dimensions are called features (attributes).

Example: Pixels of an image, a music file, etc.

But is the problem intrinsically high dimensional???

2
2/69
/69
Dimensionality reduction

In ML data points are often represented as high dimensional real valued

vectors
x = (x1 , x1 , x3 , . . . , xd )⊤ ∈ Rd .
The individual dimensions are called features (attributes).

Example: Pixels of an image, a music file, etc.

But is the problem intrinsically high dimensional??? Often we can convert

high dimensional problems to lower dimensional ones without losing too
much information.

2
2/69
/69
Dimensionality reduction

• Real world data often lie on or near lower dimensional structures

(manifolds).

3
3/69
/69
Dimensionality reduction

• Real world data often lie on or near lower dimensional structures

(manifolds). (Really?)

3
3/69
/69
Dimensionality reduction

• Real world data often lie on or near lower dimensional structures

(manifolds). (Really?)
◦ Variables (features) may be correlated or dependent.

3
3/69
/69
Dimensionality reduction

• Real world data often lie on or near lower dimensional structures

(manifolds). (Really?)
◦ Variables (features) may be correlated or dependent.
◦ Physical systems have a small number of degrees of freedom (e.g., pose and
lighting in Vision).

3
3/69
/69
Dimensionality reduction

• Real world data often lie on or near lower dimensional structures

(manifolds). (Really?)
◦ Variables (features) may be correlated or dependent.
◦ Physical systems have a small number of degrees of freedom (e.g., pose and
lighting in Vision).
• IDEA: find the manifold and restrict learning algorithm to it.

3
3/69
/69
Differentiable manifolds

In mathematics, a d dimensional manifold is a topological space such that

each point has a neighborhood that is homeomorphic to Rd . A differentiable
manifold has additional structure, and a Riemannian manifold has a metric
too → geodesics.

4
4/69
/69
Dimensionality reduction

Advantages:
• Visualization: humans can only imagine things in 2D or 3D.

5
5/69
/69
Dimensionality reduction

Advantages:
• Visualization: humans can only imagine things in 2D or 3D.
• Computational efficiency: learning algorithms work faster in low
dimensions.

5
5/69
/69
Dimensionality reduction

Advantages:
• Visualization: humans can only imagine things in 2D or 3D.
• Computational efficiency: learning algorithms work faster in low
dimensions.
• Better performance: the projection might eliminate noise.
• Interpretability: the vectors spanning the subspace might have
interesting interpretations.

5
5/69
/69
Dimensionality reduction