Detailed Explanation of Module 2 Lab 4: t-Distributed Stochastic Neighbor
Embedding (t-SNE)
(Fully Updated with Lab Content and Your Queries)
Section 1: What is t-SNE and Why Use It?
t-SNE stands for t-Distributed Stochastic Neighbor Embedding.
It is an unsupervised, non-linear dimensionality reduction technique, mainly used for
visualizing high-dimensional data in 2D or 3D.
Developed by Laurens van der Maaten and Geoffrey Hinton in 2008, t-SNE helps reveal
patterns, clusters, and relationships in data that are not visible in tables or with linear
methods like PCA [1] .
Section 2: How Does t-SNE Work? (Step-by-Step with Example)
t-SNE works in three main steps, each with a clear purpose and effect:
Step 1: Measure Similarities in High-Dimensional Space
For every pair of data points, t-SNE centers a Gaussian distribution over each point and
measures how dense the other points are under this distribution.
This process produces a set of probabilities (Pij) that reflect how likely points are to be
neighbors in the original high-dimensional space.
The perplexity parameter controls the size of the neighborhood considered for each point
(think of it as a guess of how many close neighbors each point has). Typical values are
between 5 and 50 [1] .
Example:
Suppose you have 1,797 images of handwritten digits (each 8×8 pixels, so 64 features). For
each image, t-SNE calculates its similarity to every other image based on their pixel values,
resulting in a matrix of probabilities that encode local structure.
Step 2: Measure Similarities in Low-Dimensional Space
t-SNE maps all points to a 2D or 3D space, initially at random.
It then computes a new set of probabilities (Qij) using a Student t-distribution (with heavier
tails than a Gaussian).
The heavy tails allow distant points to be modeled more flexibly, helping clusters spread out
and avoid crowding [1] .
Example:
The same digit images are now points on a 2D plane. t-SNE computes how close they are using
the t-distribution, building a new matrix of similarities for the low-dimensional space.
Step 3: Match the Two Probability Distributions
t-SNE tries to make the low-dimensional similarities (Qij) match the high-dimensional
similarities (Pij) as closely as possible.
It does this by minimizing the Kullback-Leibler (KL) divergence between the two
distributions, using gradient descent.
The points are moved around iteratively until the best match is found, preserving local
neighborhoods and revealing clusters.
Example:
As optimization proceeds, images of the digit "3" are pulled together, "8"s are grouped, and so
on. After enough iterations, the 2D plot shows well-separated clusters for each digit [1] .
Section 3: Practical Application – Visualizing Digits
Dataset: 1,797 handwritten digit images (0–9), each 8×8 pixels (64 features).
Goal: Visualize how the digits cluster in 2D using t-SNE.
Code Example:
from sklearn.manifold import TSNE
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
digits = load_digits()
X = np.vstack([digits.data[digits.target == i] for i in range(10)])
y = np.hstack([digits.target[digits.target == i] for i in range(10)])
tsne = TSNE(init="pca", random_state=20150101, n_components=2, perplexity=30, n_iter=1000
digits_proj = tsne.fit_transform(X)
palette = np.array(sns.color_palette("hls", 10))
plt.figure(figsize=(8,8))
plt.scatter(digits_proj[:,0], digits_proj[:,1], c=palette[y.astype(int)], s=40)
plt.title("t-SNE visualization")
plt.show()
Interpretation: Each color is a digit. t-SNE clusters similar digits together, making the
structure visible [1] .
Section 4: Understanding and Tuning t-SNE Hyperparameters
Parameter What It Does Typical Values / Notes
n_components Output dimensions (usually 2 or 3 for visualization) 2 or 3
perplexity Controls neighborhood size (local vs. global structure) 5–50 (try several values)
n_iter Number of optimization steps (iterations) ≥250 (usually 1000 or more)
Optimization algorithm (‘barnes_hut’ is fast, ‘exact’ is ‘barnes_hut’ for large
method
slower) datasets
Effect of Perplexity:
Low perplexity (e.g., 5): Focuses on very local structure; clusters may be small and tight.
High perplexity (e.g., 100): Considers more global structure; clusters may merge or lose
detail.
Best practice: Try several values and compare results. Perplexity should be less than the
number of points [1] .
Effect of n_iter (Iterations):
Too few iterations: The plot may not stabilize; clusters may look “pinched” or not well
separated.
More iterations: Allows the optimization to converge and produce a clearer map.
Best practice: Iterate until the configuration is stable (often ≥1000) [1] .
Effect of Method:
‘barnes_hut’: Fast, approximate, O(NlogN) time; good for large datasets.
‘exact’: Slower, O(N²) time; more accurate but computationally expensive [1] .
Section 5: Visualizing the Effects of Hyperparameters
Changing Perplexity:
Perplexity 5: Local clusters dominate, but global structure may be lost.
Perplexity 30: Balanced, clear clusters for each digit.
Perplexity 100: Clusters may merge, and points from different digits may mix.
Changing Iterations:
10, 20, 60, 120 steps: Clusters are not yet formed; plots look unstable or “pinched.”
1000 steps: Well-separated, stable clusters.
5000 steps: Similar to 1000, but clusters may be denser [1] .
Section 6: Best Practices and Limitations
Randomness:
t-SNE results can vary between runs due to random initialization. Use random_state for
reproducibility.
No True Clustering:
t-SNE is not a clustering algorithm; it only helps visualize clusters.
Interpretation:
The axes in a t-SNE plot have no intrinsic meaning; only the relative positions and
groupings matter.
Parameter Sensitivity:
Results can change with different perplexity or iteration settings. Always experiment
with several values [1] .
Section 7: Summary Table
Step What Happens Example (Digits)
Compute similarities (Pij) in high-dimensional space using How likely two digit images are
1
Gaussian neighbors
Compute similarities (Qij) in low-dimensional space using Initial random 2D positions for each
2
Student-t image
Minimize KL divergence between Pij and Qij (gradient Points move until clusters of digits
3
descent) form
Section 8: Exercises and Exploration
Try different perplexity and iteration values to see how the visualization changes.
Use t-SNE for exploration, not for clustering or modeling directly.
Combine with other techniques: Use t-SNE after PCA or as a preprocessing step for
visualization [1] .
Section 9: Key Takeaways
t-SNE is a powerful tool for exploring and visualizing high-dimensional data.
It excels at revealing clusters and local structure.
Hyperparameters like perplexity and n_iter greatly influence the results—experiment with
them!
t-SNE is best for visualization and data exploration, not as a preprocessing step for
modeling or clustering [1] .
If you want a deeper explanation of any step, or want to see code for a particular part, just
ask!
1. AIML_Module_2_Lab_4_t_SNE.ipynb-Colab.pdf