0% found this document useful (0 votes)

27 views5 pages

Module 2 Lab 4

The document provides a detailed explanation of t-Distributed Stochastic Neighbor Embedding (t-SNE), an unsupervised dimensionality reduction technique for visualizing high-dimensional data. It outlines the steps of t-SNE, practical applications, hyperparameter tuning, and best practices for effective use. Key takeaways emphasize t-SNE's effectiveness in revealing data clusters and the importance of experimenting with hyperparameters like perplexity and iterations.

Uploaded by

katrao39798

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views5 pages

Module 2 Lab 4

Uploaded by

katrao39798

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Detailed Explanation of Module 2 Lab 4: t-Distributed Stochastic Neighbor

Embedding (t-SNE)
(Fully Updated with Lab Content and Your Queries)

Section 1: What is t-SNE and Why Use It?

t-SNE stands for t-Distributed Stochastic Neighbor Embedding.
It is an unsupervised, non-linear dimensionality reduction technique, mainly used for
visualizing high-dimensional data in 2D or 3D.
Developed by Laurens van der Maaten and Geoffrey Hinton in 2008, t-SNE helps reveal
patterns, clusters, and relationships in data that are not visible in tables or with linear
methods like PCA [1] .

Section 2: How Does t-SNE Work? (Step-by-Step with Example)

t-SNE works in three main steps, each with a clear purpose and effect:

Step 1: Measure Similarities in High-Dimensional Space

For every pair of data points, t-SNE centers a Gaussian distribution over each point and
measures how dense the other points are under this distribution.
This process produces a set of probabilities (Pij) that reflect how likely points are to be
neighbors in the original high-dimensional space.
The perplexity parameter controls the size of the neighborhood considered for each point
(think of it as a guess of how many close neighbors each point has). Typical values are
between 5 and 50 [1] .
Example:
Suppose you have 1,797 images of handwritten digits (each 8×8 pixels, so 64 features). For
each image, t-SNE calculates its similarity to every other image based on their pixel values,
resulting in a matrix of probabilities that encode local structure.

Step 2: Measure Similarities in Low-Dimensional Space

t-SNE maps all points to a 2D or 3D space, initially at random.
It then computes a new set of probabilities (Qij) using a Student t-distribution (with heavier
tails than a Gaussian).
The heavy tails allow distant points to be modeled more flexibly, helping clusters spread out
and avoid crowding [1] .
Example:
The same digit images are now points on a 2D plane. t-SNE computes how close they are using
the t-distribution, building a new matrix of similarities for the low-dimensional space.

Step 3: Match the Two Probability Distributions

t-SNE tries to make the low-dimensional similarities (Qij) match the high-dimensional
similarities (Pij) as closely as possible.
It does this by minimizing the Kullback-Leibler (KL) divergence between the two
distributions, using gradient descent.
The points are moved around iteratively until the best match is found, preserving local
neighborhoods and revealing clusters.
Example:
As optimization proceeds, images of the digit "3" are pulled together, "8"s are grouped, and so
on. After enough iterations, the 2D plot shows well-separated clusters for each digit [1] .

Section 3: Practical Application – Visualizing Digits

Dataset: 1,797 handwritten digit images (0–9), each 8×8 pixels (64 features).
Goal: Visualize how the digits cluster in 2D using t-SNE.
Code Example:

from sklearn.manifold import TSNE

from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

digits = load_digits()
X = np.vstack([digits.data[digits.target == i] for i in range(10)])
y = np.hstack([digits.target[digits.target == i] for i in range(10)])

tsne = TSNE(init="pca", random_state=20150101, n_components=2, perplexity=30, n_iter=1000

digits_proj = tsne.fit_transform(X)

palette = np.array(sns.color_palette("hls", 10))

plt.figure(figsize=(8,8))
plt.scatter(digits_proj[:,0], digits_proj[:,1], c=palette[y.astype(int)], s=40)
plt.title("t-SNE visualization")
plt.show()

Interpretation: Each color is a digit. t-SNE clusters similar digits together, making the
structure visible [1] .
Section 4: Understanding and Tuning t-SNE Hyperparameters
Parameter What It Does Typical Values / Notes

n_components Output dimensions (usually 2 or 3 for visualization) 2 or 3

perplexity Controls neighborhood size (local vs. global structure) 5–50 (try several values)

n_iter Number of optimization steps (iterations) ≥250 (usually 1000 or more)

Optimization algorithm (‘barnes_hut’ is fast, ‘exact’ is ‘barnes_hut’ for large

method
slower) datasets

Effect of Perplexity:
Low perplexity (e.g., 5): Focuses on very local structure; clusters may be small and tight.
High perplexity (e.g., 100): Considers more global structure; clusters may merge or lose
detail.
Best practice: Try several values and compare results. Perplexity should be less than the
number of points [1] .

Effect of n_iter (Iterations):

Too few iterations: The plot may not stabilize; clusters may look “pinched” or not well
separated.
More iterations: Allows the optimization to converge and produce a clearer map.
Best practice: Iterate until the configuration is stable (often ≥1000) [1] .

Effect of Method:
‘barnes_hut’: Fast, approximate, O(NlogN) time; good for large datasets.
‘exact’: Slower, O(N²) time; more accurate but computationally expensive [1] .

Section 5: Visualizing the Effects of Hyperparameters

Changing Perplexity:
Perplexity 5: Local clusters dominate, but global structure may be lost.
Perplexity 30: Balanced, clear clusters for each digit.
Perplexity 100: Clusters may merge, and points from different digits may mix.
Changing Iterations:
10, 20, 60, 120 steps: Clusters are not yet formed; plots look unstable or “pinched.”
1000 steps: Well-separated, stable clusters.
5000 steps: Similar to 1000, but clusters may be denser [1] .
Section 6: Best Practices and Limitations
Randomness:
t-SNE results can vary between runs due to random initialization. Use random_state for
reproducibility.
No True Clustering:
t-SNE is not a clustering algorithm; it only helps visualize clusters.
Interpretation:
The axes in a t-SNE plot have no intrinsic meaning; only the relative positions and
groupings matter.
Parameter Sensitivity:
Results can change with different perplexity or iteration settings. Always experiment
with several values [1] .

Section 7: Summary Table

Step What Happens Example (Digits)

Compute similarities (Pij) in high-dimensional space using How likely two digit images are
1
Gaussian neighbors

Compute similarities (Qij) in low-dimensional space using Initial random 2D positions for each
2
Student-t image

Minimize KL divergence between Pij and Qij (gradient Points move until clusters of digits
3
descent) form

Section 8: Exercises and Exploration

Try different perplexity and iteration values to see how the visualization changes.
Use t-SNE for exploration, not for clustering or modeling directly.
Combine with other techniques: Use t-SNE after PCA or as a preprocessing step for
visualization [1] .

Section 9: Key Takeaways

t-SNE is a powerful tool for exploring and visualizing high-dimensional data.
It excels at revealing clusters and local structure.
Hyperparameters like perplexity and n_iter greatly influence the results—experiment with
them!
t-SNE is best for visualization and data exploration, not as a preprocessing step for
modeling or clustering [1] .
If you want a deeper explanation of any step, or want to see code for a particular part, just
ask!

1. AIML_Module_2_Lab_4_t_SNE.ipynb-Colab.pdf

CNN Image Classification Guide
No ratings yet
CNN Image Classification Guide
13 pages
Heavy-Tailed Kernels Reveal A Finer Cluster Structure in t-SNE Visualisations
No ratings yet
Heavy-Tailed Kernels Reveal A Finer Cluster Structure in t-SNE Visualisations
20 pages
Fingerprint Classification Using Kohonen Topologic Map
No ratings yet
Fingerprint Classification Using Kohonen Topologic Map
4 pages
T-Visne: Interactive Assessment and Interpretation of T-Sne Projections
No ratings yet
T-Visne: Interactive Assessment and Interpretation of T-Sne Projections
18 pages
Intro to t-SNE for Data Scientists
No ratings yet
Intro to t-SNE for Data Scientists
12 pages
Belongie Iccv01
No ratings yet
Belongie Iccv01
8 pages
User Representations in Online Social Networks
No ratings yet
User Representations in Online Social Networks
16 pages
t-SNE: High-Dimensional Data Visualization
No ratings yet
t-SNE: High-Dimensional Data Visualization
27 pages
TT-TSDF: Memory-Efficient TSDF With Low-Rank Tensor Train Decomposition
No ratings yet
TT-TSDF: Memory-Efficient TSDF With Low-Rank Tensor Train Decomposition
6 pages
CETA Conditions Workflow
No ratings yet
CETA Conditions Workflow
8 pages
Object Recognition and Template Matching: - A Template Is A Small Image (Sub-Image)
No ratings yet
Object Recognition and Template Matching: - A Template Is A Small Image (Sub-Image)
23 pages
T Sne Wikipedia
No ratings yet
T Sne Wikipedia
4 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Unsupervised Learning: Neighbor Embedding
No ratings yet
Unsupervised Learning: Neighbor Embedding
15 pages
Network Analysis with tnet
No ratings yet
Network Analysis with tnet
48 pages
Categories of Research
100% (1)
Categories of Research
2 pages
6406 Presentation Script
No ratings yet
6406 Presentation Script
3 pages
Densereg: Fully Convolutional Dense Shape Regression In-The-Wild
No ratings yet
Densereg: Fully Convolutional Dense Shape Regression In-The-Wild
11 pages
Dimension Reduction
No ratings yet
Dimension Reduction
43 pages
Data Science Immersive Syllabus: Course
No ratings yet
Data Science Immersive Syllabus: Course
4 pages
L09 OtherDimensionReductionMethods-1
No ratings yet
L09 OtherDimensionReductionMethods-1
29 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
Actup: Analyzing and Consolidating Tsne & Umap: Mnist Fashion-Mnist Coil-100
No ratings yet
Actup: Analyzing and Consolidating Tsne & Umap: Mnist Fashion-Mnist Coil-100
8 pages
Science:, 2323 (2000) Sam T. Roweis and Lawrence K. Saul
No ratings yet
Science:, 2323 (2000) Sam T. Roweis and Lawrence K. Saul
5 pages
Indianinstitute of Information Technology Design and Manufacturing (Iiitd&M) Kancheepuram
No ratings yet
Indianinstitute of Information Technology Design and Manufacturing (Iiitd&M) Kancheepuram
1 page
Lecture Dimensionality Reduction
No ratings yet
Lecture Dimensionality Reduction
34 pages
Lecture 2. Dimension Reduction
No ratings yet
Lecture 2. Dimension Reduction
71 pages
Science 2006 Cottrell 454 5
No ratings yet
Science 2006 Cottrell 454 5
3 pages
A Geometric Understanding of Deep Learning 2020 Engineering
No ratings yet
A Geometric Understanding of Deep Learning 2020 Engineering
14 pages
t-SNE Algorithm Guide: R & Python
No ratings yet
t-SNE Algorithm Guide: R & Python
19 pages
t-SNE: Key Concepts & Steps
No ratings yet
t-SNE: Key Concepts & Steps
11 pages
Mapper DR
No ratings yet
Mapper DR
10 pages
E-Note 27952 Content Document 20241123033842PM
No ratings yet
E-Note 27952 Content Document 20241123033842PM
57 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
61 pages
Symmetric Gauss-Seidel Technique Based Alternating Direction Methods of Multipliers For Transform Invariant Low-Rank Textures Problem
No ratings yet
Symmetric Gauss-Seidel Technique Based Alternating Direction Methods of Multipliers For Transform Invariant Low-Rank Textures Problem
9 pages
Intro Cluster Problem Python
No ratings yet
Intro Cluster Problem Python
13 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Self-Supervised Visualisation of Medical Image Datasets
No ratings yet
Self-Supervised Visualisation of Medical Image Datasets
13 pages
A Study On Software Effort Prediction Using Machine Learning Techniques
No ratings yet
A Study On Software Effort Prediction Using Machine Learning Techniques
15 pages
Deformable Diversity Similarity for Template Matching
No ratings yet
Deformable Diversity Similarity for Template Matching
9 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
Crop Yield Prediction Using ML
No ratings yet
Crop Yield Prediction Using ML
7 pages
Deep Learning Lectures - 3
No ratings yet
Deep Learning Lectures - 3
75 pages
CNNs in Radiology: An Overview
No ratings yet
CNNs in Radiology: An Overview
20 pages
Dimensionality Reduction Visualization
No ratings yet
Dimensionality Reduction Visualization
28 pages
Syllabus: International Institute of Professional Studies Davv, Indore
No ratings yet
Syllabus: International Institute of Professional Studies Davv, Indore
13 pages
Topological Deep Learning
No ratings yet
Topological Deep Learning
9 pages
# Mix Data Into A 100-Dimensional State: Print
No ratings yet
# Mix Data Into A 100-Dimensional State: Print
25 pages
Computer & Network Security Syllabus
No ratings yet
Computer & Network Security Syllabus
45 pages
Machine Learning 2
No ratings yet
Machine Learning 2
17 pages
Artificial Vs ML
No ratings yet
Artificial Vs ML
7 pages
Supplementary - Active Learning Alloys
No ratings yet
Supplementary - Active Learning Alloys
38 pages
AI
No ratings yet
AI
13 pages
W4 2高级心理统计【赵蓁媜2021011898】
No ratings yet
W4 2高级心理统计【赵蓁媜2021011898】
5 pages
Unsupervised Deep Learning For Structured Shape Matching
No ratings yet
Unsupervised Deep Learning For Structured Shape Matching
14 pages
The Application of AI Technologies in STEM Education: A Systematic Review From 2011 To 2021
No ratings yet
The Application of AI Technologies in STEM Education: A Systematic Review From 2011 To 2021
20 pages
Coding Assignment Report
No ratings yet
Coding Assignment Report
5 pages
Blockchain and AI Technology Convergence Applications in Transportation Systems
No ratings yet
Blockchain and AI Technology Convergence Applications in Transportation Systems
32 pages
The Art of Using t-SNE For Single Cell Transcriptomics
No ratings yet
The Art of Using t-SNE For Single Cell Transcriptomics
14 pages
Mapper DR
No ratings yet
Mapper DR
5 pages
3 - Theoretical Foundations of T-SNE For Visualizing - 2022
No ratings yet
3 - Theoretical Foundations of T-SNE For Visualizing - 2022
54 pages
Transformer & GPT Model Basics
No ratings yet
Transformer & GPT Model Basics
69 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
Master Thesis Doc
No ratings yet
Master Thesis Doc
55 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Lecture21 1
No ratings yet
Lecture21 1
14 pages
Cs230exam spr21
No ratings yet
Cs230exam spr21
16 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
11 pages
Vandermaaten 14 A
No ratings yet
Vandermaaten 14 A
25 pages
Prof. Dr. Ms. Goldie Gabrani Sept 2023 Profile Website
No ratings yet
Prof. Dr. Ms. Goldie Gabrani Sept 2023 Profile Website
9 pages
Machine Learning Model Development From A
No ratings yet
Machine Learning Model Development From A
9 pages
Speech Emotion Recognition Insights
No ratings yet
Speech Emotion Recognition Insights
4 pages
Report - en
No ratings yet
Report - en
4 pages
Prospectus 2022
No ratings yet
Prospectus 2022
1 page
Dbscan R PKG Description
No ratings yet
Dbscan R PKG Description
55 pages
1 s2.0 S2352864824001251 Main
No ratings yet
1 s2.0 S2352864824001251 Main
12 pages
F - Sne Umap: ROM TO With Contrastive Learning
No ratings yet
F - Sne Umap: ROM TO With Contrastive Learning
44 pages
Temp 2 Lab 1
No ratings yet
Temp 2 Lab 1
5 pages
ML Unit5 QB
No ratings yet
ML Unit5 QB
6 pages
ML Laboratory Lesson Plan-BISL607
No ratings yet
ML Laboratory Lesson Plan-BISL607
7 pages
NeurIPS 2021 On Umaps True Loss Function Paper
No ratings yet
NeurIPS 2021 On Umaps True Loss Function Paper
12 pages
Feature Selection
No ratings yet
Feature Selection
32 pages
Module 3 Lab 3
No ratings yet
Module 3 Lab 3
4 pages
Deep Generative Neural Embeddings For High Dimensional Data Visualization
No ratings yet
Deep Generative Neural Embeddings For High Dimensional Data Visualization
8 pages
Lecture3 MCQ Guide
No ratings yet
Lecture3 MCQ Guide
9 pages
Module 2 Lab 3
No ratings yet
Module 2 Lab 3
5 pages
Module 1 Lab 2
No ratings yet
Module 1 Lab 2
7 pages
Diabetes Project Using Machine Learning
100% (1)
Diabetes Project Using Machine Learning
49 pages
MD Zaid Hussain GEN AI Engineer Resume 1
No ratings yet
MD Zaid Hussain GEN AI Engineer Resume 1
2 pages
8 Clustering2
No ratings yet
8 Clustering2
84 pages
Module 3
No ratings yet
Module 3
21 pages
Ai 4 Mid
No ratings yet
Ai 4 Mid
23 pages
Sheng Hundley
No ratings yet
Sheng Hundley
54 pages

Module 2 Lab 4

Uploaded by

Module 2 Lab 4

Uploaded by

Detailed Explanation of Module 2 Lab 4: t-Distributed Stochastic Neighbor

Section 1: What is t-SNE and Why Use It?

Section 2: How Does t-SNE Work? (Step-by-Step with Example)

Step 1: Measure Similarities in High-Dimensional Space

Step 2: Measure Similarities in Low-Dimensional Space

Step 3: Match the Two Probability Distributions

Section 3: Practical Application – Visualizing Digits

from sklearn.manifold import TSNE

tsne = TSNE(init="pca", random_state=20150101, n_components=2, perplexity=30, n_iter=1000

palette = np.array(sns.color_palette("hls", 10))

n_components Output dimensions (usually 2 or 3 for visualization) 2 or 3

n_iter Number of optimization steps (iterations) ≥250 (usually 1000 or more)

Optimization algorithm (‘barnes_hut’ is fast, ‘exact’ is ‘barnes_hut’ for large

Effect of n_iter (Iterations):

Section 5: Visualizing the Effects of Hyperparameters

Section 7: Summary Table

Section 8: Exercises and Exploration

Section 9: Key Takeaways

You might also like