0% found this document useful (0 votes)

4 views2 pages

Splex Tme2

The document outlines a training module focused on clustering methods using the scikit-learn Python library. It includes instructions for analyzing simulated data sets and applying various clustering techniques such as K-means, hierarchical clustering, and spectral clustering. Additionally, it emphasizes evaluating clustering results using metrics like homogeneity, completeness, and silhouette scores, and suggests applying these methods to real-world data sets on breast cancer and mice protein expression.

Uploaded by

ahmedprof843

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views2 pages

Splex Tme2

Uploaded by

ahmedprof843

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

SPLEX TME 2

Clustering

The goal of the TME is to learn how to use some popular clustering methods (unsupervised
learning), and how to interpret the results.
We will use the scikit-learn Python library http://scikit-learn.org which is already installed
on the computers.

Data (simulated data sets + data sets of TME 1)

We explore two data sets downloadable from the Machine Learning Repository (http://archive.
ics.uci.edu/ml/index.php)

• Breast Cancer Wisconsin (Diagnostic) Data Set (https://archive.ics.uci.edu/ml/datasets/

Breast+Cancer+Wisconsin+(Diagnostic))

• Mice Protein Expression Data Set (https://archive.ics.uci.edu/ml/datasets/Mice+Protein+

Expression)

Libraries
You will need to load the following packages:

import matplotlib.pyplot as plt

from sklearn import cluster
from sklearn.cluster import KMeans
from sklearn import metrics
from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_classification
from sklearn.datasets import make_blobs
from sklearn.datasets import make_moons

Analysis

Before running analysis on the Breast and Mice data sets, we will do analysis on three simulated
data sets to better understand what different clustering methods do, and why they produce different
clustering. Generate and visualize the artificial data as follows:

# First simulated data set

plt.title("Two informative features, one cluster per class", fontsize=’small’)
X1, Y1 = make_classification(n_samples=200, n_features=2, n_redundant=0, n_informative=2,
n_clusters_per_class=1)
plt.scatter(X1[:, 0], X1[:, 1], marker=’o’, c=Y1,s=25, edgecolor=’k’)

# Second simulated data set

plt.title("Three blobs", fontsize=’small’)
X2, Y2 = make_blobs(n_samples=200, n_features=2, centers=3)
plt.scatter(X2[:, 0], X2[:, 1], marker=’o’, c=Y2, s=25, edgecolor=’k’)

# Third simulated data set

plt.title("Non-linearly separated data sets", fontsize=’small’)
X3, Y3 = make_moons(n_samples=200, shuffle=True, noise=None, random_state=None)
plt.scatter(X3[:, 0], X3[:, 1], marker=’o’, c=Y3, s=25, edgecolor=’k’)

1
Apply the following clustering methods to the three simulated data sets.
Clustering Methods

1. K-means
http://scikit-learn.org/stable/modules/clustering.html#k-means
An example of k-means clustering (where k is the number of clusters you want to produce,
and X is the data matrix):

km = KMeans(n_clusters=k, init=’k-means++’, max_iter=100, n_init=1)

km.fit(X)

You can also visualize the clustering (and compare it to the true repartition):

plt.scatter(X[:, 0], X[:, 1], s=10, c=km.labels_)

2. Hierarchical clustering
http://scikit-learn.org/stable/modules/clustering.html#hierarchical-clustering
An example of hierarchical clustering (where k is the number of clusters you want to produce,
and X is the data matrix):

for linkage in (’ward’, ’average’, ’complete’):

clustering = AgglomerativeClustering(linkage=linkage, n_clusters=k)
clustering.fit(X)

3. Spectral clustering
http://scikit-learn.org/stable/modules/clustering.html#spectral-clustering
An example of spectral clustering (where k is the number of clusters you want to produce,
and X is the data matrix):

spectral = cluster.SpectralClustering(n_clusters=k, eigen_solver=’arpack’,

affinity="nearest_neighbors")
spectral.fit(X)

4. Analyse the results of clustering in terms of

• Homogeneity metrics.homogeneity score()

• Completeness metrics.completeness score()
• V-measure metrics.v measure score()
• Adjusted Rand-Index metrics.adjusted rand score()
• Silhouette Coefficient metrics.silhouette score()

5. What is an optimal clustering method for each simulated data set?

6. Re-run the clustering methods on the Breast cancer and Mice data sets. Do not include the
class variables in your clustering analysis but compare the obtained clustering with the true
class labels.

Solar Fridge for Engineering Students
No ratings yet
Solar Fridge for Engineering Students
46 pages
PEC Codes and Limits
57% (7)
PEC Codes and Limits
6 pages
Lesson Plan
0% (1)
Lesson Plan
4 pages
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
No ratings yet
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
34 pages
D3 1001 Manual de Peças TK 40662-1.13
100% (1)
D3 1001 Manual de Peças TK 40662-1.13
55 pages
Studi Kasus Facebook
No ratings yet
Studi Kasus Facebook
8 pages
Inverter E171781
No ratings yet
Inverter E171781
6 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Forguson
100% (1)
Forguson
61 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
LI-NING Europa Badminton Catalogue & PriceList 2013
No ratings yet
LI-NING Europa Badminton Catalogue & PriceList 2013
36 pages
Spike Sorting Tutorial
No ratings yet
Spike Sorting Tutorial
25 pages
Clustering For Clasification
No ratings yet
Clustering For Clasification
13 pages
"Fiberglass" (Glass-Fiber-Reinforced Thermosetting-Resin) Pressure Pipe Fittings
No ratings yet
"Fiberglass" (Glass-Fiber-Reinforced Thermosetting-Resin) Pressure Pipe Fittings
5 pages
Python for Machine Learning Enthusiasts
No ratings yet
Python for Machine Learning Enthusiasts
50 pages
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
No ratings yet
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
6 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Opt Sim
No ratings yet
Opt Sim
1 page
AI With Python - Unsupervised Learning - Clustering
No ratings yet
AI With Python - Unsupervised Learning - Clustering
12 pages
ISO 20022 Migration Guide
No ratings yet
ISO 20022 Migration Guide
16 pages
2.3. Clustering - Scikit-Learn 1
No ratings yet
2.3. Clustering - Scikit-Learn 1
24 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
Annexure - X Report On Failure of OCB / CT / PT
No ratings yet
Annexure - X Report On Failure of OCB / CT / PT
12 pages
Word 2010 Practice Exercise Guide
No ratings yet
Word 2010 Practice Exercise Guide
2 pages
COVID-19 Clustering Project Report
No ratings yet
COVID-19 Clustering Project Report
19 pages
The Art of Cryogenics: Low-Temperature Experimental Techniques
No ratings yet
The Art of Cryogenics: Low-Temperature Experimental Techniques
26 pages
Scikit-learn ML Course Guide
100% (1)
Scikit-learn ML Course Guide
23 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Unsupervised Machine Learning in Python
100% (2)
Unsupervised Machine Learning in Python
89 pages
TMC Manifesto
No ratings yet
TMC Manifesto
72 pages
IT Professional Resume
No ratings yet
IT Professional Resume
3 pages
69 1968efs PDF
No ratings yet
69 1968efs PDF
20 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
FAQ299
No ratings yet
FAQ299
2 pages
Tiffany & Co. Social Media Strategy
100% (1)
Tiffany & Co. Social Media Strategy
22 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Introduction To Scikit Learn
100% (1)
Introduction To Scikit Learn
108 pages
Telangana e-Challan Violations List
No ratings yet
Telangana e-Challan Violations List
1 page
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
Deep Learning for Mental Illness Prediction
No ratings yet
Deep Learning for Mental Illness Prediction
58 pages
Chandni Tiwari: Chandni - Sai21@yahoo - Co M
No ratings yet
Chandni Tiwari: Chandni - Sai21@yahoo - Co M
4 pages
Bayesian and Clustering Algorithms in Python
No ratings yet
Bayesian and Clustering Algorithms in Python
18 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
Proposal and Bio
No ratings yet
Proposal and Bio
2 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Crash Course Sul Machine Learning ?
No ratings yet
Crash Course Sul Machine Learning ?
13 pages
RS9 GPL 4 HN JNJ O1 Dy
No ratings yet
RS9 GPL 4 HN JNJ O1 Dy
14 pages
Scikit-Learn for Data Scientists
No ratings yet
Scikit-Learn for Data Scientists
27 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Two-Way Slab Design ACI Method
No ratings yet
Two-Way Slab Design ACI Method
6 pages
AAI101 - Session 2 - Unsupervised Learning
No ratings yet
AAI101 - Session 2 - Unsupervised Learning
38 pages
BOQ For Expansion of Fire Hydrant
No ratings yet
BOQ For Expansion of Fire Hydrant
3 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
28 pages
Data Mining
No ratings yet
Data Mining
27 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Rf-Controlled Beach Cleaner Vehicle
No ratings yet
Rf-Controlled Beach Cleaner Vehicle
13 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Machine Learning
No ratings yet
Machine Learning
39 pages
AI Overview Simplified
No ratings yet
AI Overview Simplified
17 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Model-Based Clustering
No ratings yet
Model-Based Clustering
23 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Unsupervised
No ratings yet
Unsupervised
10 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
AppliedML Chap1 Clustering
No ratings yet
AppliedML Chap1 Clustering
37 pages
# Mix Data Into A 100-Dimensional State: Print
No ratings yet
# Mix Data Into A 100-Dimensional State: Print
25 pages
M8 - Research Framework
No ratings yet
M8 - Research Framework
2 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
RFT - Specifications (En)
No ratings yet
RFT - Specifications (En)
4 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
51 DA5400 - FML51 - 20250501 ProblemSet06
No ratings yet
51 DA5400 - FML51 - 20250501 ProblemSet06
4 pages
GMMN Mohamed V Charts
No ratings yet
GMMN Mohamed V Charts
48 pages
CH 15
No ratings yet
CH 15
88 pages
GSA-150-N-IP Boiler Manual - 0319
No ratings yet
GSA-150-N-IP Boiler Manual - 0319
44 pages
Machine Learning Lab Programs
No ratings yet
Machine Learning Lab Programs
6 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
SOLUTION ONLY CODE DWDM - Lab - All
No ratings yet
SOLUTION ONLY CODE DWDM - Lab - All
8 pages
Machine Learning: Supervised /unsupervised
No ratings yet
Machine Learning: Supervised /unsupervised
33 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Module 3
No ratings yet
Module 3
21 pages
Amlt Bca Unit-3
No ratings yet
Amlt Bca Unit-3
7 pages
1 ST
No ratings yet
1 ST
11 pages

Splex Tme2

Uploaded by

Splex Tme2

Uploaded by

SPLEX TME 2

Data (simulated data sets + data sets of TME 1)

• Breast Cancer Wisconsin (Diagnostic) Data Set (https://archive.ics.uci.edu/ml/datasets/

• Mice Protein Expression Data Set (https://archive.ics.uci.edu/ml/datasets/Mice+Protein+

import matplotlib.pyplot as plt

# First simulated data set

# Second simulated data set

# Third simulated data set

km = KMeans(n_clusters=k, init=’k-means++’, max_iter=100, n_init=1)

plt.scatter(X[:, 0], X[:, 1], s=10, c=km.labels_)

for linkage in (’ward’, ’average’, ’complete’):

spectral = cluster.SpectralClustering(n_clusters=k, eigen_solver=’arpack’,

4. Analyse the results of clustering in terms of

• Homogeneity metrics.homogeneity score()

5. What is an optimal clustering method for each simulated data set?

You might also like