100% found this document useful (1 vote)

194 views11 pages

Clustering - Jupyter Notebook

This document is a Jupyter Notebook that explores k-means clustering on the iris dataset. It loads the iris data, visualizes the relationships between features, implements a custom k-means clustering algorithm from scratch, and visualizes the cluster centers found for different numbers of iterations of the algorithm. The notebook loads data, explores relationships between features, implements a basic k-means algorithm, and visualizes how the cluster centers change with more iterations of the algorithm.

Uploaded by

reema dsouza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

194 views11 pages

Clustering - Jupyter Notebook

Uploaded by

reema dsouza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

Clustering

k-means Clustering
https://en.wikipedia.org/wiki/K-means_clustering (https://en.wikipedia.org/wiki/K-means_clustering)

UCI Machine Learning Repository

https://archive.ics.uci.edu/ml/datasets.php
(https://archive.ics.uci.edu/ml/datasets.php)

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import style

In [2]: style.use('default')

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 1/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [3]: iris_df = sns.load_dataset('iris')

iris_df

Out[3]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

150 rows × 5 columns

In [4]: iris_df['species'].value_counts()

Out[4]: virginica 50

setosa 50

versicolor 50

Name: species, dtype: int64

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 2/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [5]: plt.figure(figsize = (7,7))

sns.scatterplot(data = iris_df, x = 'sepal_length', y = 'sepal_width', hue = 'spe

Out[5]: <AxesSubplot:xlabel='sepal_length', ylabel='sepal_width'>

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 3/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [6]: plt.figure(figsize = (7,7))

sns.scatterplot(data = iris_df, x = 'sepal_length', y = 'petal_width', hue = 'spe

Out[6]: <AxesSubplot:xlabel='sepal_length', ylabel='petal_width'>

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 4/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [7]: plt.figure(figsize = (5,5))

sns.scatterplot(data = iris_df, x = 'petal_length', y = 'petal_width', hue = 'spe

Out[7]: <AxesSubplot:xlabel='petal_length', ylabel='petal_width'>

Raw Coding the k Means Clustering Algorithm

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 5/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [8]: from sklearn.metrics import pairwise_distances_argmin

def find_clusters(X, n_clusters, rseed=0, num_iter = 100):
# 1. Randomly choose clusters
rng = np.random.RandomState(rseed)
i = rng.permutation(X.shape[0])[:n_clusters]
centers = X[i]

iter = 1
while True:
# 2a. Assign labels based on closest center
labels = pairwise_distances_argmin(X, centers)

# 2b. Find new centers from means of points

new_centers = np.array([X[labels == i].mean(0)
for i in range(n_clusters)])

# 2c. Check for convergence

print(num_iter, iter)
iter +=1
if iter > num_iter:
break

if np.all(centers == new_centers):
break

centers = new_centers

return centers, labels

X = iris_df.iloc[:, :-1].to_numpy()
centers = []
labels = []
for i in [1, 2, 5, 10]:
out_center, out_label = find_clusters(X, 3, num_iter = i, rseed=0)
centers.append(out_center)
labels.append(out_label)

1 1

2 1

2 2

5 1

5 2

5 3

5 4

5 5

10 1

10 2

10 3

10 4

10 5

10 6

10 7

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 6/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

10 8

10 9

In [9]: iris_df

Out[9]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

150 rows × 5 columns

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 7/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [10]: import matplotlib.gridspec as gridspec

fig2 = plt.figure(constrained_layout=True, figsize = (7,7))
spec2 = gridspec.GridSpec(ncols=2, nrows=2, figure=fig2)
f2_ax1 = fig2.add_subplot(spec2[0, 0])
sns.scatterplot(ax = f2_ax1, x = iris_df['sepal_length'], y = iris_df['petal_widt
f2_ax1.scatter(centers[0][:, 0], centers[0][:, -1], marker = '*', color = 'royalb
f2_ax1.set_title('Number of Iterations = 1')

f2_ax2 = fig2.add_subplot(spec2[0, 1])
sns.scatterplot(ax = f2_ax2, x = iris_df['sepal_length'], y = iris_df['petal_widt
f2_ax2.scatter(centers[1][:, 0], centers[1][:, -1], marker = '*', color = 'royalb
f2_ax2.set_title('Number of Iterations = 2')

f2_ax3 = fig2.add_subplot(spec2[1, 0])
sns.scatterplot(ax = f2_ax3, x = iris_df['sepal_length'], y = iris_df['petal_widt
f2_ax3.scatter(centers[2][:, 0], centers[2][:, -1], marker = '*', color = 'royalb
f2_ax3.set_title('Number of Iterations = 5')

f2_ax4 = fig2.add_subplot(spec2[1, 1])
sns.scatterplot(ax = f2_ax4, x = iris_df['sepal_length'], y = iris_df['petal_widt
f2_ax4.scatter(centers[3][:, 0], centers[3][:, -1], marker = '*', color = 'royalb
f2_ax4.set_title('Number of Iterations = 10')
fig2.suptitle('Clustering at various iterations')
plt.show()

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 8/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

Using sklearn

https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
(https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 9/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [11]: from sklearn.cluster import KMeans

kmc = KMeans(n_clusters=3, max_iter=600, algorithm = 'full')
X = iris_df.iloc[:, :-1]
kmc.fit(X)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:882: User
Warning: KMeans is known to have a memory leak on Windows with MKL, when there
are less chunks than available threads. You can avoid it by setting the environ
ment variable OMP_NUM_THREADS=1.

f"KMeans is known to have a memory leak on Windows "

Out[11]: KMeans(algorithm='full', max_iter=600, n_clusters=3)

In [12]: kmc.cluster_centers_

Out[12]: array([[5.006 , 3.428 , 1.462 , 0.246 ],

[5.9016129 , 2.7483871 , 4.39354839, 1.43387097],

[6.85 , 3.07368421, 5.74210526, 2.07105263]])

In [13]: kmc.labels_

Out[13]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2,

2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2,

2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1])

In [14]: pd.crosstab(kmc.labels_, iris_df['species'])

Out[14]: species setosa versicolor virginica

row_0

0 50 0 0

1 0 48 14

2 0 2 36

Metrics

https://scikit-learn.org/stable/modules/clustering.html#clustering-evaluation (https://scikit-
learn.org/stable/modules/clustering.html#clustering-evaluation)

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 10/11

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

In [15]: from sklearn.metrics import silhouette_score

cluster_df = pd.DataFrame(kmc.labels_, columns = ['Cluster ID'])
# cluster_df

full_cluster_df = pd.concat([X.reset_index(drop = True), cluster_df], axis = 1)
full_cluster_df

silhouette_score(full_cluster_df, kmc.labels_, metric='euclidean')

Out[15]: 0.6128676734836785

In [ ]:

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 11/11

ML Lab Manual-17csl76
No ratings yet
ML Lab Manual-17csl76
43 pages
MBA 102 Book
No ratings yet
MBA 102 Book
524 pages
Medical Statistics
No ratings yet
Medical Statistics
38 pages
Nadeem Report
No ratings yet
Nadeem Report
19 pages
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
No ratings yet
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
4 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Image Processing 7
No ratings yet
Image Processing 7
193 pages
Clustering
No ratings yet
Clustering
1 page
K-Means Clustering Python Guide
No ratings yet
K-Means Clustering Python Guide
3 pages
Student Survey Data Analysis
100% (2)
Student Survey Data Analysis
11 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Statistics I
100% (2)
Statistics I
686 pages
BERT A Review of Applications in Sentiment Analysis
No ratings yet
BERT A Review of Applications in Sentiment Analysis
10 pages
ARMA-Stochastic Time Series Modeling
100% (1)
ARMA-Stochastic Time Series Modeling
19 pages
PGP-AIML Curriculum - Great Lakes
No ratings yet
PGP-AIML Curriculum - Great Lakes
43 pages
A Practical Time-Series Tutorial With MATLAB
No ratings yet
A Practical Time-Series Tutorial With MATLAB
95 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Predicting Salary with Experience
100% (1)
Predicting Salary with Experience
7 pages
Time Series
67% (3)
Time Series
34 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Heart Attack Prediction Model EDA
100% (1)
Heart Attack Prediction Model EDA
24 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
21 Feature Importance Methods in ML
100% (1)
21 Feature Importance Methods in ML
41 pages
Python Tasks and ML Projects
0% (1)
Python Tasks and ML Projects
5 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Hierarchical and Partitional Clustering
No ratings yet
Hierarchical and Partitional Clustering
2 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
AS Notebook - PCA - Wine Data-4
100% (1)
AS Notebook - PCA - Wine Data-4
1 page
1051791158741317
No ratings yet
1051791158741317
74 pages
AI&ML
No ratings yet
AI&ML
18 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
ANOVA Analysis of Hay Fever Relief
100% (2)
ANOVA Analysis of Hay Fever Relief
56 pages
DWDM Record Print1
No ratings yet
DWDM Record Print1
100 pages
K9-Assignment 3 - MITS6002 Business Analytics
100% (3)
K9-Assignment 3 - MITS6002 Business Analytics
11 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Introduction To Python and Computer Programming 1704298503
No ratings yet
Introduction To Python and Computer Programming 1704298503
44 pages
Wholesale Data Analysis Report
100% (1)
Wholesale Data Analysis Report
17 pages
Chap 5
No ratings yet
Chap 5
8 pages
SMDM Project
100% (1)
SMDM Project
22 pages
Machine Learning Project Analysis
No ratings yet
Machine Learning Project Analysis
114 pages
12 PAGES - Random Forest Algorithm, Support Vector Machine For Regression Analysis
No ratings yet
12 PAGES - Random Forest Algorithm, Support Vector Machine For Regression Analysis
12 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Why Do You Need To Scale Data in KNN: 3 Answers
No ratings yet
Why Do You Need To Scale Data in KNN: 3 Answers
1 page
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Rose Wine Sales Forecasting Guide
No ratings yet
Rose Wine Sales Forecasting Guide
52 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Time Series Forecasting Project (Shoe Sales)
No ratings yet
Time Series Forecasting Project (Shoe Sales)
26 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Credit Card Approval Project Report
No ratings yet
Credit Card Approval Project Report
34 pages
Semi-Automated EDA in Python
No ratings yet
Semi-Automated EDA in Python
3 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Python Data Visualization Guide
No ratings yet
Python Data Visualization Guide
16 pages
Aiml Ques
No ratings yet
Aiml Ques
2 pages
K Means Clustering
100% (1)
K Means Clustering
10 pages
Sports Injury Probability Analysis
No ratings yet
Sports Injury Probability Analysis
12 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Statistical Machine Learning Assignment
No ratings yet
Statistical Machine Learning Assignment
5 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
9 pages
VARUNSAINI - 13 Nov 2022
No ratings yet
VARUNSAINI - 13 Nov 2022
14 pages
Breast Cancer SVM Classification Guide
No ratings yet
Breast Cancer SVM Classification Guide
2 pages
Pandas Plotting Capabilities
No ratings yet
Pandas Plotting Capabilities
27 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Machine Learning Is Fun! - Adam Geitgey - Part 1
No ratings yet
Machine Learning Is Fun! - Adam Geitgey - Part 1
10 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
5th Semester Syllabus
No ratings yet
5th Semester Syllabus
17 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Naivebayes
No ratings yet
Naivebayes
7 pages
Haar-like Features & Cascade Algorithm
No ratings yet
Haar-like Features & Cascade Algorithm
3 pages
Time Series Analysis
100% (1)
Time Series Analysis
2 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
AIML
No ratings yet
AIML
38 pages
Yunita 2021 J. Phys. Conf. Ser. 1898 012044
No ratings yet
Yunita 2021 J. Phys. Conf. Ser. 1898 012044
15 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
Clustering Techniques in I.R.
No ratings yet
Clustering Techniques in I.R.
13 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
72 pages
ESO Data Dictionary
No ratings yet
ESO Data Dictionary
46 pages

Clustering - Jupyter Notebook

Uploaded by

Clustering - Jupyter Notebook

Uploaded by

9/8/2021 Completed 012 2021-08-23 to 2021-09-10 Part C Clustering - Jupyter Notebook

UCI Machine Learning Repository

In [1]: import pandas as pd

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 1/11

In [3]: iris_df = sns.load_dataset('iris')

Out[3]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

150 rows × 5 columns

Name: species, dtype: int64

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 2/11

In [5]: plt.figure(figsize = (7,7))

Out[5]: <AxesSubplot:xlabel='sepal_length', ylabel='sepal_width'>

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 3/11

In [6]: plt.figure(figsize = (7,7))

Out[6]: <AxesSubplot:xlabel='sepal_length', ylabel='petal_width'>

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 4/11

In [7]: plt.figure(figsize = (5,5))

Out[7]: <AxesSubplot:xlabel='petal_length', ylabel='petal_width'>

Raw Coding the k Means Clustering Algorithm

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 5/11

In [8]: from sklearn.metrics import pairwise_distances_argmin

# 2b. Find new centers from means of points

# 2c. Check for convergence

return centers, labels

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 6/11

Out[9]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

150 rows × 5 columns

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 7/11

In [10]: import matplotlib.gridspec as gridspec

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 8/11

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 9/11

In [11]: from sklearn.cluster import KMeans

f"KMeans is known to have a memory leak on Windows "

Out[11]: KMeans(algorithm='full', max_iter=600, n_clusters=3)

Out[12]: array([[5.006 , 3.428 , 1.462 , 0.246 ],

[5.9016129 , 2.7483871 , 4.39354839, 1.43387097],

[6.85 , 3.07368421, 5.74210526, 2.07105263]])

In [14]: pd.crosstab(kmc.labels_, iris_df['species'])

Out[14]: species setosa versicolor virginica

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 10/11

In [15]: from sklearn.metrics import silhouette_score

localhost:8888/notebooks/Completed 012 2021-08-23 to 2021-09-10 Part C Clustering.ipynb 11/11

You might also like