0% found this document useful (0 votes)

241 views11 pages

K Means Clustering

The document discusses the K-means clustering algorithm. K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into K number of clusters based on their similarities. It works by assigning each data point to the nearest cluster center and updating cluster centers to be the average of all points assigned to that cluster. This process repeats for a certain number of iterations to properly cluster the data.

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

241 views11 pages

K Means Clustering

Uploaded by

Shobha Kumari Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

K means Clustering – Introduction

K-Means Clustering is an Unsupervised Machine Learning algorithm, which groups

the unlabeled dataset into different clusters.
K means Clustering
Unsupervised Machine Learning learning is the process of teaching a computer to
use unlabeled, unclassified data and enabling the algorithm to operate on that data
without supervision. Without any previous data training, the machine’s job in this
case is to organize unsorted data according to parallels, patterns, and variations.
The goal of clustering is to divide the population or set of data points into a number
of groups so that the data points within each group are more comparable to one
another and different from the data points within the other groups. It is essentially a
grouping of things based on how similar and different they are to one another.
We are given a data set of items, with certain features, and values for these features
(like a vector). The task is to categorize those items into groups. To achieve this, we
will use the K-means algorithm; an unsupervised learning algorithm. ‘K’ in the name
of the algorithm represents the number of groups/clusters we want to classify our
items into.
(It will help if you think of items as points in an n-dimensional space). The
algorithm will categorize the items into k groups or clusters of similarity. To
calculate that similarity, we will use the euclidean distance as a measurement.
The algorithm works as follows:
1. First, we randomly initialize k points, called means or cluster centroids.
2. We categorize each item to its closest mean and we update the mean’s
coordinates, which are the averages of the items categorized in that cluster
so far.
3. We repeat the process for a given number of iterations and at the end, we
have our clusters.
The “points” mentioned above are called means because they are the mean values of
the items categorized in them. To initialize these means, we have a lot of options. An
intuitive method is to initialize the means at random items in the data set. Another
method is to initialize the means at random values between the boundaries of the
data set (if for a feature x, the items have values in [0,3], we will initialize the means
with values for x at [0,3]).
The above algorithm in pseudocode is as follows:
Initialize k means with random values

--> For a given number of iterations:

--> Iterate through items:

--> Find the mean closest to the item by calculating

the euclidean distance of the item with each of the means

--> Assign item to mean

--> Update mean by shifting it to the average of the items

in that cluster

Import the necessary Libraries:

We are importing Numpy for statistical computations, Matplotlib to plot the graph,
and make_blobs from sklearn.datasets.

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

Create the custom dataset with make_blobs and plot it

X,y = make_blobs(n_samples = 500,n_features = 2,centers = 3,random_state = 23)

fig = plt.figure(0)

plt.grid(True)

plt.scatter(X[:,0],X[:,1])

plt.show()

Output:
Clustering dataset

Initialize the random centroids

 Python3
k = 3

clusters = {}

np.random.seed(23)

for idx in range(k):

center = 2*(2*np.random.random((X.shape[1],))-1)

points = []

cluster = {

'center' : center,

'points' : []

}
clusters[idx] = cluster

clusters

Output:
{0: {'center': array([0.06919154, 1.78785042]), 'points': []},
1: {'center': array([ 1.06183904, -0.87041662]), 'points': []},
2: {'center': array([-1.11581855, 0.74488834]), 'points': []}}
Plot the random initialize center with data points

 Python3
plt.scatter(X[:,0],X[:,1])

plt.grid(True)

for i in clusters:

center = clusters[i]['center']

plt.scatter(center[0],center[1],marker = '*',c = 'red')

plt.show()

Output:
Data points with random center

Define euclidean distance

 Python3
def distance(p1,p2):

return np.sqrt(np.sum((p1-p2)**2))

Create the function to Assign and Update the cluster center

 Python3
#Implementing E step

def assign_clusters(X, clusters):

for idx in range(X.shape[0]):

dist = []

curr_x = X[idx]

for i in range(k):
dis = distance(curr_x,clusters[i]['center'])

dist.append(dis)

curr_cluster = np.argmin(dist)

clusters[curr_cluster]['points'].append(curr_x)

return clusters

#Implementing the M-Step

def update_clusters(X, clusters):

for i in range(k):

points = np.array(clusters[i]['points'])

if points.shape[0] > 0:

new_center = points.mean(axis =0)

clusters[i]['center'] = new_center

clusters[i]['points'] = []

return clusters

Create the function to Predict the cluster for the datapoints

 Python3
def pred_cluster(X, clusters):

pred = []

for i in range(X.shape[0]):

dist = []

for j in range(k):

dist.append(distance(X[i],clusters[j]['center']))

pred.append(np.argmin(dist))

return pred

Assign, Update, and predict the cluster center

 Python3
clusters = assign_clusters(X,clusters)
clusters = update_clusters(X,clusters)

pred = pred_cluster(X,clusters)

Plot the data points with their predicted cluster center

 Python3
plt.scatter(X[:,0],X[:,1],c = pred)

for i in clusters:

center = clusters[i]['center']

plt.scatter(center[0],center[1],marker = '^',c = 'red')

plt.show()

Output:

K-means Clustering

Example 2:

Import the necessary libraries

 Python3
import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

import matplotlib.cm as cm

from sklearn.datasets import load_iris

from sklearn.cluster import KMeans

Load the Dataset

 Python3
X, y = load_iris(return_X_y=True)

Elbow Method
Finding the ideal number of groups to divide the data into is a basic stage in any
unsupervised algorithm. One of the most common techniques for figuring out this
ideal value of k is the elbow approach.

 Python3
#Find optimum number of cluster

sse = [] #SUM OF SQUARED ERROR

for k in range(1,11):

km = KMeans(n_clusters=k, random_state=2)

km.fit(X)

sse.append(km.inertia_)

Plot the Elbow graph to find the optimum number of cluster

 Python3
sns.set_style("whitegrid")

g=sns.lineplot(x=range(1,11), y=sse)

g.set(xlabel ="Number of cluster (k)",

ylabel = "Sum Squared Error",

title ='Elbow Method')

plt.show()

Output:

Elbow Method

From the above graph, we can observe that at k=2 and k=3 elbow-like situation. So,
we are considering K=3
Build the Kmeans clustering model

 Python3
kmeans = KMeans(n_clusters = 3, random_state = 2)

kmeans.fit(X)

Output:
KMeans
KMeans(n_clusters=3, random_state=2)
Find the cluster center

 Python3
kmeans.cluster_centers_

Output:
array([[5.006 , 3.428 , 1.462 , 0.246 ],
[5.9016129 , 2.7483871 , 4.39354839, 1.43387097],
[6.85 , 3.07368421, 5.74210526, 2.07105263]])
Predict the cluster group:

 Python3
pred = kmeans.fit_predict(X)

pred

Output:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2,
2, 2,
2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2,
2, 2,
2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1],
dtype=int32)
Plot the cluster center with data points

 Python3
plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.scatter(X[:,0],X[:,1],c = pred, cmap=cm.Accent)

plt.grid(True)

for center in kmeans.cluster_centers_:

center = center[:2]

plt.scatter(center[0],center[1],marker = '^',c = 'red')

plt.xlabel("petal length (cm)")

plt.ylabel("petal width (cm)")

plt.subplot(1,2,2)

plt.scatter(X[:,2],X[:,3],c = pred, cmap=cm.Accent)

plt.grid(True)

for center in kmeans.cluster_centers_:

center = center[2:4]

plt.scatter(center[0],center[1],marker = '^',c = 'red')

plt.xlabel("sepal length (cm)")

plt.ylabel("sepal width (cm)")

plt.show()

Output:

K-means clustering

Osint Complete Resources
No ratings yet
Osint Complete Resources
43 pages
K Means Clustering - Introduction - GeeksforGeeks
No ratings yet
K Means Clustering - Introduction - GeeksforGeeks
11 pages
0006 - K Means Clustering - Introduction - 2025
No ratings yet
0006 - K Means Clustering - Introduction - 2025
19 pages
Unit 4
No ratings yet
Unit 4
63 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
K Means
No ratings yet
K Means
3 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
K Means
No ratings yet
K Means
25 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
Algo
No ratings yet
Algo
59 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
K-Means Clustering Tutorial
No ratings yet
K-Means Clustering Tutorial
16 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Assignment No 5
No ratings yet
Assignment No 5
5 pages
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
K-Means Clustering for Data Analysts
No ratings yet
K-Means Clustering for Data Analysts
25 pages
Aiml Assignment 10
No ratings yet
Aiml Assignment 10
6 pages
Eml 10 250825
No ratings yet
Eml 10 250825
91 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
22 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Unit 4
No ratings yet
Unit 4
22 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
DS - ML - 7 - 60019210046 1
No ratings yet
DS - ML - 7 - 60019210046 1
6 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
08 K-Means
No ratings yet
08 K-Means
19 pages
Exp 5 ML
No ratings yet
Exp 5 ML
9 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
STAT452 Project1
No ratings yet
STAT452 Project1
13 pages
K-Means Clustering Python Guide
No ratings yet
K-Means Clustering Python Guide
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
84 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Difference Between K Means and Hierarchical Clustering
No ratings yet
Difference Between K Means and Hierarchical Clustering
2 pages
Linear Equations-2
No ratings yet
Linear Equations-2
2 pages
Activation Functions
No ratings yet
Activation Functions
15 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
SQL UNION Clause
No ratings yet
SQL UNION Clause
3 pages
SQL Sequences for Database Developers
No ratings yet
SQL Sequences for Database Developers
3 pages
SQL Query Processing10
No ratings yet
SQL Query Processing10
3 pages
SQL WITH Clause
No ratings yet
SQL WITH Clause
3 pages
Through The Language Glass Why The World PDF
0% (6)
Through The Language Glass Why The World PDF
7 pages
Lord's Piso Wifi
No ratings yet
Lord's Piso Wifi
2 pages
2013HW70753-EndSemReport-Sagar Agrawal
No ratings yet
2013HW70753-EndSemReport-Sagar Agrawal
56 pages
NDIA GVSETS 2024 MOSA Session - (Papers) Enabling Multi-Vendor Model Based Application Development Using The FACE Technical Standard
No ratings yet
NDIA GVSETS 2024 MOSA Session - (Papers) Enabling Multi-Vendor Model Based Application Development Using The FACE Technical Standard
10 pages
HTML Cheatsheet
No ratings yet
HTML Cheatsheet
6 pages
DXB3100 Radio 2212 B20 Ericsson Faulty Report
No ratings yet
DXB3100 Radio 2212 B20 Ericsson Faulty Report
1 page
AZ204 Resources
No ratings yet
AZ204 Resources
3 pages
Image Analytics, Unit-3
No ratings yet
Image Analytics, Unit-3
12 pages
Keyboard Scan Codes: Set 2: 101-, 102-, and 104-Key Keyboards
No ratings yet
Keyboard Scan Codes: Set 2: 101-, 102-, and 104-Key Keyboards
2 pages
Apple's Brand Loyalty
No ratings yet
Apple's Brand Loyalty
10 pages
OS - Chapter - 4 - Memory Management
No ratings yet
OS - Chapter - 4 - Memory Management
48 pages
Abhipedia Abhimanu Com Article 1049 MjcyMDc2 My Experiments With Silence
No ratings yet
Abhipedia Abhimanu Com Article 1049 MjcyMDc2 My Experiments With Silence
5 pages
IND AS 115: Revenue Recognition Guide
No ratings yet
IND AS 115: Revenue Recognition Guide
21 pages
J1939 Explained - A Simple Intro (2023) - CSS Electronics
No ratings yet
J1939 Explained - A Simple Intro (2023) - CSS Electronics
8 pages
Dataset Penjualan Produk
No ratings yet
Dataset Penjualan Produk
4 pages
Excel Skills Lab Guide for MBA Students
No ratings yet
Excel Skills Lab Guide for MBA Students
49 pages
Lutech Viewer2-9 Manual FINAL 200930A
No ratings yet
Lutech Viewer2-9 Manual FINAL 200930A
19 pages
EpicWeb Customer Portal User Guide
No ratings yet
EpicWeb Customer Portal User Guide
11 pages
Final ETI Micro Project Report
0% (1)
Final ETI Micro Project Report
17 pages
Migration POC
No ratings yet
Migration POC
10 pages
Project Diary - Major
No ratings yet
Project Diary - Major
12 pages
SG110CX: Multi-MPPT String Inverter For System
No ratings yet
SG110CX: Multi-MPPT String Inverter For System
2 pages
Ar
No ratings yet
Ar
10 pages
HyperX Cloud Flight S FW Update Instructions Rev 3102 4107
No ratings yet
HyperX Cloud Flight S FW Update Instructions Rev 3102 4107
3 pages
Lecture-4 Code of Conduct
No ratings yet
Lecture-4 Code of Conduct
35 pages
Venkata Rami Reddy Resume
No ratings yet
Venkata Rami Reddy Resume
1 page
Vlsi Interview Questions
0% (1)
Vlsi Interview Questions
10 pages
Save & Restore ARKit World Maps
No ratings yet
Save & Restore ARKit World Maps
9 pages
ICT's Role in Modern Media Transformation
No ratings yet
ICT's Role in Modern Media Transformation
6 pages

K Means Clustering

Uploaded by

K Means Clustering

Uploaded by

K means Clustering – Introduction

K-Means Clustering is an Unsupervised Machine Learning algorithm, which groups

--> For a given number of iterations:

--> Iterate through items:

--> Find the mean closest to the item by calculating

--> Assign item to mean

--> Update mean by shifting it to the average of the items

Import the necessary Libraries:

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

Create the custom dataset with make_blobs and plot it

Initialize the random centroids

for idx in range(k):

plt.scatter(center[0],center[1],marker = '*',c = 'red')

Define euclidean distance

Create the function to Assign and Update the cluster center

def assign_clusters(X, clusters):

for idx in range(X.shape[0]):

#Implementing the M-Step

def update_clusters(X, clusters):

new_center = points.mean(axis =0)

Create the function to Predict the cluster for the datapoints

Assign, Update, and predict the cluster center

Plot the data points with their predicted cluster center

plt.scatter(center[0],center[1],marker = '^',c = 'red')

Import the necessary libraries

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris

from sklearn.cluster import KMeans

Load the Dataset

sse = [] #SUM OF SQUARED ERROR

Plot the Elbow graph to find the optimum number of cluster

g.set(xlabel ="Number of cluster (k)",

title ='Elbow Method')

for center in kmeans.cluster_centers_:

plt.scatter(center[0],center[1],marker = '^',c = 'red')

plt.xlabel("petal length (cm)")

plt.ylabel("petal width (cm)")

plt.scatter(X[:,2],X[:,3],c = pred, cmap=cm.Accent)

for center in kmeans.cluster_centers_:

plt.scatter(center[0],center[1],marker = '^',c = 'red')

plt.xlabel("sepal length (cm)")

plt.ylabel("sepal width (cm)")

You might also like