K Means Clustering

K-Means Clustering is an unsupervised learning algorithm that groups unlabeled datasets into a predetermined number of clusters based on similarities. The algorithm iteratively assigns data points to the nearest centroids and recalculates centroids until the clusters stabilize. Unsupervised learning also includes association methods, which identify relationships between variables in large datasets.

Uploaded by

priskilla Selvin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views11 pages

K Means Clustering

Uploaded by

priskilla Selvin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to

solve the clustering problems in machine learning or data science.

Unsupervised learning
Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the
given dataset, which means it does not have any idea about the features of the
dataset. The task of the unsupervised learning algorithm is to identify the image
features on their own. Unsupervised learning algorithm will perform this task by
clustering the image dataset into the groups according to similarities between
images.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two
types of problems:

o Clustering: Clustering is a method of grouping the objects into

clusters such that objects with most similarities remains into a group
and has less or no similarities with the objects of another group.
Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those
commonalities.
o Association: An association rule is an unsupervised learning method
which is used for finding the relationships between variables in the
large database. It determines the set of items that occurs together in
the dataset. Association rule makes marketing strategy more effective.
Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis.

What is K-Means Algorithm?

 K-Means Clustering is an Unsupervised Learning algorithm, which
groups the unlabeled dataset into different clusters. Here K defines the
number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.
 It is an iterative algorithm that divides the unlabeled dataset into k different clusters
in such a way that each dataset belongs only one group that has similar properties.
 It allows us to cluster the data into different groups and a convenient
way to discover the categories of groups in the unlabeled dataset on
its own without the need for any training.
 It is a centroid-based algorithm, where each cluster is associated with
a centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.
 The algorithm takes the unlabeled dataset as input, divides the dataset
into k-number of clusters, and repeats the process until it does not find
the best clusters. The value of k should be predetermined in this
algorithm.

The below diagram explains the working of the K-means Clustering

Algorithm:
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the
new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Let's understand the above steps by considering the visual plots:
THEORITICAL EXPLANATION-K Means Algorithm
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these
two variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to
put them into different clusters. It means here we will try to group
these datasets into two different clusters.
o We need to choose some random k points or centroid to form the
cluster. These points can be either the points from the dataset or any
other point. So, here we are selecting the below two points as k points,
which are not the part of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-
point or centroid. We will compute it by applying some mathematics
that we have studied to calculate the distance between two points. So,
we will draw a median between both the centroids. Consider the below
image:

From the above image, it is clear that points left side of the line is near to the
K1 or blue centroid, and points to the right of the line are close to the yellow
centroid. Let's color them as blue and yellow for clear visualization.

o As we need to find the closest cluster, so we will repeat the process by

choosing a new centroid. To choose the new centroids, we will
compute the center of gravity of these centroids, and will find new
centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we
will repeat the same process of finding a median line. The median will
be like below image:

From the above image, we can see, one yellow point is on the left side of the
line, and two blue points are right to the line. So, these three points will be
assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which is

finding new centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids,

so the new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on
either side of the line, which means our model is formed. Consider the
below image:

As our model is ready, so we can now remove the assumed centroids, and
the two final clusters will be as shown in the below image:

Chapter 4
No ratings yet
Chapter 4
30 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
Unit 4
No ratings yet
Unit 4
125 pages
CH 5
No ratings yet
CH 5
34 pages
MLF Mod3
No ratings yet
MLF Mod3
10 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Aiml 8
No ratings yet
Aiml 8
7 pages
K Clustering
No ratings yet
K Clustering
28 pages
Intro to Clustering Techniques
No ratings yet
Intro to Clustering Techniques
13 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
6 pages
ML Unit 2
No ratings yet
ML Unit 2
17 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Unit IV
No ratings yet
Unit IV
96 pages
K-Means and K-Medoids Clustering Guide
No ratings yet
K-Means and K-Medoids Clustering Guide
29 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
ML (Unit 4)
No ratings yet
ML (Unit 4)
19 pages
Clustering
No ratings yet
Clustering
10 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Algo
No ratings yet
Algo
59 pages
Week 11
No ratings yet
Week 11
49 pages
Clustering
No ratings yet
Clustering
17 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
ML Exp8
No ratings yet
ML Exp8
4 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Clustering
No ratings yet
Clustering
24 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Unit 4
No ratings yet
Unit 4
22 pages
Pilot
No ratings yet
Pilot
3 pages
ML 12
No ratings yet
ML 12
19 pages
Unit 4
No ratings yet
Unit 4
29 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
Clustering
No ratings yet
Clustering
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
45 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
K-Means Clustering Guide 2023
No ratings yet
K-Means Clustering Guide 2023
14 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Determinnant 3 by 3 Matrix Practice
100% (1)
Determinnant 3 by 3 Matrix Practice
4 pages
K Mean
No ratings yet
K Mean
7 pages
What Are P, NP, NP-complete, and NP-hard? P
No ratings yet
What Are P, NP, NP-complete, and NP-hard? P
1 page
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Java and C Banker's Algorithm Examples
No ratings yet
Java and C Banker's Algorithm Examples
7 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
11 pages
Chapter 8 MultiFreedom Constraints Solutions
No ratings yet
Chapter 8 MultiFreedom Constraints Solutions
3 pages
Counterpropagation Networks
No ratings yet
Counterpropagation Networks
6 pages
Graphics Line & Circle Algorithms
No ratings yet
Graphics Line & Circle Algorithms
8 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
Post Test
No ratings yet
Post Test
4 pages
Viva Questions
No ratings yet
Viva Questions
15 pages
Numerical Analysis Course Guide
No ratings yet
Numerical Analysis Course Guide
6 pages
EE61 June 2010
No ratings yet
EE61 June 2010
2 pages
Chapter Twoo Simplex
100% (2)
Chapter Twoo Simplex
34 pages
Final Viva
No ratings yet
Final Viva
27 pages
Linear and Nonlinear Programming
No ratings yet
Linear and Nonlinear Programming
7 pages
Data Structures Course Guide
No ratings yet
Data Structures Course Guide
49 pages
A Short Introduction To Boosting
No ratings yet
A Short Introduction To Boosting
14 pages
Assignment 1 PSAC
No ratings yet
Assignment 1 PSAC
4 pages
CS 161 Summer 2009 Homework #2 Sample Solutions: Problem 1 (24 Points)
No ratings yet
CS 161 Summer 2009 Homework #2 Sample Solutions: Problem 1 (24 Points)
8 pages
21.2 Algo
No ratings yet
21.2 Algo
4 pages
Sparse 1
No ratings yet
Sparse 1
68 pages
Polynomials Algebra I
No ratings yet
Polynomials Algebra I
69 pages
HOD Responsibilities
No ratings yet
HOD Responsibilities
11 pages
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
No ratings yet
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
30 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Maths 35
No ratings yet
Maths 35
11 pages
Contemporary Communication Systems Using Matlab Proakis and Salehi
100% (2)
Contemporary Communication Systems Using Matlab Proakis and Salehi
443 pages
3 - CMF+EMA+AMA+Stiffness
No ratings yet
3 - CMF+EMA+AMA+Stiffness
2 pages
Term ML
No ratings yet
Term ML
9 pages
Fundamentals of Image Compression
No ratings yet
Fundamentals of Image Compression
11 pages
Expt 3 and 4 (New1)
No ratings yet
Expt 3 and 4 (New1)
13 pages
5.LU Decomposition Method
No ratings yet
5.LU Decomposition Method
3 pages
1.3 (1) Sahil Thakur - 22bda70064
No ratings yet
1.3 (1) Sahil Thakur - 22bda70064
5 pages
Assaignement 1
No ratings yet
Assaignement 1
7 pages
Partial Least Squares Regression (PLS) - Statistical Software For Excel
No ratings yet
Partial Least Squares Regression (PLS) - Statistical Software For Excel
3 pages
DC Question
No ratings yet
DC Question
5 pages