Module 3 Clustering

Cluster analysis, or clustering, is a data mining method that groups similar data points together to form clusters, facilitating the organization of unlabelled data. Various clustering methods include partitioning, hierarchical, density-based, grid-based, and model-based approaches, each with unique techniques and applications. For instance, the DBSCAN method identifies density-connected points to form clusters while filtering out noise, making it effective for datasets with arbitrary shapes.

Uploaded by

nazalmhd02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views57 pages

Module 3 Clustering

Uploaded by

nazalmhd02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Mod3 – Introduction to Clustering

Cluster analysis
●
Cluster analysis also known as clustering , is a method of data mining
that groups similar data points together.
●
The goal of cluster analysis is to divide a datasets into groups(or
cluster) such that the data points within each group are more similar
to each other than to data points in other grops.
●
The given data is divided into different groups by combining similar
objects into a group. This group is a cluster. A cluster is a collection of
similar data which is grouped together.
●
Cluster: A collection of data objects similar (or
related) to one another within the same group and
dissimilar (or unrelated) to the objects in other
groups
●
Cluster analysis - Finding similarities between
data according to the characteristics found in the
data and grouping similar data objects into clusters
●
For example, consider a dataset of vehicles given in which it contain information
about different vehicles like cars, buses, bicycles,etc. As it is unsupervised
learning there are no class labels like cars, bikes, etc for all the vehicles all the
data is combined and is not in a structured manner.
●
Now our task is to convert the unlabelled data to labelled data and it can be done
using clusters.
●
The main idea of cluster analysis is that it would arrange all the data points by
forming clusters like cars clusters which contains all the cars,bikes clusters which
contains all the bikes, etc.
●
Simply it is the partitioning of similar objects which are applied to unlabelled data.
There can be different similarity measures
Clustering Methods:
●
Partitioning methods
●
Hierarchical Clustering Methods
●
Density-based Methods
●
Grid-based methods
●
Model-based methods
Partitioning methods
Given a database of n objects or data tuples, a partitioning method
constructs k partitions of the data, where each partition represents a
cluster and k ≤ n. That is, it classifies the data into k groups, which
together satisfy the following requirements:
(1) each group must contain at least one object, and
(2) each object must belong to exactly one group.
In the partitioning method, there is one technique called iterative
relocation, which means the object will be moved from one group to
another to improve the partitioning.
PAM(Partitioning Around Medoid) or
K-Medoid
i x y
X1 2 6
X2 3 4
X3 3 8
X4 4 7
X5 6 2
X6 6 4
X7 7 3
X8 7 4
X9 8 5
X10 7 6
●
Apply K-Medoid clustering algorithm to form two clusters.
●
Use Manhattan distance to find the data point and medoid.
Steps:
– Select two medoids
– C1=(3,4), C2=(7,4)
to find the distance use manhattan distance formula:
(x1,y1) and (x2, y2) are data points
●
Manhattan Distance =|x1-x2|+|y1-y2|
recent:///f0b28661b1f9f6a09297d1c867b4afbe
Hierarchical Clustering Methods:
Hierarchical Clustering Methods:
-clustering is done by hierarchical decomposition
-Objects are grouped into tree of clusters
Two types of hierarchical method depending on whether hierarchical decomposition is made
bottom-up(merging is done) or top-down(splitting is done):

Agglomerative: Bottom-up approach, Initially each object will be in a seperate cluster(bottom).

Then successively merges the clusters that are close to one another until all of the objects are
merged into one(the top-most level of the hierarchy), or until a termination condition holds(such
as a desired number of clusters is obtained or the diameter of each cluster is within a certain
threshold).
Divisive approach: Top-down approach, starts with all
objects in one cluster. In each successive iteration, a cluster
is split up into smaller cluster until eventually each object is in
one cluster, or until a termination condition holds(such as a
desired number of clusters is obtained or the diameter of
each cluster is within a certain threshold).
Density-based Methods
Density-based Methods : based on the notion of
density(number of objects or data points). It continue clustering
as long as the density in the “neighbourhood” exceeds some
threshold;
●
Such a method can be used to filter out noise (outliers) and
discover clusters of arbitrary shape
DBSCAN
●
Density Based Spatial Clustering of appliction with noise.
●
It has 2 inputs (E and minpoints())
●
E- radius of circle formed with dataobject as center.
●
Minpts()- minimum no of datapoints inside the circle.
●
3 types of datapoints.
– Core points:- it should satisfy the condition of minpoints.
– Boundarypoint:- neighbour of core.
– Noise point:- not core nor boundary.
DBSCAN - Procedure
● A density-based cluster is defined as a group of density connected
points. (ie, find a group of density-connected points.)
● The algorithm of density-based clustering works as follow:

● For each point xi, compute the distance between xi and the other
points.
● Finds all neighbor points within distance eps of the starting point xi.
Each point, with a neighbor count greater than or equal to MinPts, is
marked as core point or visited.(In this step , core points are
identified.)
● For each core point, if it’s not already assigned to a
cluster, create a new cluster. Find recursively all its
density connected points and assign them to the
same cluster as the core point.
● Iterate through the remaining unvisited points in
the dataset.
● Those points that do not belong to any cluster are

treated as outliers or noise.

● Advantages
● Applicable for spatial database
● Discovery of clusters with arbitrary shape,
● Good efficiency on large databases, i.e., on databases of significantly more than just a few
thousand objects.
● 4) Minimal requirements of domain knowledge to determine the input parameters, because

appropriate values are often not known in advance when dealing with large databases.
● 5) Only two parameters are required

● 6) the number of clusters does not need to be specified by the user

● 7) Since it has a concept of noise, it works well even with noisy datasets.

● Disadvantages

● Not good at handling high dimensional data

Grid-based methods
●
Grid-based methods: Grid-based methods quantize the object space
into a finite number of cells that form a grid structure. All of the
clustering operations are performed on the grid structure (i.e., on the
quantized space). The main advantage of this approach is its fast
processing time, which is typically independent of the number of data
objects and dependent only on the number of cells in each dimension in
the quantized space.
Model-based methods
●
Model-based methods: Model-based methods hypothesize a model for each of
the clusters and find the best fit of the data to the given model.
●
EM is an algorithm that performs expectation maximization analysis based on
statistical modeling.
●
COBWEB is a conceptual learning algorithm that performs probability analysis
and takes concepts as a model for clusters.
●
SOM (or self-organizing feature map) is a neural network-based algorithm.
ROCK Clustering algorithm
●
ROCK is Robust Clustering using linKs.
●
ROCK belongs to the class of agglomerative
hierarchical clustering algorithms.
●
ROCK works for categorical attributes.

CS8080 Information Retrieval Techniques Reg 2017 Question Bank
No ratings yet
CS8080 Information Retrieval Techniques Reg 2017 Question Bank
6 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
A Short Review On Different Clustering Techniques and Their Applications
No ratings yet
A Short Review On Different Clustering Techniques and Their Applications
15 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
(3rd Year) Pattern REcognition Lecture 4
No ratings yet
(3rd Year) Pattern REcognition Lecture 4
48 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Unit VII
No ratings yet
Unit VII
30 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
CLUSTER ANALYSIS Unit 3 Data Mining
No ratings yet
CLUSTER ANALYSIS Unit 3 Data Mining
84 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
(Ebook PDF) Data Mining Concepts and Techniques 3rd Instant Download
100% (4)
(Ebook PDF) Data Mining Concepts and Techniques 3rd Instant Download
54 pages
Unit 4
No ratings yet
Unit 4
16 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Clustering
No ratings yet
Clustering
34 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Scikit Learn User Guide 0.12
100% (1)
Scikit Learn User Guide 0.12
1,049 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
ML - 8
No ratings yet
ML - 8
70 pages
BIDA - Question Bank
No ratings yet
BIDA - Question Bank
21 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
Clustering New
No ratings yet
Clustering New
6 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
DataMining Unit4 Notes
No ratings yet
DataMining Unit4 Notes
27 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
M5
No ratings yet
M5
40 pages
Module 5
No ratings yet
Module 5
91 pages
Clustering
No ratings yet
Clustering
8 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Cluster Analysis in Data Mining
No ratings yet
Cluster Analysis in Data Mining
36 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Clustering
No ratings yet
Clustering
11 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Clustering for Data Analysts
No ratings yet
Clustering for Data Analysts
6 pages
Clustering
No ratings yet
Clustering
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
MCQ On Consumer Perception and Consumer Preference: A) True B) False
No ratings yet
MCQ On Consumer Perception and Consumer Preference: A) True B) False
4 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Data Mining Concepts and Techniques 3rd Edition Han Fast Access
No ratings yet
Data Mining Concepts and Techniques 3rd Edition Han Fast Access
317 pages
Density-Based Clustering Insights
No ratings yet
Density-Based Clustering Insights
8 pages
Computer Vision Lecture Notes All Compress
No ratings yet
Computer Vision Lecture Notes All Compress
17 pages
9.log. Chapter 9 - Full
No ratings yet
9.log. Chapter 9 - Full
77 pages
A Guide To Machine Learning Algorithms 100+
No ratings yet
A Guide To Machine Learning Algorithms 100+
49 pages
AI-Driven Customer Profiling
No ratings yet
AI-Driven Customer Profiling
11 pages
Improving The Location of Nodes in Wireless Ad Hoc and Sensor Networks Using Improvised LAL Approach
100% (1)
Improving The Location of Nodes in Wireless Ad Hoc and Sensor Networks Using Improvised LAL Approach
10 pages
CMR University School of Engineering and Technology Department of Cse and It
No ratings yet
CMR University School of Engineering and Technology Department of Cse and It
6 pages
Pattern Recognition Theodoridis S. and Koutroumbas K. 2006 Book Reviews
No ratings yet
Pattern Recognition Theodoridis S. and Koutroumbas K. 2006 Book Reviews
1 page
Camintac Essay - Nubbh Kejriwal
No ratings yet
Camintac Essay - Nubbh Kejriwal
4 pages
Data Mining Clustering Insights
No ratings yet
Data Mining Clustering Insights
3 pages
Kmeans KDJ
No ratings yet
Kmeans KDJ
8 pages
Visual QA and ML Projects Overview
No ratings yet
Visual QA and ML Projects Overview
1 page
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Neural Network-Based Algorithm
No ratings yet
Neural Network-Based Algorithm
14 pages
ML Unit 5 Material SVCK Cse
No ratings yet
ML Unit 5 Material SVCK Cse
22 pages
PHYLIS
No ratings yet
PHYLIS
65 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
High Dimensional Data Clustering Using Cuckoo Search Optimization Algorithm
No ratings yet
High Dimensional Data Clustering Using Cuckoo Search Optimization Algorithm
5 pages
Constucting and Analyzing Microbiome Networks in R - Layeghifard2018
No ratings yet
Constucting and Analyzing Microbiome Networks in R - Layeghifard2018
24 pages
AI Ass 2
No ratings yet
AI Ass 2
32 pages
Knee Point Detection For Detecting Automatically The Number of Clusters During Clustering Techniques
No ratings yet
Knee Point Detection For Detecting Automatically The Number of Clusters During Clustering Techniques
10 pages
0 Content Crime Studies Case Study Tamil Nadu
No ratings yet
0 Content Crime Studies Case Study Tamil Nadu
10 pages
98 Jicr September 3208
No ratings yet
98 Jicr September 3208
6 pages
Carpenter 2020
No ratings yet
Carpenter 2020
2 pages

Module 3 Clustering

Uploaded by

Module 3 Clustering

Uploaded by

Mod3 – Introduction to Clustering

Agglomerative: Bottom-up approach, Initially each object will be in a seperate cluster(bottom).

treated as outliers or noise.

● 6) the number of clusters does not need to be specified by the user

● Not good at handling high dimensional data

You might also like