Cluster Analysis

Uploaded by

Anakha Ajayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views22 pages

Cluster Analysis

Uploaded by

Anakha Ajayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Cluster Analysis

AJEENA T
M23CSCS01
TKMCE
Contents
Introduction
Desired features of cluster analysis
Types of data in cluster analysis
Overview of basic clustering methods
DBSCAN - Density-Based Clustering Based on
Connected Regions with High Density
Introduction
Cluster analysis is the process of partitioning a
set of data objects into subsets.
The set of clusters resulting from a cluster analysis
can be referred to as a clustering.
Cluster, is a collection of data objects that are
similar to one another within the cluster and
dissimilar to objects in other cluster .
Applications such as business intelligence, image
pattern recognition, web search, biology and
security.
Sometimes called automatic classification

Clustering can automatically find the groupings

Clustering is known as unsupervised learning;

the class label information is not present

Clustering is a form of learning by observation,

rather than learning by example
Desired features of cluster analysis
Scalability : highly scalable algorithms are needed
Ability to deal with different types of
attributes : applications may require clustering
data types , such as binary, nominal, and ordinal
data , or mixtures of these data types.
 Discovery of clusters with arbitrary shape :
algorithms based on Euclidean or Manhattan
distance measures tend to find spherical clusters
with similar size and density. It is important to
develop algorithms that can detect clusters of
arbitrary shape.
Capability of clustering high-dimensionality
data: clustering algorithms are good at handling
low-dimensional data. Finding clusters of data
objects in a high-dimensional space is challenging.
Constraint- based clustering: to find data
groups with good clustering behaviour that satisfy
specified constraints.
Interpretability and usability: users want
clustering results to be interpretable,
comprehensible, and usable
Requirements for domain knowledge to
determine input parameters : many clustering
algorithms require users to provide domain
knowledge in the form of input parameters such as
the desired number of clusters
Ability to deal with noisy data: need clustering
methods that are robust to noise
Incremental clustering and insensitivity to
input order: incremental clustering algorithms and
algorithms that are insensitive to the input order are
needed
Types of data in cluster analysis

 Interval-scaled variables - Continuous measurements

of a roughly linear scale, e.g., weight and height,
latitude and longitude coordinates
 Binary variables - A variable that can take only 2
values, e.g., gender variables can take 2 values male
and female
 Nominal or categorical variables - A generalization of
the binary variable in that it can take more than 2
states, e.g., red, yellow, blue, green
 Ordinal variables - An ordinal variable can be discrete
or continuous. In this order is important, e.g.,rank.
 Ratio- Scaled variable - It is a positive measurement
on a nonlinear scale
 Variables of mixed type - A database may contain all
the types of variables binary, nominal, ordinal,
interval and ratio. And those combinedly called as
mixed-type variables
Overview of basic clustering methods
Partitioning method
 Given a set of n objects, a partitioning method constructs
k partitions of the data, each partition represents a
cluster and k<=n
 i.e., it divides the data into k groups such that each group
must contain at least one object
 The basic partitioning methods typically adopt exclusive
cluster separation. i.e., each object must belong to
exactly one group
 Commonly used partitioning methods are k-means and k-
medoids
Hierarchical methods
 Creates a hierarchical decomposition of the given set of
data objects
 It can be classified as being either agglomerative or
divisive
 Agglomerative approach is also called bottom-up
approach , starts with each object forming a separate
group - Merges the objects (groups) close to one another,
until a termination condition holds
 Divisive approach , also known as top-down approach ,
starts with all the objects in the same cluster - split into
smaller clusters, until each object is in one cluster
 can be distance-based or density- and continuity-based
Grid- based methods
 Quantize the object space into a finite number of
cells that form a grid structure
 All the operations are performed on the grid
structure
 Fast processing time (typically independent of
the number of data objects , yet dependent on
grid size)
Density –based methods
 Density- based methods can divide a set of objects into
multiple exclusive clusters
 Can find arbitrarily shaped clusters
 Clusters are dense regions of objects in space that are
separated by low-density regions
 Cluster density: each point must have a minimum
number of points within its “neighbourhood”
 Can be used to filter out noise or outliers
 Can be extended from full space to subspace clustering
Partitioning and hierarchical methods are designed
to find spherical-shaped clusters
If noise or outliers are included then they would
inaccurately identify convex regions
To find arbitrary shaped clusters, we can model
clusters as dense regions in the data space, separated
by sparse regions
This is the main strategy behind density-based
clustering method, which can discover clusters of
non spherical shape.
DBSCAN: Density-Based Clustering Based on
Connected Regions with High Density

To find dense regions in density-based

clustering
 The density of an object o can be measured by the
number of objects close to o.
 DBSCAN(density-based spatial clustering of applications
with noise) finds core objects- that have dense
neighbourhoods.
 Core objects and their neighbourhoods together form
dense regions as clusters
 To find the neighbourhood of an object by DBSCAN
o A user-specified parameter ε>0 is used to specify the
radius of a neighbourhood
o The ε-neighbourhood of an object o is the space within
a radius ε centered at o
o The density of a neighbourhood can be measured by
the number of objects in the neighbourhood
o DBSCAN uses another user-specified parameter, MintPts-
specifies the density threshold of dense regions
o An object is a core object if the ε-neighbourhood of the
object contains at least MinPts objects
 An object p is directly density-reachable from
another object q if and only if q is a core object and
p is in the ε-neighbourhood of q
 Two objects p1,p2 are density-connected with
respect to ε and MinPts if there is an object q such
that both p1 and p2 are density-reachable from q
with respect to ε and MintPts
 Consider an e.g.,of Density-reachability and
density-connectivity. Let MinPts =3
o Here m, p, o, r are core objects, each is in an ε-
neighbourhood containing at least 3 points
Object q is directly density-reachable from m, m is
directly density-reachable from p and vice versa
q is (indirectly) density-reachable from p; q is
directly density-reachable from m and m is directly
density-reachable from p
p is not density-reachable from q; q is not a core
object
Similarly, r and s are density-reachable from o and o
is density reachable from r. Thus, o, r, and s are all
density-connected
DBSCAN Algorithm

mark all objects as unvisited;

Do
randomly select an unvisited object p;
mark p as visited;
if the ε -neighbourhood of p has at least MinPts objects
create a new cluster C, and add p to C;
let N be the set of objects in the -neighbourhood of p;
for each point p’ in N
if p’ is unvisited
mark p’ as visited;
if the ε -neighbourhood of p’ has at least MinPts
points, add those points to N;
if p’ is not yet a member of any cluster, add p’ to C;
end for
output C;
else mark p as noise;
until no object is unvisited;
THE END

Accenture Complete Preparation Sheet
100% (1)
Accenture Complete Preparation Sheet
11 pages
Exams Questions and Model Answers
No ratings yet
Exams Questions and Model Answers
6 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Machine Learning Projects Python
94% (18)
Machine Learning Projects Python
134 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Coding Interview Questions & Solutions
No ratings yet
Coding Interview Questions & Solutions
56 pages
Density-Based Clustering Insights
No ratings yet
Density-Based Clustering Insights
8 pages
Density ML
No ratings yet
Density ML
51 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Ktustudents - In: 1. Hierarchical Methods
No ratings yet
Ktustudents - In: 1. Hierarchical Methods
21 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
Density-Based Clustering Methods
No ratings yet
Density-Based Clustering Methods
14 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
No ratings yet
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course
14 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Density-Based Clustering Guide
No ratings yet
Density-Based Clustering Guide
21 pages
M6
No ratings yet
M6
23 pages
LiDAR Full Notes
No ratings yet
LiDAR Full Notes
32 pages
Clustering
No ratings yet
Clustering
7 pages
SSRN Id3768295
No ratings yet
SSRN Id3768295
7 pages
Strain Gauge Catalog for Engineers
No ratings yet
Strain Gauge Catalog for Engineers
92 pages
Aluminum Wheel Casting Simulation
No ratings yet
Aluminum Wheel Casting Simulation
5 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
Dose Effectiveness Analysis
No ratings yet
Dose Effectiveness Analysis
71 pages
Article in Press: Heat and Mass Transfer in Apple Cubes in A Microwave-Assisted Uidized Bed Drier
No ratings yet
Article in Press: Heat and Mass Transfer in Apple Cubes in A Microwave-Assisted Uidized Bed Drier
9 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
1/4 Din Setpoint Programmer: FORM 3707 Edition 1 © May 1996 PRICE $10.00
No ratings yet
1/4 Din Setpoint Programmer: FORM 3707 Edition 1 © May 1996 PRICE $10.00
98 pages
ML - 8
No ratings yet
ML - 8
70 pages
Unreadable Document
No ratings yet
Unreadable Document
12 pages
Practice Problem Set - II PDF
No ratings yet
Practice Problem Set - II PDF
3 pages
Advanced Clustering for Varied Densities
No ratings yet
Advanced Clustering for Varied Densities
4 pages
9-4 Notes PDF
No ratings yet
9-4 Notes PDF
18 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
Clustering Part2
No ratings yet
Clustering Part2
29 pages
Cluster Analysis
No ratings yet
Cluster Analysis
27 pages
Clustering
No ratings yet
Clustering
65 pages
DBSCAN An Assessment of Density Based CL
No ratings yet
DBSCAN An Assessment of Density Based CL
5 pages
Clustering
No ratings yet
Clustering
75 pages
Applied Mathematics Msbte Board Paper PDF
No ratings yet
Applied Mathematics Msbte Board Paper PDF
3 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
Clustering Part2
No ratings yet
Clustering Part2
40 pages
Unit 5
No ratings yet
Unit 5
25 pages
Data Mining - Lecture 9
No ratings yet
Data Mining - Lecture 9
29 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
Clustering
No ratings yet
Clustering
12 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
ART 002 Lesson 5 Visual Elements of Arts and Designs
No ratings yet
ART 002 Lesson 5 Visual Elements of Arts and Designs
17 pages
Linear Equations and Inequalities Lesson Plan
100% (1)
Linear Equations and Inequalities Lesson Plan
7 pages
Functional Regression Insights
No ratings yet
Functional Regression Insights
7 pages
Lesson 4
No ratings yet
Lesson 4
2 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Module 10
No ratings yet
Module 10
59 pages
GM and Pre Cal PT
No ratings yet
GM and Pre Cal PT
3 pages
Blockchain Cryptography Essentials
No ratings yet
Blockchain Cryptography Essentials
41 pages
Density Based Clustering Technique
No ratings yet
Density Based Clustering Technique
54 pages
Barnouw - Vico and The Continuity of Science
No ratings yet
Barnouw - Vico and The Continuity of Science
13 pages
K Medoids
No ratings yet
K Medoids
101 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
DBSCAN Past, Present and Future
No ratings yet
DBSCAN Past, Present and Future
7 pages
Class - 10 Math Notes Chapter - 11 Constructions
No ratings yet
Class - 10 Math Notes Chapter - 11 Constructions
54 pages
TCW 1 - Introducing Statistics
No ratings yet
TCW 1 - Introducing Statistics
1 page
Cluster Analysis Methods Guide
100% (1)
Cluster Analysis Methods Guide
21 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
Residual Offset in Silicon Hall-Effect Sensor Analytical Formula Stress Effects and Implications For Octagonal Hall Plate Geometry
No ratings yet
Residual Offset in Silicon Hall-Effect Sensor Analytical Formula Stress Effects and Implications For Octagonal Hall Plate Geometry
9 pages
Bca Part 2 Differentiation and Integration 1 275 2020
No ratings yet
Bca Part 2 Differentiation and Integration 1 275 2020
2 pages
Unit VII
No ratings yet
Unit VII
30 pages
CHANDRA DZDA STAT6174037 ProbabilityTheoryandAppliedStatistics
No ratings yet
CHANDRA DZDA STAT6174037 ProbabilityTheoryandAppliedStatistics
17 pages
Current, Resistance, Emf - Summative Test
No ratings yet
Current, Resistance, Emf - Summative Test
3 pages
Did Staggered
No ratings yet
Did Staggered
37 pages
4pm1 01 Que 20250521
100% (1)
4pm1 01 Que 20250521
36 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
15 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
Open Lecture 13 - DBSCAN PDF
No ratings yet
Open Lecture 13 - DBSCAN PDF
33 pages
Module 3 Clustering
No ratings yet
Module 3 Clustering
57 pages

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

Cluster Analysis

Clustering can automatically find the groupings

Clustering is known as unsupervised learning;

Clustering is a form of learning by observation,

 Interval-scaled variables - Continuous measurements

To find dense regions in density-based

mark all objects as unvisited;

You might also like