0% found this document useful (0 votes)

31 views5 pages

Density and Grid Based Clustering

Density-Based Clustering is an unsupervised learning method that identifies clusters of arbitrary shapes and handles noise in data. The DBSCAN algorithm classifies points into core, border, and noise categories based on density parameters, while OPTICS improves upon DBSCAN by creating a reachability plot to identify clusters of varying densities. Grid-Based Methods, including STING and CLIQUE, utilize a grid structure for efficient clustering in high-dimensional data, focusing on statistical parameters and density thresholds.

Uploaded by

rayav46818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views5 pages

Density and Grid Based Clustering

Uploaded by

rayav46818

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Density Based Methods

Density-Based Clustering refers to one of the most popular unsupervised learning

methodologies used in model building and machine learning algorithms. The data points in the
region separated by two clusters of low point density are considered as noise. The surroundings
with a radius ε of a given object are known as the ε neighborhood of the object. If the ε
neighborhood of the object comprises at least a minimum number, MinPts of objects, then it is
called a core object.

Major features:

1. It is used to discover clusters of arbitrary shape.

2. It is also used to handle noise in the data clusters.
3. It is a one scan method.
4. It needs density parameters as a termination condition.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

Clusters are dense regions in the data space, separated by regions of the lower density of points.
The DBSCAN algorithm is based on this intuitive notion of “clusters” and “noise”. The key
idea is that for each point of a cluster, the neighborhood of a given radius has to contain at least
a minimum number of points.

Parameters Required for DBSCAN Algorithm

1. eps: It defines the neighborhood around a data point i.e. if the distance between two
points is lower or equal to ‘eps’ then they are considered neighbors. If the eps value is
chosen too small then a large part of the data will be considered as an outlier. If it is
chosen very large then the clusters will merge and the majority of the data points will
be in the same clusters. One way to find the eps value is based on the k-distance graph.
2. MinPts: Minimum number of neighbors (data points) within eps radius. The larger the
dataset, the larger value of MinPts must be chosen. As a general rule, the minimum
MinPts can be derived from the number of dimensions D in the dataset as, MinPts >=
D+1. The minimum value of MinPts must be chosen at least 3.

In this algorithm, we have 3 types of data points.

Core Point: A point is a core point if it has more than MinPts points within eps.
Border Point: A point which has fewer than MinPts within eps but it is in the neighborhood of
a core point.
Noise or outlier: A point which is not a core point or border point.
Steps used in DBSCAN Algorithm

1. Find all the neighbor points within eps and identify the core points or visited with
more than MinPts neighbors.

2. For each core point if it is not already assigned to a cluster, create a new cluster.

3. Find recursively all its density-connected points and assign them to the same cluster
as the core point.
A point a and b are said to be density connected if there exists a point c which has a
sufficient number of points in its neighbors and both points a and b are within the eps
distance. This is a chaining process. So, if b is a neighbor of c, c is a neighbor of d,
and d is a neighbor of e, which in turn is neighbor of a implying that b is a neighbor
of a.

4. Iterate through the remaining unvisited points in the dataset. Those points that do not
belong to any cluster are noise.

Ordering Points to Identify the Clustering Structure (OPTICS)

OPTICS (Ordering Points to Identify the Clustering Structure) is a density-based clustering
algorithm, it can extract clusters of varying densities and shapes. It is useful for identifying
clusters of different densities in large, high-dimensional datasets. The main idea behind
OPTICS is to extract the clustering structure of a dataset by identifying the density-connected
points. The algorithm builds a density-based representation of the data by creating an ordered
list of points called the reachability plot. Each point in the list is associated with a reachability
distance, which is a measure of how easy it is to reach that point from other points in the
dataset. Points with similar reachability distances are likely to be in the same cluster.
OPTICS Algorithm
1. Define a density threshold parameter, Eps, which controls the minimum density of
clusters.
2. For each point in the dataset, calculate the distance to its k-nearest neighbors.
3. Starting with an arbitrary point, calculate the reachability distance of each point in the
dataset, based on the density of its neighbors.
4. Order the points based on their reachability distance and create the reachability plot.
Extract clusters from the reachability plot by grouping points that are close to each
other and have similar reachability distances.
Several parameters including the minimum density threshold (Eps), the number of nearest
neighbors to consider (min_samples), and a reachability distance cutoff.
They are: -
1. Core Distance: It is the minimum value of radius required to classify a given point as
a core point. If the given point is not a Core point, then it’s Core Distance is undefined.
2. Reachability Distance: It is defined with respect to another data point q(Let). The
Reachability distance between a point p and q is the maximum of the Core Distance of
p and the Euclidean Distance (or some other distance metric) between p and q.

Discussion about OPTICS and DBSCAN Clustering:

1. Memory Cost: The OPTICS clustering technique requires more memory as it

maintains a priority queue (Min Heap) to determine the next data point which is closest
to the point currently being processed in terms of Reachability Distance. It also requires
more computational power because the nearest neighbour queries are more complicated
than radius queries in DBSCAN.

2. Fewer Parameters: The OPTICS clustering technique does not need to maintain the
epsilon parameter and is only given in the above pseudo-code to reduce the time taken.
This leads to the reduction of the analytical process of parameter tuning. This technique
does not segregate the given data into clusters. It merely produces a Reachability
distance plot and it is upon the interpretation of the programmer to cluster the points
accordingly.

3. Handling varying densities: DBSCAN clustering can struggle to handle datasets with
varying densities, as it requires a single value of epsilon to define the neighborhood
size for all points. In contrast, OPTICS can handle varying densities by using the
concept of reachability distance, which adapts to the local density of the data. This
means that OPTICS can identify clusters of different sizes and shapes more effectively
than DBSCAN in datasets with varying densities.

4. Cluster extraction: While both OPTICS and DBSCAN can identify clusters, OPTICS
produces a reachability distance plot that can be used to extract clusters at different
levels of granularity. This allows for more flexible clustering and can reveal clusters
that may not be apparent with a fixed epsilon value in DBSCAN. However, this also
requires more manual interpretation and decision-making on the part of the
programmer.

5. Noise handling: DBSCAN explicitly distinguishes between core points, boundary

points, and noise points, while OPTICS does not explicitly identify noise points.
Instead, points with high reachability distances can be considered as potential noise
points. However, this also means that OPTICS may be less effective at identifying
small clusters that are surrounded by noise points, as these clusters may be merged
with the noise points in the reachability distance plot.

6. Runtime complexity: The runtime complexity of OPTICS is generally higher than

that of DBSCAN, due to the use of a priority queue to maintain the reachability
distances. However, recent research has proposed optimizations to reduce the
computational complexity of OPTICS, making it more scalable for large datasets.
3.2.7 Grid Based Methods

Grid-based clustering method is used for multi-resolution of grid-based data structure. It is

used to quantize the area of the object into a finite number of cells, which is stored in the grid
system where all the operations of Clustering are implemented. We can use this method for
its quick processing time, which is generally independent of the number of data objects, still
dependent on only the multiple cells in each dimension in the quantized space.

There is an instance of a grid-based approach that involves STING, which explores statistical
data stored in the grid cells, and WaveCluster, which clusters objects using a wavelet
transform approach. And CLIQUE, which defines a grid-and density-based approach for
Clustering in high-dimensional data space.

STING (Statistical Information Grid)

It is also a grid-based clustering technique. This technique is used for a multidimensional

grid data structure, which is used to quantify the space into a finite number of cells. The main
factor of this technique is the value space surrounding the data points. The spatial area of the
STING can be divided into rectangular cells and several levels of cells at different resolution
levels. All the high-level cells are further divided into several low-level cells.

The STING contains all the data related to the attributes in each cell, such as mean,
maximum, and minimum values, which are precomputed and stored as statistical parameters.
These statistical parameters are useful for query processing and other data analysis tasks.

Steps:
Step 1: First, we have to Determine a layer to begin the process.
Step 2: For each cell, we have to calculate the confidence interval or estimated probability
range that this cell is relevant to the query.
Step 3: Then, we must level the cell as relevant or irrelevant based on the interval calculated.
Step 4: If the layer is the bottom layer, go to point 6; otherwise, go to point 5.
Step 5: It goes down the hierarchy structure by one level. Go to point 2 for those cells that
form the relevant cell of the high-level layer.
Step 6: If the specification required for the query is met, then we have to go to point 8;
otherwise, go to point 7.
Step 7: We must retrieve data that fall into the relevant cells and do further processing. Return
the result that meets the requirement of the query. Go to point 9.
Step 8: Find the regions of relevant cells. Return those regions that meet the query's
requirements. Go to point 9.
Step 9: Stop or terminate.

CLIQUE Algorithm

CLIQUE Algorithm uses density and grid-based technique i.e. subspace clustering algorithm
and finds out the cluster by taking density threshold and a number of grids as input
parameters. It is specially designed to handle datasets with a large number of dimensions.
CLIQUE Algorithm is very scalable with respect to the value of the records, and a number
of dimensions in the dataset because it is grid-based and uses the Apriori Property effectively.

Working of CLIQUE Algorithm

The CLIQUE algorithm first divides the data space into grids. It is done by dividing each
dimension into equal intervals called units. After that, it identifies dense units. A unit is dense
if the data points in this are exceeding the threshold value.

Once the algorithm finds dense cells along one dimension, the algorithm tries to find dense
cells along two dimensions, and it works until all dense cells along the entire dimension are
found.

After finding all dense cells in all dimensions, the algorithm proceeds to find the largest set
(“cluster”) of connected dense cells.

Finally, the CLIQUE algorithm generates a minimal description of the cluster. Clusters are
then generated from all dense subspaces using the apriori approach.

Vedic Numerology Course Guide
91% (64)
Vedic Numerology Course Guide
114 pages
Daily Math Review Sheets Grade 5 PDF
100% (2)
Daily Math Review Sheets Grade 5 PDF
77 pages
Force FX-8CS Service Manual - en
83% (6)
Force FX-8CS Service Manual - en
282 pages
Beginner's Guide To Accounting
100% (3)
Beginner's Guide To Accounting
70 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
14 pages
Optics
No ratings yet
Optics
3 pages
Density Based Clustering Technique
No ratings yet
Density Based Clustering Technique
54 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
15 pages
Density ML
No ratings yet
Density ML
51 pages
DBSCAN Clustering Explained
No ratings yet
DBSCAN Clustering Explained
3 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
6 pages
OPTICS: Ordering Points To Identify The Clustering Structure
No ratings yet
OPTICS: Ordering Points To Identify The Clustering Structure
10 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
M6
No ratings yet
M6
23 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
ML14 Dbscan
No ratings yet
ML14 Dbscan
10 pages
Unit 8 DBSCAN
No ratings yet
Unit 8 DBSCAN
53 pages
Density Based Clustering
No ratings yet
Density Based Clustering
17 pages
Density Based Clustering
No ratings yet
Density Based Clustering
17 pages
DBSCAN: Density-Based Clustering Guide
No ratings yet
DBSCAN: Density-Based Clustering Guide
18 pages
Dbscan
No ratings yet
Dbscan
18 pages
Density Based Clustering
No ratings yet
Density Based Clustering
19 pages
DBSCAN
No ratings yet
DBSCAN
8 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Parallel Implementation of OPTICS Algorithm
No ratings yet
Parallel Implementation of OPTICS Algorithm
10 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
DBSCAN Presentation
No ratings yet
DBSCAN Presentation
10 pages
Density-Based Clustering Guide
No ratings yet
Density-Based Clustering Guide
21 pages
DBSCAN
No ratings yet
DBSCAN
23 pages
Advanced Clustering for Varied Densities
No ratings yet
Advanced Clustering for Varied Densities
4 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
ML Exp 9
No ratings yet
ML Exp 9
5 pages
DM Lect 8 - Clustering - DBSCAN
No ratings yet
DM Lect 8 - Clustering - DBSCAN
22 pages
Data Mining
No ratings yet
Data Mining
3 pages
Dbscan: Densiy Based Scan Algorithm
No ratings yet
Dbscan: Densiy Based Scan Algorithm
8 pages
DBSCAN (Density-Based Spatial Clustering of Applications With
No ratings yet
DBSCAN (Density-Based Spatial Clustering of Applications With
27 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
Data Mining - Density Based Clustering
No ratings yet
Data Mining - Density Based Clustering
8 pages
Ktustudents - In: 1. Hierarchical Methods
No ratings yet
Ktustudents - In: 1. Hierarchical Methods
21 pages
Open Lecture 13 - DBSCAN PDF
No ratings yet
Open Lecture 13 - DBSCAN PDF
33 pages
Unsupervised Learning Clustering II
No ratings yet
Unsupervised Learning Clustering II
17 pages
DBSCAN
No ratings yet
DBSCAN
29 pages
Chapter 2 (19-06-2019 v2)
No ratings yet
Chapter 2 (19-06-2019 v2)
10 pages
Optics Algorithm
No ratings yet
Optics Algorithm
10 pages
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
No ratings yet
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
30 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
19 pages
DBSCAN
No ratings yet
DBSCAN
27 pages
DBSCAN
No ratings yet
DBSCAN
14 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
No ratings yet
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
3 pages
Density Based
No ratings yet
Density Based
52 pages
Density Based
No ratings yet
Density Based
52 pages
Density Based
No ratings yet
Density Based
52 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
Density Based Clustering (Unit 5)
No ratings yet
Density Based Clustering (Unit 5)
5 pages
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
No ratings yet
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
5 pages
DB SCAN Unit 4
No ratings yet
DB SCAN Unit 4
6 pages
MCQ
67% (3)
MCQ
274 pages
ENOVIASynchronicityDesignSyncDataManager ProjectSyncUser V6R2011x
No ratings yet
ENOVIASynchronicityDesignSyncDataManager ProjectSyncUser V6R2011x
295 pages
Siemens PBX & Cisco CallManager Guide
No ratings yet
Siemens PBX & Cisco CallManager Guide
37 pages
Prolegomenon To Geisha As A Cultural Performer: Miyako Odori, The Gion School and Representation of A Traditional" Japan - Mariko Okada
No ratings yet
Prolegomenon To Geisha As A Cultural Performer: Miyako Odori, The Gion School and Representation of A Traditional" Japan - Mariko Okada
7 pages
BLF24 T ST en GB
No ratings yet
BLF24 T ST en GB
4 pages
Dialogue Completion & Reading Comprehension
0% (1)
Dialogue Completion & Reading Comprehension
8 pages
Hitch Climbers Guide
No ratings yet
Hitch Climbers Guide
28 pages
Nanto Company Profile & Introduction Letter & ISO
No ratings yet
Nanto Company Profile & Introduction Letter & ISO
15 pages
Marginal Costing
No ratings yet
Marginal Costing
4 pages
Chapter 3 BJT
No ratings yet
Chapter 3 BJT
58 pages
Free Incoming Inspection Template
No ratings yet
Free Incoming Inspection Template
5 pages
Hypertension Cheat Sheet
No ratings yet
Hypertension Cheat Sheet
4 pages
M1000H
No ratings yet
M1000H
2 pages
Unit 2 - Esp in Elt - Complete
No ratings yet
Unit 2 - Esp in Elt - Complete
35 pages
11 - 8 - 2022 - 9 - 12 - 58 - 189bachelor of Science B.Sc. - ExamForm
No ratings yet
11 - 8 - 2022 - 9 - 12 - 58 - 189bachelor of Science B.Sc. - ExamForm
2 pages
Aws A 5-22
No ratings yet
Aws A 5-22
45 pages
MSC Solid State Physics Lecture#3
No ratings yet
MSC Solid State Physics Lecture#3
17 pages
A212 - MC 10 - PROVISIONS, CLCA - Student
No ratings yet
A212 - MC 10 - PROVISIONS, CLCA - Student
4 pages
CHEMISTRY Exam
No ratings yet
CHEMISTRY Exam
8 pages
Rare Project-2023-24 - 230614 - 163032
No ratings yet
Rare Project-2023-24 - 230614 - 163032
6 pages
Recovery CDs
No ratings yet
Recovery CDs
6 pages
LU-1500N Series: LU-1508NS LU-1508NH LU-1510N LU-1510N-7 LU-1509NS LU-1509NH LU-1511N-7
No ratings yet
LU-1500N Series: LU-1508NS LU-1508NH LU-1510N LU-1510N-7 LU-1509NS LU-1509NH LU-1511N-7
5 pages
Matrices
No ratings yet
Matrices
12 pages
Harley-Davidson Procurement Software Selection
No ratings yet
Harley-Davidson Procurement Software Selection
3 pages
Purposive Communication - Lesson 3
No ratings yet
Purposive Communication - Lesson 3
7 pages
Mobiltech Presentation
100% (1)
Mobiltech Presentation
27 pages

Density and Grid Based Clustering

Uploaded by

Density and Grid Based Clustering

Uploaded by

Density Based Methods

Density-Based Clustering refers to one of the most popular unsupervised learning

1. It is used to discover clusters of arbitrary shape.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

Parameters Required for DBSCAN Algorithm

In this algorithm, we have 3 types of data points.

Ordering Points to Identify the Clustering Structure (OPTICS)

Discussion about OPTICS and DBSCAN Clustering:

1. Memory Cost: The OPTICS clustering technique requires more memory as it

5. Noise handling: DBSCAN explicitly distinguishes between core points, boundary

6. Runtime complexity: The runtime complexity of OPTICS is generally higher than

Grid-based clustering method is used for multi-resolution of grid-based data structure. It is

STING (Statistical Information Grid)

It is also a grid-based clustering technique. This technique is used for a multidimensional

Working of CLIQUE Algorithm

You might also like