DBSCAN Clustering
Defnition:
DBSCAN (density-Based Spatial Clustering of Applications with Noise) is a
density-based clustering algorithm that identifies clusters in data by grouping
points that are close together and marking points in low-density regions as
noise or outliers. It does not require specifying the number of clusters in
advance and works well for clusters of arbitrary shape.(handles nasted
clusters).
Key Concepts of DBSCAN :
1. Core Points, Border Points, and Noise
Core Points:
Points with at least a minimum number of neighboring points (MinPts)
within a specified distance (ϵ).
These points are considered central to a cluster.
Border Points:
Points within the ϵ-neighborhood of a core point but do not themselves
have enough neighbors to be a core point.
They "belong" to the cluster of the core point.
Noise Points:
DBSCAN Clustering 1
Points that are not core points and are not within the ϵ-neighborhood of any
core point.
Treated as outliers.
2. Parameters
Epsilon (ϵ):
Maximum distance between two points to be considered neighbors..
MinPts:
The minimum number of points required to form a dense region
(including the point itself).
Steps :
1. Identify Core Points:
For each point, count how many points fall within its ϵ-neighborhood.
If the count ≥MinPts, the point is a core point.
DBSCAN Clustering 2
A point is considered a core point if it has at least MinPts points
(including itself) within a given radius ε (epsilon)
2. Expand Clusters:
Start with an unvisited core point.
DBSCAN Clustering 3
Create a new cluster and include all points in its ϵ-neighborhood.
Recursively add all neighboring core points and their neighbors to the
cluster.
3. Classify Points:
Border points are added to the cluster of the nearest core point.( Non
core points) but we dont use it to ad to the cluster, meaning non core
points can only be added to the cluste , but we don’t use them to
expand it ( Ne9fou fih)
DBSCAN Clustering 4
Points not belonging to any cluster are classified as noise.
Remaining points are called
outliers/Noise points.
Advantages
1. No Need to Specify KKK:
Unlike K-Means, DBSCAN automatically determines the number of
clusters based on the data.
2. Detects Arbitrary Shapes:
DBSCAN Clustering 5
Can identify clusters of irregular shapes (e.g., spirals, concentric
circles).
3. Handles Noise:
Effectively identifies outliers as noise points.
4. Works Well for Density-Based Clusters:
Clusters are defined by dense regions of data.
Limitations
1. Parameter Sensitivity:
The results depend heavily on the choice of ϵ and MinPts.
ϵthat is too small results in many small clusters or noise, while too large
may merge clusters.
2. Varying Densities:
Struggles when clusters have different densities. A single ϵ value may
not work well for all clusters.
3. High Dimensionality:
Computing distances becomes less meaningful in high-dimensional
data.
DBSCAN Clustering 6