AIML
Dr. Nitin Arvind Shelke
Density based clustering : DBSCAN
• Unsupervised Learning Method under Clustering
• Density-Based Approach: DBSCAN groups points based on density,
identifying dense regions as clusters and sparse regions as noise
(outliers).
• No Need to Predefine Clusters: Unlike K-Means, DBSCAN does not
require specifying the number of clusters beforehand. It automatically
detects clusters based on density.
• Handles Arbitrary Shapes & Noise: DBSCAN can identify clusters of
various shapes and sizes and effectively detects outliers, making it
more robust than centroid-based clustering methods.
Density based clustering : DBSCAN
• There are two key parameters in DBSCAN needed
to define ‘Density’.
✓ minPts: The minimum number of points (a
threshold) clustered together for a region to be
considered dense.
✓ eps (ε): A distance measure that will be used to
locate the points in the neighborhood of any
point.
Density based clustering : DBSCAN
Core, Border, and Outlier Points:
1. Core Points have at least MinPts neighbors within ε (Eps) distance.
2. Border Points have fewer than MinPts neighbors but are reachable
from a core point.
3. Outliers (Noise Points) are neither core nor border points.
Density based clustering : DBSCAN
• The DBSCAN algorithm takes two input
parameters.
➢ Radius around each point ( eps) and the
minimum number of data points that should be
around that point within that radius ( MinPts).
• Considering the example, consider the point
(1.5,2.5), if we take eps = 0.3, then the circle
around the point with radius = 0.3, will contain
only one other point inside it (1.2,2.5) as shown
below:
Density based clustering : DBSCAN
• In this, we have 3 types of data points.
Core Point: A point is a core point if it has
more than MinPts points within eps.
Border Point: A point which has fewer
than MinPts within “eps” but it is in the
neighborhood of a core point.
Noise or outlier: A point which is not a
core point or border point.
Density based clustering : DBSCAN
• Q. Given the points A(3, 7), B(4, 6), C(5, 5), D(6, 4), E(7, 3), F(6, 2),
G(7, 2) and H(8, 4), Find the core points, border point and outliers
using DBSCAN.
• 1) Take Eps = 2.5 and MinPts = 4
• 2) Take Eps = 2.5 and MinPts = 3
Density based clustering : DBSCAN
Steps to solve the DBSCAN Problem
• Step 1: Create the distance matrix by calculating the distance using
Euclidian distance formula
• Step 2: Find all the data points that lie in the Eps-neighborhood of
each data point. That is, put all the points in the neighborhood set of
each data point whose distance is <= MinPts.
• Step 3: Identify the Core Points, Border Points, and Outlier Points
Density based clustering : DBSCAN
• Step 1: Create the distance matrix by calculating the distance using
Euclidian distance formula
Density based clustering : DBSCAN
Distance Calculation from data point A to other points
Density based clustering : DBSCAN
Distance Calculation from data point B to other points
Density based clustering : DBSCAN
Distance Calculation from data point C to other points
Density based clustering : DBSCAN
Distance Calculation from data point D to other points
Density based clustering : DBSCAN
Density based clustering : DBSCAN
• Step 2: Now, finding all the data points that lie in the Eps-
neighborhood of each data points. That is, put all the points in the
neighborhood set of each data point whose distance is <=2.5.
Density based clustering : DBSCAN
Take Eps = 2.5 and MinPts = 4
Density based clustering : DBSCAN
• Step 3: Identify the Core Points, Border Points, and Outlier Points
Density based clustering : DBSCAN
• Eps = 2.5 and MinPts = 4
1) Core Points: D, E, F, G, H (These points have at least 4 neighbors
within ε = 2.5)
2) Border Point: C (Connected to a core point but has fewer than 4
neighbors)
3) Outliers: A, B (These points are neither core points nor directly
connected to a core point)
Density based clustering : DBSCAN
• Eps = 2.5 and MinPts = 3
1) Core Points: B, C, D, E, F, G, H (These points have at least 3 neighbors
within ε = 2.5)
2) Border Point: A (A has fewer than 3 neighbors but is connected to a
core point)
3) Outliers: None (All points are either core or border)