7.
2 DBSCAN Clustering 241
that the attributes are numeric because distance calculation is still used. The
algorithm can be reduced to three steps: defining threshold density, classifica-
tion of data points, and clustering (Tan et al., 2005).
Step 1: Defining Epsilon and MinPoints
The DBSCAN algorithm starts with calculation of a density for all data points
in a dataset, with a given fixed radius ε (epsilon). To determine whether a
neighborhood is high-density or low-density, a threshold of data points
(MinPoints) will have to be defined, above which the neighborhood is con-
sidered high-density. In Fig. 7.14, the number of data points inside the space
is defined by radius ε. If MinPoints is defined as 5, the space ε surrounding
data point A is considered a high-density region. Both ε and MinPoints are
user-defined parameters and can be altered for a dataset.
Step 2: Classification of Data Points
In a dataset, with a given ε and MinPoints, all data points can be defined
into three buckets (Fig. 7.15):
G Core points: All the data points inside the high-density region of at least
one data point are considered a core point. A high-density region is a
space where there are at least MinPoints data points within a radius of ε
for any data point.
G Border points: Border points sit on the circumference of radius ε from a
data point. A border point is the boundary between high-density and
FIGURE 7.15
Core, border, and density points.