Clustering beyond K-Means
Comparison of Clustering Algorithms
K-Means Agglomerative DBSCAN
• Works on only Numeric • Intuitive and fast • No need to specify value of
Information • Less computationally k
• Computationally Expensive expensive • Works on finding non-
• Does not show process of • Can aid in selection of spherical clusters
cluster formation correct k value • Can identify noise in data
• Influenced by outliers. • Provide hierarchical order • Immune to the effects of
Forcefully assign noise to among clusters outliers
cluster • Less impact of outliers
• Does not provide
hierarchical presentation
• Identifying right value of k
Hierarchical Clustering: Dendrogram
Linkage Functions
DBSCAN
Choosing DBSCAN Parameters
• MinPts: Minimum points required in neighborhood for dense region.
MinPts >= D +1 (D is number of dimensions)
MinPts = 2* D
MinPts = In (N) (N is the number of observations in the data)
• Radius: try and error. We may use a k-distance plot and find the elbow point.
DBSCAN for Retail
• Detect natural groupings of customers based on shopping behavior without prior
assumptions.
• Identify outliers such as fraudulent transactions or unusual purchasing spikes.
• Handle large, complex datasets efficiently, making it a scalable tech tool for
growing businesses.
Group Assignment on Clustering
• https://www.kaggle.com/datasets/salahuddinahmedshuvo/ecommer
ce-consumer-behavior-analysis-data
• Develop Problem Statement for Clustering
• Apply the clustering and generate profiles
• Suggestion on targeting
Thank You!