Experiment 2: Customer Segmentation
using K-Means Clustering
Aim:
To implement K-Means clustering algorithm for customer segmentation using Python and
scikit-learn.
Software Requirements:
Python 3.x, Jupyter Notebook, pandas, matplotlib, seaborn, scikit-learn
Dataset:
Sample customer dataset with features like Age, Annual Income, and Spending Score.
Procedure:
1. Import necessary libraries.
2. Load the dataset.
3. Explore and visualize the dataset using scatter plots.
4. Use the Elbow method to determine the optimal number of clusters (k).
5. Apply K-Means clustering algorithm using the determined value of k.
6. Visualize the clusters formed.
7. Interpret the results for business insights.
Program:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
# Load dataset
data = pd.read_csv('Mall_Customers.csv')
X = data[['Annual Income (k$)', 'Spending Score (1-100)']]
# Elbow method to find optimal k
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()
# Apply K-Means
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)
y_kmeans = kmeans.fit_predict(X)
# Visualizing clusters
plt.scatter(X.values[y_kmeans == 0, 0], X.values[y_kmeans == 0, 1], s = 100, c = 'red', label =
'Cluster 1')
plt.scatter(X.values[y_kmeans == 1, 0], X.values[y_kmeans == 1, 1], s = 100, c = 'blue', label =
'Cluster 2')
plt.scatter(X.values[y_kmeans == 2, 0], X.values[y_kmeans == 2, 1], s = 100, c = 'green', label
= 'Cluster 3')
plt.scatter(X.values[y_kmeans == 3, 0], X.values[y_kmeans == 3, 1], s = 100, c = 'cyan', label =
'Cluster 4')
plt.scatter(X.values[y_kmeans == 4, 0], X.values[y_kmeans == 4, 1], s = 100, c = 'magenta',
label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow',
label = 'Centroids')
plt.title('Customer Segments')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
Sample Output:
The output consists of:
Elbow plot showing the optimal number of clusters (typically 5 for this dataset).
Scatter plot of customers segmented into clusters.
Different customer segments visualized based on income and spending score.
Viva Questions:
What is the purpose of customer segmentation?
How does the K-Means algorithm work?
What is the Elbow method?
What are the limitations of K-Means clustering?