0% found this document useful (0 votes)

106 views8 pages

Mall Customer Segmentation Guide

This document summarizes a customer segmentation project using KMeans clustering. The project uses a mall customer dataset containing customer ID, age, gender, income and spending score. KMeans clustering is applied to group customers based on income and spending into a fixed number of clusters. Various libraries like NumPy, Pandas, Matplotlib and Scikit-Learn are used. The code generates an elbow plot to find optimal cluster number, performs clustering, plots clusters and centroids.

Uploaded by

dsingh1be21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views8 pages

Mall Customer Segmentation Guide

Uploaded by

dsingh1be21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Customer Segmentation

Computer Science and Engineering Department

Thapar Institute of Engineering and Technology

(Deemed to be University), Patiala – 147004

Machine Learning Project

Submitted By:

Name : Yogesh Rathee

Roll No : 102103022

Name : Jagveer Singh

Roll No : 102103024

Submitted To:

Ms. Kudratdeep Aulakh

Index
Sr. No. Content used Page No.
1. Introduction 3
2 Libraries used 4
3. Algorithm(s) used 5
4. Code and Screenshots 6
1. Introduction

1.1 Mall Customer Segmentation Data

https://www.kaggle.com/datasets/vjchoudhary7/custom
er-segmentation-tutorial-in-python

This data set is created only for the learning purpose of the customer
segmentation concepts, also known as market basket analysis. I will
demonstrate this by using unsupervised ML technique (KMeans Clustering
Algorithm) in the simplest form.

1.2 Description of dataset

You are owing a supermarket mall and through membership cards , you
have some basic data about your customers like Customer ID, age,
gender, annual income and spending score. Spending Score is something
you assign to the customer based on your defined parameters like
customer behavior and purchasing data.
2. Libraries Used:

Numpy : NumPy is a Python library for efficient numerical computation,

offering multi-dimensional array support and a wide range of
mathematical functions. It is widely used in data analysis, scientific
research, and machine learning.

Pandas : Pandas is a Python library for data manipulation and analysis,

offering DataFrames and Series for working with structured data
efficiently.

Matplotlib.pyplot : matplotlib.pyplot is a Python library for creating 2D

data visualizations, like plots and charts. It's a fundamental tool for data
visualization in Python.

Seaborn: Seaborn is a Python library that enhances Matplotlib for

creating appealing and informative statistical data visualizations.

Sklearn: Scikit-Learn (sklearn) is a Python library for machine learning,

offering a broad set of tools and algorithms for various tasks in data
science and artificial intelligence
3. Algorithm(s) Used

K-means clustering : K-means clustering is a popular unsupervised machine

learning algorithm. Its main task is to group data into a fixed number of clusters,
often referred to as "k." These clusters are formed based on the similarities
between data points, aiding data segmentation and organization.

The algorithm operates iteratively. Initially, it places "k" cluster centers

randomly within the data space. Data points are then assigned to the nearest
cluster center, typically using Euclidean distance. The cluster centers are then
recalculated as the mean of their assigned data points. This process repeats until
the cluster assignments and centers no longer change significantly.

K-means has applications in various fields, like marketing, image segmentation,

and document classification. It's essential for revealing natural data groupings,
making it a valuable tool for data analysis and preprocessing. However, it does
have some limitations, such as sensitivity to the initial placement of cluster
centers and the need to specify "k" beforehand. Nonetheless, it remains a
versatile and valuable method for data clustering and pattern recognition.
4. Code and Screenshots

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans

# loading the data from csv file to a Pandas DataFrame

customer_data = pd.read_csv('D:/ML project/ye rha tere

project/Mall_Customers.csv')

# Display the first 5 rows in the dataframe

print(customer_data.head())

# finding the number of rows and columns

print(customer_data.shape)

# getting some informations about the dataset

print(customer_data.info())

# checking for missing values

print(customer_data.isnull().sum())

X = customer_data.iloc[:,[3,4]].values

print(X)

# finding wcss value for different number of clusters

wcss = []

for i in range(1,11):

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

kmeans.fit(X)
wcss.append(kmeans.inertia_)

# plot an elbow graph

sns.set()

plt.plot(range(1,11), wcss)

plt.title('The Elbow Point Graph')

plt.xlabel('Number of Clusters')

plt.ylabel('WCSS')

plt.show()

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=0)

# return a label for each data point based on their cluster

Y = kmeans.fit_predict(X)

print(Y)

# plotting all the clusters and their Centroids

plt.figure(figsize=(8,8))

plt.scatter(X[Y==0,0], X[Y==0,1], s=50, c='green', label='Cluster 1')

plt.scatter(X[Y==1,0], X[Y==1,1], s=50, c='red', label='Cluster 2')

plt.scatter(X[Y==2,0], X[Y==2,1], s=50, c='yellow', label='Cluster 3')

plt.scatter(X[Y==3,0], X[Y==3,1], s=50, c='violet', label='Cluster 4')

plt.scatter(X[Y==4,0], X[Y==4,1], s=50, c='blue', label='Cluster 5')

# plot the centroidsplt.scatter(kmeans.cluster_centers_[:,0],

kmeans.cluster_centers_[:,1], s=100, c='cyan', label='Centroids')

plt.title('Customer Groups')

plt.xlabel('Annual Income')

plt.ylabel('Spending Score')

plt.show()

Customer Segemntation
No ratings yet
Customer Segemntation
26 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Experiment-3 ML Lab
No ratings yet
Experiment-3 ML Lab
20 pages
Day 4
No ratings yet
Day 4
62 pages
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
No ratings yet
A Cluster-Based Analysis For Targeting Potential Customers in A Real-World Marketing System
8 pages
Aiml Assignment 10
No ratings yet
Aiml Assignment 10
6 pages
Experiment 2 KMeans Clustering
No ratings yet
Experiment 2 KMeans Clustering
3 pages
Document 1192
No ratings yet
Document 1192
4 pages
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 5
No ratings yet
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 5
3 pages
VL2024250504566 Ast03
No ratings yet
VL2024250504566 Ast03
2 pages
ML Expected Question and Explanation of The 3 PGM
No ratings yet
ML Expected Question and Explanation of The 3 PGM
12 pages
ML - K-Means
No ratings yet
ML - K-Means
12 pages
Tolerancias Lineales GBT1804
No ratings yet
Tolerancias Lineales GBT1804
4 pages
IEE Paper
No ratings yet
IEE Paper
5 pages
Clustering Algorithms for Data Analysis
No ratings yet
Clustering Algorithms for Data Analysis
7 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Phase 2
No ratings yet
Phase 2
5 pages
Segmentation Analysis
No ratings yet
Segmentation Analysis
17 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Untitled Document-2-1-13-7-11.4
No ratings yet
Untitled Document-2-1-13-7-11.4
5 pages
ML Assignment 4
No ratings yet
ML Assignment 4
6 pages
Customer Segmentation Using Machine Learning
100% (1)
Customer Segmentation Using Machine Learning
28 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
K Means Clustering Customer Clustering
No ratings yet
K Means Clustering Customer Clustering
7 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
BT 4065 Report
No ratings yet
BT 4065 Report
32 pages
Mall Customer Segmentation Using Machine Learning Techniques
No ratings yet
Mall Customer Segmentation Using Machine Learning Techniques
17 pages
Bone Suplement Market Segmentation
No ratings yet
Bone Suplement Market Segmentation
20 pages
Lecture - 7 - Practical - DBSCAN Clustering in Python
No ratings yet
Lecture - 7 - Practical - DBSCAN Clustering in Python
3 pages
Customer Clustering Analysis
No ratings yet
Customer Clustering Analysis
22 pages
Workshop Project Report
No ratings yet
Workshop Project Report
10 pages
Aiml Project Review
No ratings yet
Aiml Project Review
22 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
Data Clustering Guide for Analysts
No ratings yet
Data Clustering Guide for Analysts
3 pages
Kman 07
No ratings yet
Kman 07
9 pages
Peer Eval
No ratings yet
Peer Eval
6 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
DWDM PPT
No ratings yet
DWDM PPT
13 pages
K-Means for Customer Segmentation
No ratings yet
K-Means for Customer Segmentation
13 pages
Axe Submission
No ratings yet
Axe Submission
4 pages
ASM Handbook Volume 10 Materials Characterization 1st Edition Asm International. Handbook Committee. PDF Download
No ratings yet
ASM Handbook Volume 10 Materials Characterization 1st Edition Asm International. Handbook Committee. PDF Download
107 pages
Ads Phase 4
No ratings yet
Ads Phase 4
12 pages
Customer Segmentation via Data Science
No ratings yet
Customer Segmentation via Data Science
21 pages
Data Science for Customer Segmentation
No ratings yet
Data Science for Customer Segmentation
13 pages
Mining and Visualising Real-World Data: About This Module
100% (1)
Mining and Visualising Real-World Data: About This Module
16 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Customer Segmentation Using K
No ratings yet
Customer Segmentation Using K
16 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Dynamic of Structures
No ratings yet
Dynamic of Structures
10 pages
DS MP
No ratings yet
DS MP
18 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
BCS2213 - Async Interface
No ratings yet
BCS2213 - Async Interface
21 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Energy Consumption Prediction System
No ratings yet
Energy Consumption Prediction System
21 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Mastering Python For Data Science - Sample Chapter
71% (7)
Mastering Python For Data Science - Sample Chapter
24 pages
Quantitative Research Method 2022
No ratings yet
Quantitative Research Method 2022
31 pages
Furuta Pendulum Final Report - MIT Student Group Project PDF
No ratings yet
Furuta Pendulum Final Report - MIT Student Group Project PDF
33 pages
Maths Paper
No ratings yet
Maths Paper
32 pages
AML Assignment 1 1
No ratings yet
AML Assignment 1 1
4 pages
Math2 - q4 - Mod6 - Finding The Area of A Given
No ratings yet
Math2 - q4 - Mod6 - Finding The Area of A Given
24 pages
Machine Learning for Customer Segmentation
No ratings yet
Machine Learning for Customer Segmentation
6 pages
LiDAR Full Notes
No ratings yet
LiDAR Full Notes
32 pages
Topic 2 - Exponential Models
No ratings yet
Topic 2 - Exponential Models
34 pages
18.085 Computational Science and Engineering I: Mit Opencourseware
No ratings yet
18.085 Computational Science and Engineering I: Mit Opencourseware
13 pages
Design and Analysis of Disc Plate in Hot Blast Valve #DN1800
No ratings yet
Design and Analysis of Disc Plate in Hot Blast Valve #DN1800
8 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
Mathematical Quantization 1st Edition Nik Weaver Instant Download
100% (6)
Mathematical Quantization 1st Edition Nik Weaver Instant Download
61 pages
Formulation of Equivalent Beam
No ratings yet
Formulation of Equivalent Beam
2 pages
Consciousness Study: Three Paradigms
No ratings yet
Consciousness Study: Three Paradigms
11 pages
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
No ratings yet
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
4 pages
Citric Acid
No ratings yet
Citric Acid
7 pages
Real Numbers - Class X
No ratings yet
Real Numbers - Class X
8 pages
Age Calculation
No ratings yet
Age Calculation
4 pages
Name: - : Inquiry Question
No ratings yet
Name: - : Inquiry Question
14 pages
Atangana
No ratings yet
Atangana
16 pages
Sneed-Structuralism and Scientific Realism (1983)
No ratings yet
Sneed-Structuralism and Scientific Realism (1983)
26 pages
The Secant Method
No ratings yet
The Secant Method
7 pages
Residual Offset in Silicon Hall-Effect Sensor Analytical Formula Stress Effects and Implications For Octagonal Hall Plate Geometry
No ratings yet
Residual Offset in Silicon Hall-Effect Sensor Analytical Formula Stress Effects and Implications For Octagonal Hall Plate Geometry
9 pages
References: D Dy DZ D Dy DZ D DX DZ D DX DZ D D D D D Dy DZ Ydydz
No ratings yet
References: D Dy DZ D Dy DZ D DX DZ D DX DZ D D D D D Dy DZ Ydydz
5 pages
TCW 1 - Introducing Statistics
No ratings yet
TCW 1 - Introducing Statistics
1 page
Bca Part 2 Differentiation and Integration 1 275 2020
No ratings yet
Bca Part 2 Differentiation and Integration 1 275 2020
2 pages
DC-1 Assignment-8
No ratings yet
DC-1 Assignment-8
5 pages
Reflection - Project in Enhanced Mathematics 8
No ratings yet
Reflection - Project in Enhanced Mathematics 8
5 pages

Mall Customer Segmentation Guide

Uploaded by

Mall Customer Segmentation Guide

Uploaded by

Customer Segmentation

Computer Science and Engineering Department

Thapar Institute of Engineering and Technology

(Deemed to be University), Patiala – 147004

Machine Learning Project

Name : Yogesh Rathee

Name : Jagveer Singh

Ms. Kudratdeep Aulakh

1.1 Mall Customer Segmentation Data

1.2 Description of dataset

Numpy : NumPy is a Python library for efficient numerical computation,

Pandas : Pandas is a Python library for data manipulation and analysis,

Matplotlib.pyplot : matplotlib.pyplot is a Python library for creating 2D

Seaborn: Seaborn is a Python library that enhances Matplotlib for

Sklearn: Scikit-Learn (sklearn) is a Python library for machine learning,

K-means clustering : K-means clustering is a popular unsupervised machine

The algorithm operates iteratively. Initially, it places "k" cluster centers

K-means has applications in various fields, like marketing, image segmentation,

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans

# loading the data from csv file to a Pandas DataFrame

customer_data = pd.read_csv('D:/ML project/ye rha tere

# Display the first 5 rows in the dataframe

# finding the number of rows and columns

# getting some informations about the dataset

# checking for missing values

# finding wcss value for different number of clusters

kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)

# plot an elbow graph

plt.title('The Elbow Point Graph')

kmeans = KMeans(n_clusters=5, init='k-means++', random_state=0)

# return a label for each data point based on their cluster

# plotting all the clusters and their Centroids

plt.scatter(X[Y==0,0], X[Y==0,1], s=50, c='green', label='Cluster 1')

plt.scatter(X[Y==1,0], X[Y==1,1], s=50, c='red', label='Cluster 2')

plt.scatter(X[Y==2,0], X[Y==2,1], s=50, c='yellow', label='Cluster 3')

plt.scatter(X[Y==3,0], X[Y==3,1], s=50, c='violet', label='Cluster 4')

plt.scatter(X[Y==4,0], X[Y==4,1], s=50, c='blue', label='Cluster 5')

# plot the centroidsplt.scatter(kmeans.cluster_centers_[:,0],

You might also like