Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
46 views5 pages

Paper Published

Uploaded by

Prathy usha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

Paper Published

Uploaded by

Prathy usha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NOVYI MIR Research Journal ISSN NO: 0130-7673

Customer Churn Prediction and Segmentation using


Machine Learning
Dr. Gunavathi H S
Computer Science and Engineering Dhriti Ashtekar T Meenakshi
Bangalore Institure of Technology Computer Science and Engineering Computer Science and Engineering
Bangalore, India Bangalore Institure of Technology Bangalore Institure of Technology
Bangalore, India Bangalore, India

Navya N Prathyusha R
Computer Science and Engineering Computer Science and Engineering
Bangalore Institure of Technology Bangalore Institure of Technology
Bangalore, India Bangalore, India

Abstract— When consumers go to a rival and cease utilizing a The churn analysis's objective is to determine the
company's goods or services, this is known as customer churn. proportion of customers who will quit using a certain good or
It is a problem for businesses because it can reduce sales, service. Additionally, these opportunities will be extracted
damage a brand's reputation, and impede long-term expansion. using data mining-based work known as the rate of client
Numerous things, such unsatisfactory products, bad customer
attrition analysis. Owing to the fierce rivalry of today,
service, price problems, or low engagement, can result in a loss
of clients. The project aims to lower customer attrition by using numerous companies are providing nearly identical products
machine learning algorithms to create a prediction model that at comparable prices, both regarding services and quality. By
identifies customers who are likely to leave. The project's giving each client a probability, the churn analysis makes it
activities including collecting and combining customer data feasible to accurately anticipate which consumers will
from various sources to ensure proper formatting, determining discontinue utilizing services or products. Depending on the
probable churn drivers via exploratory data analysis, degree of loss (in monetary terms) and client groups, this
developing. A machine learning predictive model algorithms, analysis can be carried out. These assessments can be applied
and designing an interface that telecom business users can use to improve customer communication, which will increase the
to interact with the churn prediction results. Customer data
loyalty of customers and persuade them. The churn rate, also
with variables including purchase history, engagement metrics,
and support interactions make up the project's input. The known as customer attrition, is applicable while creating
prediction model is trained using these data. A forecasting marketing campaigns. that are useful for the target audience.
system that can precisely identify clients who are most likely to Profitability can be raised dramatically in this fashion, or
depart is the project's output, giving businesses insight into the potential harm from losing clients may be decreased at a
reasons why they could go and empowering them to take similar rate.
preventative measures to keep their customers. Additionally,
customer segmentation is done to gain insights about different The model's solution employs a predictive algorithm to
customer segments. These insights serve as the base for pinpoint customers that are likely to defect. To ascertain the
generating retention strategies for companies.
variables that can be utilized to forecast churning, the system
does exploratory data analysis. After pre-processing, it is
Keywords— customer churn, prediction model, machine trained using the customer data that is currently accessible.
learning, random forest
Random forests and decision trees, are utilized when building
the prediction model. To obtain even better and more
effective outcomes, principal component analysis and k-
I. INTRODUCTION means are utilized to segment the customer into different
In the modern world, telecom businesses are producing clusters. Consequently, businesses can use the system to
enormous volumes of data at a very quick speed. Numerous identify at-risk clients and take appropriate action.
telecom service providers are in competition with one another
to gain market share. Consumers can choose from a variety
of options, including more affordable and superior services. II. LITERATURE REVIEW
Maximizing profits and surviving in a cutthroat market are This section provides an overview of relevant prior work in
telecom companies' ultimate goals. When a sizable customer churn analysis.
percentage of clients express dissatisfaction with the services
provided by any telecom business, this is recognized as Abhishek Gaur, et al. [1] published in 2018 as an IEEE
customer churn. Customers begin to migrate their services to conference paper, it highlights data mining's crucial role in
other service providers as a result. telecom churn analysis. They emphasize cost-sensitive
classification's importance, proposing a partition cost-

VOLUME 9 ISSUE 5 2024 PAGE NO: 530


NOVYI MIR Research Journal ISSN NO: 0130-7673

sensitive CART model. Among the tested methods, gradient underscoring the significance of preprocessing and model
boosting yields the maximum precision at 84.57% AUC. evaluation for accurate churn prediction.

Irfan Ullah, et al. [2] explores methodologies for telecom Soumi De, et al. [9] introduces SS-IL, a stack structure
churn prediction, focusing on machine learning on large- for prediction of churn based on sampling. Utilizing a dataset
scale data platforms. They stress feature engineering's from a hotel commerce platform, it employs decision trees in
importance and big data platforms' function in gleaning group learning and logistic regression as base learners and
analysis of social network characteristics from enormous Random Forest as a meta learner. Through various sampling
datasets. Their methodology for predicting churn, evaluated strategies, SS-IL enhances precision and F1-score,
using fresh data, proves effective and is successfully demonstrating improved performance in churn analysis for
deployed, integrating analysis of social networks and imbalanced datasets.
machine learning to bolster telecom customer retention
strategies. B. Prabadevi, et al. [10] explores churn prediction
methodologies, emphasizing deep neural networks (DNN)
Nurul Izzati Mohammed, et al. [3] delves into and convolutional neural networks (CNN) for diverse
forecasting customer churn in telecom, talking about industries. Their proposed approaches enhance prediction
ensemble methods, logistic regression, SVM, and decision accuracy, with CNN achieving precision rates up to 89%. The
trees. Their methodology involves data preprocessing, findings underscore the efficacy of sophisticated machine
algorithm implementation, and classifier evaluation to learning to support customer retention and reduce costs for
optimize prediction. The study aims to refine churn analysis, organizations.
prioritize client retention and develop effective prediction
models, utilizing a variety of techniques for machine learning III. PROPOSED METHOD
enhanced understanding of churn factors. Keeping existing customers in telecom industry is
frequently just as important as gaining new ones, churn is a
Nurul Izzati Mohammed, et al. [4] examines methods of serious problem. It is critical for telecom businesses to
machine learning for telecom churn prediction, finding K- comprehend and anticipate customer turnover to ensure
means unsuitable for their dataset. They advocate decision profitability and sustainability in a market marked by intense
trees, achieving 94.98% accuracy and 80.80% F1 score in competition and changing consumer preferences. This section
their tests. Their recommendation carries significance for gives briefs about the steps adopted in analyzing, predicting
telecom firms aiming to increase income and client retention. customer churn, and segmenting the customers using machine
learning. Fig. 1. Depicts the architecture of the adopted
Anna Sniegula, et al. [5] concentrates on a random forest method including the steps: Data Cleaning which is followed
utilizing a model for churn prediction call detail records. by Exploratory Data Analysis which includes univariate and
They employ feature selection techniques and compare bivariate analysis, Model building, and Customer
Random Forest's performance with other algorithms, Segmentation.
demonstrating its superiority. Additionally, they conduct
customer profiling and identify churning factors, offering
valuable insights for telecom firms aiming to enhance
retention strategies.

Xin Hu, et al. [6] suggests a model for hybrid churn


prediction mixing neural networks (BP) and decision trees
(C5.0) algorithms. Weighted by model confidence, their
approach achieves high churn prediction accuracy. Empirical
validation using supermarket data reinforces the efficacy of
the strategy in anticipating consumer attrition.

Pushkar Bhuse, et al. [7] compares Random Forest,


SVMs, XGBoost, and deep neural networks, and investigates
the application of machine learning and deep learning to the
prediction of telecom churn. They use Random Forest pre-
grid search to improve models with grid search, attaining
90.96% accuracy. The study underscores the efficacy of these
techniques, especially highlighting Random Forest's promise
for telecom sector applications.

Praveen Lalwani, et al. [8] presents a thorough analysis


of the studies conducted on predicting customer attrition via
machine learning. Their methodology involves EDA, various
Fig. 1. Proposed Method Architecture
models like logistic regression and ensemble techniques like
Adaboost and XGBoost. With feature selection using GSA,
their proposed model achieves an AUC score of 84%,

VOLUME 9 ISSUE 5 2024 PAGE NO: 531


NOVYI MIR Research Journal ISSN NO: 0130-7673

The dataset of a telecommunication company is used for changes in another variable. Important insights
the predictive random forest model. It is specific to telecom generated from EDA are:
industry and consists of 7043 customer records and 21
attributes such as 'customerID', 'gender', 'SeniorCitizen', 1. Electronic check medium has the highest
'Partner' and 'Dependents'. churners

A. Data cleaning 2. Customers with Contract Type - Monthly likely


to stop using the service and churn because of no
Data cleaning elevates the quality of data. Following steps
contract terms, as they are free-to-go customers.
are performed in data cleaning.
3. No Online security, no Tech Support category are
1. Creation of a copy of base data for manipulation and
high churners.
processing.
4. Non-senior Citizens are high churners.
2. Conversion of categorical data into a simpler
numerical data type. Fig. 2. Depicts the heatmap representing the influence of the
attributes on churn.
3. We observe that there are 11 missing values in the
TotalCharges column.
4. Missing Value Treatment is performed. Given that
these entries make up a relatively small percentage of
0.15% of the entire dataset, it is safe to ignore them
from further processing.

B. Exploratory Data Analysis


A strategy for evaluating datasets to highlight their key
features is called exploratory data analysis (EDA), and it
frequently makes use of visual aids. Understanding the
underlying structure and patterns in the data, finding
correlations between variables, and identifying outliers and
anomalies are the main objectives of exploratory data analysis
(EDA). The following observations are generated using EDA:
 Count of target variables per category reveals that the
dataset of telecom is imbalanced. The ratio of churners
to non-churners is revealed to be
26.536987:73.463013.

 Univariate Analysis is done which focuses solely on


examining one variable at a time, without considering
the relationships or interactions with other variables. Fig. 2. Heatmap representing the impact of the attributes on churn
The relationship between monthly charge and total
charge and churn by monthly charge and total Charge
reveals that the churn is high when monthly charges C. Model Building
are high and higher churn at lower total charges. Thus The dataset of telecom is then split into two parts: the
univariate analysis helps us in understanding the feature variables (X), containing independent features except
relationship between the attributes. Insights generated for target feature, and target variable (Y), representing the
include: variable to be predicted (in this case, 'Churn'). Further, the
1. High churn seen in Month-to-month contracts, data is divided into a training set used to train the classifier
No online security, No Tech support, First year and a testing set used to test the classifier using the
of subscription, and Fibre Optics Internet. train_test_split function from the scikit-learn library.

2. Low churn is seen in Long-term contracts,  Decision Tree Model Building:


Subscriptions without internet service and, The decision tree algorithm divides the training data into
customers engaged for 5+ years. segments to minimize impurity or maximize information
3. Factors like Gender and Availability of gain recursively. This process is done to train the model.
PhoneService have almost no impact on Churn Based on the testing data, predictions are created. Every
data point is traversed to reach a leaf node, and the
majority class in that node is assigned as the predicted
class.
 Bivariate Analysis is done whose primary aim is to
understand how changes in one variable are related to
 Random Forest Model Building:

VOLUME 9 ISSUE 5 2024 PAGE NO: 532


NOVYI MIR Research Journal ISSN NO: 0130-7673

The random forest algorithm is implemented by defining number of groups, or clusters so that the data points in each
a Random Forest class. The training dataset is used to cluster have comparable properties. k-means clustering is
build the random forest model. One hundred decision applied to partition the data into four clusters. Each cluster's
trees are trained using replacement (random selections of average values are computed to extract insights into the
the dataset) and their predictions are combined to get the typical behavior of customers within that segment. Fig. 4.
final prediction. The random forest ensemble separately depicts the four clusters by considering the top two principal
predicts the class based on the testing data by examining components.
each decision tree and using a majority vote mechanism
to arrive at a final prediction.

 Handling Class Imbalance with SMOTE-ENN:


To deal with the issue of class imbalance in the dataset
and improve the accuracy, the Synthetic Minority Over-
sampling Technique followed by Edited Nearest
Neighbors (SMOTE-ENN) resampling technique is
employed. SMOTE is used generate synthetic customer
details for the minority non-churners class by
interpolating between existing minority class samples,
while ENN removes noisy and borderline data from both
the classes using k-nearest neighbors. The resampled
data is then split into training and testing datasets, and
the decision tree and random forest modeling process is
repeated using this balanced dataset.
Fig. 4. Visualization of the k customer segments wherein the red dots
represent the centroids.
D. Customer Segmentation
K-means clustering, an unsupervised machine learning
approach, is utilized for client segmentation. It divides the IV. RESULTS
clientele of telecommunications companies into discrete
groups according to their traits and behaviors. In order to With an astounding 94% accuracy rate, Random Forest is
effectively visualize the data, the code preprocesses it by a reliable algorithm for predicting client attrition. The
scaling it using StandardScaler and reducing its dimension decision tree's performance is shown in Fig. 5 across many
using Principal Component Analysis (PCA). criteria. Following the application of SMOTEEN, the
decision tree's accuracy is 93%. Similarly, Fig.
The elbow method is a visual method used to figure out 6. displays the performance of random forest across different
the number of clusters (k) the k-means clustering algorithm metrics The ensemble nature and ability to handle large
should have. Decision Trees algorithm and random forests, datasets make Random Forest a reliable choice for such tasks.
which don't require grouping data points, aren't immediately Customer segmentation plays a vital role by identifying
affected by it. The ideal number of clusters is determined distinct customer groups, allowing tailored marketing
using the Elbow Method which is revealed to be 4. Fig. 3. strategies, improved customer retention, and better overall
depicts the elbow method graph. business insights.

Fig.5. Performance of decision tree measured across different


parameters.

Fig. 3. Elbow method revels the optimal number of cluster to be 4

A key algorithm for clustering data is k-means. The


objective is to divide an unlabelled data set into a certain

VOLUME 9 ISSUE 5 2024 PAGE NO: 533


NOVYI MIR Research Journal ISSN NO: 0130-7673

Analysis Of Machine Learning Techniques For Churn Prediction and


Factor Identification In Telecom Sector”,IEEE, vol.7, 2019
[3] Abdelrahim Kasem Ahmad, Assef Jafar and Kadan Aljoumaa,
“Customer churn prediction in telecom using machine learning in big
data platform”,Springer, 2019
[4] Nurul Izzati Mohammed, Saiful Adli Ismail, Mohd Nazri Kama
Othman Mohd Yusop Azri Azmi, “Customer churn prediction in
telecommunication industry using machine learning classifiers”
Association for computing machinery, 2019
[5] Anna Sniegula, Aneta Poniszewska- Maranda, Milan popovic, “Study
of machine learning methods for customer churn prediction in
telecommunication company”, Association for computing machinery,
2019
Fig.6. Performance of random forest measured across different [6] Jaehyun ahn, Junsik hwang, Doyoung kim, Hyukgeun choi,Shinjin
parameters.ss kang, “A Survey on Churn Analysis in Various Business
Domains”,IEEE,Vol 8,2020
V. CONCLUSION [7] Hu, Xin, Yanfei Yang, Lanhua Chen, Siru Zhu. "Research on a
In conclusion, this project highlights the effectiveness of customer churn combination prediction model based on decision tree
and neural network." In IEEE 5th International Conference on Cloud
machine learning in predicting churn and segmenting Computing and Big Data Analytics, pp. 129-1323. IEEE, 2020.
customers in the telecom industry. By employing a decision [8] Shuli wu, Wei-Chuen Yau, Thian-Song Ong and Siew-Chin Chong,
tree algorithm and random forests, we've developed accurate “Integrated Churn Prediction and Customer Segmentation Framework
models to identify at-risk customers and inform retention for Telco Business”,Vol.9,IEEE, 2021
strategies. Random forest, particularly, excels in handling [9] Praveen Lalwani, Manas Kumar Mishra, Jasroop Singh Chadha,
complex datasets and mitigating overfitting, making it a Pratyush Sethi, “Customer churn prediction system: a machine learning
approach” Springer,2021
valuable tool for determining churn rate. These
[10] Soumi De ,P. Prabu,“A Sampling-Based Stack Framework for
insights provide telecom companies with actionable Imbalanced Learning in Churn Prediction”,IEEE,Vol.10,2022
strategies to improve customer retention and drive long-term
business success in a competitive market. IEEE conference templates contain guidance text for
composing and formatting conference papers. Please
ensure that all template text is removed from your
REFERENCES conference paper prior to submission to the
conference. Failure to remove template text from
[1] Abhishek Gaur, “Predicting Customer Churn Prediction in Telecom your paper may result in your paper not being
Sector using Various Machine Learning Techniques”,IEEE,2018 published.
[2] Irfan Ullah,Basit Raza,Ahmad Kamran Malik,Muhammad Imran,Saif
Ul Islam,Sung Won Kim , “A Churn Prediction Model Using RF:

VOLUME 9 ISSUE 5 2024 PAGE NO: 534

You might also like