0% found this document useful (0 votes)

36 views10 pages

Business Problem

Uploaded by

mansoordool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views10 pages

Business Problem

Uploaded by

mansoordool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Business Problem: Improving Customer Retention Rate for a Retail Bank

• Discuss and justify whether your problem relates a regression analysis or a classification analysis.

Section 2: Understand the Data (30%)

• Explore and discuss whether any of the data attributes you have focused on are closely correlated

with other attributes - either positively or negatively.

• Include at least 3 Tableau-generated visualisations (e.g., map, scatter plot, bar chart, pie chart,

box-and-whisker plot) that give different insights to support your discussions.

Certainly! Here are three ideas for Tableau-generated visualizations that can provide different
insights to support your discussions on customer churn prediction:

1. Geographical Churn Rate Map:

o Create a map visualization in Tableau that shows the churn rates across
different geographical regions.
o Use the Geography attribute to plot the data points on a map, with each region
represented by a color or size based on its churn rate.
o This visualization will help identify any spatial patterns or hotspots of churn,
allowing you to target retention efforts in specific regions.
o Insights: The map may reveal that certain regions have significantly higher
churn rates compared to others, indicating potential regional factors
influencing churn. It could also highlight areas where the bank has a strong
customer base with low churn rates.
2. Customer Tenure vs. Balance Scatter Plot:
o Create a scatter plot in Tableau with Tenure on the x-axis and Balance on the
y-axis.
o Color-code the data points based on the churn status (churned or retained).
o This visualization will help explore the relationship between customer tenure
and account balance, and how it relates to churn.
o Insights: The scatter plot may show that customers with longer tenures and
higher balances are more likely to be retained, while those with shorter tenures
and lower balances are at a higher risk of churn. It could also reveal any
outliers or clusters of customers with specific tenure and balance
combinations.
3. Age and Gender Churn Rate Bar Chart:
o Create a stacked bar chart in Tableau that displays the churn rates by age
group and gender.
o Divide the age attribute into meaningful age groups (e.g., 18-24, 25-34, 35-44,
etc.) and create separate bars for each age group.
o Stack the bars based on gender (male and female) to show the proportion of
churned and retained customers within each age group and gender
combination.
o Insights: The bar chart may reveal that certain age groups have higher churn
rates than others, indicating that age is a significant factor in churn behavior. It
could also highlight any gender differences in churn rates within each age
group, suggesting potential demographic segments to focus on for retention
efforts.
• Include at least 3 R-generated plots or aggregation tables that give different insights to support

your discussions.

Certainly! Here are three ideas for R-generated plots or aggregation tables that can provide
different insights to support your discussions on customer churn prediction:

1. Churn Rate by Product Combination:

o Create an aggregation table in R that shows the churn rates for different
combinations of products held by customers.
o Use the "No Of Products" attribute to group customers based on the number of
products they have (e.g., 1 product, 2 products, 3 products, etc.).
o Calculate the churn rate for each product combination group by counting the
number of churned customers and the total number of customers in each
group.
o Insights: The aggregation table may reveal that customers with specific
product combinations have higher or lower churn rates. For example,
customers with only one product may have a higher churn rate compared to
those with multiple products, indicating the importance of cross-selling and
product bundling in reducing churn.
2. Correlation Heatmap:
o Generate a correlation heatmap in R using the numerical attributes in your
dataset, such as Age, Tenure, Balance, CreditScore, and Estimated Salary.
o The heatmap will display the pairwise correlations between these attributes
using a color scale, with darker colors indicating stronger correlations.
o Insights: The correlation heatmap may highlight significant correlations
between certain attributes. For instance, if Age and Tenure have a strong
positive correlation, it suggests that older customers tend to have longer
relationships with the bank. This information can help identify important
factors influencing churn and guide feature selection for your predictive
models.
3. Decision Tree Visualization:
o Train a decision tree model in R using the customer churn data, with the churn
status as the target variable and relevant attributes as predictors.
o Visualize the decision tree using R's plotting capabilities, such as the "rpart" or
"party" package.
o The decision tree plot will display the split points and conditions at each node,
leading to the classification of customers as churned or retained.
o Insights: The decision tree visualization provides a clear and interpretable
representation of the churn prediction logic. It can identify the most important
attributes and thresholds that contribute to churn. For example, the tree may
reveal that customers with a Balance below a certain threshold and a
CreditScore above a specific value are more likely to churn, providing
actionable insights for targeted retention strategies.

• Include the R-code you used in the appendix of your report.

Section 3: Prepare the Data (10%)

• If required, explain how you have derived your chosen target attribute(s) in Tableau or in R.

I think this isn’t needed since the churn status is already given in the data set

• Discuss and justify what other steps you may have taken to prepare your data, including, where

relevant: removing attributes from consideration, adding further "derived" attributes (e.g., Dates),

imputing "reasonable" values for missing data, transforming attributes, and standardising data

values.

• Prepare suitable separate "Training" and "Testing" datasets.

80-20 split should be fine. I think.

• Include any R-code you used to prepare your data in the appendix of your report.

Section 4: Generate and Test Prediction Models (40%)

• Select and justify at least 3 different prediction models (with at least 1 ensemble model) that are

likely to best help with your stated problem objectives.

When selecting prediction models for your customer churn analysis, it's important to consider
the nature of your problem, the characteristics of your dataset, and the interpretability and
performance requirements. Here are three different prediction models, including one
ensemble model, that are commonly used and likely to be effective for churn prediction:

1. Logistic Regression:
o Logistic regression is a popular choice for binary classification problems like
churn prediction.
o It models the probability of an outcome (churn or retention) based on a linear
combination of the predictor variables.
o Logistic regression provides interpretable coefficients that indicate the impact
of each predictor on the churn probability.
o It assumes a linear relationship between the predictors and the log-odds of the
outcome, making it a simple and interpretable model.
o Logistic regression can handle both numerical and categorical predictors and
is less prone to overfitting compared to more complex models.
2. Random Forest (Ensemble Model):
o Random Forest is an ensemble model that combines multiple decision trees to
make predictions.
o It is known for its ability to handle high-dimensional datasets and capture
complex interactions between predictors.
o Random Forest reduces overfitting by creating a diverse set of decision trees
and aggregating their predictions.
o It can handle both numerical and categorical predictors and automatically
captures non-linear relationships.
o Random Forest provides feature importance measures, indicating the relative
contribution of each predictor to the churn prediction.
o It is robust to outliers and can handle missing data without extensive
preprocessing.
3. Gradient Boosting (e.g., XGBoost, LightGBM):
o Gradient Boosting is another ensemble model that combines weak learners
(usually decision trees) to create a strong predictive model.
o It iteratively trains decision trees, with each tree focusing on correcting the
errors made by the previous trees.
o Gradient Boosting algorithms, such as XGBoost and LightGBM, have gained
popularity due to their excellent performance and scalability.
o They can handle numerical and categorical predictors and capture complex
non-linear relationships.
o Gradient Boosting models provide feature importance measures and can
handle missing data.
o They have hyperparameters that allow for fine-tuning the model's performance
and complexity.

Justification:

 Logistic Regression serves as a simple and interpretable baseline model. It provides

insights into the impact of each predictor on churn probability and is suitable when
interpretability is important.
 Random Forest, as an ensemble model, captures complex interactions between
predictors and reduces overfitting. It is robust to outliers and can handle high-
dimensional datasets, making it a strong candidate for churn prediction.
 Gradient Boosting models, such as XGBoost or LightGBM, have shown excellent
performance in various classification tasks, including churn prediction. They can
capture non-linear relationships and handle large datasets efficiently.

By selecting these three models, you cover a range of approaches, from simple and
interpretable (Logistic Regression) to more complex and powerful (Random Forest and
Gradient Boosting). This allows you to compare their performance and gain insights from
different perspectives.

In your report, justify your choice of models based on their strengths and suitability for the
churn prediction problem. Explain how each model can capture the underlying patterns in
your dataset and contribute to achieving your problem objectives.

Remember to evaluate the performance of each model using appropriate evaluation metrics
for binary classification, such as accuracy, precision, recall, F1-score, and area under the
ROC curve (AUC-ROC). Compare the results and discuss the trade-offs between model
complexity and interpretability.

By providing a clear justification for your model selection and evaluating their performance,
you demonstrate a thoughtful approach to predictive modeling and ensure that your churn
prediction analysis is comprehensive and well-grounded.

To configure your selected models (Logistic Regression, Random Forest, and Gradient
Boosting) for optimal performance and insights, you need to consider attribute selection and
model tuning parameters. Here's how you can approach this:
• Configure your models (e.g., select attributes and/or other model tuning parameters) that you

expect will best deliver relevant insights and/or provide the lowest error rates, justifying your

decisions.

1. Attribute Selection:
o Start by exploring the relevance and importance of each attribute in relation to
customer churn.
o Use domain knowledge and insights from exploratory data analysis to identify
attributes that are likely to be predictive of churn.
o Consider statistical methods, such as correlation analysis or chi-square tests, to
assess the relationship between each attribute and the churn outcome.
o Use feature importance measures provided by Random Forest and Gradient
Boosting models to identify the most influential attributes.
o Select a subset of attributes that collectively provide the best predictive power
while avoiding multicollinearity and overfitting.
o Justify your attribute selection based on their relevance to churn prediction
and the insights they can provide to the business.
2. Model Tuning Parameters:
o Each model has its own set of tuning parameters that can impact its
performance and complexity.
o For Logistic Regression:
 Consider regularization techniques, such as L1 (Lasso) or L2 (Ridge),
to control overfitting and improve model generalization.
 Adjust the regularization strength (e.g., C parameter in scikit-learn) to
balance between model complexity and performance.
o For Random Forest:
 Tune the number of trees in the forest (n_estimators) to find the
optimal balance between model complexity and performance.
 Adjust the maximum depth of the trees (max_depth) or the minimum
number of samples required to split a node (min_samples_split) to
control overfitting.
 Experiment with the number of features considered at each split
(max_features) to introduce randomness and reduce correlation
between trees.
o For Gradient Boosting (XGBoost or LightGBM):
 Tune the learning rate (learning_rate) to control the contribution of
each tree to the final prediction.
 Adjust the maximum depth of the trees (max_depth) to control the
complexity of the model.
 Modify the number of boosting rounds (n_estimators) to find the
optimal number of trees in the ensemble.
 Consider regularization parameters, such as L1 (reg_alpha) and L2
(reg_lambda), to prevent overfitting.
o Use techniques like grid search or random search to systematically explore
different combinations of tuning parameters and find the best configuration for
each model.
oJustify your choices of tuning parameters based on their impact on model
performance, generalization, and computational efficiency.
3. Model Evaluation:
o Split your dataset into training and testing sets to evaluate the performance of
each model.
o Use cross-validation techniques, such as k-fold cross-validation, to assess the
model's performance on multiple subsets of the training data.
o Evaluate the models using relevant performance metrics for churn prediction,
such as accuracy, precision, recall, F1-score, and AUC-ROC.
o Compare the performance of the models and select the one that provides the
best balance between predictive accuracy and interpretability.
o Justify your model selection based on the evaluation results and the business
requirements for churn prediction.

In your report, document the attribute selection process and explain the reasoning behind
your choices. Provide the statistical analysis or feature importance measures used to support
your decisions.

For model tuning, describe the tuning parameters considered for each model and the range of
values explored. Present the results of the grid search or random search, highlighting the best-
performing configurations for each model.

Evaluate the models using appropriate performance metrics and present the results in a clear
and concise manner. Justify your final model selection based on the evaluation outcomes and
the trade-offs between performance and interpretability.

By carefully selecting attributes and tuning model parameters, you can optimize your models
to deliver relevant insights and achieve the lowest error rates for churn prediction. This
systematic approach demonstrates a thorough understanding of the modeling process and
enhances the credibility of your analysis.

• Run these models, discussing the model outputs and drawing, where possible, insights related to

your problem.

• Select proper evaluation metrics to measure the accuracy of your models. Determine and
comment on the best model across your 3 prediction models.

To evaluate the accuracy of your churn prediction models (Logistic Regression, Random
Forest, and Gradient Boosting), you need to select appropriate evaluation metrics. Since
churn prediction is a binary classification problem, you can use the following metrics:

1. Accuracy:
o Accuracy measures the overall correctness of the model's predictions.
o It is calculated as the ratio of correct predictions (true positives + true
negatives) to the total number of instances.
o However, accuracy alone may not be sufficient if your dataset is imbalanced,
as it can be biased towards the majority class.
2. Precision:
o Precision focuses on the correctness of the positive predictions (churned
customers).
o It is calculated as the ratio of true positives (correctly predicted churned
customers) to the total predicted positives (true positives + false positives).
o High precision indicates that when the model predicts a customer as churned,
it is highly likely to be correct.
3. Recall (Sensitivity):
o Recall measures the model's ability to identify churned customers correctly.
o It is calculated as the ratio of true positives (correctly predicted churned
customers) to the total actual positives (true positives + false negatives).
o High recall indicates that the model captures a large portion of the actual
churned customers.
4. F1-score:
o The F1-score is the harmonic mean of precision and recall.
o It provides a balanced measure of the model's performance, considering both
precision and recall.
o F1-score is useful when you want to find a balance between precision and
recall and when the class distribution is imbalanced.
5. Area Under the ROC Curve (AUC-ROC):
o AUC-ROC measures the model's ability to discriminate between churned and
non-churned customers.
o It plots the true positive rate (recall) against the false positive rate at various
classification thresholds.
o A higher AUC-ROC indicates better model performance, with a value of 1
representing a perfect classifier.

To determine the best model among your three prediction models, you should evaluate them
using these metrics on the testing dataset. Here's an example of how you can present and
comment on the results:

Model Accuracy Precision Recall F1-score AUC-ROC

Logistic Regression 0.85 0.80 0.75 0.77 0.83
Random Forest 0.88 0.85 0.82 0.83 0.90
Gradient Boosting 0.90 0.88 0.86 0.87 0.93

Based on the evaluation metrics, the Gradient Boosting model appears to be the best-
performing model among the three. It achieves the highest accuracy, precision, recall, F1-
score, and AUC-ROC compared to Logistic Regression and Random Forest.

The high accuracy indicates that the Gradient Boosting model correctly predicts the churn
status for a large proportion of customers. The high precision suggests that when the model
predicts a customer as churned, it is highly likely to be correct. The high recall indicates that
the model captures a significant portion of the actual churned customers.

The F1-score, which balances precision and recall, is also the highest for the Gradient
Boosting model, indicating a good overall performance. The high AUC-ROC value suggests
that the model has a strong ability to discriminate between churned and non-churned
customers.
However, it's important to consider the specific business requirements and the trade-off
between different metrics. For example, if the cost of false positives (incorrectly predicting a
customer as churned) is high, you may prioritize precision over recall.

In your report, present the evaluation metrics for each model in a clear and concise manner,
using tables or visualizations. Provide a commentary on the performance of each model and
justify your selection of the best model based on the evaluation results and business
considerations.

Remember to also discuss any limitations or assumptions made during the evaluation process
and consider the practical implications of implementing the chosen model for churn
prediction in the business context.

• Discuss what steps you may have taken to improve your individual models.

• Include any R-code you used in the appendix of your report.

To improve the performance of your individual churn prediction models (Logistic

Regression, Random Forest, and Gradient Boosting), you can take several steps. Here are
some common techniques and considerations:

1. Feature Engineering:
o Explore creating new features or transforming existing ones to capture
additional information relevant to churn prediction.
o Derive features based on domain knowledge, such as calculating customer
lifetime value, customer engagement scores, or aggregating transactional data.
o Experiment with feature interactions or combinations that may have predictive
power.
o Apply feature scaling or normalization techniques to ensure that the features
have similar ranges and avoid any particular feature dominating the others.
2. Handling Class Imbalance:
o If your dataset has a significant class imbalance (i.e., a much higher proportion
of non-churned customers compared to churned customers), it can affect
model performance.
o Consider techniques such as oversampling the minority class (e.g., using
SMOTE), undersampling the majority class, or assigning class weights to
balance the importance of each class during training.
o Evaluate the impact of these techniques on the model's performance and
choose the approach that yields the best results.
3. Hyperparameter Tuning:
o Fine-tune the hyperparameters of each model to optimize its performance.
o Use techniques like grid search or random search to systematically explore
different combinations of hyperparameters.
o For Logistic Regression, tune the regularization strength (e.g., C parameter)
and consider different penalty terms (L1 or L2).
o For Random Forest, experiment with the number of trees, maximum depth,
minimum samples per leaf, and other relevant hyperparameters.
o For Gradient Boosting, tune the learning rate, maximum depth, number of
boosting rounds, and regularization parameters.
o Use cross-validation to evaluate the model's performance for each
hyperparameter combination and select the best configuration.
4. Ensemble Methods:
o Explore combining multiple models to create an ensemble that leverages the
strengths of each individual model.
o Techniques like voting (majority or weighted) or stacking can be used to
aggregate the predictions of different models.
o Experiment with different combinations of models and evaluate the
performance of the ensemble compared to the individual models.
5. Model Interpretation and Feature Importance:
o Analyze the coefficients or feature importances of each model to gain insights
into the most influential predictors of churn.
o Use techniques like permutation importance or SHAP (SHapley Additive
exPlanations) to assess the impact of each feature on the model's predictions.
o Visualize the feature importances or create partial dependence plots to
understand the relationship between the features and churn probability.
o Use these insights to refine the feature selection process or guide further data
exploration and business decisions.
6. Cross-Validation and Model Stability:
o Assess the stability and generalization ability of each model using cross-
validation techniques.
o Use strategies like k-fold cross-validation or stratified k-fold cross-validation
to evaluate the model's performance across different subsets of the data.
o Analyze the variation in performance metrics across the cross-validation folds
to ensure that the model is not overfitting or underfitting.
o Consider using techniques like early stopping or model checkpointing to
prevent overfitting during training.

In your report, discuss the specific steps you took to improve each individual model. Provide
details on the feature engineering techniques applied, the handling of class imbalance,
hyperparameter tuning process, and any ensemble methods used.

Present the results of these improvement steps, comparing the performance metrics before
and after each modification. Highlight the most effective techniques for each model and
discuss the insights gained from model interpretation and feature importance analysis.

Finally, summarize the overall impact of these improvements on the models' performance and
discuss any trade-offs or limitations encountered during the optimization process.

By documenting the steps taken to improve the individual models, you demonstrate a
thorough and iterative approach to model development and showcase your ability to optimize
the models for better churn prediction performance.

Interim Report
No ratings yet
Interim Report
17 pages
NM Lab Manual (Thirumoorthy D)
No ratings yet
NM Lab Manual (Thirumoorthy D)
41 pages
Wa0004.
No ratings yet
Wa0004.
70 pages
Churn Analysis of Bank Customers
100% (1)
Churn Analysis of Bank Customers
12 pages
Week 2 - DSML01-01 - FE2384989551
No ratings yet
Week 2 - DSML01-01 - FE2384989551
4 pages
Phase-2 (1) .Docx - Abi
No ratings yet
Phase-2 (1) .Docx - Abi
11 pages
Customers Churn Insights
No ratings yet
Customers Churn Insights
23 pages
Aiml MP
No ratings yet
Aiml MP
16 pages
Sample Major Project-1 Report-7th Sem Word
No ratings yet
Sample Major Project-1 Report-7th Sem Word
36 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
Varshini Phase 2
No ratings yet
Varshini Phase 2
19 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
Jtpes 2024 4 5 - 10
No ratings yet
Jtpes 2024 4 5 - 10
8 pages
Crypto Paper
No ratings yet
Crypto Paper
15 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Group 13 - Analyzing Customer Churn
No ratings yet
Group 13 - Analyzing Customer Churn
6 pages
DSS 2 Draft
No ratings yet
DSS 2 Draft
33 pages
Telco Customer Churn Prediction
No ratings yet
Telco Customer Churn Prediction
9 pages
Predictive Analytics Strategy
No ratings yet
Predictive Analytics Strategy
4 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
Customer Churn Analysis
No ratings yet
Customer Churn Analysis
6 pages
Synopsis Customer
No ratings yet
Synopsis Customer
12 pages
Part 2
No ratings yet
Part 2
7 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
Churn Data Prediction Project
No ratings yet
Churn Data Prediction Project
5 pages
Customer Churn Prediction Capstone Himanshu
No ratings yet
Customer Churn Prediction Capstone Himanshu
5 pages
Customer Churn
No ratings yet
Customer Churn
38 pages
Nimish
No ratings yet
Nimish
4 pages
PowerCo Problem
No ratings yet
PowerCo Problem
2 pages
Concept Note - Chhandavi Gowardhan
No ratings yet
Concept Note - Chhandavi Gowardhan
2 pages
Wa0001.
No ratings yet
Wa0001.
11 pages
Rainfall Prediction Using Machine Learning
100% (1)
Rainfall Prediction Using Machine Learning
6 pages
Customer Churn Insights
No ratings yet
Customer Churn Insights
8 pages
Synopsis
No ratings yet
Synopsis
3 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
Phase 1
No ratings yet
Phase 1
2 pages
Abstract On CPP Project Sample
No ratings yet
Abstract On CPP Project Sample
19 pages
Phase 3
No ratings yet
Phase 3
16 pages
Paper Published
No ratings yet
Paper Published
5 pages
Churn Analysis for UK Retailer
No ratings yet
Churn Analysis for UK Retailer
15 pages
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
100% (1)
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
14 pages
Proposal CIS 412
No ratings yet
Proposal CIS 412
1 page
SQL Project
No ratings yet
SQL Project
21 pages
Final Report Srini
No ratings yet
Final Report Srini
24 pages
Data Mining
No ratings yet
Data Mining
7 pages
ML Customer Churn Case Study
No ratings yet
ML Customer Churn Case Study
4 pages
Inthiyas Phase2 PRJ
No ratings yet
Inthiyas Phase2 PRJ
8 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Customer Churn Prediction Report
No ratings yet
Customer Churn Prediction Report
5 pages
Methodology
No ratings yet
Methodology
12 pages
Report
No ratings yet
Report
17 pages
GRP 10 Report
No ratings yet
GRP 10 Report
16 pages
INNOVATION - PDF Phrase 2
No ratings yet
INNOVATION - PDF Phrase 2
9 pages
Customer Churn Project Report
No ratings yet
Customer Churn Project Report
3 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Behavioral Attributes and Financial Churn Prediction: Regulararticle Open Access
No ratings yet
Behavioral Attributes and Financial Churn Prediction: Regulararticle Open Access
18 pages
Interim Repor - Final
No ratings yet
Interim Repor - Final
19 pages
Data Mining Using Python Manual
No ratings yet
Data Mining Using Python Manual
69 pages
Football Prediction with ML
No ratings yet
Football Prediction with ML
73 pages
OBL Final
No ratings yet
OBL Final
16 pages
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
No ratings yet
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
4 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
AI & Data Science
No ratings yet
AI & Data Science
152 pages
KYC Passing Rate Data Analysis Using Feature Engineering Clustering and Data Visualization
No ratings yet
KYC Passing Rate Data Analysis Using Feature Engineering Clustering and Data Visualization
56 pages
Data Mining Exam Prep Guide
No ratings yet
Data Mining Exam Prep Guide
4 pages
AKTU ML PYQ Units 1 To 5 Clean
No ratings yet
AKTU ML PYQ Units 1 To 5 Clean
5 pages
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
No ratings yet
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
6 pages
AI Unit 4
No ratings yet
AI Unit 4
11 pages
4A. Decision Trees
No ratings yet
4A. Decision Trees
35 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
12 pages
Credit Scoring and Data Mining
100% (1)
Credit Scoring and Data Mining
170 pages
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
No ratings yet
A Tutorial Review On Text Mining Algorithms: Mrs. Sayantani Ghosh, Mr. Sudipta Roy, and Prof. Samir K. Bandyopadhyay
11 pages
Machine Learning Techniques Guide
No ratings yet
Machine Learning Techniques Guide
16 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
Ensemble Learning: Inspire Educate Transform
No ratings yet
Ensemble Learning: Inspire Educate Transform
39 pages
Quiz 6 - Classification
No ratings yet
Quiz 6 - Classification
2 pages
DSS2020-Creating The Best First Impression - Designing Online Product Photos To Increase Sales PDF
No ratings yet
DSS2020-Creating The Best First Impression - Designing Online Product Photos To Increase Sales PDF
14 pages
DSML Practical
No ratings yet
DSML Practical
4 pages
Application of Machine Learning in A Mineral LeachingProcessTaking Pyrolusite Leaching As An Example
No ratings yet
Application of Machine Learning in A Mineral LeachingProcessTaking Pyrolusite Leaching As An Example
9 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Data Mining To Improve Personnel Selection and Enhance Human Capital: A Case Study in High-Technology Industry
No ratings yet
Data Mining To Improve Personnel Selection and Enhance Human Capital: A Case Study in High-Technology Industry
11 pages
Lightgbm Abril2019 PDF
No ratings yet
Lightgbm Abril2019 PDF
157 pages
Implementation of Machine Learning Algorithms To C
No ratings yet
Implementation of Machine Learning Algorithms To C
17 pages
Econometrics Meets Machine Learning
No ratings yet
Econometrics Meets Machine Learning
38 pages
DRL For WSN Book
No ratings yet
DRL For WSN Book
78 pages

Business Problem

Uploaded by

Business Problem

Uploaded by

Business Problem: Improving Customer Retention Rate for a Retail Bank

Section 2: Understand the Data (30%)

with other attributes - either positively or negatively.

box-and-whisker plot) that give different insights to support your discussions.

1. Geographical Churn Rate Map:

1. Churn Rate by Product Combination:

• Include the R-code you used in the appendix of your report.

• Prepare suitable separate "Training" and "Testing" datasets.

80-20 split should be fine. I think.

Section 4: Generate and Test Prediction Models (40%)

likely to best help with your stated problem objectives.

 Logistic Regression serves as a simple and interpretable baseline model. It provides

Model Accuracy Precision Recall F1-score AUC-ROC

• Include any R-code you used in the appendix of your report.

To improve the performance of your individual churn prediction models (Logistic

You might also like