0% found this document useful (0 votes)

27 views12 pages

SPPUML3

Machine learning lab assignment 3

Uploaded by

kanaseaditya800

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views12 pages

SPPUML3

Machine learning lab assignment 3

Uploaded by

kanaseaditya800

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

In [39]: 1 #Name:- Kanase Aditya Madhukar

2 #Roll No.:- 2441059

3 #Batch:- D
4 #Assignment no 3

Given a bank customer, build a neural

network-based classifier that can determine
whether
they will leave or not in the next 6 months. Dataset Description: The case study is from an
open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14
distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure,
Balance, etc. Link to the Kaggle project: https://www.kaggle.com/barelydedicated/bank-
customer-churn-modeling (https://www.kaggle.com/barelydedicated/bank-customer-churn-
modeling) Perform following steps:

1. Read the dataset.

2. Distinguish the feature and target set and divide the data set into training and test sets.
3. Normalize the train and test data.
4. Initialize and build the model. Identify the points of improvement and implement the
same.
5. Print the accuracy score and confusion matrix (5 points).

In [ ]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
5 from sklearn.preprocessing import StandardScaler
6 import io

Read the Dataset

In [2]: 1 from google.colab import files
2 uploaded=files.upload()

Choose Files No file chosen

Upload widget is only available when the cell has been executed in the current browser session.
Please rerun this cell to enable.

Saving bank.csv to bank.csv

In [40]: 1 df=pd.read_csv(io.StringIO(uploaded['bank.csv'].decode('utf-8')))
2 df.head()

Out[40]: RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Bala

0 1 15634602 Hargrave 619 France Female 42 2

1 2 15647311 Hill 608 Spain Female 41 1 8380

2 3 15619304 Onio 502 France Female 42 8 15966

3 4 15701354 Boni 699 France Female 39 1

4 5 15737888 Mitchell 850 Spain Female 43 2 12551

2. Drop the Columns which are unique for all users

In [41]: 1 df=df.drop(['RowNumber','CustomerId','Surname'],axis=1)
2 df.head()

Out[41]: CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard Is

0 619 France Female 42 2 0.00 1 1

1 608 Spain Female 41 1 83807.86 1 0

2 502 France Female 42 8 159660.80 3 1

3 699 France Female 39 1 0.00 2 0

4 850 Spain Female 43 2 125510.82 1 1

In [42]: 1 df.isna().any()
2 df.isna().sum()

Out[42]: CreditScore 0
Geography 0
Gender 0
Age 0
Tenure 0
Balance 0
NumOfProducts 0
HasCrCard 0
IsActiveMember 0
EstimatedSalary 0
Exited 0
dtype: int64
BiVariate Analysis
In [43]: 1 print(df.shape)
2 df.info()

(10000, 11)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CreditScore 10000 non-null int64
1 Geography 10000 non-null object
2 Gender 10000 non-null object
3 Age 10000 non-null int64
4 Tenure 10000 non-null int64
5 Balance 10000 non-null float64
6 NumOfProducts 10000 non-null int64
7 HasCrCard 10000 non-null int64
8 IsActiveMember 10000 non-null int64
9 EstimatedSalary 10000 non-null float64
10 Exited 10000 non-null int64
dtypes: float64(2), int64(7), object(2)
memory usage: 859.5+ KB

In [44]: 1 df.describe()

Out[44]: CreditScore Age Tenure Balance NumOfProducts HasCrCar

count 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.0000

mean 650.528800 38.921800 5.012800 76485.889288 1.530200 0.7055

std 96.653299 10.487806 2.892174 62397.405202 0.581654 0.4558

min 350.000000 18.000000 0.000000 0.000000 1.000000 0.0000

25% 584.000000 32.000000 3.000000 0.000000 1.000000 0.0000

50% 652.000000 37.000000 5.000000 97198.540000 1.000000 1.0000

75% 718.000000 44.000000 7.000000 127644.240000 2.000000 1.0000

max 850.000000 92.000000 10.000000 250898.090000 4.000000 1.0000

Before performing Bivariate analysis, Lets bring all the features to the same
range
In [45]: 1 ## Scale the data
2 scaler=StandardScaler()
3 ## Extract only the Numerical Columns to perform Bivariate Analysis
4 subset=df.drop(['Geography','Gender','HasCrCard','IsActiveMember'],axis
5 scaled=scaler.fit_transform(subset)
6 scaled_df=pd.DataFrame(scaled,columns=subset.columns)
7 sns.pairplot(scaled_df,diag_kind='kde')
8

Out[45]: <seaborn.axisgrid.PairGrid at 0x7fe8126f0940>

In [46]: 1 sns.heatmap(scaled_df.corr(),annot=True,cmap='rainbow')

Out[46]: <matplotlib.axes._subplots.AxesSubplot at 0x7fe7cb4ef9b0>

From the above plots, We can see that there is no significant
Linear relationship between the features

In [47]: 1 ## Categorical Features vs Target Variable

2 sns.countplot(x='Geography',data=df,hue='Exited')
3 plt.show()
4 sns.countplot(x='Gender',data=df,hue='Exited')
5 plt.show()
6 sns.countplot(x='HasCrCard',data=df,hue='Exited')
7 plt.show()
8 sns.countplot(x='IsActiveMember',data=df,hue='Exited')
9 plt.show()
Analysing the Numerical Features relationship with the Target
variable. Here 'Exited' is the Target Feature.

In [50]: 1 subset=subset.drop('Exited',axis=1)
2 for i in subset.columns:
3 sns.boxplot(df['Exited'],df[i],hue=df['Gender'])
4 plt.show()

/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43: Futur
eWarning: Pass the following variables as keyword args: x, y. From vers
ion 0.12, the only valid positional argument will be `data`, and passin
g other arguments without an explicit keyword will result in an error o
r misinterpretation.
FutureWarning

Insights from Bivariate Plots

1. The Avg Credit Score seem to be almost the same for Active and Churned customers
2. Young People seem to stick to the bank compared to older people
3. The Average Bank Balance is high for Churned Customers
4. The churning rate is high with German Customers
5. The Churning rate is high among the Non-Active Members

4. Distinguish the Target and Feature Set and divide the dataset
into Training and Test sets

In [51]: 1 X=df.drop('Exited',axis=1)
2 y=df.pop('Exited')
3
In [52]: 1 from sklearn.model_selection import train_test_split
2 X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.10,rando
3 X_train,X_val,y_train,y_val=train_test_split(X_train,y_train,test_size=
4 print("X_train size is {}".format(X_train.shape[0]))
5 print("X_val size is {}".format(X_val.shape[0]))
6 print("X_test size is {}".format(X_test.shape[0]))
7
8

X_train size is 8100

X_val size is 900
X_test size is 1000

In [53]: 1 ## Standardising the train, Val and Test data

2 from sklearn.preprocessing import StandardScaler
3 scaler=StandardScaler()
4 num_cols=['CreditScore','Age','Tenure','Balance','NumOfProducts','Estim
5 num_subset=scaler.fit_transform(X_train[num_cols])
6 X_train_num_df=pd.DataFrame(num_subset,columns=num_cols)
7 X_train_num_df['Geography']=list(X_train['Geography'])
8 X_train_num_df['Gender']=list(X_train['Gender'])
9 X_train_num_df['HasCrCard']=list(X_train['HasCrCard'])
10 X_train_num_df['IsActiveMember']=list(X_train['IsActiveMember'])
11 X_train_num_df.head()
12 ## Standardise the Validation data
13 num_subset=scaler.fit_transform(X_val[num_cols])
14 X_val_num_df=pd.DataFrame(num_subset,columns=num_cols)
15 X_val_num_df['Geography']=list(X_val['Geography'])
16 X_val_num_df['Gender']=list(X_val['Gender'])
17 X_val_num_df['HasCrCard']=list(X_val['HasCrCard'])
18 X_val_num_df['IsActiveMember']=list(X_val['IsActiveMember'])
19 ## Standardise the Test data
20 num_subset=scaler.fit_transform(X_test[num_cols])
21 X_test_num_df=pd.DataFrame(num_subset,columns=num_cols)
22 X_test_num_df['Geography']=list(X_test['Geography'])
23 X_test_num_df['Gender']=list(X_test['Gender'])
24 X_test_num_df['HasCrCard']=list(X_test['HasCrCard'])
25 X_test_num_df['IsActiveMember']=list(X_test['IsActiveMember'])
26
27

In [54]: 1 ## Convert the categorical features to numerical

2 X_train_num_df=pd.get_dummies(X_train_num_df,columns=['Geography','Gend
3 X_test_num_df=pd.get_dummies(X_test_num_df,columns=['Geography','Gender
4 X_val_num_df=pd.get_dummies(X_val_num_df,columns=['Geography','Gender']
5 X_train_num_df.head()

Out[54]: CreditScore Age Tenure Balance NumOfProducts EstimatedSalary HasCrCard

0 -1.178587 -1.041960 -1.732257 0.198686 0.820905 1.560315 1

1 -0.380169 -1.326982 1.730718 -0.022020 -0.907991 -0.713592 1

2 -0.349062 1.808258 -0.693364 0.681178 0.820905 -1.126515 1

3 0.625629 2.378302 -0.347067 -1.229191 0.820905 -1.682740 1

4 -0.203895 -1.136967 1.730718 0.924256 -0.907991 1.332535 1

Initialise and build the Model

In [55]: 1 from tensorflow.keras import Sequential

2 from tensorflow.keras.layers import Dense
3
4 model=Sequential()
5 model.add(Dense(7,activation='relu'))
6 model.add(Dense(10,activation='relu'))
7 model.add(Dense(1,activation='sigmoid'))

In [56]: 1 import tensorflow as tf

2 optimizer=tf.keras.optimizers.Adam(0.01)
3 model.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['

In [57]: 1 model.fit(X_train_num_df,y_train,epochs=100,batch_size=10,verbose=1)

Epoch 1/100
810/810 [==============================] - 1s 1ms/step - loss: 0.4511 -
accuracy: 0.8054
Epoch 2/100
810/810 [==============================] - 1s 1ms/step - loss: 0.3623 -
accuracy: 0.8493
Epoch 3/100
810/810 [==============================] - 1s 1ms/step - loss: 0.3543 -
accuracy: 0.8541
Epoch 4/100
810/810 [==============================] - 1s 1ms/step - loss: 0.3433 -
accuracy: 0.8561
Epoch 5/100
810/810 [==============================] - 1s 1ms/step - loss: 0.3291 -
accuracy: 0.8692
Epoch 6/100
810/810 [==============================] - 1s 1ms/step - loss: 0.3488 -
accuracy: 0.8560
Epoch 7/100
810/810 [ ] 1s 1ms/step loss: 0 3439

Predict the Results using 0.5 threshold

In [58]: 1 y_pred_val=model.predict(X_val_num_df)
2 y_pred_val[y_pred_val>0.5]=1
3 y_pred_val[y_pred_val <0.5]=0
In [59]: 1 y_pred_val=y_pred_val.tolist()
2 X_compare_val=X_val.copy()
3 X_compare_val['y_actual']=y_val
4 X_compare_val['y_pred']=y_pred_val
5 X_compare_val.head(10)

Out[59]: CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard

340 642 Germany Female 40 6 129502.49 2 0

8622 706 Germany Male 36 9 58571.18 2 1

8401 535 Spain Male 58 1 0.00 2 1

4338 714 Spain Male 25 2 0.00 1 1

8915 606 France Male 36 1 155655.46 1 1

2624 605 Spain Female 29 3 116805.82 1 0

2234 720 France Female 38 10 0.00 2 1

349 582 France Male 39 5 0.00 2 1

3719 850 France Female 62 1 124678.35 1 1

2171 526 Germany Male 58 9 190298.89 2 1

Confusion Matrix of the Validation set

In [60]: 1 from sklearn.metrics import confusion_matrix
2 cm_val=confusion_matrix(y_val,y_pred_val)
3 cm_val

Out[60]: array([[694, 22],

[ 96, 88]])

From the above confusion matrix, Out of 900

Validation dataset observations, our model accurately
predicted 694+88=782 and made 96+22=118 incorrect
predictions.
In [61]: 1 Accuracy=782/900
2 print("Accuracy of the Model on the Validation Data set is 86.89%")

Accuracy of the Model on the Validation Data set is 86.89%

In [62]: 1 loss1,accuracy1=model.evaluate(X_train_num_df,y_train,verbose=False)
2 loss2,accuracy2=model.evaluate(X_val_num_df,y_val,verbose=False)
3 print("Train Loss {}".format(loss1))
4 print("Train Accuracy {}".format(accuracy1))
5 print("Val Loss {}".format(loss2))
6 print("Val Accuracy {}".format(accuracy2))

Train Loss 0.33421364426612854

Train Accuracy 0.8649382591247559
Val Loss 0.348032146692276
Val Accuracy 0.8688889145851135

Since our Training Accuracy and Validation Accuracy are pretty

close, we can conclude that our model generalises well. So, lets
apply the model on the Test set and make predictions and
evaluate the model against the Test.

In [63]: 1 from sklearn import metrics

2 y_pred_test=model.predict(X_test_num_df)
3 y_pred_test[y_pred_test>0.5]=1
4 y_pred_test[y_pred_test <0.5]=0
5 cm_test=metrics.confusion_matrix(y_test,y_pred_test)
6 cm_test
7 print("Test Confusion Matrix")
8

Test Confusion Matrix

In [64]: 1 cm_test

Out[64]: array([[756, 38],

[121, 85]])

In [65]: 1 loss3,accuracy3=model.evaluate(X_test_num_df,y_test,verbose=False)
2 print("Test Accuracy is {}".format(accuracy3))
3 print("Test loss is {}".format(loss3))

Test Accuracy is 0.8410000205039978

Test loss is 0.38615888357162476

In [ ]: 1

Supervised Decision Trees A Case Study For AllLife Bank
No ratings yet
Supervised Decision Trees A Case Study For AllLife Bank
50 pages
Data Mining for Business Insights
83% (12)
Data Mining for Business Insights
34 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
DAV Lab Manual Yashraj
No ratings yet
DAV Lab Manual Yashraj
28 pages
Churn Analysis of Bank Customers
100% (1)
Churn Analysis of Bank Customers
12 pages
Credit Scores Classification
No ratings yet
Credit Scores Classification
104 pages
Total Quality Management Exam - Prelim
100% (2)
Total Quality Management Exam - Prelim
4 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Customer Bank Churn Analysis: Non Competetive Part
No ratings yet
Customer Bank Churn Analysis: Non Competetive Part
33 pages
Customer Churn Analysis 1740361695
No ratings yet
Customer Churn Analysis 1740361695
14 pages
SLAC-Proposal-May 19, 2023
No ratings yet
SLAC-Proposal-May 19, 2023
16 pages
Churn Prediction Model
No ratings yet
Churn Prediction Model
36 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
Bank Customer Churn Prediction Guide
No ratings yet
Bank Customer Churn Prediction Guide
14 pages
BankX Marketing 1744722258
No ratings yet
BankX Marketing 1744722258
29 pages
Professional Education Test
No ratings yet
Professional Education Test
7 pages
Kunal DA-12 Assignment-4
No ratings yet
Kunal DA-12 Assignment-4
26 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Ml3.ipynb - Colab
No ratings yet
Ml3.ipynb - Colab
11 pages
DSC Project 442
No ratings yet
DSC Project 442
12 pages
Exp 81
No ratings yet
Exp 81
7 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
Dsbda Exp4 Part1
No ratings yet
Dsbda Exp4 Part1
39 pages
Credit Card Approval Prediction
No ratings yet
Credit Card Approval Prediction
90 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Mlproj
No ratings yet
Mlproj
49 pages
Deep Learning Practical File
No ratings yet
Deep Learning Practical File
18 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Telco Churn Analysis
No ratings yet
Telco Churn Analysis
9 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
EDA Assignment 8 Bank Bivariate Multivariate Analysis A
No ratings yet
EDA Assignment 8 Bank Bivariate Multivariate Analysis A
12 pages
Naive Bayes Vs Logistic Regression
No ratings yet
Naive Bayes Vs Logistic Regression
16 pages
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
16 pages
LDA CreditCardDefault Code N
No ratings yet
LDA CreditCardDefault Code N
11 pages
Machine Learning Credit Rating Model
No ratings yet
Machine Learning Credit Rating Model
12 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Report
No ratings yet
Report
17 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Predictive Modelling For Customer's Credit Worthiness: 1. Data Exploration Insights
No ratings yet
Predictive Modelling For Customer's Credit Worthiness: 1. Data Exploration Insights
3 pages
Predicting Credit Risk 1713295035
No ratings yet
Predicting Credit Risk 1713295035
19 pages
Customer Churn Model Analysis
No ratings yet
Customer Churn Model Analysis
2 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
'Universalbank - CSV': #Reading The File
No ratings yet
'Universalbank - CSV': #Reading The File
4 pages
Bank Customer Segmentation Guide
No ratings yet
Bank Customer Segmentation Guide
53 pages
Common and Proper Nouns Lesson Plan
80% (5)
Common and Proper Nouns Lesson Plan
4 pages
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
No ratings yet
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
9 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
IT Professional Resume
No ratings yet
IT Professional Resume
2 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Background in A Research Paper
No ratings yet
Background in A Research Paper
2 pages
Predictionof Customer Churnin Banking Industry
No ratings yet
Predictionof Customer Churnin Banking Industry
16 pages
Research Questionnaire Direction: Kindly Put A Check Mark or Fill Out The Space Provided Before That Corresponds
No ratings yet
Research Questionnaire Direction: Kindly Put A Check Mark or Fill Out The Space Provided Before That Corresponds
4 pages
Zindi Financial Inclusion Guide
No ratings yet
Zindi Financial Inclusion Guide
12 pages
Lesson 1&2
No ratings yet
Lesson 1&2
6 pages
Classification - Bank - Marketing - Dataset - Jupyter Notebook
No ratings yet
Classification - Bank - Marketing - Dataset - Jupyter Notebook
23 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Newsflash December 2012 FINAL
No ratings yet
Newsflash December 2012 FINAL
60 pages
Bank Marketing Ingles
No ratings yet
Bank Marketing Ingles
37 pages
MDSK Atc Revit MEP Essentials
No ratings yet
MDSK Atc Revit MEP Essentials
2 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Practical 3
No ratings yet
Practical 3
8 pages
Complete Name: End of The Program Evaluation
No ratings yet
Complete Name: End of The Program Evaluation
4 pages
Final Synthesis Project
No ratings yet
Final Synthesis Project
8 pages
Unit of Work
No ratings yet
Unit of Work
23 pages
ESC 17 - Module 2 Quiz
No ratings yet
ESC 17 - Module 2 Quiz
2 pages
ALS A E Reviewer Mock Test LS4 Life Skills
No ratings yet
ALS A E Reviewer Mock Test LS4 Life Skills
31 pages
Group 1 5b Report
No ratings yet
Group 1 5b Report
10 pages
Vadnana Luthra Orignal
No ratings yet
Vadnana Luthra Orignal
11 pages
Shadowing Technique Boosts Pronunciation
No ratings yet
Shadowing Technique Boosts Pronunciation
20 pages
Arithmatic Circuit
No ratings yet
Arithmatic Circuit
7 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Intermediate Relational Database Certificate
No ratings yet
Intermediate Relational Database Certificate
1 page
DSP Final Exam FS 2022
No ratings yet
DSP Final Exam FS 2022
1 page
School Fee Structure 2024
No ratings yet
School Fee Structure 2024
1 page
Cognitive Learning Strategies Guide
No ratings yet
Cognitive Learning Strategies Guide
1 page
Concept Note - HEALTH WORKERS' DAY CELEBRATION HONORING DEDICATION AND ENHANCING WELL-BEING
No ratings yet
Concept Note - HEALTH WORKERS' DAY CELEBRATION HONORING DEDICATION AND ENHANCING WELL-BEING
4 pages
Exam Results
No ratings yet
Exam Results
2 pages
Entry-Level Sales Hiring Platform
No ratings yet
Entry-Level Sales Hiring Platform
7 pages
Racism in The Ugly Duckling
No ratings yet
Racism in The Ugly Duckling
4 pages
Matrix On Strategic Plan For Customs Development (SPCD)
No ratings yet
Matrix On Strategic Plan For Customs Development (SPCD)
4 pages
CRM for Retail Efficiency
No ratings yet
CRM for Retail Efficiency
80 pages
Hjorland (2002) Análisis de Dominio en La CI. 11 Enfoques
No ratings yet
Hjorland (2002) Análisis de Dominio en La CI. 11 Enfoques
13 pages
Alumni Admissions Essays
No ratings yet
Alumni Admissions Essays
10 pages