import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv(‘telecom churn.csv')
data
7038
7039
7040
7041
7042
Customer
ID
002-
ORFBO
003.
MIKNFE
004-
TLL
0011-
IGKFF
0013,
EXCHZ
9987.
wuivo
9992
2RAMN
9992-
JOEL
9993-
“HIEB
9995
HOTOH
Gender Age Married
Female
Male
Male
Male
Female
Female
Male
Male
Male
Male
7043 rows x 38 columns
37
46
50
78
20
40
22
21
36
Yes
No
No
Yes
Yes
No
No
Yes
— EEE
data.head(10)
Number of
Dependents
o
o
°
City
Frazier
Park
Glendale
Costa
Mesa
Martinez
Camarille
la Mesa
Riverbank.
Elk
Solana
Beach
Sierra
City
Zip
Code
93225
91206
92627
94553
93010
9194"
95367
95432
92075
96125
Latitude
34,827662
34,162515
33645672
38.014457
34,227846
32759327
37.734971
39,108252
33.0181
39,600598
Longitude
-118999073
-118 203869
“117922613
-122.115432
-119.079903
-116997260
-120.95427"
-123,64512"
-117.263628
-120,636358Customer
ID
002-
ORFBO
0003-
MIKNFE
o004-
TLHL
oo11-
IGKFF
0013-
EXCHZ
0013
MHZWE
0013-
SMEOE
0014.
aMAQL
0015-
yoco
0016.
quis
Gender Age Married
Female
Male
Male
Male
Female
Female
Female
Male
Female
Female
10 rows x 38 columns
37
46
50
Yes
Ne
Ne
Yes
Yes
Ne
Yes
Yes
Ne
Yes
Number of
Dependents
0
0
0
0
—EEEEEEEE
data.tail(10)
City
Frazier
Park
Glendale
Costa
Mesa
Martinez
Camarillo
Midpines
Lompoc
Napa
Simi
Valley
Sheridan
Zip
code
93225
91206
92627
94552
93010
95345
93437
94558
93062
95681
Latitude
34,827662
34,162515
33645672
38.014457
34,227846
37.581496
34757477
38.489789
34,296813
38.984756
Longitude
118999073
-118.203869
“117922613
122.115432
-119,079903
-119.972762
120550507
122270110
-118,685703
-121.345074
Num7033
7034
7035
7036
7037
7038
7039
7040
7041
7042
Customer
ID
9975-
SKRNR
9978-
HYCIN
9979
RGMZT
9985-
wwix
9986-
BONCE
9987.
wuTyD
9992-
2RAMN
9992-
JOEL
9993
“HIEB
9995
HOTOH
Gender
Male
Male
Female
Female
Female
Female
Male
Male
Male
Male
10 rows x 38 columns
‘Age Married
24 No
mR Yes
20 No
53 No
36 No
20 No
40 Yes.
22 No
2 Yes
36 Yes
—EEEEEEE
Find the shape of dataset
fn data.shape
(7043, 38)
Number of city
Dependents
© Sierraville
1 Bakersfield
Angeles
© due
© Fallbrook
la Mesa
© Riverbank
o elk
ome
Sierra City
1 print("Nunber of Rows" data.shape[@})
print(*Nunber of Columns" data. shape[1])
Number of Rows 7042
Number of Columns 38
fn data. info()
Zip
code
96126
93301
0022
93628
92028
91941
98367
95432
92075
96125
Latitude
39559708
35,383937
34,023810
36807595
33,362575
32,759327
37,734971
39,108252
3.001813
39,600599
Longitude
120345639
-119.020428
118156582
-118.901544
117299644
-116.997260
12098427"
-123,64512"
117.263628
-120.636358
RangeIndex: 7843 entries, @ to 7042
Data columns (total 38 columns):
# Column’ Non-Null Count type
@ Customer 10 7043 non-null object
1 Gender 7043 non-null object
2 Age 7043 non-null int64
3 Married 7043 non-null object
4 Number of Dependents 7e43 non-null int64
5 City 7043 non-null object
6 Zip Code 7043 non-null —int6a
7 Latitude 7@43 non-null floate4
8 Longitude 7@43 non-null floated
9 Number of Referrals 7043 non-null inte4
10 Tenure in Months 7043 non-null int6a
11 offer 7843 non-null object
12 Phone Service 7843 non-null object
13 Avg Monthly Long Distance Charges 6361 non-null floated
14 Multiple Lines 6361 non-null object
45. Internet Service 7043 non-null object,
16 Internet Type 5517 non-null object
17 Avg Monthly GB Download 5517 non-null floate4
48 Online Security 5517 non-null object
19 Online Backup 5517 non-null object
20 Device Protection Plan 5517 non-null object
21 Premium Tech Support 5517 non-null object
22. Streaming TV 5517 non-null object
23. Streaming Movies 5517 non-null object
24 Streaming Music 5517 non-null object
25. Unlimited Data 5517 non-null object
26 Contract 7843 non-null object
27. Paperless Billing 7843 non-null object
28 Payment Method 7043 non-null object
29. Monthly Charge 7043 non-null float64
30 Total Charges 7843 non-null floate4
31 Total Refunds 7043 non-null floate4
32. Total Extra Data Charges 7043 non-null inte
33 Total Long Distance Charges 7043 non-null float6a
34 Total Revenue 7@43 non-null floate4
35. Customer Status 7043 non-null object
36 Churn Category 1869 non-null object
37. Churn Reason 1869 non-null object
dtypes: Floate4(9), intea(6), object(23)
memory usage: 2.0+ ME
data.colunns
Index(['Customer 10", ‘Gender’, ‘Age’, ‘Married’, ‘Number of Dependents’,
‘City’, ‘Zip Code’, ‘Latitude’, ‘Longitude’, ‘Number of Referrals’,
“Tenure in Months’, ‘Offer’, "Phone Service’,
‘avg Monthly Long Distance Charges’, ‘Multiple Lines’,
"Internet Service’, ‘Internet Type’, ‘Avg Monthly GB Download’,
‘Online Security’, ‘Online Backup’, ‘Device Protection Plan’,
‘Premium Tech Support’, ‘Streaming TV’, ‘Streaming Movies’,
"Streaming Music’, ‘Unlimited Data’, ‘Contract’, ‘Paperless Billing",
"Payment Method", ‘Monthly Charge’, ‘Total Charges’, ‘Total Refunds",
‘Total Extra Data Charges’, ‘Total Long Distance Charges",
Total Revenue’, ‘Customer Status’, ‘Churn Category’, ‘Churn Reason" ],
dtype=' object}
Descriptive Statisticsprint (data.describe())
count.
mean
std
min
25%
50%
75%
max
count
mean
std
min
25%
50%
75%
max
count.
mean
std
min
25%
50%
75%
max
count
mean
std
min
25%
50%
75%
max
count.
mean
std
min
25%
50%
75%
max
Age Number of Dependents
Total Refunds
Zip Code
7e43.000000
468692 93486.070567
1856. 767505
980000 ©90001.000000
020008 92101.000000
020088 93518,000000
980000 © 95329,000000
000000 96150.000000
7043 .000
32.386
24.542
1.000
9.000
29.000
55.000
72.000
962182
-902614
000000
000000
200000
000000
Total Revenue
220000
379056
204542
360000
610080
642000
145000
7043000000 [email protected]
46.509726 @
16.750352 962802
19.000008 e
32.000000 e
46.000000 6
60.000000 e.
80.000000 9
Longitude Number of Referrals
7043.00000 7043.000000
-119.756684 1.951867
2.154425 3.001199
-124.301372 0.200000
-121.788098 0.000000
-119.595293 2.000000
-117.969795 3.000000
-114.192901 11.080000
‘Avg Monthly Long Distance Charges
6361000000
25.420517
14200374
1.010008
13.050000
25.690008
37680000
49.990000
Nonthly Charge Total Charges
7043.000000 © 7043.000008 ©7843. 000000
63.596131 2280..381264
31.204743 — 2266.226462
-10.000008 18.800000
30.400000 —_400.150000
[email protected] —1394.556000
89.750008 3786.600000
118.750000 8684.800000 © 49.790800
Total Long Distance Charges
7e43.000000 7043.
749,099262 3034.
846.660055 2865.
@.00000@ = 21.
70.545008 605.
401.440000 2108.
1191.160000 4801.
3564.720000 11979.
data-isnull()-sum() #check null values
340000
Latitude \
7043.000000
36.197455
2.468929
32.555828
3.990646
36.205465
38.161321
41.962127
Tenure in Months \
220
767
261,
1200
200
1200
1200
1200
5517 000000
‘Avg Monthly G8 Download \
26.189958
19. 586585
2.000000
13.000000
21.000000
30.000000
85.000000
Total Extra Data Charges
[email protected]
6.860713
25.104978
0.200000
2.200000
2.200000
2.200000
150.000000
\Customer ID
Gender
Age
Narried
Number of Dependents
city
Zip Code
Latitude
Longitude
Number of Referrals
Tenure in Months
Offer
Phone Service
Avg Monthly Long Distance Charges
Multiple Lines
Internet Service
Internet Type
Avg Monthly GB Download
Online Security
Online Backup
Device Protection Plan
Premium Tech Support
Streaming TV
Streaming Movies
Streaming Music
Unlimited pata
Contract
Paperless Billing
Payment Method
Nonthly Charge
Total Charges
Total Refunds
Total Extra Data Charges
Total Long Distance Charges
Total Revenue
Customer Status
Churn Category
Churn Reason
dtype: intes
sns.heatmap(data.isnul1())
wheatmap visual for null values
data[ ‘Churn Category’ ].isnul1(
punne
7038
7039
7040
7041
7042
Name
-10
-08
06
04
oz
hoo
Zip Code
Longitude
“Enure in Months
ine Security
Multiple Lines
Device Protect
‘Streaming TV
Payment Method
“otal Charges
“Btal Extra Data Charges
“etal Revenue
Churn Category
Number of Dependents
True
True
True
False
False
False
True
False
True
True
True
Churn Category, Length: 7843, dtype: bool
‘#Renove missing values
data. dropna(ho
"any", inplact
True)
#Renove duplicate
dup=data.duplicated() .any()
data. describe (includ
all’)count
unique
top
freq,
‘mean
std
min
25%
50%
75%
max
Customer
Ip Sender
1586 1586
1586 2
9004- ,
Tieu Female
1 803
NaN NaN
NeN NaN
NeN NaN
NeN NaN
NeN NaN
NaN NaN
NaN NaN
11 rows x 38 columns
Age
1586,000000
NaN
Nat
Naty
50165826
17.684679
119.000000
35,000000
50,000000
65.750000
80,000000
—EEEEEEEEEE
data[ ‘Customer ID'].unique()
array(['0004-TLHLJ', "@011-IGKFF', *013-EXCHZ", ...
Married
1586
2
Ne
1007
NaN
NaN.
NaN’
NaN.
NaN.
NaN.
NaN
Number of
Dependents
1586,000000
NaN
NaN
Naty
0.092686
0.455859
0.000000
0.000000
0.000000
0.000000
5.000000
“9985-MAVIX', '9992-RRAMN'], dtype=object)
data[ ‘Customer Status’ ].value_counts()
Churned 1586
Name: Customer Status, dtype: int64
City
1586
598,
San
Diego
161
NaN
NaN
NaN
NaN
NaN
NaN
NaN
Zip Code
1586,000000
Nan’
NaN
NaN
93433361917
1830004432
90001.000000
92117,000000
93278.500000
95309.000000
96150.000000
» '9965-YOKZB"
Latitude
11586.000000
NaN.
Nan.
NaN’
36,009995
2.509295
32555828
33874789
35,363810
38,066743
41962127
data[['Tenure in Months", ‘Monthly Charge’, ‘Total Revenue’]].describe().T
count mean std
Tenure in
Monthe 1586C 18902900 19:759294
Monthly
Charge 15660 81.108670 19814937
Total Revenue 1586.0 2221295648 2543.063161
plt. Figure(Figsize=(6, 6))
data[‘Tenure in Months'].hist()
min 25%
toc 3.00
9.00 71.00
50% 75%
11.000 31,0000
83.925 95,3500
4692 26028 1142.35 3414.0875
max
72.00
11835
11195.44In [22]:
out[22]
In [23]
out[23]:
plt. figure(figsize=(6, 6))
data[ ‘Monthly Charge’ ].hist()
data[ ‘Total Revenue’ ]-hist()
8 8 $ 8 $ 8 8 8
2000 © 4000» 6000-8000 10000)
UNIVARIATE ANALYSIS
Here, in this analysis will observe the influence of various features on the effect of churning
In [24]
out [24]:
In [58]:
data[ ‘Customer Status’ ].describe()
fdistribution of customer status
count. 1586
unique 1
top Churnec
freq 1586
Name: Customer Status, dtype: object
data[‘Churn Category" ]-hist() #histogram chart for Churn Category
828888 8
0
Competitor Dissatisfaction ther ‘Attitude Price
Customer Churn on basis of gender
plt. figure(figsize=(7,7))
sns.countplot(x="Custoner Status’ ,hue='Gender’ ,data=data)
pit. show()800
700
600
200
100
‘Chumed
‘Customer Status
from the above distribution, it can be seen that their is a slight difference between male and female customers
for churning
In [68]: plt. figure (figsize=(10,7))
sns.countplot (x=' Customer Status’ ,hue='Churn Category’ ,data=data)
ut[ee): 10000
‘5000
6000
“btal Revenue
4000
2000
Monthly Charge
# Sample data for Online Backup and Online Payment
categories = ['Online Backup’, ‘Online Payment]
values = [7, 30] # You can adjust these values based on your data
plt.pie(values, labels=categories, autopct='%1.1f%%', startangle=98, colors=['skyb!
plt.axis( equal’)
plt.title(‘Usage Distribution: Online Backup vs. Online Payment" )
plt.show()
Usage Distribution: Online Backup vs. Online Payment
Online Backup
corr_natrix = data.corr()
‘# Plotting a heatmap of the correlation matrix
pit. figure(figsize=(12, 1€))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm', fmt=".2f', linewidths=0.5)
plt.title(‘Correlation matrix’)
pit. show()