Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views7 pages

Module 3.2

The document discusses Exploratory Data Analysis (EDA) and its importance for data scientists in understanding datasets, particularly in analyzing customer churn in a subscription-based business. It outlines objectives for analyzing a simulated dataset of 200 customers, including calculating statistics, identifying patterns, and drawing conclusions about customer behavior. The session emphasizes the significance of customer retention and the factors influencing churn, such as satisfaction scores and payment issues.

Uploaded by

Squall Lionheart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Module 3.2

The document discusses Exploratory Data Analysis (EDA) and its importance for data scientists in understanding datasets, particularly in analyzing customer churn in a subscription-based business. It outlines objectives for analyzing a simulated dataset of 200 customers, including calculating statistics, identifying patterns, and drawing conclusions about customer behavior. The session emphasizes the significance of customer retention and the factors influencing churn, such as satisfaction scores and payment issues.

Uploaded by

Squall Lionheart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

3.2 E.D.

A: DIGGING INTO DATASETS

Danalyze
ata scientists utilize Exploratory Data Analysis (EDA) to
and examine data sets, as well as describe their key
properties, and they frequently use data visualization
techniques. It assists data scientists in determining how to
best modify data sources to obtain the answers they require,
making it easier for them to detect patterns, identify
anomalies, test a hypothesis, or con rm assumptions. And in
this session, we’ll get to understand it better through hands-
on learning through a simulated but close to real life dataset
as presented below.

A dedicated customer base says a lot about a company or


a service. Additionally, not only shows that their customers
are satis ed with their products or services, but also that they
put a lot of e ort into building a relationship with them. Loyal
clients are more likely to stick with a business, recommend it
to their friends and coworkers, and choose it over
competitors.

It is much easier to keep existing consumers than to nd


new ones. It indicates that as a business must focus more on
preserving their relationships with current customers than on
attempting to draw in new ones. This is also the reason it is
very important to know customer churn rate.

Customer Churn is the rate at which a business loses


customers on a given amount of time. A high customer
turnover rate indicates that many of the customers have lost
interest in buying the products or services for a variety of
reasons, which may indicate that there are problems with the
business.

HowTo : DataMine&AnalyzeWithR Page 114 of 131 EBBertulfo


fi
ff
fi
fi
In this session, you will examine a simple dataset and
determine why some customers abandon a service. The
provided dataset is designed to model customer behavior in
a subscription-based business. You will be taking a closer
look at the dataset to nd possible reasons why some
customers stayed and others left the service being o ered.
With this module, you will be able to execute basic data
analysis steps and hopefully master some key data
exploration approaches.

Objectives:
• Understand the structure of a dataset and explore its
contents.
• Calculate basic statistics (like averages) to summarize data.
• Use simple visualizations to identify patterns.
• Draw basic conclusions about customer behavior based on
their analysis.

The Dataset:
Let’s start rst by understanding the dataset that we are
going to use in this learning session. When a customer
"churns," it means they have stopped using a product or
service. This occurrence typically poses a lot of questions
from companies that values their customers and most often
then not, want to know why customers churn so they can
address these reasons and improve customer retention.

We will create a sample dataset that represents a close to


real world situation and set a population sample of 200
customers. The dataset will also include customerID [1-200],
average monthlyUsage [in hours], satisfactionScore [1-10],
subscriptionLength [in months], paymentStatus [0-1] where 0
when a customer missed a subscription payment and one
means that the account is active and churn [0-1] where 0
HowTo : DataMine&AnalyzeWithR Page 115 of 131 EBBertulfo
fi
fi
ff
means the customer is still actively using the service and 1 is
when the customer quit using the service. So, to create this
simulated dataset, let’s run these R scripts below:

# Load the dataset


set.seed(123) # let’s make sure that this dataset can be reproduced

# Let’s just start with the Vector Data


customerID = 1:200
monthlyUsage = round(rnorm(200, mean = 30, sd = 10), 1)
satisfactionScore = sample(1:10, 200, replace = TRUE)
subscriptionLength = sample(1:24, 200, replace = TRUE)
paymentStatus = sample(c(0, 1), 200, replace = TRUE, prob = c(0.2, 0.8))
churn = sample(c(0, 1), 200, replace = TRUE, prob = c(0.7, 0.3))

# Once the Vectors are all set, let’s create the dataset using data.frame()
customerData <- data.frame(customerID, monthlyUsage, satisfactionScore,
subscriptionLength, paymentStatus, churn)

# Let’s test is the customerData has been created successfully


head(customerData)

Now that we have a working dataset, you can describe it


rst through these guide questions. Try to use some R scripts
to the following statements below. Doing this before starting
any deeper data analysis will allow you to understand the
dataset better which will then lead you a much more
comprehensive description of the data. And if you do this
process as a habit, it will allow you to e ciently work on your
R Scripting skills and Data Analysis.

• How many customers are represented in this dataset?

• What information is available for each customer?

HowTo : DataMine&AnalyzeWithR Page 116 of 131 EBBertulfo


fi
ffi
• Are there any missing values or unusual entries in each
column? (For example, negative values or zeros in
columns where they don’t make sense.)

• What are the minimum, maximum, and range of


subscription lengths in the dataset?

• What does this tell you about the customer base?

• What percentage of customers have missed a payment


versus those who haven’t?

• What does this distribution tell you about payment


reliability?

Once you get the hang of it, let’s dig deeper into the
dataset and determine the following:

1. Calculate average values for monthly usage and


satisfaction score to understand general customer
behavior.

# Calculate average monthly usage


avgUsage <- mean(customerData$monthlyUsage)

# Calculate average satisfaction score


avgSatisfaction <- mean(customerData$satisfactionScore)

cat("Average Monthly Usage:", avgUsage, "\n")


cat("Average Satisfaction Score:", avgSatisfaction, "\n")

• What is the average usage time per month?

• Are customers generally satis ed (based on the average


satisfaction score)?
HowTo : DataMine&AnalyzeWithR Page 117 of 131 EBBertulfo
fi
2. See if customers who churn (leave the service) have lower
satisfaction scores or lower monthly usage than those
who stay.

# Average usage and satisfaction for customers who churned vs. stayed
avgUsageChrnd <- mean(customerData$monthlyUsage[customerData$churn == 1])
avgUsageRtnd <- mean(customerData$monthlyUsage[customerData$churn == 0])
avgSatChrnd <- mean(customerData$satisfactionScore[customerData$churn == 1])
avgSatRtnd <- mean(customerData$satisfactionScore[customerData$churn == 0])

# Display results
cat("Average Monthly Usage (Churned):", avgUsageChrnd, "\n")
cat("Average Monthly Usage (Retained):", avgUsageRtnd, "\n")
cat("Average Satisfaction Score (Churned):", avgSatChrnd, "\n")
cat("Average Satisfaction Score (Retained):", avgSatRtnd, "\n")

• Do customers who churn tend to have lower satisfaction


scores than those who stay?

• Is monthly usage lower for customers who churn?

3. See if there is a connection between payment issues and


customer churn. Use simple plots to see how monthly
usage and satisfaction score vary by churn status.

# Calculate the percentage of customers with payment issues


# (0 = missed payment) by churn status
paymentIssuesChrnd <-
mean(customerData$paymentStatus[customerData$Churn == 1] == 0)

paymentIssuesRtnd <-
mean(customerData$paymentStatus[customerData$Churn == 0] == 0)

HowTo : DataMine&AnalyzeWithR Page 118 of 131 EBBertulfo


# Display results
cat("Percentage of Payment Issues (Churned):", paymentIssuesChrnd * 100, "%\n")
cat("Percentage of Payment Issues (Retained):", paymentIssuesRtnd * 100, "%\n")

• Are customers who leave the service more likely to have


payment issues?

• Does it seem that missed payments contribute to


customer churn?

4. Use simple plots to see how monthly usage and


satisfaction score vary by churn status.

# Boxplot for Monthly Usage by Churn Status


boxplot(customerData$monthlyUsage ~ customerData$churn,
main = "Monthly Usage by Churn Status",
xlab = "Churn Status (0 = Retained, 1 = Churned)",
ylab = "Monthly Usage (hours)",
col = c("lightblue", "lightpink"))

# Boxplot for Satisfaction Score by Churn Status


boxplot(customerData$satisfactionScore ~ customerSata$churn,
main = "Satisfaction Score by Churn Status",
xlab = "Churn Status (0 = Retained, 1 = Churned)",
ylab = "Satisfaction Score",
col = c("lightgreen", "lightcoral"))

5. Summarize your ndings to answer the main question:


Why do some customers churn?

Based on your analysis, write a brief summary of the


factors that appear to in uence customer churn. Consider
these questions:

HowTo : DataMine&AnalyzeWithR Page 119 of 131 EBBertulfo


fi
fl
• Do customers with lower usage and satisfaction scores
churn more often?

• Are payment issues more common among customers


who churn?

This marks the end of this module’s learning objectives,


but please keep your data and make sure to save it as we will
use it to prepare a document to better present our ndings.

HowTo : DataMine&AnalyzeWithR Page 120 of 131 EBBertulfo


fi

You might also like