CRISP-DM framework
The CRISP-DM (Cross-
Industry Standard Process
for Data Mining) framework
is a widely used
methodology for guiding
analytics and data mining
projects. Its structured
approach involves six
phases: Business
Understanding, Data
Understanding, Data
Preparation, Modeling,
Evaluation, and
Deployment.
Business understanding Data preparation
Optimize prices to boost revenue Time-consuming
Segment customers to tailor product offers Data modeling
Pinpoint failure in our supply chain The model makes calculations or
predictions
Data Understanding
Evaluation
What data do we have
Deployment
What data do we need
Availability, Quality, Frequency
When applying CRISP-DM to teaching business analytics, educators can structure their
curriculum around these phases to provide students with a comprehensive understanding of
the analytics process. This approach allows students to gain hands-on experience with real-
world datasets, develop critical thinking skills, and understand the iterative nature of
analytics projects.
By incorporating CRISP-DM into their teaching, educators can prepare students to
effectively tackle complex business problems using data-driven approaches, equipping them
with valuable skills for today's data-driven business environment. Additionally, students
learn how to communicate their findings and insights effectively, which is crucial for driving
informed decision-making within organizations.
Data mining and machine learning projects require a structured approach to ensure that the
process is thorough and the results are relevant and actionable. The CRISP-DM methodology,
or Cross-Industry Standard Process for Data Mining, is a widely used framework that
provides a step-by-step guide to data mining and machine learning projects. By following the
CRISP-DM process, data scientists and analysts can ensure that they are addressing the
business problem effectively and producing models that provide accurate insights.
In this blog post, we will explore the six phases of the CRISP-DM process in detail and
provide tips and insights on approaching each phase effectively. We will cover the initial
business understanding phase, data understanding, data preparation, modeling, evaluation,
and deployment. By the end of this post, you will have a thorough understanding of the
CRISP-DM methodology and how to use it to drive data mining and machine learning
projects forward. Whether you are new to data mining or an experienced data scientist, this
post will provide valuable insights and practical tips for success.
At the following topics you will be
able to identify each of these topics
in detail:
1. Business
Understanding: This
phase is all about
understanding the
business problem and
identifying how data
mining can help solve
it. Examples of topics
to cover in this phase
include:
Identifying the
business objectives
and goals.
Defining the scope of
the project.
Identifying the key stakeholders and their needs.
Determining the success criteria for the project.
2. Data Understanding: In this phase, data sources are identified, and data is collected and
explored. The goal is to gain a deeper understanding of the data and identify any data quality
issues. Topics to cover in this phase include:
Identifying the relevant data sources.
Collecting and exploring the data.
Assessing the quality of the data.
Identifying any missing data or outliers.
3. Data Preparation: In this phase, data is cleaned, transformed, and formatted to be used in
modeling. This includes selecting relevant features and dealing with missing data, outliers,
and other issues. Topics to cover in this phase include:
Cleaning and transforming the data.
Handling missing data and outliers.
Selecting relevant features.
Normalizing or standardizing the data.
4. Modeling: In this phase, various modeling techniques are applied to the prepared data.
This includes selecting appropriate algorithms and building, testing, and validating models.
Topics to cover in this phase include:
Selecting appropriate algorithms.
Building and testing models.
Validating models to ensure they meet the success criteria.
Iterating on the models to improve their accuracy.
5. Evaluation: In this phase, the models are evaluated and compared to determine which
model is best suited for the business problem. The evaluation criteria are based on the
business objectives. Topics to cover in this phase include:
Evaluating the models against the success criteria.
Comparing different models to determine which is best suited for the problem.
Analyzing the results of the models to gain insights.
6. Deployment: In this final phase, the best model is deployed into production, and the
project results are communicated to stakeholders. This includes documenting the process and
providing recommendations for ongoing monitoring and improvement. Topics to cover in this
phase include:
Deploying the model into production.
Communicating the project results to stakeholders.
Documenting the process for future reference.
Providing recommendations for ongoing monitoring and improvement.
A good example its regarding a use case where you want to predict customer churn for a
telecom company.
In the Business Understanding phase, you would identify the business objectives, such as
reducing customer churn and increasing customer satisfaction.
In the Data Understanding phase, you would collect and explore data on
customer behavior, such as call and data usage, and assess the quality of the
data.
In the Data Preparation phase, you would clean and transform the data, and
select relevant features such as call duration and data usage.
In the Modeling phase, you would build and test models to predict customer
churn, using techniques such as logistic regression or decision trees.
In the Evaluation phase, you would evaluate the models against the success
criteria, and compare different models to determine which is best suited for the
problem.
Finally, in the Deployment phase, you would deploy the best model into
production and communicate the results to stakeholders.