0% found this document useful (0 votes)

11 views21 pages

Lecture 7 & 8 Data Mining

Data mining is the process of discovering patterns and correlations in large data sets to predict outcomes, which can help businesses increase revenues and improve customer relationships. It has various applications across industries such as retail, banking, and healthcare, utilizing methods like CRISP-DM and SEMMA for systematic project execution. Common myths and mistakes in data mining include misconceptions about its capabilities and the importance of thorough data preparation.

Uploaded by

Adarsh Singhania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views21 pages

Lecture 7 & 8 Data Mining

Uploaded by

Adarsh Singhania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Data mining

We will cover
• Data mining concepts and applications
• Process
• Methods
• data mining myths and blunders
Meaning
• The process of digging through data to discover
hidden connections and predict future trends

• Data mining is the process of finding patterns

and correlations within large data sets to
predict outcomes. Using a broad range of
techniques, business can use this information
to increase revenues, cut costs, improve
customer relationships, reduce risks and more.
Example of Data Mining
• Grocery stores are well-known users of data
mining techniques. Many supermarkets offer
free loyalty cards to customers that give them
access to reduced prices not available to non-
members. The cards make it easy for stores to
track who is buying what, when they are buying it
and at what price. After analyzing the data, stores
can then use this data to offer customers coupons
targeted to their buying habits and decide when to
put items on sale or when to sell them at full price.
Applications
Data Mining Applications
• Customer Relationship Management
– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers

• Banking and Other Financial

– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting
Data Mining Applications (cont.)
• Retailing and Logistics
– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life

• Manufacturing and Maintenance

– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize the
use manufacturing capacity
– Discover novel patterns to improve product quality
Data Mining Applications
• Brokerage and Securities Trading
– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading

• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities
Data Mining Applications (cont.)
• Computer hardware and software
• Science and engineering
• Government and defense
• Homeland security and law enforcement
• Travel industry
• Healthcare Highly popular
• Medicine application areas for
• data mining
Entertainment industry
• Sports
• Etc.
Process
Data Mining Process
• A manifestation of best practices
• A systematic way to conduct DM projects
• Different groups has different versions
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process for
Data Mining)
– SEMMA (Sample, Explore, Modify, Model, and
Assess)
– KDD (Knowledge Discovery in Databases)
Data Mining Process: CRISP-DM

1 2
Business Data
Understanding Understanding

3
Data
Preparation
Data Sources
6
4
Deployment
Model
Building

5
Testing and
Evaluation
Data Mining Process: CRISP-DM
Step 1: Business Understanding Accounts for
~85% of
Step 2: Data Understanding total project
Step 3: Data Preparation (!) time

Step 4: Model Building

Step 5: Testing and Evaluation
Step 6: Deployment
• The process is highly repetitive and
experimental (DM: art versus science?)
Data Preparation – A Critical DM Task
Real-world
Data

· Collect data
Data Consolidation · Select data
· Integrate data

· Impute missing values

Data Cleaning · Reduce noise in data
· Eliminate inconsistencies

· Normalize data
Data Transformation · Discretize/aggregate data
· Construct new attributes

· Reduce number of variables

Data Reduction · Reduce number of cases
· Balance skewed data

Well-formed
Data
Data Mining Process: SEMMA
Sample
(Generate a representative
sample of the data)

Assess Explore
(Evaluate the accuracy and (Visualization and basic
usefulness of the models) description of the data)

SEMMA

Model Modify
(Use variety of statistical and (Select variables, transform
machine learning models ) variable representations)
 Sample: Generate a representative sample of the data

 Explore: Visualization and basic description of the data

 Modify: Select variables, transform variable representations

 Model: Use variety of statistical and machine learning

models

 Assess: Evaluate the accuracy and usefulness of the models

Data Mining Myths and Blunders
Data Mining Myths
• Data mining …
– provides instant solutions/predictions
– is not yet viable for business applications
– requires a separate, dedicated database
– can only be done by those with advanced degrees
– is only for large firms that have lots of customer
data
– is another name for the good-old statistics
Common Data Mining Mistakes
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data mining is
and what it really can/cannot do
3. Not leaving insufficient time for data acquisition,
selection and preparation
4. Looking only at aggregated results and not at
individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results
Common Data Mining Mistakes
6. Ignoring suspicious (good or bad) findings and
quickly moving on
7. Running mining algorithms repeatedly and blindly,
without thinking about the next stage
8. Naively believing everything you are told about
the data
9. Naively believing everything you are told about
your own data mining analysis
10. Measuring your results differently from the way
your sponsor measures them

Benchmarking Sox Costs, Hours and Controls
No ratings yet
Benchmarking Sox Costs, Hours and Controls
45 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
PN325 PDS
No ratings yet
PN325 PDS
4 pages
Data Mining
No ratings yet
Data Mining
20 pages
UNIT 5 Introduction To Data Mining-1
No ratings yet
UNIT 5 Introduction To Data Mining-1
185 pages
330776187-Turban-Dss9e-Ch05 Part 2
No ratings yet
330776187-Turban-Dss9e-Ch05 Part 2
11 pages
Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
Data Mining Overview: by Dr. Sunil D. Lakdawala
No ratings yet
Data Mining Overview: by Dr. Sunil D. Lakdawala
52 pages
DataMining Overview
No ratings yet
DataMining Overview
52 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Chapter 6 - Data Mining
No ratings yet
Chapter 6 - Data Mining
62 pages
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
100% (1)
Example J.6 Base Plate Bearing On Concrete: Merican Nstitute of Teel Onstruction
4 pages
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
DSS Lec.8
No ratings yet
DSS Lec.8
22 pages
Data Mining and IBM SPSS Modeler
No ratings yet
Data Mining and IBM SPSS Modeler
20 pages
1 - DM
No ratings yet
1 - DM
5 pages
Handout 2 Data Mining
No ratings yet
Handout 2 Data Mining
16 pages
Screenshot 2024-06-04 at 12.01.00 AM
No ratings yet
Screenshot 2024-06-04 at 12.01.00 AM
45 pages
Chapter 5 - Data Mining
No ratings yet
Chapter 5 - Data Mining
29 pages
Chapter 4 SR2023
No ratings yet
Chapter 4 SR2023
58 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
What Is Data Mining
No ratings yet
What Is Data Mining
1 page
Data Mining - Bi 3
No ratings yet
Data Mining - Bi 3
40 pages
Screenshot 2024-06-04 at 12.00.45 AM
No ratings yet
Screenshot 2024-06-04 at 12.00.45 AM
45 pages
Screenshot 2024-06-04 at 12.07.18 AM
No ratings yet
Screenshot 2024-06-04 at 12.07.18 AM
45 pages
Screenshot 2024-06-03 at 11.59.21 PM
No ratings yet
Screenshot 2024-06-03 at 11.59.21 PM
45 pages
DSS Chapter 5
No ratings yet
DSS Chapter 5
9 pages
Data Mining
No ratings yet
Data Mining
41 pages
Lecture 1 & 2 - Introduction To Data Mining2
No ratings yet
Lecture 1 & 2 - Introduction To Data Mining2
19 pages
Data Mining - Intro
No ratings yet
Data Mining - Intro
17 pages
Data Mining
No ratings yet
Data Mining
3 pages
CH 5
No ratings yet
CH 5
4 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
Management by Walking Around
100% (2)
Management by Walking Around
7 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
10 pages
POLITICAL SCIENCE Most Important Questions (Prashant Kirad) PDF Political Parties Elections
No ratings yet
POLITICAL SCIENCE Most Important Questions (Prashant Kirad) PDF Political Parties Elections
1 page
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
PredictiveAnalysis U1 U2
No ratings yet
PredictiveAnalysis U1 U2
7 pages
Data Mining Summary
No ratings yet
Data Mining Summary
2 pages
Data Mining
No ratings yet
Data Mining
21 pages
Cost Concepts Quiz
No ratings yet
Cost Concepts Quiz
11 pages
Data Mining
No ratings yet
Data Mining
15 pages
AI Data Mining - Applications and Insights
No ratings yet
AI Data Mining - Applications and Insights
7 pages
Grade 9 - English All Unit 3 and Moments #3
No ratings yet
Grade 9 - English All Unit 3 and Moments #3
5 pages
Lecture 6 Compress
No ratings yet
Lecture 6 Compress
9 pages
Data Mining
No ratings yet
Data Mining
6 pages
Data Mining
No ratings yet
Data Mining
30 pages
Data Mining
No ratings yet
Data Mining
12 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
Mobiltech Presentation
100% (1)
Mobiltech Presentation
27 pages
Data Mining Concepts & Applications
100% (1)
Data Mining Concepts & Applications
121 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Controledge Hc900 Io Modules Specifications: 51-52-03-41, November 2019
No ratings yet
Controledge Hc900 Io Modules Specifications: 51-52-03-41, November 2019
35 pages
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
No ratings yet
Solving Routine and Non-Routine Problems Involving Money and Whole Numbers
25 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
6 pages
37 Hounds of Low Tide PDF
No ratings yet
37 Hounds of Low Tide PDF
2 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Phones 2017 PDF
No ratings yet
Phones 2017 PDF
161 pages
Hand, Foot and Mouth Disease (HFMD)
No ratings yet
Hand, Foot and Mouth Disease (HFMD)
3 pages
Data Mining
No ratings yet
Data Mining
31 pages
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
66 pages
Design and Manufacturing of Pneumatic Burr Removing Machine: Kakde D V, Lokawar V L
No ratings yet
Design and Manufacturing of Pneumatic Burr Removing Machine: Kakde D V, Lokawar V L
3 pages
Natural Disasters
No ratings yet
Natural Disasters
14 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
STID1103 SYLLABUS A211 Student
No ratings yet
STID1103 SYLLABUS A211 Student
5 pages
Quantitative Methods For Business and Economics (Jakub Kielbasa)
100% (1)
Quantitative Methods For Business and Economics (Jakub Kielbasa)
187 pages
Nursing Body Mechanics Guide
No ratings yet
Nursing Body Mechanics Guide
66 pages
High Pass Filter
No ratings yet
High Pass Filter
12 pages
Lesson Plan
No ratings yet
Lesson Plan
8 pages
MyEdBC Family Portal Instructional Manual
No ratings yet
MyEdBC Family Portal Instructional Manual
6 pages
Easter Events & Weather Forecast
No ratings yet
Easter Events & Weather Forecast
10 pages
Natural & Artificial Resources Vocabulary
No ratings yet
Natural & Artificial Resources Vocabulary
20 pages
FU5 P3 DTy Qeg 5 O4 Ne FKWG
No ratings yet
FU5 P3 DTy Qeg 5 O4 Ne FKWG
18 pages
Manual7298631 Dell Color Management User S Guide For Macos
No ratings yet
Manual7298631 Dell Color Management User S Guide For Macos
13 pages
APSC 255 Formula Sheet
No ratings yet
APSC 255 Formula Sheet
3 pages
CHEMISTRY Exam
No ratings yet
CHEMISTRY Exam
8 pages
The Feasibility Study of Ballitaw
No ratings yet
The Feasibility Study of Ballitaw
2 pages
Cre6-C-240
No ratings yet
Cre6-C-240
1 page
Raisen PDF
No ratings yet
Raisen PDF
99 pages

Lecture 7 & 8 Data Mining

Uploaded by

Lecture 7 & 8 Data Mining

Uploaded by

Data mining

• Data mining is the process of finding patterns

• Banking and Other Financial

• Manufacturing and Maintenance

Step 4: Model Building

· Impute missing values

· Reduce number of variables

 Explore: Visualization and basic description of the data

 Modify: Select variables, transform variable representations

 Model: Use variety of statistical and machine learning

 Assess: Evaluate the accuracy and usefulness of the models

You might also like