Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views21 pages

Lecture 7 & 8 Data Mining

Data mining is the process of discovering patterns and correlations in large data sets to predict outcomes, which can help businesses increase revenues and improve customer relationships. It has various applications across industries such as retail, banking, and healthcare, utilizing methods like CRISP-DM and SEMMA for systematic project execution. Common myths and mistakes in data mining include misconceptions about its capabilities and the importance of thorough data preparation.

Uploaded by

Adarsh Singhania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

Lecture 7 & 8 Data Mining

Data mining is the process of discovering patterns and correlations in large data sets to predict outcomes, which can help businesses increase revenues and improve customer relationships. It has various applications across industries such as retail, banking, and healthcare, utilizing methods like CRISP-DM and SEMMA for systematic project execution. Common myths and mistakes in data mining include misconceptions about its capabilities and the importance of thorough data preparation.

Uploaded by

Adarsh Singhania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data mining

We will cover
• Data mining concepts and applications
• Process
• Methods
• data mining myths and blunders
Meaning
• The process of digging through data to discover
hidden connections and predict future trends

• Data mining is the process of finding patterns


and correlations within large data sets to
predict outcomes. Using a broad range of
techniques, business can use this information
to increase revenues, cut costs, improve
customer relationships, reduce risks and more.
Example of Data Mining
• Grocery stores are well-known users of data
mining techniques. Many supermarkets offer
free loyalty cards to customers that give them
access to reduced prices not available to non-
members. The cards make it easy for stores to
track who is buying what, when they are buying it
and at what price. After analyzing the data, stores
can then use this data to offer customers coupons
targeted to their buying habits and decide when to
put items on sale or when to sell them at full price.
Applications
Data Mining Applications
• Customer Relationship Management
– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers

• Banking and Other Financial


– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting
Data Mining Applications (cont.)
• Retailing and Logistics
– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life

• Manufacturing and Maintenance


– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize the
use manufacturing capacity
– Discover novel patterns to improve product quality
Data Mining Applications
• Brokerage and Securities Trading
– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading

• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities
Data Mining Applications (cont.)
• Computer hardware and software
• Science and engineering
• Government and defense
• Homeland security and law enforcement
• Travel industry
• Healthcare Highly popular
• Medicine application areas for
• data mining
Entertainment industry
• Sports
• Etc.
Process
Data Mining Process
• A manifestation of best practices
• A systematic way to conduct DM projects
• Different groups has different versions
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process for
Data Mining)
– SEMMA (Sample, Explore, Modify, Model, and
Assess)
– KDD (Knowledge Discovery in Databases)
Data Mining Process: CRISP-DM

1 2
Business Data
Understanding Understanding

3
Data
Preparation
Data Sources
6
4
Deployment
Model
Building

5
Testing and
Evaluation
Data Mining Process: CRISP-DM
Step 1: Business Understanding Accounts for
~85% of
Step 2: Data Understanding total project
Step 3: Data Preparation (!) time

Step 4: Model Building


Step 5: Testing and Evaluation
Step 6: Deployment
• The process is highly repetitive and
experimental (DM: art versus science?)
Data Preparation – A Critical DM Task
Real-world
Data

· Collect data
Data Consolidation · Select data
· Integrate data

· Impute missing values


Data Cleaning · Reduce noise in data
· Eliminate inconsistencies

· Normalize data
Data Transformation · Discretize/aggregate data
· Construct new attributes

· Reduce number of variables


Data Reduction · Reduce number of cases
· Balance skewed data

Well-formed
Data
Data Mining Process: SEMMA
Sample
(Generate a representative
sample of the data)

Assess Explore
(Evaluate the accuracy and (Visualization and basic
usefulness of the models) description of the data)

SEMMA

Model Modify
(Use variety of statistical and (Select variables, transform
machine learning models ) variable representations)
 Sample: Generate a representative sample of the data

 Explore: Visualization and basic description of the data

 Modify: Select variables, transform variable representations

 Model: Use variety of statistical and machine learning


models

 Assess: Evaluate the accuracy and usefulness of the models


Data Mining Myths and Blunders
Data Mining Myths
• Data mining …
– provides instant solutions/predictions
– is not yet viable for business applications
– requires a separate, dedicated database
– can only be done by those with advanced degrees
– is only for large firms that have lots of customer
data
– is another name for the good-old statistics
Common Data Mining Mistakes
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data mining is
and what it really can/cannot do
3. Not leaving insufficient time for data acquisition,
selection and preparation
4. Looking only at aggregated results and not at
individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results
Common Data Mining Mistakes
6. Ignoring suspicious (good or bad) findings and
quickly moving on
7. Running mining algorithms repeatedly and blindly,
without thinking about the next stage
8. Naively believing everything you are told about
the data
9. Naively believing everything you are told about
your own data mining analysis
10. Measuring your results differently from the way
your sponsor measures them

You might also like