Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views20 pages

Data Mining

The document discusses data mining as a crucial process for extracting valuable patterns from large datasets, driven by the increasing availability of quality data and the need for competitive advantage. It outlines various data mining tasks, applications across different industries, and common misconceptions and mistakes associated with data mining practices. Key methods and algorithms used in data mining, such as classification and clustering, are also highlighted, along with the importance of data quality and preparation.

Uploaded by

Adarsh Singhania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views20 pages

Data Mining

The document discusses data mining as a crucial process for extracting valuable patterns from large datasets, driven by the increasing availability of quality data and the need for competitive advantage. It outlines various data mining tasks, applications across different industries, and common misconceptions and mistakes associated with data mining practices. Key methods and algorithms used in data mining, such as classification and clustering, are also highlighted, along with the importance of data quality and preparation.

Uploaded by

Adarsh Singhania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Mining for Business

Intelligence
Data Mining Concepts and
Definitions
Why Data Mining?
 More intense competition at the global
scale
 Recognition of the value in data sources
 Availability of quality data on customers,
vendors, transactions, Web, etc.
 Consolidation and integration of data
repositories into data warehouses
 The exponential increase in data
processing and storage capabilities; and
decrease in cost
 Movement toward conversion of
Definition of Data Mining
 The nontrivial process of identifying
valid, novel, potentially useful, and
ultimately understandable patterns in
data stored in structured databases
- Fayyad et al., (1996)
 Keywords in this definition: Process,
nontrivial, valid, novel, potentially useful,
understandable
 Data mining: a misnomer?
 Other names: knowledge extraction,
pattern analysis, knowledge discovery,
information harvesting, pattern
Data Mining at the Intersection
of Many Disciplines

Pattern
Recognition

DATA Machine
MINING Learning

Mathematical
Modeling Databases

Management Science &


Information Systems
Data Mining
Characteristics/Objectives
 Source of data for DM is often a
consolidated data warehouse (not
always!).
 DM environment is usually a client-server
or a Web-based information systems
architecture.
 Data is the most critical ingredient for
DM which may include soft/unstructured
data.
 The miner is often an end user.
 Striking it rich requires creative thinking.
Data in Data Mining
 Data: a collection of facts usually obtained as
the result of experiences, observations, or
experiments
 Data may consist of numbers, words, and
images
 Data: lowest level of abstraction (from which
Data
- DM with
information and knowledge are derived)
different data
Categorical Numerical
types?
- Other data
types?
Nominal Ordinal Interval Ratio
What Does DM Do? How Does it
Work?

 DM extracts patterns from data


 Pattern? A mathematical (numeric and/or
symbolic) relationship among data items
 Types of patterns
 Association: (Beer & diapers in a markets basket
analysis)
 Prediction: Predicts future occurrences based on the
past (Super Bowl winner, temperature on a specific day)
 Cluster: (segmentation based on demographics or past
purchase behavior)
 Sequential (or time series) relationships: existing
bank customer with checking account will open savings
account within a year
A Taxonomy for Data Mining
Tasks
Data Mining Learning Method Popular Algorithms

Classification and Regression Trees,


Prediction Supervised
ANN, SVM, Genetic Algorithms

Decision trees, ANN/MLP, SVM, Rough


Classification Supervised
sets, Genetic Algorithms

Linear/Nonlinear Regression, Regression


Regression Supervised
trees, ANN/MLP, SVM

Association Unsupervised Apriory, OneR, ZeroR, Eclat

Link analysis Unsupervised Expectation Maximization, Apriory


Algorithm, Graph-based Matching

Sequence analysis Unsupervised Apriory Algorithm, FP-Growth technique

Clustering Unsupervised K-means, ANN/SOM

Outlier analysis Unsupervised K-means, Expectation Maximization (EM)


Other Data Mining Tasks
 These are in addition to the primary
DM tasks (prediction, association,
clustering)
 Time-series forecasting

Part of sequence or link analysis?
 Visualization

Another data mining task?

 Types of DM

Hypothesis-driven data mining

Discovery-driven data mining
Data Mining Applications
 Customer Relationship Management
 Maximize return on marketing campaigns
 Improve customer retention (churn analysis)
 Maximize customer value (cross- or up-
selling)
 Identify and treat most valued customers
 Banking & Other Financial
 Automate the loan application process
 Detecting fraudulent transactions
 Maximize customer value (cross- and up-
selling)
 Optimizing cash reserves with forecasting
Data Mining Applications (cont.)
 Retailing and Logistics
 Optimize inventory levels at different
locations
 Improve the store layout and sales
promotions
 Optimize logistics by predicting seasonal
effects
 Minimize losses due to limited shelf life
 Manufacturing and Maintenance
 Predict/prevent machinery failures
 Identify anomalies in production systems to
optimize manufacturing capacity
Data Mining Applications (cont.)
 Brokerage and Securities Trading
 Predict changes on certain bond prices
 Forecast the direction of stock fluctuations
 Assess the effect of events on market
movements
 Identify and prevent fraudulent activities in
trading
 Insurance
 Forecast claim costs for better business
planning
 Determine optimal rate plans
 Optimize marketing to specific customers
Data Mining Applications (cont.)
 Computer hardware and software
 Science and engineering
 Government and defense
 Homeland security and law enforcement
 Travel industry
 Healthcare Highly popular
 Medicine application areas for
data mining
 Entertainment industry
 Sports
 Etc.
Data Mining Methods:
Classification
 Most frequently used DM method
 Part of the machine-learning family
 Employ supervised learning
 Learn from past data, classify new
data
 The output variable is categorical
(nominal or ordinal) in nature
 Classification versus regression?
 Classification versus clustering?
Classification Techniques
 Decision tree analysis
 Statistical analysis
 Neural networks
 Support vector machines
 Case-based reasoning
 Bayesian classifiers
 Genetic algorithms
 Rough sets
Decision Trees
 Employs the divide and conquer method
 Recursively divides a training set until
each division consists of examples from
A one class
general 1. Create a root node and assign all of the
algorith training data to it.
m for 2. Select the best splitting attribute.
decision 3. Add a branch to the root node for each
tree value of the split. Split the data into
building mutually exclusive subsets along the lines
of the specific split.
4. Repeat the steps 2 and 3 for each and
every leaf node until the stopping criteria
Data Mining SPSS PASW Modeler (formerly Clementine)

RapidMiner

SAS / SAS Enterprise Miner

Software Microsoft Excel

Your own code

Weka (now Pentaho)

 Commercial KXEN

MATLAB
 IBM SPSS Modeler Other commercial tools

(formerly Clementine)
KNIME

Microsoft SQL Server

 SAS – Enterprise Miner Other free tools

Zementis
 IBM – Intelligent Miner Oracle DM

Statsoft Statistica
 StatSoft – Statistica Salford CART, Mars, other

Data Miner Orange

Angoss
 … many more C4.5, C5.0, See5

Free and/or Open


Bayesia

Insightful Miner/S-Plus (now TIBCO)

Source Megaputer

Viscovery
 RapidMiner Clario Analytics
Total (w/ others) Alone
Miner3D
 Weka Thinkanalytics

 … many more Source: KDNuggets.com, May 2009


0 20 40 60 80 100 120
Data Mining Myths
 Data mining …
 provides instant solutions/predictions.
 is not yet viable for business
applications.
 requires a separate, dedicated
database.
 can only be done by those with
advanced degrees.
 is only for large firms that have lots of
customer data.
 is another name for good-old
Common Data Mining Blunders
1. Selecting the wrong problem for data
mining
2. Ignoring what your sponsor thinks data
mining is and what it really can/cannot
do
3. Not leaving sufficient time for data
acquisition, selection and preparation
4. Looking only at aggregated results and
not at individual records/predictions
5. Being sloppy about keeping track of the
data mining procedure and results
Common Data Mining Mistakes
6. Ignoring suspicious (good or bad)
findings and quickly moving on
7. Running mining algorithms repeatedly
and blindly, without thinking about the
next stage
8. Naively believing everything you are
told about the data
9. Naively believing everything you are
told about your own data mining
analysis
10. Measuring your results differently from

You might also like