0% found this document useful (0 votes)

33 views23 pages

DM 1

This document provides an introduction to data mining and warehousing. It discusses the motivation for data mining due to the explosive growth of data. It defines data mining as the automated analysis of massive data sets to extract useful patterns and knowledge. The document outlines the data mining process and some common techniques, including classification, clustering, regression, and prediction. It also discusses potential applications of data mining such as market analysis, risk analysis, and fraud detection.

Uploaded by

areebaariba745

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views23 pages

DM 1

Uploaded by

areebaariba745

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Data Mining & Warehousing

Chapter 1. Introduction

Data Warehousing/Mining
Chapter 1. Introduction

 Motivation: Why data mining?

 What is data mining?
 Data Mining: On what kind of data?
 Data mining functionality
 Are all the patterns interesting?
 Classification of data mining systems
 Major issues in data mining

Data Warehousing/Mining
• 1 Zeta byte = 1
trillion Gigabytes.

• 5,200 GB of data
for every
person on
Earth.

Data Warehousing/Mining 3
Why Data Mining?
• The Explosive Growth of Data: from terabytes to petabytes
• Data collection and data availability
• Automated data collection tools, database systems, Web, computerized
society
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: Remote sensing, bioinformatics, scientific simulation, …
• Society and everyone: news, digital cameras, YouTube, social media,
mobile devices, …
• We are drowning in data, but starving for knowledge!
• “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets
• Mine the knowledge from data
Data Warehousing/Mining 4
Example of Data Volumes
 Data Sets are growing.
How Much Data is that?
1 MB 220 or 106 bytes Small novel – 31/2 Disk
Paper rims that could fill the back of
1 GB 230 or 109 bytes
a pickup van
50,000 trees chopped and converted
1 TB 240 or 1012 bytes
into paper and printed
Academic research libraries across
2 PB 1 PB = 250 or 1015 bytes
the U.S.
All words ever spoken by human
5 EB 1 EB = 260 or 1018 bytes
beings

5
Evolution of Database Technology

 1960s:
– Data collection, database creation, IMS and network DBMS
 1970s:
– Relational data model, relational DBMS implementation
 1980s:
– RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
 1990s—2000s:
– Data mining and data warehousing, multimedia databases, and
Web databases

Data Warehousing/Mining
Data Mining: On What Kind of
Data?
 Relational databases
 Data warehouses
 Transactional databases
 Advanced DB and information repositories
– Object-oriented and object-relational databases
– Spatial databases
– Time-series data and temporal data
– Text databases and multimedia databases
– Heterogeneous and legacy databases
– WWW

Data Warehousing/Mining
What Is Data Mining?
 Data mining (knowledge discovery in databases):

– Extraction of interesting (non-trivial, implicit, previously

unknown and potentially useful) information or patterns
from data in large databases
 Alternative names and their “inside stories”:
– Data mining: a misnomer?
– Knowledge discovery(mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
 What is not data mining?
– (Deductive) query processing.
– Expert systems or small machine learning/
statistical programs
Data Warehousing/Mining
Data Mining: A Knowledge Discovery in
Databases (KDD) Process
Pattern Evaluation
– Data mining: the core of
knowledge discovery process.
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
Data Warehousing/Mining
7 Data Mining Steps
 1. Data cleaning – remove noise and
inconsistent data
 2. Data integration – combine multiple
sources
 3. Data selection – retrieve from the
database data relevant to the analysis task
 4. Data transformation – data are
transformed or consolidated into forms
appropriate for mining (e.g. performing
summary or aggregation operations)

Data Warehousing/Mining 1
7 Data Mining Steps (continued)

 5. Data mining – intelligent methods are

applied to extract data patterns
 6. Pattern evaluation – identify truly
interesting patterns representing knowledge
based on some interestingness measures
 7. Knowledge presentation – present mined
knowledge to the user

Data Warehousing/Mining 1
Data Mining: Classification Schemes

Broader category of data:

 General functionality
– Descriptive data mining – general property of data
– Predictive data mining –in order to make predictions
 Different views, different classifications
– Kinds of databases to be mined
– Kinds of knowledge to be discovered
– Kinds of techniques utilized
– Kinds of applications adapted

Data Warehousing/Mining 1
Pattern to be Mined

 Predictive data mining

– Classification
– Regression
– Time series analysis
– prediction
 Descriptive data mining
– Clustering
– Association
– Summarization

Data Warehousing/Mining 1
Why Data Mining? — Potential
Applications
 Database analysis and decision support
– Market analysis and management
 target marketing, customer relation management, market basket
analysis, cross selling, market segmentation
– Risk analysis and management
 Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
– Fraud detection and management
 Other Applications
– Text mining (news group, email, documents) and Web analysis.
– Intelligent query answering

Data Warehousing/Mining 1
Market Analysis and Management (1)

 Where are the data sources for analysis?

– Credit card transactions, loyalty cards, discount coupons,
customer complaint calls, plus (public) lifestyle studies
 Target marketing
– Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.
 Determine customer purchasing patterns over time
– Conversion of single to a joint bank account: marriage, etc.
 Cross-market analysis
– Associations/co-relations between product sales
– Prediction based on the association information
Data Warehousing/Mining 1
Market Analysis and Management (2)

 Customer profiling
– data mining can tell you what types of customers buy what
products (clustering or classification)
 Identifying customer requirements
– identifying the best products for different customers
– use prediction to find what factors will attract new customers
 Provides summary information
– various multidimensional summary reports
– statistical summary information (data central tendency and
variation)
Data Warehousing/Mining 1
Corporate Analysis and Risk
Management
 Finance planning and asset evaluation
– cash flow analysis and prediction
– contingent claim analysis to evaluate assets
– cross-sectional and time series analysis (financial-ratio, trend
analysis, etc.)
 Resource planning:
– summarize and compare the resources and spending
 Competition:
– monitor competitors and market directions
– group customers into classes and a class-based pricing
procedure
– set pricing strategy in a highly competitive market

Data Warehousing/Mining 1
Fraud Detection and Management (1)

 Applications
– widely used in health care, retail, credit card services,
telecommunications (phone card fraud), etc.
 Approach
– use historical data to build models of fraudulent behavior and
use data mining to help identify similar instances
 Examples
– auto insurance: detect a group of people who stage accidents to
collect on insurance
– money laundering: detect suspicious money transactions (US
Treasury's Financial Crimes Enforcement Network)
– medical insurance: detect professional patients and ring of
doctors and ring of references
Data Warehousing/Mining 1
Fraud Detection and Management (2)
 Detecting inappropriate medical treatment
– Australian Health Insurance Commission identifies that in many
cases blanket screening tests were requested (save Australian
$1m/yr).
 Detecting telephone fraud
– Telephone call model: destination of the call, duration, time of
day or week. Analyze patterns that deviate from an expected
norm.
– British Telecom identified discrete groups of callers with
frequent intra-group calls, especially mobile phones, and broke
a multimillion dollar fraud.
 Retail
– Analysts estimate that 38% of retail shrink is due to dishonest
employees.
Data Warehousing/Mining 1
Other Applications

 Sports
– IBM Advanced Scout analyzed NBA game statistics (shots
blocked, assists, and fouls) to gain competitive advantage for
New York Knicks and Miami Heat
 Internet Web Surf-Aid
– IBM Surf-Aid applies data mining algorithms to Web access
logs for market-related pages to discover customer preference
and behavior pages, analyzing effectiveness of Web marketing,
improving Web site organization, etc.

Data Warehousing/Mining 2
Major Issues in Data Mining (1)

 Mining methodology and user interaction

– Mining different kinds of knowledge in databases
– Interactive mining of knowledge at multiple levels of abstraction
– Incorporation of background knowledge
– Data mining query languages and ad-hoc data mining
– Expression and visualization of data mining results
– Handling noise and incomplete data
– Pattern evaluation: the interestingness problem
 Performance and scalability
– Efficiency and scalability of data mining algorithms
– Parallel, distributed and incremental mining methods

Data Warehousing/Mining 2
Major Issues in Data Mining (2)

 Issues relating to the diversity of data types

– Handling relational and complex types of data
– Mining information from heterogeneous databases and global
information systems (WWW)
 Issues related to applications and social impacts
– Application of discovered knowledge
 Domain-specific data mining tools
 Intelligent query answering
 Process control and decision making
– Integration of the discovered knowledge with existing
knowledge: A knowledge fusion problem
– Protection of data security, integrity, and privacy

Data Warehousing/Mining 2
Summary

 Data mining: discovering interesting patterns from large amounts of

data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
 Classification of data mining systems
 Major issues in data mining
Data Warehousing/Mining 2

Data Mining 5 Units Notes
No ratings yet
Data Mining 5 Units Notes
85 pages
Lecture 1 Data Mining
No ratings yet
Lecture 1 Data Mining
51 pages
DM 1
No ratings yet
DM 1
22 pages
Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction
No ratings yet
Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction
32 pages
Data Warehousing & Data Mining Slides
No ratings yet
Data Warehousing & Data Mining Slides
23 pages
001.data Mining and Data Warewhouse
No ratings yet
001.data Mining and Data Warewhouse
7 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
15 pages
Data Mining - GDi Techno Solutions
No ratings yet
Data Mining - GDi Techno Solutions
145 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
56 pages
Data Mining & Warehousing 01
No ratings yet
Data Mining & Warehousing 01
53 pages
Department of Information Technology: Data Warehousing and Data Mining IT4204 3
No ratings yet
Department of Information Technology: Data Warehousing and Data Mining IT4204 3
60 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Software
No ratings yet
Software
93 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
Data Mininng
No ratings yet
Data Mininng
11 pages
Group 4
No ratings yet
Group 4
16 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
Data Whare House PDF
No ratings yet
Data Whare House PDF
51 pages
DM - Unit4
No ratings yet
DM - Unit4
15 pages
Dwdm-Unit-1 R16
No ratings yet
Dwdm-Unit-1 R16
17 pages
Hu DM 2024
No ratings yet
Hu DM 2024
205 pages
Data Mining Data Warehousing &: Priyadarshini College of Engineering, Akkampeta Nellore Dist
No ratings yet
Data Mining Data Warehousing &: Priyadarshini College of Engineering, Akkampeta Nellore Dist
21 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
17 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
Data Mining and Warehousing Concepts: Hapter
No ratings yet
Data Mining and Warehousing Concepts: Hapter
7 pages
Course Outline Data Mining
No ratings yet
Course Outline Data Mining
4 pages
Data Warehouse and Data Mining - Unit 2
No ratings yet
Data Warehouse and Data Mining - Unit 2
24 pages
D-Unit-1 R16
No ratings yet
D-Unit-1 R16
17 pages
Datamining
100% (1)
Datamining
11 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
Introduction To Data Mining 1604
No ratings yet
Introduction To Data Mining 1604
32 pages
5, Data Warehousing
No ratings yet
5, Data Warehousing
16 pages
Data Mining & Warehousing Basics
No ratings yet
Data Mining & Warehousing Basics
14 pages
Database Tech Evolution for Analysts
No ratings yet
Database Tech Evolution for Analysts
59 pages
5 Data Warehousing and Data Mining in Government
No ratings yet
5 Data Warehousing and Data Mining in Government
26 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
1 DM Intro1
No ratings yet
1 DM Intro1
34 pages
Ch2 Data Warehousing
No ratings yet
Ch2 Data Warehousing
46 pages
Unit - II DW
No ratings yet
Unit - II DW
20 pages
1 Intro
No ratings yet
1 Intro
50 pages
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
No ratings yet
Data Mining and Warehousing: - Data Mining Has Become A Popular Buzzword But, in Fact, Promises To
9 pages
Data Warehousing and Data Mining
100% (1)
Data Warehousing and Data Mining
30 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Blue White Creative Cute Group Project Presentation
No ratings yet
Blue White Creative Cute Group Project Presentation
18 pages
Unit 1 - Introduction To Data Mining and Data Warehousing
No ratings yet
Unit 1 - Introduction To Data Mining and Data Warehousing
84 pages
1 DM Intro
No ratings yet
1 DM Intro
34 pages
01intro (Autosaved)
No ratings yet
01intro (Autosaved)
43 pages
By Bi Jay Mishra
100% (1)
By Bi Jay Mishra
685 pages
Module1 IntroToDataMining
No ratings yet
Module1 IntroToDataMining
36 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
86 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Database 4
No ratings yet
Database 4
35 pages
DMDW
No ratings yet
DMDW
287 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Internship
No ratings yet
Internship
12 pages
01 Introduction Data Science.213295.1561956228.708 PDF
No ratings yet
01 Introduction Data Science.213295.1561956228.708 PDF
63 pages
Ntpep Datamine 3.0 - Regeo Ui Design - r1 - 14
No ratings yet
Ntpep Datamine 3.0 - Regeo Ui Design - r1 - 14
45 pages
Artificial Intelligence For Security - Enhancing Protection
No ratings yet
Artificial Intelligence For Security - Enhancing Protection
373 pages
Text Mining Thesis PDF
100% (2)
Text Mining Thesis PDF
9 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
Cluster Analysis and K-Means Guide
No ratings yet
Cluster Analysis and K-Means Guide
20 pages
IntelliHealth: Medical Decision Support
No ratings yet
IntelliHealth: Medical Decision Support
16 pages
Enhancing DBSCAN Algorithm For Data Mining
No ratings yet
Enhancing DBSCAN Algorithm For Data Mining
5 pages
CS614
No ratings yet
CS614
15 pages
Topic 1 ISP565
No ratings yet
Topic 1 ISP565
58 pages
Math & Stats for Middle Schoolers
No ratings yet
Math & Stats for Middle Schoolers
174 pages
Web Mining - Lec1 2
No ratings yet
Web Mining - Lec1 2
62 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
Introduction to Knowledge Management
No ratings yet
Introduction to Knowledge Management
19 pages
Data Mining Lengkap
100% (1)
Data Mining Lengkap
697 pages
Knowledge Discovery in Databases: An Overview: William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus
No ratings yet
Knowledge Discovery in Databases: An Overview: William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus
14 pages
BIDSS
No ratings yet
BIDSS
47 pages
Ecommerce Big Data Computeing Platform System Based On Distribuded Computing
No ratings yet
Ecommerce Big Data Computeing Platform System Based On Distribuded Computing
10 pages
Figures For Chapter 8 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Figures For Chapter 8 Introduction To Data Mining: by Tan, Steinbach, Kumar
41 pages
MCA III Year I Semester Syllabus
No ratings yet
MCA III Year I Semester Syllabus
12 pages
MSApriori Algorithm Steps
No ratings yet
MSApriori Algorithm Steps
5 pages
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
No ratings yet
Roberts Ryan Machine Learning The Ultimate Beginners Guide F
45 pages
Mining Resume Writing Services
100% (2)
Mining Resume Writing Services
7 pages
Top 10 Machine Learning Algorithms
No ratings yet
Top 10 Machine Learning Algorithms
12 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Distributed Data Mining
No ratings yet
Distributed Data Mining
119 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Academic and Industry Achievements
No ratings yet
Academic and Industry Achievements
15 pages
R18B Tech CSESyllabus
No ratings yet
R18B Tech CSESyllabus
1 page
Data Mining With Big Data
No ratings yet
Data Mining With Big Data
26 pages

DM 1

Uploaded by

DM 1

Uploaded by

Data Mining & Warehousing

 Motivation: Why data mining?

– Extraction of interesting (non-trivial, implicit, previously

Data Warehouse Selection

 5. Data mining – intelligent methods are

Broader category of data:

 Predictive data mining

 Where are the data sources for analysis?

 Mining methodology and user interaction

 Issues relating to the diversity of data types

 Data mining: discovering interesting patterns from large amounts of

You might also like