0% found this document useful (0 votes)

206 views26 pages

Data Mining for Beginners

Data mining is an information extraction activity that aims to discover hidden facts contained within large databases. Some basic data mining tasks include classification, regression, clustering, pattern mining, summarization, and link analysis. Data preprocessing is an important step in the KDD process and involves cleaning data by filling in missing values, smoothing noisy data, identifying outliers, and resolving inconsistencies.

Uploaded by

Shaheen Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

206 views26 pages

Data Mining for Beginners

Uploaded by

Shaheen Mondal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Unit-4

Introduction to Data Mining

Data Mining is an information extraction activity
whose goal is to discover hidden facts
contained in large
databases.

2
Data Mining Models and
Tasks
BASIC TASKS

 Classification : Classification is a data mining technique

used for systematic placement of group membership
for data.

 For example, you may wish to use classification to

predict whether the weather on a particular day will be
“sunny”, “rainy” or “cloudy”. Popular classification
techniques include decision trees and neural networks.

4
Classification

 Given old data about customers and payments, predict

new applicant’s loan eligibility.

Previous
customers Classifier Decision rules
Salary > 5 L
Age
Salary Good/
Profession
Prof. = Exec
bad
Location
Customer
type New applicant’s
data
DATA MINING TASKS…………cntd
 Regression : Used to predict for individuals on the basis of
information gained from a previous sample of similar
individuals.

Example:
 A person wants to do some savings for future, and then it wil be
based on his current values and several past values. He uses a
linear regression formula to predict his future savings.

6
DATA MINING TASKS…………cntd
Clustering : Clustering is a data mining technique used to place
data elements into related groups without advance knowledge
of the group definitions.

Example : A department store chain creates special catalogues

targeted to various types of customer groups based on
attributes such as income, location, etc.

7
DATA MINING TASKS…………cntd
 Pattern mining is a data mining method that involves
finding existing patterns in data. In this context patterns
often means association rules. The original motivation for
searching association rules came from the desire to analyze
supermarket transaction data, that is, to examine customer
behavior in terms of the purchased products.

 For example, an association rule “cold drink ⇒ potato chips

(80%)" states that four out of five customers that bought
cold drink also bought potato chips.

8
DATA MINING TASKS…………cntd
 Summarization maps data into subsets with associated
simple descriptions (Characterization or Generalization)
 Ex- GATE score

 Link Analysis uncovers relationships among data.

 Association Rules
 Sequential Analysis determines sequential patterns.

9
Data Mining Application: Marketing
 Sales Analysis
• associations between product sales:
 bread and butter
 Toothpaste and toothbrush

 Customer Profiling
• data mining can tell you what types of customers
buy what products
 Identifying Customer Requirements
• identify the best products for different customers
• use prediction to find what factors will attract
new
customers
10
Data Mining Application:
Fraud Detection
• Association Rule Mining can detect a group of people who
stage accidents to collect on insurance

• a data-mining application can be used to detect suspicious

money transactions

• data mining can be used to help commercial lending

decisions and to prevent fraud

11
Data Preprocessing

12
Why Data
Preprocessing?
 Data in the real world is dirty
incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate
data
 e.g., occupation=“ ”
noisy: containing errors or outliers
e.g., Salary=“-10”
inconsistent: containing discrepancies in codes or
names
e.g., Age=“42” Birthday=“03/07/1997”
 e.g.,Was rating “1,2,3”, now rating “A, B, C”
 e.g., discrepancy between duplicate records
13 Data Mining: Concepts and Techniques
Why Is Data Dirty?

 Incomplete data may come from

 “Not applicable” data value when collected
 Different considerations between the time when the data was collected and when it
is analyzed.
 Human/hardware/software problems
 Noisy data (incorrect values) may come from
 Faulty data collection instruments
 Human or computer error at data entry
 Errors in data transmission
 Inconsistent data may come from
 Different data sources
 Functional dependency violation (e.g., modify some linked data)
 Duplicate records also need data cleaning
14 Data Mining: Concepts and Techniques August 10, 2015
Why Is Data Preprocessing
Important?

 No quality data, no quality mining results!

 Quality decisions must be based on quality data
e.g., duplicate or missing data may cause incorrect or even misleading statistics.
 Data warehouse needs consistent integration of quality data
 Data extraction, cleaning, and transformation comprises the majority
of the work of building a data warehouse

15 Data Mining: Concepts and Techniques

Multi-Dimensional Measure of Data
Quality
 Properties of a well-accepted multidimensional
view:
 Accuracy
 Completeness
 Consistency
 Timeliness
 Believability
 Value added
 Interpretability
 Accessibility

16 Data Mining: Concepts and Techniques August 10, 2015

Major Tasks in Data
Preprocessing
 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove outliers, and
resolve
inconsistencies
 Data integration
 Integration of
multiple databases,
data cubes, or files
 Data
transformation
 Normalization and
aggregation
 Data reduction
 Obtains reduced representation in volume but produces the same or
17 Data Mining: Concepts and Techniques August 10, 2015
similar analytical results
Forms of Data
Preprocessing

18 Data Mining: Concepts and Techniques August 10, 2015

KDD Process

19
The KDD
process
"KDD is the nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandablepatterns in
data".

20
Steps
: The process operates on the following basic steps:
 (i) identifying the goal from the user's point of view ( based on
the relevant knowledge about the domain),
 (ii) creating a target data,
 (iii) data preprocessing,
 (iv) data reduction and projection,
 (v) matching the goals of the KDD process,
 (vi) exploratory analysis,
 (vii) data mining,
 (viii) interpreting mined patterns,
 (ix) acting on the discovered knowledge.

21
 These steps can be divided into three tasks:
 the preprocessing of data(steps i - vi),
 the mining of data (steps vii) and
 the postprocessing of data (steps viii - ix).

 The domain knowledge helps the process to focus on the

research content.

22
Fig. : The KDD Process

23
KDD Process Ex: Web
Log
 Selection:
 Select log data (dates and locations) to use
 Preprocessing:
 Remove identifying URLs
 Remove error logs
 Transformation:
 Sessionize (sort and group)
 Data Mining:
 Identify and count patterns
 Construct data structure
 Interpretation/Evaluation:
 Identify and display frequently accessed sequences.
 Potential User Applications:
 Cache prediction
 Personalization

24
KDD
Issues
 Human Interaction
 Outliers
 Interpretation
 Visualization
 Large Datasets
 High Dimensionality

25
KDD Issues…………
cntd
 Multimedia Data
 Missing Data
 Irrelevant Data
 Noisy Data
 Changing Data
 Integration
 Application

Road Safety Week-Quiz Question Sets - A, B, C, D
100% (1)
Road Safety Week-Quiz Question Sets - A, B, C, D
8 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
KDD and Data Mining Explained
No ratings yet
KDD and Data Mining Explained
46 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Unit 4 Intro DM
No ratings yet
Unit 4 Intro DM
30 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
3 - DM
No ratings yet
3 - DM
4 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
DM Module1
No ratings yet
DM Module1
15 pages
DM 1
No ratings yet
DM 1
47 pages
Unit 1
No ratings yet
Unit 1
148 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
1 - DM
No ratings yet
1 - DM
5 pages
DW&M Unit - 1-Imp Vii Sem
No ratings yet
DW&M Unit - 1-Imp Vii Sem
9 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
DataMining S
No ratings yet
DataMining S
103 pages
Chapter 2 DM
No ratings yet
Chapter 2 DM
91 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
Data Mining 2.0
No ratings yet
Data Mining 2.0
15 pages
Unit 1
No ratings yet
Unit 1
48 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Combine 056
No ratings yet
Combine 056
57 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Unit 3
No ratings yet
Unit 3
22 pages
Data Mining Techniques Using R Unit 1
No ratings yet
Data Mining Techniques Using R Unit 1
26 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Lecture - 1 02032023 095637am 1 29022024 124126pm
No ratings yet
Lecture - 1 02032023 095637am 1 29022024 124126pm
33 pages
Unit1 - Intoduction To Data Mining
No ratings yet
Unit1 - Intoduction To Data Mining
10 pages
Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm
No ratings yet
Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm
33 pages
Course Manual On Data Mining - CSC 425 - 015446
No ratings yet
Course Manual On Data Mining - CSC 425 - 015446
44 pages
Unit 2 Introduction To Data Mining
No ratings yet
Unit 2 Introduction To Data Mining
38 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
IT326 - Ch1
100% (1)
IT326 - Ch1
17 pages
Data Mining Concepts and Techniques
67% (3)
Data Mining Concepts and Techniques
136 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Data Mining Essentials
No ratings yet
Data Mining Essentials
13 pages
References: Machine Learning Tools and Techniques, 2 Edition
No ratings yet
References: Machine Learning Tools and Techniques, 2 Edition
32 pages
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
No ratings yet
Mekelle University-Mekelle Institute of Technology Department of Information Technology Data Mining and Knowledge Discovery
36 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
2-Tasks and Techniques
No ratings yet
2-Tasks and Techniques
17 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
62 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Groupware Systems Overview
No ratings yet
Groupware Systems Overview
58 pages
Electrical Engg. Academic Profile
No ratings yet
Electrical Engg. Academic Profile
8 pages
BJT Basics and Configurations
No ratings yet
BJT Basics and Configurations
165 pages
Basic Objects Necessary - Setting Up The Xmlhttprequest Object - Making The Call - How The Server Responds - Using The Reply - XML Basics
No ratings yet
Basic Objects Necessary - Setting Up The Xmlhttprequest Object - Making The Call - How The Server Responds - Using The Reply - XML Basics
19 pages
Heuristic Search Algorithms Guide
No ratings yet
Heuristic Search Algorithms Guide
81 pages
Online Security: Instructor: Prof. T. Vijayetha
No ratings yet
Online Security: Instructor: Prof. T. Vijayetha
35 pages
Difference Between Testing and Assessment
No ratings yet
Difference Between Testing and Assessment
4 pages
Asians in Colorado A History of Persecution and Perseverance in The Centennial State William Wei Instant Download
No ratings yet
Asians in Colorado A History of Persecution and Perseverance in The Centennial State William Wei Instant Download
93 pages
Format For Course Curriculum Course Title: Credit Units:3 Course Code: PSYC732
No ratings yet
Format For Course Curriculum Course Title: Credit Units:3 Course Code: PSYC732
4 pages
ICTE18 SaikatChakraborty
No ratings yet
ICTE18 SaikatChakraborty
17 pages
8312995-Syllabus For A2 2022-2023 - Class11
No ratings yet
8312995-Syllabus For A2 2022-2023 - Class11
8 pages
Be Summer 2022
No ratings yet
Be Summer 2022
2 pages
Whats in A Name Systematic and Non Systematic Literature Reviews and Why The Distinction Matters
No ratings yet
Whats in A Name Systematic and Non Systematic Literature Reviews and Why The Distinction Matters
2 pages
Faculty of Electrical and Electronic Engineering Bachelor of Electronic Engineering With Honours
No ratings yet
Faculty of Electrical and Electronic Engineering Bachelor of Electronic Engineering With Honours
6 pages
Phishing Detection in E-Mails Using Machine Learning: Srishti Rawal Bhuvan Rawal Aakhila Shaheen Shubham Malik
No ratings yet
Phishing Detection in E-Mails Using Machine Learning: Srishti Rawal Bhuvan Rawal Aakhila Shaheen Shubham Malik
4 pages
Lesson 8 Engineering Ethics
No ratings yet
Lesson 8 Engineering Ethics
25 pages
Mobility Sexuality and AIDS 1st Paperback Ed. Edition Thomas PDF Download
100% (20)
Mobility Sexuality and AIDS 1st Paperback Ed. Edition Thomas PDF Download
70 pages
MITS6011 - ResearchReport
No ratings yet
MITS6011 - ResearchReport
15 pages
CV SDF
No ratings yet
CV SDF
4 pages
Lee-Lanier IB Pythagorean Project
No ratings yet
Lee-Lanier IB Pythagorean Project
3 pages
List of Course Selection Advisors
No ratings yet
List of Course Selection Advisors
3 pages
Data Science Lecture Notes
100% (1)
Data Science Lecture Notes
216 pages
Unit 3 Multilingualism and Cognition: "To Have Another Language Is To Possess A Second Soul." - Charlemagne (742
No ratings yet
Unit 3 Multilingualism and Cognition: "To Have Another Language Is To Possess A Second Soul." - Charlemagne (742
12 pages
Al-Ghazali's English Communication Course
No ratings yet
Al-Ghazali's English Communication Course
11 pages
Informal Assessment Tools
No ratings yet
Informal Assessment Tools
4 pages
02mindset1 Term1 Test WrittenComprehension
No ratings yet
02mindset1 Term1 Test WrittenComprehension
2 pages
Zone of Proximal Development: (Lev Vygotsky)
No ratings yet
Zone of Proximal Development: (Lev Vygotsky)
21 pages
Mother Tongue
No ratings yet
Mother Tongue
8 pages
Netnography As A Tool For Understanding Customers: Implications For Service Research and Practice
No ratings yet
Netnography As A Tool For Understanding Customers: Implications For Service Research and Practice
23 pages
Research Problem Selection Guide
No ratings yet
Research Problem Selection Guide
3 pages
25AnnualPGazetteVol 2 11th 24may25
No ratings yet
25AnnualPGazetteVol 2 11th 24may25
479 pages
Collaborative Learning Activities Guide
100% (1)
Collaborative Learning Activities Guide
21 pages
Bped 221 Movement Education
No ratings yet
Bped 221 Movement Education
22 pages
Reflection Paper Darren Carino
No ratings yet
Reflection Paper Darren Carino
2 pages
Advantages and Disadvantages:: For Types of Test Items
No ratings yet
Advantages and Disadvantages:: For Types of Test Items
2 pages
Managing Organizational Change For School
No ratings yet
Managing Organizational Change For School
43 pages

Data Mining for Beginners

Uploaded by

Data Mining for Beginners

Uploaded by

Unit-4

Introduction to Data Mining

 Classification : Classification is a data mining technique

 For example, you may wish to use classification to

 Given old data about customers and payments, predict

Example : A department store chain creates special catalogues

 For example, an association rule “cold drink ⇒ potato chips

 Link Analysis uncovers relationships among data.

• a data-mining application can be used to detect suspicious

• data mining can be used to help commercial lending

 Incomplete data may come from

 No quality data, no quality mining results!

15 Data Mining: Concepts and Techniques

16 Data Mining: Concepts and Techniques August 10, 2015

18 Data Mining: Concepts and Techniques August 10, 2015

 The domain knowledge helps the process to focus on the

You might also like