0% found this document useful (0 votes)

18 views23 pages

Introduction To Data Mining - Lecture03

Uploaded by

vikum.amarananda47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views23 pages

Introduction To Data Mining - Lecture03

Uploaded by

vikum.amarananda47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Introduction to Data

Mining
Madava Viranjan
• The world is rich in data

• Repositories to store data from multiple heterogeneous data sources

• OLAP as analysis technique with functionalities like summarization,

consolidation and aggregation.
What is Data Mining?

• The process of discovering interesting patterns and knowledge from large

amount of data
• Does it same as Knowledge Discovery from Data (KDD)?
KDD vs
Data Mining
Data Mining Functionalities

• Class/Concept Description
• Classes and Concepts can be described in summarized terms
• Mining Frequent Patterns
• Patterns that occur frequently in a dataset
• Classification
• Find a model that describes and distinguishes classes/concepts
• Cluster Analysis
• Objects are grouped to maximize intra-class similarity but minimize
inter-class similarities
• Are all patterns interesting?

• Can Data Mining system generate all of the interesting patterns?

• Can Data Mining system generate only required patterns?

It is a
Combination
of Subjects
Mining Frequent
Patterns
Frequent Patterns

• Frequent patterns are patterns that appear frequently in data set. Could be
either frequent itemset, frequent sequence or frequent substructure.

• Mining frequent patterns leads to discover interesting associations and

correlations in data
Frequent Itemset Mining

• Market Basket Analysis

• Typical example of
frequent itemset mining
Mining Frequent Itemsets – Apriori
Algorithm

• It uses prior knowledge of frequent itemset to determine level wise

frequent itemsets.

• Apriori property
• All non empty subsets of a frequent itemset must also be frequent

• Minimum Support Threshold

• At least frequencies should be satisfy minimum support
Mining Frequent Itemsets – Apriori
Algorithm Contd.
TID List of item_id

T1 i1, i2, i5

T2 i2, i4

T3 i2, i3

T4 i1, i2, i4

T5 i1, i3

T6 i2, i3

T7 i1, i3

T8 i1, i2, i3, i5

T9 i1, i2, i3

Minimum Support = 2
Mining Frequent Itemsets – Apriori
Algorithm Contd.

TID Computer Webcam Antivirus Office Suite SDCard

Software
T1 1 1 1 0 0

T2 0 1 1 1 0

T3 0 0 0 1 1

T4 1 1 0 1 0

T5 1 1 1 0 1

T6 1 1 1 1 1

Minimum Support = 50%

Mining Frequent Itemsets – Apriori
Algorithm Contd

• step1 : create 1-itemset, C1

• step2: by considering min_support get the frequent 1-itemset, L1
• step3: join L1 with L1(same) and create candidate 2-itemset, C2
• step4: by considering min_support get the frequent 2-itemset, L2
• step5: join L2 with L2(same) and create candidate 3-itemset. Remove
itemsets which does not satisfy appriori property.
• step6: by considering min_support get the frequent 3-itemset, L3
Mining Frequent Itemsets – Apriori
Algorithm Contd.
• How to compute confidence?

{i1, i2}=>i5
{i1, i5}=>i2
{i2, i5}=>i1
i1=>{i2, i5}
I2=>{i1, i5}
Problems of Apriori Mining

• Need to generate huge number of candidate sets

• Need to scan whole database repeatedly

Mining Frequent Itemsets – A Pattern
Growth Approach

TID List of item_id

T1 i1, i2, i5

T2 i2, i4

T3 i2, i3

• Divide and conquer approach T4 i1, i2, i4

• Create a Frequent Pattern tree (FP- T5 i1, i3

Tree)
T6 i2, i3

T7 i1, i3

T8 i1, i2, i3, i5

T9 i1, i2, i3
Mining Frequent Itemsets – A Pattern
Growth Approach contd.

step1 : Derives the 1-itemset(similar to Apriori)

step2: Create list ‘L’ by oredering 1-itemset in descending order
step3: Create the root of FP-tree and labeled as ‘null’
step4: Scan the database and again and in each transaction add a branch
based on the same order as ‘L’
Mining Frequent Itemsets – A Pattern
Growth Approach contd.

• When mining start from each length-1 pattern and construct its conditional
pattern base. Then construct its conditional FP tree and do this in recursive
manner.
TID Items

1 {a, b}

2 {b, c, d}

3 {a, c, d, e}

4 {a, d, e}

5 {a, b, c}

6 {a, b, c, d}

7 {a}

8 {a, b, c}

9 {a, b, d}

10 {b, c, e}

Minimum Support = 2
• Association rule can be misleading

Total number of transactions = 10000

Buys computer games = 6000
Buys videos = 7500
Buys both = 4000

Min_sup = 30%
Min_confidence = 60%
Correlation Analysis

• Other than measuring support and confidence correlation between

itemsets being considered.
Correlation Analysis with Lift Measure

• Lift is a measure which used in Correlation Analysis

• If the result is less than 1 then A is negatively correlated with B

Data Mining
No ratings yet
Data Mining
41 pages
Unit II
No ratings yet
Unit II
22 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Literature Survey On Various Frequent Pattern Mining Algorithm
No ratings yet
Literature Survey On Various Frequent Pattern Mining Algorithm
7 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
CSC 452 DM Week06 Association Rules 26102020 111149am
No ratings yet
CSC 452 DM Week06 Association Rules 26102020 111149am
52 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Chapter 5
No ratings yet
Chapter 5
24 pages
Unit-03 DW&DM Notes Ashish Singh PDF 11
No ratings yet
Unit-03 DW&DM Notes Ashish Singh PDF 11
8 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
Mining Frequent Patterns Ubnit 3
No ratings yet
Mining Frequent Patterns Ubnit 3
25 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
DWDM Unit III Notes
No ratings yet
DWDM Unit III Notes
23 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Unit 5
No ratings yet
Unit 5
40 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Chap 6
No ratings yet
Chap 6
77 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
DM 2
No ratings yet
DM 2
71 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
Association Rules
No ratings yet
Association Rules
20 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Contents
No ratings yet
Contents
59 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
Hot Keys
No ratings yet
Hot Keys
4 pages
Rtu PDF
No ratings yet
Rtu PDF
13 pages
Audi 80/90 Wiring Diagram Guide
No ratings yet
Audi 80/90 Wiring Diagram Guide
20 pages
BCSL 63 Solved Assignment
No ratings yet
BCSL 63 Solved Assignment
10 pages
T1 Homework 1
100% (1)
T1 Homework 1
3 pages
Evolution of Media
100% (1)
Evolution of Media
8 pages
Injector System Overview
No ratings yet
Injector System Overview
27 pages
CEng 6104-Course Outline March 2023
No ratings yet
CEng 6104-Course Outline March 2023
2 pages
The Social Engineer Toolkit
No ratings yet
The Social Engineer Toolkit
20 pages
Assignment 1 - Linear Programming I - With Answers
No ratings yet
Assignment 1 - Linear Programming I - With Answers
2 pages
Solving Wicked Problems in Construction
No ratings yet
Solving Wicked Problems in Construction
13 pages
LOREAL 2023 Universal Registration Document en
No ratings yet
LOREAL 2023 Universal Registration Document en
450 pages
10 Leadsaday
No ratings yet
10 Leadsaday
26 pages
ICTCYS604 Project Portfolio Best Practices Identify Managment JPSR
No ratings yet
ICTCYS604 Project Portfolio Best Practices Identify Managment JPSR
20 pages
Mobile Communications Networks - Midterm Exam - Feb 2025
No ratings yet
Mobile Communications Networks - Midterm Exam - Feb 2025
4 pages
Shaft Design
No ratings yet
Shaft Design
14 pages
Omron Program Copy From Card To PLC.
No ratings yet
Omron Program Copy From Card To PLC.
8 pages
Office of The Sangguniang Kabataan
No ratings yet
Office of The Sangguniang Kabataan
5 pages
0936E1001R00
No ratings yet
0936E1001R00
1 page
Week 10 Module 6 Product Development
No ratings yet
Week 10 Module 6 Product Development
25 pages
Dell EMC MD1400 and MD1420: Cost-Effective Storage
No ratings yet
Dell EMC MD1400 and MD1420: Cost-Effective Storage
3 pages
Format Kti Internasional
No ratings yet
Format Kti Internasional
3 pages
Scanning in Motion - ZEB1 Handheld Mobile 3D Laser Scanner
No ratings yet
Scanning in Motion - ZEB1 Handheld Mobile 3D Laser Scanner
1 page
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
No ratings yet
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
12 pages
Module 04 Install Software Application Abel
100% (1)
Module 04 Install Software Application Abel
53 pages
c7 PDF
No ratings yet
c7 PDF
34 pages
ECE312 Final Exam 2021
No ratings yet
ECE312 Final Exam 2021
2 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
SEO Directory and Bookmarking List
No ratings yet
SEO Directory and Bookmarking List
6 pages
Photography PDF
No ratings yet
Photography PDF
21 pages
ECE531 Screencast 2.1: Introduction To The Cramer-Rao Lower Bound (CRLB)
No ratings yet
ECE531 Screencast 2.1: Introduction To The Cramer-Rao Lower Bound (CRLB)
5 pages

Introduction To Data Mining - Lecture03

Uploaded by

Introduction To Data Mining - Lecture03

Uploaded by

Introduction to Data

• Repositories to store data from multiple heterogeneous data sources

• OLAP as analysis technique with functionalities like summarization,

• The process of discovering interesting patterns and knowledge from large

• Can Data Mining system generate all of the interesting patterns?

• Can Data Mining system generate only required patterns?

• Mining frequent patterns leads to discover interesting associations and

• Market Basket Analysis

• It uses prior knowledge of frequent itemset to determine level wise

• Minimum Support Threshold

T8 i1, i2, i3, i5

TID Computer Webcam Antivirus Office Suite SDCard

Minimum Support = 50%

• step1 : create 1-itemset, C1

• Need to generate huge number of candidate sets

• Need to scan whole database repeatedly

TID List of item_id

• Divide and conquer approach T4 i1, i2, i4

• Create a Frequent Pattern tree (FP- T5 i1, i3

T8 i1, i2, i3, i5

step1 : Derives the 1-itemset(similar to Apriori)

Total number of transactions = 10000

• Other than measuring support and confidence correlation between

• Lift is a measure which used in Correlation Analysis

• If the result is less than 1 then A is negatively correlated with B

You might also like