0% found this document useful (0 votes)

316 views20 pages

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods

The document discusses frequent pattern mining and association rule mining. It defines frequent patterns as sets of items, subsequences or substructures that occur frequently together in a dataset. Association rule mining involves finding frequent itemsets that satisfy minimum support and confidence thresholds, and generating rules from these patterns. The challenges of potentially many frequent patterns are addressed through concepts like closed and maximal patterns that compress the representation. The Apriori algorithm is described as a seminal level-wise approach that leverages the Apriori property to prune the search space.

Uploaded by

Lakshmi Priya B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

316 views20 pages

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods

Uploaded by

Lakshmi Priya B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Mining Frequent patterns, Associations

and Correlations: Basic Concepts and

Methods
What Is Pattern Discovery?

• Patterns represent intrinsic and important properties of datasets

Frequent Patterns: A set of items, subsequences, or substructures that
occur frequently together (or strongly correlated) in a data set
Motivation examples:
1. What products were often purchased together?
Eg: Set of items milk and bread appear frequently together in a
transaction
2. What are the subsequent purchases after buying an iPad?
3. What kinds of DNA are sensitive to this new drug?
4. What word sequences likely form phrases in this corpus?
Pattern Discovery: Why Is It Important?
• Finding inherent regularities in a data set

• Foundation for many essential data mining tasks

• Association, correlation, and causality analysis
• Mining sequential, structural (e.g., sub-graph) patterns
• Pattern analysis in spatiotemporal, multimedia, time-series, and
stream data

• Classification: Discriminative pattern-based analysis

• Cluster analysis: Pattern-based subspace clustering
• Broad applications: Market basket analysis, cross-marketing, catalog
design, sale campaign analysis, Web log analysis, biological sequence
analysis
Market Basket Analysis
• Frequent itemset mining leads to the discovery of associations and
correlations among items in large transactional or relational data sets.
• Industries are interested in such pattern of data.
• Helps in many business-decision making processes such as
•To develop marketing strategies
•catalog design
•cross-marketing
•customer shopping behavior analysis.
• Market basket analysis: Process analyses customer buying habits
by finding association between the different items that customer
place in their “shopping baskets”
Market Basket Analysis
• The discovery help the retailers to get the insight which items are
frequently brought together
• Helps on design store layouts – frequently brought items placed
in close proximity
• Helps retailers to plan which product can be put on sale with
reduced price
• Consider universe as set of items available then
• Each item is represented using Boolean variable representing the
presence or absence of that item
• Basket can be represented by Boolean vector of values assigned
• Analysing the buying patterns reflect items the are frequently
purchased together
• Computer  anti_virus software [support -2%, confidence =60%]
From Frequent Itemsets to Association Rules
• The patterns can be represented as association rules.
• Support and confidence are two measures of rule interestingness.
• Association rules are considered interesting if satisfy minimum support and
confidence.
• Computer => software[support 2%,confidence= 60%]
• A support of 2% means that 2% of all the transactions and a confidence of
60% means that 60% of the customers who purchased a computer also
bought the software.
• Association rule: X => Y
• Support s: The probability that a transaction contains X => Y
• Support (X=>Y)=P(X ∪ Y)
• Confidence, c: The conditional probability that a transaction containing X
also contains Y
• c(X=>Y)=P(Y/X)=sup(X ∪ Y) / sup(X)
• Association rules are considered STRONG if they satisfy minimum support
and confidence threshold (set by users)
Basic Concepts: Frequent Patterns and
Association Rules
Transaction-id Items bought Itemset I = {i1, …, ik}
10 A, B, D D is a set of database transactions. Each
20 A, C, D transaction is associated with an identifier,
30 A, D, E called a TID
Let A be a set of items. A transaction T is said
40 B, E, F
to contain A if A ⊆ T. An association rule is an
50 B, C, D, E, F
implication of the form A ⇒ B, where A ⊂ I, B
⊂ I, A ≠ ∅, B ≠ ∅, and A ∩B = φ. The rule A ⇒ B
Let supmin = 50%, confmin = 50%
Find all the rules A ∪ B with minimum
Freq. Pat.: {A:3, B:3, D:4, E:3,
AD:3} support and confidence support, s, probability
Association rules: that a transaction contains A∪ B
A -> D (60%, 100%)
confidence, c, conditional probability that a
D ->A(60%, 75%)
transaction having X also contains Y
Data Mining: Concepts and
Techniques 6
Basic Concepts: Frequent Itemsets
(Patterns)
• Itemset: A set of one or more items
• k-itemset: X = {x1, …, xk}
• (absolute) support (count) of X:
Occurrence frequency of an itemset is the • Let minsup = 50% Freq. 1-
number of transactions that contain itemsets:
• Beer: 3 (60%); Nuts: 3 (60%)
itemset. • Diaper: 4 (80%); Eggs: 3
(60%) Freq. 2itemsets:
• c = support_count(X ∪
• {Beer, Diaper}: 3 (60%)
Y)/support_count(X)
• (relative) support, s: The probability that
a transaction contains P(A U B)
• c(X=>Y)=P(Y/X)=sup(X ∪ Y) / sup(X)
• An itemset X is frequent if the relative
support of X is no less than a minsup
Association Rule Mining

• Association rule mining can be viewed as two step process

• Find all frequent item sets
• Generate strong association rules from frequent item sets
• Finding all frequent item sets:
• Each of these item sets will occur at least as frequently as a
predetermined minimum support count, min_sup

• Generate strong association rules from the frequent item sets:

• These rules should satisfy minimum support and confidence
Challenge: There Are Too Many Frequent
Patterns!
• A long pattern contains a combinatorial number of shorter frequent
item sets.

How many frequent itemsets does the following TDB1 contain?

TDB1: T1: {a1, a2 …, a100} Assuming (absolute) minsup = 1

1itemsets: {a1}, {a2}….{a100} contains (100 )

1 A too huge
set for any
2itemsets: {a1, a2},{a1,a3 } … contains (100 2)
computer
100itemset: {a1, a2, …, a100}: 1 contains
1to
compute
or store!

2100 – 1 subpatterns! How to handle such a

challenge?
Closed and Maximal

Solution 1: Closed patterns:A pattern (itemset) X is closed in a dataset D if

X is frequent, and there exists no proper super- itemset Y such that Y has
the same support as X in D.
• Solution 2: Max-patterns:A pattern X is a max-pattern if X is frequent
and there exists no (immediate) super-itemset Y such that XСY and Y is
frequent
Expressing Patterns in Compressed
Form: Closed Patterns and Maxim
My dataset: 1:A,B,C,E 2:A,C,D,E, 3:B,C,E
4:A,C,D,E 5:C,D,E 6:A,D,E

{A} = 4 ; {B} = 2 ; {C} = 5 ; {D} = 4 ; {E} = 6

{A,B} = 1; {A,C} = 3; {A,D} = 3; {A,E} = 4; {B,C} = 2;
{B,D} = 0; {B,E} = 2; {C,D} = 3; {C,E} = 5; {D,E} = 3
{A,B,C} = 1; {A,B,D} = 0; {A,B,E} = 1; {A,C,D} = 2;
{A,C,E} = 3; {A,D,E} = 3; {B,C,D} = 0; {B,C,E} = 2;
{C,D,E} = 3
{A,B,C,D} = 0; {A,B,C,E} = 1; {B,C,D,E} = 0 Min_sup=0.5
Closed and Maximal-Example

• {A} = 4 ; not closed due to {A,E} not maximal

• {B} = 2 ; not closed due to {B,C} and {B,E} not maximal
• {C} = 5 ; not closed due to {C,E} and not maximal
• {D} = 4 ; closed, but not maximal due to e.g. {A,D}
and{C,D} and {D,E}
• {E} = 6 ; closed, but not maximal due to e.g. {D,E} and
{B,E}
• {A,C,E} = 3;closed and not maximal frequent due to
{A,B,C,E} = 1
• {C,D,E} = 3; closed and maximal frequent {B,C,D,E} = 0
Apriori Pruning and Scalable Mining
Methods
• Scalable mining Methods: Three major approaches
– Level-wise, join-based approach: Apriori
– Vertical data format approach
– Frequent pattern projection and growth
Apriori Pruning and Scalable Mining
Methods
• Apriori is a seminal algorithm, uses priori knowledge of frequent itemset
properties.
• Apriori employs level wise search where k itemsets are used to explore
(k+1) item sets

• The set of frequent 1 itemsets is found by following

• Scan the database to accumulate the count for each item
• Accumulate the items that satisfy minimum support . The resulting
set is denoted as L1.
• L1 is used to find L2 the set of frequent 2 item sets which used to
find L3 until no frequent item sets can be found
• To improve the efficiency of levelwise generation Apriori property
is used to reduce search space.
Apriori Property

• All nonempty subsets of a frequent itemsets must also frequent.

(or) If there is any itemset which is infrequent, its superset should
not be generated/tested!
• If an item does not support min_sup (P(I)<min_sup) then I is not
frequent
• If A is added to I then (I U A) cannot occur more frequent than I i.e, P
(IUA)<min_sup
• This property is called antimonotonicity: if a set cannot pass a test all
its superset will fail for the same test Algorithm make use of Apriori
property follows two-step process consisting of join and prune
actions
Join step for k>=2
• To find Lk : Generate a set of candidate k-itemsets by joining
Lk-1 by itself
• Set of candidates is denoted by Ck.
• Let I1 and I2 be itemsets in Lk-1

• Apriori assumes itemset are sorted in lexicographic order

• The join can be performed when Lk-1 ⋈ Lk-1, Lk-1 items are
joinable if their first (k-2 ) items are in common

• Members are joined if (l1[1]=l2[1] ^ l1[2]=l2[2] ^…..^(l1[k-

2]=l2[k-2]) ^ (l1[k-1]<l2[k-1])

• The resulting data set formed by joining I1 and I2 is {I1[1],I1[2],

• ……., I1[k-2],I1[k-1],l2[k-1]
Prune Step

• Initially, scan DB once to get frequent 1-itemset

• Let Ck is a superset of Lk , members of Ck may or may not be

frequent but all frequent k-itemsets are included in Ck .
• Data base scan done to determine the count of candidates in Ck.
• Apriori property is used any (k-1) itemset that is not frequent
cannot be subset of frequent k-itemset and so removed from Ck
• Subtest testing can be done using Hash tree.
Frequent itemsets-generation
Frequent itemsets-generation

Medan LPG Terminal Overview
100% (1)
Medan LPG Terminal Overview
38 pages
UBS Business Plan - Stategic Planning and Financing Basis - Model For Generating A Business Plan - (UBS AG) PDF
No ratings yet
UBS Business Plan - Stategic Planning and Financing Basis - Model For Generating A Business Plan - (UBS AG) PDF
26 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
Super Memory British English Student A2 B1
No ratings yet
Super Memory British English Student A2 B1
6 pages
AIS Data Coding Schemes Written Report
50% (2)
AIS Data Coding Schemes Written Report
2 pages
Tenses: S + V1/s/es S + Tobe (Is, Am, Are) + C
No ratings yet
Tenses: S + V1/s/es S + Tobe (Is, Am, Are) + C
3 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Module 2
No ratings yet
Module 2
14 pages
A General Theory of Domination and Justice 1st Edition Lovett Instant Download
No ratings yet
A General Theory of Domination and Justice 1st Edition Lovett Instant Download
145 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Advanced Flight Ops Training
No ratings yet
Advanced Flight Ops Training
3 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Tomato Processing Guide by Mynampati Sreenivasa Rao
No ratings yet
Tomato Processing Guide by Mynampati Sreenivasa Rao
4 pages
15mining Freq Patterns-Part1
No ratings yet
15mining Freq Patterns-Part1
25 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Hoc Sinh Gioi 8 - 2022
No ratings yet
Hoc Sinh Gioi 8 - 2022
10 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
Unit 4
No ratings yet
Unit 4
97 pages
Determinants of The Money Supply: © 2005 Pearson Education Canada Inc
No ratings yet
Determinants of The Money Supply: © 2005 Pearson Education Canada Inc
17 pages
Cornerstones of Financial Accounting 3rd Canadian Edition Rich Unlocked Test Bank
No ratings yet
Cornerstones of Financial Accounting 3rd Canadian Edition Rich Unlocked Test Bank
311 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
History of Computers
No ratings yet
History of Computers
12 pages
Data Mining Essentials Guide
No ratings yet
Data Mining Essentials Guide
23 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Mod 5
No ratings yet
Mod 5
56 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Unit 1
No ratings yet
Unit 1
10 pages
The Ultimate Guide To Reading The Water
No ratings yet
The Ultimate Guide To Reading The Water
39 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
DM 2
No ratings yet
DM 2
71 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Government Arts College Salem-7
No ratings yet
Government Arts College Salem-7
2 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Data Mining: Frequent Patterns
No ratings yet
Data Mining: Frequent Patterns
40 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Week 3
No ratings yet
Week 3
56 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Frequent Pattern Mining Techniques
No ratings yet
Frequent Pattern Mining Techniques
59 pages
Unit 3
No ratings yet
Unit 3
62 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Association Rules
No ratings yet
Association Rules
48 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Chest Freezer: User Manual
No ratings yet
Chest Freezer: User Manual
31 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Economics of Oil Prices 2
No ratings yet
Economics of Oil Prices 2
8 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
100% (13)
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
42 pages
01 Lecture1-Randomness-Probability
No ratings yet
01 Lecture1-Randomness-Probability
58 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
COE301 Lab 11 Datapath Component Design
No ratings yet
COE301 Lab 11 Datapath Component Design
7 pages
Career Adaptation Strategies
No ratings yet
Career Adaptation Strategies
4 pages
Disorders of The Thyroid Gand
No ratings yet
Disorders of The Thyroid Gand
167 pages
Goodwill Valuation in Accountancy
No ratings yet
Goodwill Valuation in Accountancy
4 pages
Mini-Vert Brochure
No ratings yet
Mini-Vert Brochure
4 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
AAN 2023 Day 1-2 Mind Next Original
No ratings yet
AAN 2023 Day 1-2 Mind Next Original
21 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
05 Lecture4 - Estimation
No ratings yet
05 Lecture4 - Estimation
37 pages
Ocean Acidification Virtual Lab
No ratings yet
Ocean Acidification Virtual Lab
4 pages
ANZ J. Surg. 2008 78 (Suppl. 1) A68-A80
No ratings yet
ANZ J. Surg. 2008 78 (Suppl. 1) A68-A80
13 pages
Steel Squares: Specifications
No ratings yet
Steel Squares: Specifications
1 page
19-Linked List.c
No ratings yet
19-Linked List.c
3 pages
Association Rules
No ratings yet
Association Rules
24 pages
Fender
No ratings yet
Fender
14 pages
Electric Fan
No ratings yet
Electric Fan
1 page
Critical Thinking Exercise: "Wild Child: The Story of Feral Children"
No ratings yet
Critical Thinking Exercise: "Wild Child: The Story of Feral Children"
2 pages
B.E. Computer Science and Engineering Ucs1504 Artificial Intelligence 2020-2021 31-10-2020 AN
No ratings yet
B.E. Computer Science and Engineering Ucs1504 Artificial Intelligence 2020-2021 31-10-2020 AN
1 page
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods

Uploaded by

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods

Uploaded by

Mining Frequent patterns, Associations

and Correlations: Basic Concepts and

• Patterns represent intrinsic and important properties of datasets

• Foundation for many essential data mining tasks

• Classification: Discriminative pattern-based analysis

• Association rule mining can be viewed as two step process

• Generate strong association rules from the frequent item sets:

How many frequent itemsets does the following TDB1 contain?

1itemsets: {a1}, {a2}….{a100} contains (100 )

2100 – 1 subpatterns! How to handle such a

Solution 1: Closed patterns:A pattern (itemset) X is closed in a dataset D if

{A} = 4 ; {B} = 2 ; {C} = 5 ; {D} = 4 ; {E} = 6

• {A} = 4 ; not closed due to {A,E} not maximal

• The set of frequent 1 itemsets is found by following

• All nonempty subsets of a frequent itemsets must also frequent.

• Apriori assumes itemset are sorted in lexicographic order

• Members are joined if (l1[1]=l2[1] ^ l1[2]=l2[2] ^…..^(l1[k-

• The resulting data set formed by joining I1 and I2 is {I1[1],I1[2],

• Initially, scan DB once to get frequent 1-itemset

• Let Ck is a superset of Lk , members of Ck may or may not be

You might also like