0% found this document useful (0 votes)

4 views28 pages

2024 Lecture7

The document presents an overview of the FP Grow Algorithm, which is used for mining frequent patterns in transactional databases without candidate generation. It details the process of constructing an FP tree from sorted frequent itemsets and explains how to mine the tree for frequent itemsets recursively. Additionally, it includes a recap of association rules, the Apriori algorithm, and an exercise for applying the FP Grow Algorithm on a sample database.

Uploaded by

triprishikesh358

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views28 pages

2024 Lecture7

Uploaded by

triprishikesh358

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

723A75 Advanced Data Mining

TDDD41 Data Mining - Clustering and Association Analysis

Lecture 7: FP Grow Algorithm

Johan Alenlöv
IDA, Linköping University, Sweden
Outline

arg min x∈D

• Content
• Recap
• Frequent Pattern (FP) Grow Algorithm
• Exercise
• Summary
• Litterature
• Course Book. 2nd ed.: 5.2.4. 3rd ed.: 6.2.4
• Han, J., Pei, J., and Yin, Y. Mining Frequent Patterns without Candidate Generation.
In Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, 2000.

1
Recap

• Given a database of transactions we want to find association rules,

Item1 , . . . , Itemm → Itemm+1 , . . . , Itemn

(X → Y)

with a user-specified minimum support and confidence.

• support: Fraction of transactions that contains th full rule Item1 , . . . , Itemn .
(p(X, Y))
• confidence: Fraction of transactions that contain Item1 , . . . , Itemm which also
contain Itemm+1 → Itemn . (p(Y | X))
• We find the rules in two steps:
1. Find all frequent itemsets
2. Find all rules with minimum confidence from these sets.

2
Recap: Apriori Algorithm

• Using the following apriori propeerty:

• Every subset of a frequent itemset is frequent.
• Alternatively, every superset of an infrequent itemset is infrequent.
• The Apriori Algorithm works as follows:
1. Find all 1-itemsets.
2. Use the previous found frequent itemsets and the apriori property to generate candidates
for the next frequent itemsets.
3. Go through the candidates to find the itemsets.
Step 2 and 3 are repeated until no new frequent itemsets are found.
• We proved by induction that the algorithm is correct.

3
Recap: Generate rules

• Given a large itemset L we wish to generate rules

X → L \ X,
where X ⊂ L.
• These rules should have a minimum confidence.
• The algorithm uses the following apriori property:
• If X does not result in a rule with minimum confidence for L, then neither does any
subset X′ ⊂ X,
support(L) support(L)
confidence(X → L \ X) = = confidence(X → L \ X )
′ ′
supportX ≥ support(X′ )

1 for all large itemsets lk with k ≥ 2 do

2 call genrules(lk , lk , minconf)

Algorithm: genrules(lk , am , minconf)

Input: A large itemset lk , a set am ⊆ lk , the minimum confidence minconf.
Output: All the rules of the form a → lk \ a with a ⊆ am and confidence equal or above minconf.

1 A = {(m − 1)-itemsets am−1 |am−1 ⊆ am }

2 for all am−1 ∈ A do
3 conf = support(lk ) / support(am−1 ) // Confidence of the rule am−1 → lk \ am−1
4 if conf ≥ minconf then
5 output the rule am−1 → lk \ am−1 with confidence=conf and support=support(lk )
6 if m − 1 > 1 then call genrules(lk , am−1 , minconf)

4
Recap: Rule Generation Algorithm Proof

1 for all large itemsets lk with k ≥ 2 do

2 call genrules(lk , lk , minconf)

Algorithm: genrules(lk , am , minconf)

Input: A large itemset lk , a set am ⊆ lk , the minimum confidence minconf.
Output: All the rules of the form a → lk \ a with a ⊆ am and confidence equal or above minconf.

1 A = {(m − 1)-itemsets am−1 |am−1 ⊆ am }

• We prove by contradiction that the rule generation algorithm is correct.

• Assume that the algorithm missed a rule. Let am−1 → lk \ am−1 denote one of
the missing rules with the largest antecedent. Then,
• Note that lk has minimum support and, thus, it is outputted by the apriori algorithm
since this is correct.
• Then, the rule generation algorithm cannot have missed the rule if m = k.
• Moreover if m < k, then
confidence(am → lk \ am ) = support(lk )/support(am ) ≥ support(lk )/support(am−1 )
= confidence(am−1 → lk \ am−1 ) ≥ minconf.
• Note that the algorithm didn’t miss the rule am → lk \ am .
• Then the algorithm couldn’t have missed the rule am−1 → lk \ am−1 .
• This contradicts our assumption and, thus, the algorithm is correct.

5
FP Grow Algorithm

• As previous, assume that we have access to some transactional data,

Tid Items
1 F, A, C, D, G, I, M, P
2 A, B, C, F, L, M, O
3 B, F, H, J, O, W
4 B, C, K, S, P
5 A, F, C, E, L, P, M, N
• The FP grow algorithm returns all frequent itemsets without candidate
generation and may save time and space.
• First, it finds frequent 1-itemsets and sorts the frequent items within each
transaction in support decending order, e.g. with minsup = 3
Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P
• Then it outputs the frequent 1-itemsets, F, C, A, B, M, and P.

6
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.

Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:1

C:1

A:1

M:1

P:1

7
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.

Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:2

C:2

A:2

M:1 B:1

P:1 M:1

8
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.

Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:3

C:2 B:1

A:2

M:1 B:1

P:1 M:1

9
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.

Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:3 C:1

C:2 B:1 B:1

A:2 P:1

M:1 B:1

P:1 M:1

10
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.

Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:4 C:1

C:3 B:1 B:1

A:3 P:1

M:2 B:1

P:2 M:1

• Finally, it mines the FP tree for frequent itemsets instead of the original database.
11
FP Grow Algorithm

Algorithm: FP-tree(D, minsup)

Input: A transactional database D, and the minimum support minsup.
Output: The FP tree for D and minsup.

1 Count support for each item in D

2 Remove the infrequent items from the transactions in D
3 Sort the items in each transaction in D in support descending order
4 Create a FP tree with a single node T with T.name = NULL
5 for each transaction I ∈ D do
6 insert-tree(I, T)

Algorithm: insert-tree(I1 , . . . Im , T)
Input: An itemset I1 , . . . , Im , and a node T in the FP tree.
Output: Modified FP tree.

1 if T has a child N such that N.name = I1 .name then

2 N.count + +
3 else
4 create a new child N of T with N.name = I1 .name and N.count = 1
5 if m > 1 then
6 insert-tree (I2 , . . . , Im , N) 12
FP Grow Algorithm

• The X-conditional database consists of all the prefix paths leading to X in the FP
tree.
{}

F:4 C:1 Item Conditional database

F -
C:3 B:1 B:1 C F:3
A FC:3
A:3 P:1 B FCA:1, F:1, C:1
M FCA:2, FCAB:1
M:2 B:1 P FCAM:2, CB:1

P:2 M:1

• The support of each prefix path in the conditional database is equal to the count
of X for that prefix path.
• The X-conditional database contains all the itemsets in D that end with X.
• It is enough to mine the X-conditional database to find all the frequent itemsets
in D that end with X.
• Re-start the algorithm for the X-conditional database, i.e. call the FP grow
algorithm recursively.

13
FP Grow Algorithm

• If we look at the M-conditional database, ({FCA : 2, FCAB : 1})

Tip Items
1 F, C, A
2 F, C, A
3 F, C, A, B
• After finding the frequent 1-itemsets and sorting the transactions we have
Tid Items
1 F, C, A
2 F, C, A
3 F, C, A
• Output the frequent 1-itemsets, adding M as suﬀix (FM, CM, AM)
• Build the FP tree and the conditional databases.
{}

F:3
Item Conditional database
F -
C F:3
C:3
A FC:3

A:3

• Restart the algorithm for the FM, CM, and AM condtional databases.

14
FP Grow Algorithm

• For the AM-conditional database ({FC : 3}), or

Tid Items
1 F,C
2 F,C
3 F,C
• After finding the 1-itemsets and sorting the transactions we have
Tid Items
1 F,C
2 F,C
3 F,C
• Output th 1-itemsets, adding AM as a suﬀix. (FAM and CAM).
• Build the FP tree and the econditional databases.
{}

Item Conditional database

F:3 F -
C F:3
C:3

• Restart the algorithm for the FAM and CAM conditional databases.

15
FP Grow Algorithm

• For the CAM-conditional database ({F : 3}), or

Tid Items
1 F
2 F
3 F
• After finding the 1-itemsets and sorting the transactions we have
Tid Items
1 F
2 F
3 F
• Output th 1-itemsets, adding CAM as a suﬀix. (FCAM).
• Build the FP tree and the econditional databases.
{}
Item Conditional database
F -
F:3

• Conditional database is empty.

Backtrack.

16
FP Grow Algorithm

• To mine the FP tree Tree, call FP-grow(Tree, NULL, minsup).

Algorithm: FP-grow(Tree, α, minsup)

Input: A FP tree Tree, an itemset α, and the minimum support minsup.
Output: All the itemsets in Tree that end with α and have minsup.

1 for each item X in Tree do

2 output the itemset β = X ∪ α with support=X.count
3 build the β conditional database and the corresponding FP tree Treeβ
4 if Treeβ is not empty then call FP-grow(Treeβ , β, minsup)

• The algorithm above can be made more eﬀicient by adding the lines below.

0.1 if Tree has a single branch then

0.2 for each combination β of the nodes in the branch do
0.3 output the itemset β ∪ α with support = minX∈β X.count
0.4 else

• The FP grow algorithm is correct.

17
FP Grow Algorithm

• With small values for minsup, there are many and long candidates, which implies
long runtime due to expensive operations such as pattern matching, subset
checking, storing, etc.

18
Exercise

• Run the FP grow algorithm on the database below with minsup 2.

Tid Items
1 A, B, E
2 B, D
3 B, C
4 A, B, D
5 A, C
6 B, C
7 A, C
8 A, B, C, E
9 A, B, C
• Show the execution details (i.e. FP tree construction, conditional databases,
recursive calls), not just the frequent itemsets found.
• Solution : {A, B, C, D, E, AB, AC, AE, BC, BD, BE, ABC, ABE}

19
Summary

• Mining transactions to find rules of the form

Item1 , . . . , Itemm → Itemm+1 , . . . , Itemn

with user-defined minimum support and confidence.

• Two-step solution:
1. Find all the large itemsets.
2. Generate all the rules with minimum confidence.
• We have seen two solutions for step 1. Apriori and FP grow algorithm.
• The runtime can differ a lot for small values of minsup.

Focus Smart Maths M2 - TG PDF
100% (2)
Focus Smart Maths M2 - TG PDF
61 pages
A-Thurs-O2 Absorption-Report
No ratings yet
A-Thurs-O2 Absorption-Report
25 pages
Exams Questions and Model Answers
No ratings yet
Exams Questions and Model Answers
6 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
FP Growth Alg
No ratings yet
FP Growth Alg
17 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Data Mining Unit 2 (Part 2) - 1
No ratings yet
Data Mining Unit 2 (Part 2) - 1
7 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
23 pages
CK: Candidate Itemset of Size K LK: Frequent Itemset of Size K L1 (Frequent Items) Ck+1 Candidates Generated From LK
No ratings yet
CK: Candidate Itemset of Size K LK: Frequent Itemset of Size K L1 (Frequent Items) Ck+1 Candidates Generated From LK
7 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01
31 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
FP-Growth Algorithm Overview
No ratings yet
FP-Growth Algorithm Overview
21 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Chapter 5
No ratings yet
Chapter 5
24 pages
3 - Unit-Iii-3
No ratings yet
3 - Unit-Iii-3
29 pages
Fpgrowth
No ratings yet
Fpgrowth
11 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
Chapter 5
No ratings yet
Chapter 5
34 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
88 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Data Mining: Frequent Patterns
No ratings yet
Data Mining: Frequent Patterns
40 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Lecture 5 - FP-Growth Algorithm
No ratings yet
Lecture 5 - FP-Growth Algorithm
26 pages
HW6 Redina
No ratings yet
HW6 Redina
7 pages
Lecture 2.3.3 2.3.4
No ratings yet
Lecture 2.3.3 2.3.4
29 pages
DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
Data Mining Lab Report
No ratings yet
Data Mining Lab Report
6 pages
Frequent Pattern Analysis Guide
No ratings yet
Frequent Pattern Analysis Guide
5 pages
Association Rule Mining: FP Growth
No ratings yet
Association Rule Mining: FP Growth
22 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
Unit4 2 Association Rules FP Growth
No ratings yet
Unit4 2 Association Rules FP Growth
33 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
No ratings yet
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
13 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
FP-Growth for Data Scientists
No ratings yet
FP-Growth for Data Scientists
20 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
23 pages
FP Tree
No ratings yet
FP Tree
54 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
No ratings yet
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
6 pages
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
No ratings yet
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
21 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
FP-Growth for Data Analysts
No ratings yet
FP-Growth for Data Analysts
24 pages
Tutorial sheet-1-MA1003E
No ratings yet
Tutorial sheet-1-MA1003E
2 pages
A Practical Guide To Critical Thinking-Haskins
0% (1)
A Practical Guide To Critical Thinking-Haskins
20 pages
Order of Magnitude & Vector Basics
No ratings yet
Order of Magnitude & Vector Basics
24 pages
Intro to Management Science
No ratings yet
Intro to Management Science
10 pages
1.bais Varience Trade-Off
No ratings yet
1.bais Varience Trade-Off
5 pages
1/4 Din Setpoint Programmer: FORM 3707 Edition 1 © May 1996 PRICE $10.00
No ratings yet
1/4 Din Setpoint Programmer: FORM 3707 Edition 1 © May 1996 PRICE $10.00
98 pages
Practice Worksheet On "Word Problems"
No ratings yet
Practice Worksheet On "Word Problems"
1 page
STD 8 Maths: Cube Roots & Proportions Quiz
No ratings yet
STD 8 Maths: Cube Roots & Proportions Quiz
3 pages
Lecture 17
No ratings yet
Lecture 17
2 pages
Jurnal JP - Peran Masa Kerja Dan Gaya Komunikasi Terhadap Kinerja Karyawan Dengan Motivasi Karyawan Sebagai Mediator Pada PT Gajah Tunggal TBK
No ratings yet
Jurnal JP - Peran Masa Kerja Dan Gaya Komunikasi Terhadap Kinerja Karyawan Dengan Motivasi Karyawan Sebagai Mediator Pada PT Gajah Tunggal TBK
13 pages
Quantum Mechanics Notes Summary
No ratings yet
Quantum Mechanics Notes Summary
64 pages
IT Diploma Basic Maths Exam
No ratings yet
IT Diploma Basic Maths Exam
4 pages
Engineering with Laplace Transforms
No ratings yet
Engineering with Laplace Transforms
24 pages
The New Religion of Risk Management: Peter L. Bernstein
No ratings yet
The New Religion of Risk Management: Peter L. Bernstein
5 pages
How Indian Highways Are Numbered
No ratings yet
How Indian Highways Are Numbered
3 pages
Lesson 4
No ratings yet
Lesson 4
2 pages
Ateneo de Davao Math Curriculum 2007
No ratings yet
Ateneo de Davao Math Curriculum 2007
2 pages
Thermal Comfort Calculations
No ratings yet
Thermal Comfort Calculations
4 pages
Topology Optimization for Engineers
No ratings yet
Topology Optimization for Engineers
14 pages
Rahat Test 1
No ratings yet
Rahat Test 1
2 pages
s.3 Mathematics Paper 2
100% (1)
s.3 Mathematics Paper 2
5 pages
Digital Systems Design Exam 2023
No ratings yet
Digital Systems Design Exam 2023
2 pages
Algorithms For Data Compression in Wireless Computing Systems
No ratings yet
Algorithms For Data Compression in Wireless Computing Systems
7 pages
Truss Analysis & Elastic Strain Energy
No ratings yet
Truss Analysis & Elastic Strain Energy
12 pages
Module 6 Texture
No ratings yet
Module 6 Texture
20 pages
Note You Must Follow A Sequential Method and Show All Your Working For Arriving at A Particular Solution
No ratings yet
Note You Must Follow A Sequential Method and Show All Your Working For Arriving at A Particular Solution
9 pages
Module 12 - Lesson - 1 - The+Gas+Laws
No ratings yet
Module 12 - Lesson - 1 - The+Gas+Laws
23 pages

2024 Lecture7

Uploaded by

2024 Lecture7

Uploaded by

723A75 Advanced Data Mining

TDDD41 Data Mining - Clustering and Association Analysis

arg min x∈D

• Given a database of transactions we want to find association rules,

Item1 , . . . , Itemm → Itemm+1 , . . . , Itemn

with a user-specified minimum support and confidence.

• Using the following apriori propeerty:

• Given a large itemset L we wish to generate rules

1 for all large itemsets lk with k ≥ 2 do

Algorithm: genrules(lk , am , minconf)

1 A = {(m − 1)-itemsets am−1 |am−1 ⊆ am }

1 for all large itemsets lk with k ≥ 2 do

Algorithm: genrules(lk , am , minconf)

1 A = {(m − 1)-itemsets am−1 |am−1 ⊆ am }

• We prove by contradiction that the rule generation algorithm is correct.

• As previous, assume that we have access to some transactional data,

• Given the new sorted set it constructs a so-called FP tree.

• Given the new sorted set it constructs a so-called FP tree.

• Given the new sorted set it constructs a so-called FP tree.

• Given the new sorted set it constructs a so-called FP tree.

C:2 B:1 B:1

• Given the new sorted set it constructs a so-called FP tree.

C:3 B:1 B:1

Algorithm: FP-tree(D, minsup)

1 Count support for each item in D

1 if T has a child N such that N.name = I1 .name then

F:4 C:1 Item Conditional database

• If we look at the M-conditional database, ({FCA : 2, FCAB : 1})

• For the AM-conditional database ({FC : 3}), or

Item Conditional database

• For the CAM-conditional database ({F : 3}), or

• Conditional database is empty.

• To mine the FP tree Tree, call FP-grow(Tree, NULL, minsup).

Algorithm: FP-grow(Tree, α, minsup)

1 for each item X in Tree do

0.1 if Tree has a single branch then

• The FP grow algorithm is correct.

• Run the FP grow algorithm on the database below with minsup 2.

• Mining transactions to find rules of the form

Item1 , . . . , Itemm → Itemm+1 , . . . , Itemn

with user-defined minimum support and confidence.

You might also like