Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views28 pages

2024 Lecture7

The document presents an overview of the FP Grow Algorithm, which is used for mining frequent patterns in transactional databases without candidate generation. It details the process of constructing an FP tree from sorted frequent itemsets and explains how to mine the tree for frequent itemsets recursively. Additionally, it includes a recap of association rules, the Apriori algorithm, and an exercise for applying the FP Grow Algorithm on a sample database.

Uploaded by

triprishikesh358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views28 pages

2024 Lecture7

The document presents an overview of the FP Grow Algorithm, which is used for mining frequent patterns in transactional databases without candidate generation. It details the process of constructing an FP tree from sorted frequent itemsets and explains how to mine the tree for frequent itemsets recursively. Additionally, it includes a recap of association rules, the Apriori algorithm, and an exercise for applying the FP Grow Algorithm on a sample database.

Uploaded by

triprishikesh358
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

723A75 Advanced Data Mining

TDDD41 Data Mining - Clustering and Association Analysis


Lecture 7: FP Grow Algorithm

Johan Alenlöv
IDA, Linköping University, Sweden
Outline

arg min x∈D

• Content
• Recap
• Frequent Pattern (FP) Grow Algorithm
• Exercise
• Summary
• Litterature
• Course Book. 2nd ed.: 5.2.4. 3rd ed.: 6.2.4
• Han, J., Pei, J., and Yin, Y. Mining Frequent Patterns without Candidate Generation.
In Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data, 2000.

1
Recap

• Given a database of transactions we want to find association rules,

Item1 , . . . , Itemm → Itemm+1 , . . . , Itemn


(X → Y)

with a user-specified minimum support and confidence.


• support: Fraction of transactions that contains th full rule Item1 , . . . , Itemn .
(p(X, Y))
• confidence: Fraction of transactions that contain Item1 , . . . , Itemm which also
contain Itemm+1 → Itemn . (p(Y | X))
• We find the rules in two steps:
1. Find all frequent itemsets
2. Find all rules with minimum confidence from these sets.

2
Recap: Apriori Algorithm

• Using the following apriori propeerty:


• Every subset of a frequent itemset is frequent.
• Alternatively, every superset of an infrequent itemset is infrequent.
• The Apriori Algorithm works as follows:
1. Find all 1-itemsets.
2. Use the previous found frequent itemsets and the apriori property to generate candidates
for the next frequent itemsets.
3. Go through the candidates to find the itemsets.
Step 2 and 3 are repeated until no new frequent itemsets are found.
• We proved by induction that the algorithm is correct.

3
Recap: Generate rules

• Given a large itemset L we wish to generate rules


X → L \ X,
where X ⊂ L.
• These rules should have a minimum confidence.
• The algorithm uses the following apriori property:
• If X does not result in a rule with minimum confidence for L, then neither does any
subset X′ ⊂ X,
support(L) support(L)
confidence(X → L \ X) = = confidence(X → L \ X )
′ ′
supportX ≥ support(X′ )

1 for all large itemsets lk with k ≥ 2 do


2 call genrules(lk , lk , minconf)

Algorithm: genrules(lk , am , minconf)


Input: A large itemset lk , a set am ⊆ lk , the minimum confidence minconf.
Output: All the rules of the form a → lk \ a with a ⊆ am and confidence equal or above minconf.

1 A = {(m − 1)-itemsets am−1 |am−1 ⊆ am }


2 for all am−1 ∈ A do
3 conf = support(lk ) / support(am−1 ) // Confidence of the rule am−1 → lk \ am−1
4 if conf ≥ minconf then
5 output the rule am−1 → lk \ am−1 with confidence=conf and support=support(lk )
6 if m − 1 > 1 then call genrules(lk , am−1 , minconf)

4
Recap: Rule Generation Algorithm Proof

1 for all large itemsets lk with k ≥ 2 do


2 call genrules(lk , lk , minconf)

Algorithm: genrules(lk , am , minconf)


Input: A large itemset lk , a set am ⊆ lk , the minimum confidence minconf.
Output: All the rules of the form a → lk \ a with a ⊆ am and confidence equal or above minconf.

1 A = {(m − 1)-itemsets am−1 |am−1 ⊆ am }


2 for all am−1 ∈ A do
3 conf = support(lk ) / support(am−1 ) // Confidence of the rule am−1 → lk \ am−1
4 if conf ≥ minconf then
5 output the rule am−1 → lk \ am−1 with confidence=conf and support=support(lk )
6 if m − 1 > 1 then call genrules(lk , am−1 , minconf)

• We prove by contradiction that the rule generation algorithm is correct.


• Assume that the algorithm missed a rule. Let am−1 → lk \ am−1 denote one of
the missing rules with the largest antecedent. Then,
• Note that lk has minimum support and, thus, it is outputted by the apriori algorithm
since this is correct.
• Then, the rule generation algorithm cannot have missed the rule if m = k.
• Moreover if m < k, then
confidence(am → lk \ am ) = support(lk )/support(am ) ≥ support(lk )/support(am−1 )
= confidence(am−1 → lk \ am−1 ) ≥ minconf.
• Note that the algorithm didn’t miss the rule am → lk \ am .
• Then the algorithm couldn’t have missed the rule am−1 → lk \ am−1 .
• This contradicts our assumption and, thus, the algorithm is correct.

5
FP Grow Algorithm

• As previous, assume that we have access to some transactional data,


Tid Items
1 F, A, C, D, G, I, M, P
2 A, B, C, F, L, M, O
3 B, F, H, J, O, W
4 B, C, K, S, P
5 A, F, C, E, L, P, M, N
• The FP grow algorithm returns all frequent itemsets without candidate
generation and may save time and space.
• First, it finds frequent 1-itemsets and sorts the frequent items within each
transaction in support decending order, e.g. with minsup = 3
Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P
• Then it outputs the frequent 1-itemsets, F, C, A, B, M, and P.

6
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.


Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:1

C:1

A:1

M:1

P:1

7
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.


Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:2

C:2

A:2

M:1 B:1

P:1 M:1

8
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.


Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:3

C:2 B:1

A:2

M:1 B:1

P:1 M:1

9
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.


Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:3 C:1

C:2 B:1 B:1

A:2 P:1

M:1 B:1

P:1 M:1

10
FP Grow Algorithm

• Given the new sorted set it constructs a so-called FP tree.


Tid Items
1 F, C, A, M, P
2 F, C, A, B, M
3 F, B
4 C, B, P
5 F, C, A, M, P

{}

F:4 C:1

C:3 B:1 B:1

A:3 P:1

M:2 B:1

P:2 M:1

• Finally, it mines the FP tree for frequent itemsets instead of the original database.
11
FP Grow Algorithm

Algorithm: FP-tree(D, minsup)


Input: A transactional database D, and the minimum support minsup.
Output: The FP tree for D and minsup.

1 Count support for each item in D


2 Remove the infrequent items from the transactions in D
3 Sort the items in each transaction in D in support descending order
4 Create a FP tree with a single node T with T.name = NULL
5 for each transaction I ∈ D do
6 insert-tree(I, T)

Algorithm: insert-tree(I1 , . . . Im , T)
Input: An itemset I1 , . . . , Im , and a node T in the FP tree.
Output: Modified FP tree.

1 if T has a child N such that N.name = I1 .name then


2 N.count + +
3 else
4 create a new child N of T with N.name = I1 .name and N.count = 1
5 if m > 1 then
6 insert-tree (I2 , . . . , Im , N) 12
FP Grow Algorithm

• The X-conditional database consists of all the prefix paths leading to X in the FP
tree.
{}

F:4 C:1 Item Conditional database


F -
C:3 B:1 B:1 C F:3
A FC:3
A:3 P:1 B FCA:1, F:1, C:1
M FCA:2, FCAB:1
M:2 B:1 P FCAM:2, CB:1

P:2 M:1

• The support of each prefix path in the conditional database is equal to the count
of X for that prefix path.
• The X-conditional database contains all the itemsets in D that end with X.
• It is enough to mine the X-conditional database to find all the frequent itemsets
in D that end with X.
• Re-start the algorithm for the X-conditional database, i.e. call the FP grow
algorithm recursively.

13
FP Grow Algorithm

• If we look at the M-conditional database, ({FCA : 2, FCAB : 1})


Tip Items
1 F, C, A
2 F, C, A
3 F, C, A, B
• After finding the frequent 1-itemsets and sorting the transactions we have
Tid Items
1 F, C, A
2 F, C, A
3 F, C, A
• Output the frequent 1-itemsets, adding M as suffix (FM, CM, AM)
• Build the FP tree and the conditional databases.
{}

F:3
Item Conditional database
F -
C F:3
C:3
A FC:3

A:3

• Restart the algorithm for the FM, CM, and AM condtional databases.

14
FP Grow Algorithm

• For the AM-conditional database ({FC : 3}), or


Tid Items
1 F,C
2 F,C
3 F,C
• After finding the 1-itemsets and sorting the transactions we have
Tid Items
1 F,C
2 F,C
3 F,C
• Output th 1-itemsets, adding AM as a suffix. (FAM and CAM).
• Build the FP tree and the econditional databases.
{}

Item Conditional database


F:3 F -
C F:3
C:3

• Restart the algorithm for the FAM and CAM conditional databases.

15
FP Grow Algorithm

• For the CAM-conditional database ({F : 3}), or


Tid Items
1 F
2 F
3 F
• After finding the 1-itemsets and sorting the transactions we have
Tid Items
1 F
2 F
3 F
• Output th 1-itemsets, adding CAM as a suffix. (FCAM).
• Build the FP tree and the econditional databases.
{}
Item Conditional database
F -
F:3

• Conditional database is empty.


Backtrack.

16
FP Grow Algorithm

• To mine the FP tree Tree, call FP-grow(Tree, NULL, minsup).

Algorithm: FP-grow(Tree, α, minsup)


Input: A FP tree Tree, an itemset α, and the minimum support minsup.
Output: All the itemsets in Tree that end with α and have minsup.

1 for each item X in Tree do


2 output the itemset β = X ∪ α with support=X.count
3 build the β conditional database and the corresponding FP tree Treeβ
4 if Treeβ is not empty then call FP-grow(Treeβ , β, minsup)

• The algorithm above can be made more efficient by adding the lines below.

0.1 if Tree has a single branch then


0.2 for each combination β of the nodes in the branch do
0.3 output the itemset β ∪ α with support = minX∈β X.count
0.4 else

• The FP grow algorithm is correct.

17
FP Grow Algorithm

• With small values for minsup, there are many and long candidates, which implies
long runtime due to expensive operations such as pattern matching, subset
checking, storing, etc.

18
Exercise

• Run the FP grow algorithm on the database below with minsup 2.


Tid Items
1 A, B, E
2 B, D
3 B, C
4 A, B, D
5 A, C
6 B, C
7 A, C
8 A, B, C, E
9 A, B, C
• Show the execution details (i.e. FP tree construction, conditional databases,
recursive calls), not just the frequent itemsets found.
• Solution : {A, B, C, D, E, AB, AC, AE, BC, BD, BE, ABC, ABE}

19
Summary

• Mining transactions to find rules of the form

Item1 , . . . , Itemm → Itemm+1 , . . . , Itemn

with user-defined minimum support and confidence.


• Two-step solution:
1. Find all the large itemsets.
2. Generate all the rules with minimum confidence.
• We have seen two solutions for step 1. Apriori and FP grow algorithm.
• The runtime can differ a lot for small values of minsup.

20

You might also like