Group members:
Rahul Kelaskar A – 636
Anish Khale A - 638
Dhaval Doshi A - 682 Guide : Mr. Gautam Borkar
• Process of exploring and analyzing data
• Iterative multi-step process
• Involves data preparation, search for patterns, knowledge
evaluation and interpretation
• Arrangement or Ordering
• Existence of organization of underlying structure
Application of algorithms to
extract patterns in data.
Act of taking in raw data and
taking “action” based on the
“category” of the pattern.
Identifies underlying patterns from transformed data.
Input:
A database DB, represented by FP-tree and a
minimum support S.
Output:
The complete set of frequent patterns.
Method:
call FP-growth(FP-tree, null)
Procedure FP-growth(Tree, α)
{
if Tree contains a single prefix path // Mining single prefix-path FP-tree
then {
let P be the single prefix-path part of Tree;
let Q be the multipath part with the top branching node replaced by a null root;
for each combination (denoted as β) of the nodes in the path P do
generate pattern β ∪ α with support = minimum support of nodes in β;
let freq pattern set(P) be the set of patterns so generated; }
else let Q be Tree;
for each item ai in Q do { // Mining multipath FP-tree
generate pattern β = ai ∪ α with support = ai .support;
construct β’s conditional pattern-base and then β’s conditional FP-tree Treeβ ;
if Treeβ = ∅
then call FP-growth(Treeβ, β);
let freq pattern set(Q) be the set of patterns so generated; }
return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) ×freq pattern
set(Q)))
}
Example:[1]
{}
Header Table
Conditional pattern bases
Item frequency head f:4 c:1 item cond. pattern base
f 4 c f:3
c 4 c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1 b fca:1, f:1, c:1
m 3
p 3 m fca:2, fcab:1
m:2 b:1
p fcam:2, cb:1
p:2 m:1
m-conditional pattern base:
fca:2, fcab:1
{}
Header Table
f:4 c:1 {} All frequent patterns
Item frequency head relate to m
f 4 m,
c:3 b:1 b:1 f:3
c 4
fm, cm, am,
a 3 c:3
b 3 a:3 p:1 fcm, fam, cam,
m 3 a:3 fcam
p 3 m:2 b:1
m-conditional FP-tree
p:2 m:1
GENERALIZED SEQUENTIAL PATTERN MINING
ALGORITHM
1. Initially, every item in DB is a candidate of
length-1.
2. For each level (i.e., sequences of length-k) do
2.1 Scan database to collect support count for each
candidate sequence.
2.2 Generate candidate length-(k+1) sequences from
length-k frequent sequences using Apriori.
3. Repeat until no frequent sequence or no
candidate can be found.
Cand Sup
<a> 3
Seq. ID Sequence
10 <(bd)cb(ac)> <b> 5
20 <(bf)(ce)b(fg)> <c> 4
30 <(ah)(bf)abf> <d> 3
40 <(be)(ce)d> <e> 3
50 <a(bd)bcb(ade)>
<f> 2
Minimum support =2 <g> 1
<h> 1
Length-1 Candidates
<a> <b> <c> <d> <e> <f>
<a> <aa> <ab> <ac> <ad> <ae> <af>
<b> <ba> <bb> <bc> <bd> <be> <bf>
<c> <ca> <cb> <cc> <cd> <ce> <cf>
<d> <da> <db> <dc> <dd> <de> <df>
<e> <ea> <eb> <ec> <ed> <ee> <ef>
<f> <fa> <fb> <fc> <fd> <fe> <ff>
<a> <b> <c> <d> <e> <f>
<a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)>
<b> <(bc)> <(bd)> <(be)> <(bf)>
<c> <(cd)> <(ce)> <(cf)>
<d> <(de)> <(df)>
Length-2 Candidates
<e> <(ef)>
<f>
5th scan: 1 cand. <(bd)cba> Cand. cannot pass
1 length-5 seq. pat. sup. threshold
4th scan: 8 cand. <abba> <(bd)bc> … Cand. not in DB at all
6 length-4 seq. pat.
3rd scan: 46 cand. <abb> <aab> <aba> <baa> <bab> …
19 length-3 seq. pat
2nd scan: 51 cand. <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
19 length-2 seq. pat.
1st scan: 8 cand. <a> <b> <c> <d> <e> <f> <g> <h>
6 length-1 seq. pat.
Seq. ID Sequence
min_sup =2 10 <(bd)cb(ac)>
20 <(bf)(ce)b(fg)>
30 <(ah)(bf)abf>
40 <(be)(ce)d>
50 <a(bd)bcb(ade)>
Security(credit card fraud)
Global climate modeling
Business
Disaster Management
[1] Florian Verhein, Frequent Pattern Growth (FP-Growth)
Algorithm, 2008.
[2] An Introduction to Apriori-based method: GSP
(Generalized Sequential Patterns: Srikant & Agrawal
[EDBT’96].