ASSOCIATION MINING
Association Rule Mining
Given a set of transactions, find rules that will
predict the occurrence of an item based on the
occurrences of other items in the transaction
Market-Basket transactions Example of Association
Rules
TID Items {Diaper} {Beer},
1 Bread, Milk {Milk, Bread} {Eggs, Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
What Is Association Mining?
Association rule mining:
Finding frequent patterns, associations,
correlations, or causal structures among
sets of items or objects in transaction
databases, relational databases, and
other information repositories.
Frequent pattern: pattern (set of items,
sequence, etc.) that occurs frequently in a
database.
Definition: Frequent Itemset
Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper} TID Items
k-itemset 1 Bread, Milk
An itemset that contains k items 2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
Support count ()
4 Bread, Milk, Diaper, Beer
Frequency of occurrence of an itemset
5 Bread, Milk, Diaper, Coke
E.g. ({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that contain an
itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater than
or equal to a minsup threshold
Definition: Association Rule
TID Items
• Association Rule
– An implication expression of the form X Y, 1 Bread, Milk
where X and Y are itemsets 2 Bread, Diaper, Beer, Eggs
– Example: 3 Milk, Diaper, Beer, Coke
{Milk, Diaper} {Beer} 4 Bread, Milk, Diaper, Beer
• Rule Evaluation Metrics 5 Bread, Milk, Diaper, Coke
– Support (s) Example:
• Fraction of transactions that contain {Milk , Diaper } Beer
both X and Y
– Confidence (c) ( Milk , Diaper, Beer ) 2
s 0 .4
• Measures how often items in Y |T| 5
appear in transactions that ( Milk, Diaper, Beer ) 2
contain X c 0.67
( Milk , Diaper ) 3
April 29, 2024 Data Preprocessing 6
April 29, 2024 Data Preprocessing 7
April 29, 2024 Data Preprocessing 8
Apriori: A Candidate Generation-and-Test
Approach
Any subset of a frequent itemset must be frequent
if {beer, diaper, nuts} is frequent, so is {beer, diaper}
Every transaction having {beer, diaper, nuts} also contains {beer,
diaper}
Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!
Method:
generate length (k+1) candidate itemsets from length k frequent
itemsets, and
test the candidates against DB
Apriori Algorithm for Frequent Itemset
Generation
A two-step process is followed, consisting of join and
prune actions.
April 29, 2024 Data Preprocessing 15
Apriori – Solved Example
April 29, 2024 Data Preprocessing 16
Apriori – Solved Example
The Apriori Algorithm—An Example
Itemset sup
Itemset sup
Database TDB {A} 2 L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
20 B, C, E
1st scan {C} 3
{D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
CFrequency
2 ≥ 50%, Confidence
Itemset sup C2 Itemset 100%:
{A, B} 1
L2 Itemset sup
{A, C}
A scan
2nd C {A, B}
2
{A, C}
{B, C}
2
2
{A, E} 1 BE {A, C}
{B, E} 3
{B, C} 2 BC E {A, E}
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2
CE B {B, E}
BE C {C, E}
C3 Itemset 3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
The Apriori Algorithm — Example
Database D ite m s e t s u p .
L1 ite m s e t s u p .
T ID Item s C1 {1 } 2 {1 } 2
100 1 3 4 {2 } 3 {2 } 3
200 2 3 5 Scan D {3 } 3 {3 } 3
300 1 2 3 5 {4 } 1 {5 } 3
400 2 5 {5 } 3
C2 item s et s up C2 item s et
L2 itemset sup {1 2}
{1 2} 1 Scan D
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
28
April 29, 2024 Data Preprocessing 29
Important Details of Apriori
How to generate candidates?
Step 1: self-joining Lk
Step 2: pruning
How to count supports of candidates?
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3*L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
How to Generate Candidates?
Suppose the items in Lk-1 are listed in an order
Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1
Step 2: pruning
itemsets c in Ck do
(k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
Applications of Apriori Algorithm
• Education field: Extracting association rules in
data mining of admitted students through
characteristics and specialties.
• Medical field: Analysis of the patient’s database.
• Forestry: Analysis of probability and intensity of
forest fire with the forest fire data.
• Recommender system: Used by companies like
Amazon and Google for the autocomplete feature
32
Drawbacks of Apriori Algorithm
Using Apriori needs a generation of candidate
itemsets. These itemsets may be large in number
if the itemset in the database is huge.
Apriori needs multiple scans of the database to
check the support of each itemset generated and
this leads to high costs
Construct FP-tree from a Transaction Database
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
1. Scan DB once, find Header Table
frequent 1-itemset f:4 c:1
Item frequency head
(single item pattern) f 4
2. Sort frequent items in c 4 c:3 b:1 b:1
frequency descending a 3
order, f-list b 3 a:3 p:1
m 3
3. Scan DB again, p 3
m:2 b:1
construct FP-tree
F-list=f-c-a-b-m-p p:2 m:1
37
April 29, 2024 Data Preprocessing 38
Benefits of the FP-tree Structure
Completeness
Preserve complete information for frequent pattern
mining
Never break a long pattern of any transaction
Compactness
Reduce irrelevant info—infrequent items are gone
Items in frequency descending order: the more
frequently occurring, the more likely to be shared
Never be larger than the original database (not count
node-links and the count field)
For Connect-4 DB, compression ratio could be over
100
39
DECISION TREES
Example of a Decision Tree
Tid Refund Marital Taxable
Status Income Cheat
1 Yes Single 125K No
Refund
Yes No
2 No Married 100K No
3 No Single 70K No NO MarSt
4 Yes Married 120K No Single, Divorced Married
5 No Divorced 95K Yes
TaxInc NO
6 No Married 60K No
7 Yes Divorced 220K No < 80K > 80K
8 No Single 85K Yes NO YES
9 No Married 75K No
10 No Single 90K Yes
10
Training Data Model: Decision Tree
Apply Model to Test Data
Test Data
Start at the root of tree Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Each node represents a test on an attribute
of the instance to be classified and each
outgoing arch a possible outcome, leading to
a further test.
The leafs correspond to classification
actions.
Decision tree representation
(PlayTennis)
Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No
4/29/2024 Maria Simi
Decision trees expressivity
Decision trees represent a disjunction of
conjunctions on constraints on the value of
attributes:
(Outlook = Sunny Humidity = Normal)
(Outlook = Overcast)
(Outlook = Rain Wind = Weak)
4/29/2024
When to use Decision Trees
Problem characteristics:
Instances can be described by attribute value pairs
Target function is discrete valued
Disjunctive hypothesis may be required
Possibly noisy training data samples
Robust to errors in training data
Missing attribute values
Different classification problems:
Equipment or medical diagnosis
Credit risk analysis
Several tasks in natural language processing
4/29/2024
Top-down induction of Decision Trees
ID3 (Quinlan, 1986) is a basic algorithm for learning DT's
Given a training set of examples, the algorithms for building
DT performs search in the space of decision trees
The construction of the tree is top-down. The algorithm is
greedy.
The fundamental question is “which attribute should be
tested next? Which question gives us more information?”
Select the best attribute
A descendent node is then created for each possible value of
this attribute and examples are partitioned according to this
value
The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left
4/29/2024
Which attribute is the best classifier?
A statistical property called information gain, measures
how well a given attribute separates the training
examples
Information gain uses the notion of entropy, commonly
used in information theory
Information gain = expected reduction of entropy
4/29/2024
Entropy in binary classification
Entropy measures the impurity of a collection of
examples. It depends from the distribution of the
random variable p.
S is a collection of training examples
p+ the proportion of positive examples in S
p– the proportion of negative examples in S
Entropy (S) – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0,94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
= 1/2 + 1/2 = 1 [log21/2 = – 1]
Note: the log of a number < 1 is negative, 0 p 1, 0 entropy 1
4/29/2024
Entropy
4/29/2024 Maria Simi
Information gain as entropy
reduction
Information gain is the expected reduction in
entropy caused by partitioning the examples on an
attribute.
The higher the information gain the more effective
the attribute in classifying training data.
Expected reduction in entropy knowing A
|Sv|
Gain(S, A) = Entropy(S) − |S| Entropy(Sv)
v Values(A)
Values(A) possible values for A
Sv subset of S for which A has value v
4/29/2024
Example: expected information gain
Let
Values(Wind) = {Weak, Strong}
S = [9+, 5−]
SWeak = [6+, 2−]
SStrong = [3+, 3−]
Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14
Entropy(SStrong)
= 0,94 − 8/14 0,811 − 6/14 1,00
= 0,048
4/29/2024
Which attribute is the best classifier?
4/29/2024 Maria Simi
Example
4/29/2024
First step: which attribute to test at the
root?
Which attribute should be tested at the root?
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) = 0.084
Gain(S, Temperature) = 0.029
Outlook provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible
value of Outlook
partition the training samples according to the
value of Outlook
4/29/2024
After first step
4/29/2024 Maria Simi
Second step
Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970 3/5 0.0 2/5 0.0 =
0.970
Gain(SSunny, Wind) = 0.970 2/5 1.0 3.5 0.918 = 0
.019
Gain(SSunny, Temp.) = 0.970 2/5 0.0 2/5 1.0 1/5
0.0 = 0.570
Humidity provides the best prediction for the target
Lets grow the tree:
add to the tree a successor for each possible value
of Humidity
partition the training samples according to the
value of Humidity
4/29/2024
Second and third steps
{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No
4/29/2024 Maria Simi
ID3: algorithm
ID3(X, T, Attrs) X: training examples:
T: target attribute (e.g. PlayTennis),
Attrs: other attributes, initially all attributes
Create Root node
If all X's are +, return Root with class +
If all X's are –, return Root with class –
If Attrs is empty return Root with class most common value of T in X
else
A best attribute; decision attribute for Root A
For each possible value vi of A:
- add a new branch below Root, for test A = vi
- Xi subset of X with A = vi
- If Xi is empty then add a new leaf with class the most common value of T in X
else add the subtree generated by ID3(Xi, T, Attrs {A})
return Root
4/29/2024