Chapter Five
Chapter Five
1
Pattern Discovery : Definition
o Pattern discovery attempts to discover hidden linkage between data items
o Given a set of records each of which contain some number of items from
a given collection;
2
Pattern Discovery: Application
o Shelf management (e.g. Supermarket, pharmacy, Book shop).
3
Prevalent Interesting Rules
o Analysts already know about prevalent rules
– Interesting rules are those that deviate from prior Milk and
1995 Eggs sell
expectation
together!
o Mining’s payoff is in finding interesting
(surprising) phenomena
o What makes a rule surprising?
– Does not match prior expectation
• Correlation between milk and cereal remains 1998
Zzzz... Milk and
roughly constant over time cereal sell
o Cannot be trivially derived from simpler rules together!
– Milk 10%, cereal 10%
– Milk & cereal 10% … prevailing
– Eggs 10%
– Milk, cereal & eggs 0.1% … Surprising!
4
Pattern Discovery: Basic concepts
o itemset: A set of one or more items
o k-itemset: X = {x1, …, xk}
o support, s, is the fraction of transactions that contains X (i.e., the probability
that a transaction contains X)
– support of X and Y greater than user defined threshold s; that is, support
probability of s that a transaction contains X Y
–An itemset X is frequent if X’s support is no less than a minsup threshold
o Confidence: is the probability of finding Y in a transaction with all
X1,X2,…,Xn .
– confidence, c, conditional prob. that a transaction having X also contains
Y; i.e. conditional prob. (confidence) of Y given X > user threshold c
5
Steps in Pattern Discovery
o It finds sets of items that appear “frequently” in the baskets.
o The problem of pattern discovery can be generalized into two steps:
– Fining frequent patterns from large itemsets
• Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
– Generating association rules from these itemsets.
o Association rules are defined as statements of the form
{X1,X2,…,Xn} -> Y, which means that Y may present in the
transaction if X1,X2,…,Xn are all in the transaction.
o Example: Rules Discovered
{Milk} --> {Coke}
{Tea, Milk} --> {Coke}
6
Example: Finding frequent itemsets
o Given a support threshold (X > S), sets of X items that appear in greater
than or equal to S baskets are called frequent itemsets.
– Support = 4 baskets.
B1 = {m, c, b} B2 = {m, p, j}
B3 = {m, b} B4 = {c, j}
B5 = {m, p, b} B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}
8
Example: Association Rules
o Let say min_support = 50%, min_confidence = 50%, identify frequent
item pairs and define association rules