Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views9 pages

Chapter Five

Chapter Five discusses Association Rule Discovery, focusing on pattern discovery to identify hidden linkages between data items. It outlines the process of finding frequent patterns and generating association rules, emphasizing the importance of support and confidence in determining interesting rules. The chapter provides examples of applications in various fields, such as retail, to illustrate how these rules can predict customer behavior.

Uploaded by

eyibeltal3939
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

Chapter Five

Chapter Five discusses Association Rule Discovery, focusing on pattern discovery to identify hidden linkages between data items. It outlines the process of finding frequent patterns and generating association rules, emphasizing the importance of support and confidence in determining interesting rules. The chapter provides examples of applications in various fields, such as retail, to illustrate how these rules can predict customer behavior.

Uploaded by

eyibeltal3939
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter Five

Association Rule Discovery

Compiled by: Nigusu Y. ( [email protected] )

1
Pattern Discovery : Definition
o Pattern discovery attempts to discover hidden linkage between data items

o Given a set of records each of which contain some number of items from
a given collection;

– Pattern discovery produce dependency rules which will predict


occurrence of an item based on occurrences of other items.
o Motivation of Pattern discovery: Finding inherent regularities in data
–What products were often purchased together? Pasta & Tea?
–What are the subsequent purchases after buying a PC?
–What kinds of DNA are sensitive to the new drug D?
–Can we find redundant tests in medicine?

2
Pattern Discovery: Application
o Shelf management (e.g. Supermarket, pharmacy, Book shop).

• Goal: To identify items that are bought together by sufficiently many


customers.

• Approach: Process sales transactions data collected to find


dependencies among items.

• A classic rule - If a customer buys Coffee and milk, then he is very


likely to buy Tea. So, don’t be surprised if you find six-packs stacked
next to Coffee!

{Coffee, Milk}  Tea

3
Prevalent  Interesting Rules
o Analysts already know about prevalent rules
– Interesting rules are those that deviate from prior Milk and
1995 Eggs sell
expectation
together!
o Mining’s payoff is in finding interesting
(surprising) phenomena
o What makes a rule surprising?
– Does not match prior expectation
• Correlation between milk and cereal remains 1998
Zzzz... Milk and
roughly constant over time cereal sell
o Cannot be trivially derived from simpler rules together!
– Milk 10%, cereal 10%
– Milk & cereal 10% … prevailing
– Eggs 10%
– Milk, cereal & eggs 0.1% … Surprising!
4
Pattern Discovery: Basic concepts
o itemset: A set of one or more items
o k-itemset: X = {x1, …, xk}
o support, s, is the fraction of transactions that contains X (i.e., the probability
that a transaction contains X)
– support of X and Y greater than user defined threshold s; that is, support
probability of s that a transaction contains X  Y
–An itemset X is frequent if X’s support is no less than a minsup threshold
o Confidence: is the probability of finding Y in a transaction with all
X1,X2,…,Xn .
– confidence, c, conditional prob. that a transaction having X also contains
Y; i.e. conditional prob. (confidence) of Y given X > user threshold c

5
Steps in Pattern Discovery
o It finds sets of items that appear “frequently” in the baskets.
o The problem of pattern discovery can be generalized into two steps:
– Fining frequent patterns from large itemsets
• Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
– Generating association rules from these itemsets.
o Association rules are defined as statements of the form
{X1,X2,…,Xn} -> Y, which means that Y may present in the
transaction if X1,X2,…,Xn are all in the transaction.
o Example: Rules Discovered
{Milk} --> {Coke}
{Tea, Milk} --> {Coke}
6
Example: Finding frequent itemsets
o Given a support threshold (X > S), sets of X items that appear in greater
than or equal to S baskets are called frequent itemsets.

o Example: Frequent Itemsets

– Itemsets bought={milk, coke, Pepsi, biscuit, juice}.

– Support = 4 baskets.

B1 = {m, c, b} B2 = {m, p, j}

B3 = {m, b} B4 = {c, j}

B5 = {m, p, b} B6 = {m, c, b, j}

B7 = {c, b, j} B8 = {b, c}

– Frequent itemsets: {m}, {c}, {b}, {j}, {m,b} , {b,c}.


7
Association Rules
o Find all rules on itemsets of the form XY with minimum support and
confidence
– If-then rules about the contents of baskets.
• {i1, i2,…,ik} → j means: “if a basket contains all of i1,…,ik then it is likely
to contain j.”
o A typical question: “find all association rules with support ≥ s and confidence ≥
c.” Note: “support” of an association rule is the support of the set of items it
mentions.
– Confidence of this association rule is the probability of j given i1,…,ik. It is the
number of transactions i1,…,ik containing item j
– Example: Confidence
B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b}
B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}
o An association rule: {m, b} → c (with confidence = 2/4 = 50%).

8
Example: Association Rules
o Let say min_support = 50%, min_confidence = 50%, identify frequent
item pairs and define association rules

Tid Items bought


Customer Customer
10 Coke, Nuts, Tea buys both buys Tea
20 Coke, Coffee, Tea
30 Coke, Tea, Eggs
40 Nuts, Eggs, Milk
50 Coffee, Tea, Eggs, Milk Customer
buys Coke
o Frequent Pattern:
– Coke:3, Tea: 4, Eggs: 3, {Coke, Tea}: 3
o Association rules:
– Coke  Tea (60%, 100%)

– Tea  Coke (60%, 75%)


9

You might also like