0% found this document useful (0 votes)

76 views37 pages

Association Datascience

Uploaded by

anon_679166612

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views37 pages

Association Datascience

Uploaded by

anon_679166612

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Data Science with R

Lesson 13—Association

©©Copyright
Copyright 2015,
2015, Simplilearn.
Simplilearn. All rights
All rights reserved.
reserved.
Objectives

• Explain association rule mining and parameters of interesting relationships

After completing
this lesson, you will • Explain the Apriori algorithm and steps to find frequent item set
be able to:

© Copyright 2015, Simplilearn. All rights reserved.

Topic 1: Association Rule Mining

© Copyright 2015, Simplilearn. All rights reserved.

Association Rules

An association rule is a pattern that states when X occurs, Y occurs with a certain probability. A
transaction t contains X, a set of items (item set) in I, if X is a subset of t.

An association rule is an implication of the form:

XY
Where, X, Y  I, and X Y = 

© Copyright 2015, Simplilearn. All rights reserved.

Association Rule Mining

This is a classical Data Mining technique that:

• Finds out interesting patterns in a dataset
• Assumes all data elements as categorical
• Is not suitable for numeric data

Brute-force solutions cannot solve the problem of finding different combinations of items in less time and
! computing power.

© Copyright 2015, Simplilearn. All rights reserved.

Application Areas of Association Rule Mining

Some examples are:

Market Basket Data Analysis

Purchase Data Analysis

Website Traffic Analysis

© Copyright 2015, Simplilearn. All rights reserved.

Parameters of Interesting Relationships

Interesting relationships have two parameters:

• Frequent item sets: Collection of items occurring together frequently
• Association rules: Indicators of a strong relationship between two items

Example:
In the “Items” table below, {wine, diapers, soy milk} is the frequent item set
and diapers ➞ Wine is an association rule:

© Copyright 2015, Simplilearn. All rights reserved.

Association Rule Strength Measures

The measures of the strength of association rules are explained below:

Support Confidence
For an item set, it is the percentage of the dataset that The confidence for the rule {diapers} ➞ {wine} is
contains this item set. defined as support({diapers, wine})/support({diapers}).

The rule holds with support sup in T, if sup% of

of undertaking the project The rule holds in T with confidence conf if conf% of
transactions contain X  Y.
ascertaining the costs and benefits transactions that contain X also contain Y.
sup = Pr(X  Y). conf = Pr(Y | X)

Example: In the “Items” table, the support of {soy milk} Example: In the “Items” table, the confidence for
is 4/5 and of {soymilk, diapers} is 3/5. diapers ➞ wine is 3/5/4/5 = 3/4 = 0.75.

© Copyright 2015, Simplilearn. All rights reserved.

Limitations of Support and Confidence

While support and confidence can help you quantify the success of
association analysis, for thousands of sale items, the process of finding
them can be really slow.
In such cases, you can use algorithms such as Apriori.

© Copyright 2015, Simplilearn. All rights reserved.

Topic 2: Apriori Algorithm

© Copyright 2015, Simplilearn. All rights reserved.

Apriori Algorithm: Meaning
All possible item sets from the set {1, 2, 3}
This algorithm:
• Helps reduce the number of possible interesting item sets
• Assumes that if an item set is frequent, all of its subsets are also
frequent

With infrequent item sets highlighted

© Copyright 2015, Simplilearn. All rights reserved.

Apriori Algorithm: Example

To understand its application, consider the below “Shopping Baskets” items set, which ignores some
important parameters, such as quantities of items and price paid:

t1: Beef, Chicken, Milk

t2: Beef, Cheese
t3: Cheese, Boots
t4: Beef, Chicken, Cheese
t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
t7: Chicken, Milk, Clothes

© Copyright 2015, Simplilearn. All rights reserved.

Applying Apriori Algorithm: Steps

It includes two steps:

Mine all frequent item sets

Generate rules from frequent item sets

Assume:
• minsup = 30%
• minconf = 80%
An example frequent item set:
{Chicken, Clothes, Milk} [sup = 3/7]
Association rules from the item set:
Clothes  Milk, Chicken [sup = 3/7, conf = 3/3]
… …
Clothes, Chicken  Milk, [sup = 3/7, conf = 3/3]

© Copyright 2015, Simplilearn. All rights reserved.

Step 1: Mine All Frequent Item Sets
Visual Depiction
A frequent item set is:
• The one with sup ≥ minsup
• Any subset of a frequent item set

© Copyright 2015, Simplilearn. All rights reserved.

Algorithm to Find Frequent Item Set

Also called level-wise search, it includes the following steps:

Find all 1-item frequent item sets; then all 2-item frequent item sets, and so on

In each iteration k, consider item sets that contain some k-1 frequent item sets

Find frequent item sets of size 1: F1

! With k = 2, Ck = item sets of size k that could be frequent, given Fk-1, and Fk = item sets that are actually frequent, Fk  Ck.

© Copyright 2015, Simplilearn. All rights reserved.

Finding Frequent Item Set—Example

Consider the below dataset T with minsup = 0.5:

TID Items
T100 1, 3, 4
T200 2, 3, 5
T300 1, 2, 3, 5
T400 2, 5

itemset:count
1. scan T  C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3
 F1: {1}:2, {2}:3, {3}:3, {5}:3
 C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}
2. scan T  C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2
 F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2
 C3: {2, 3,5}
3. scan T  C3: {2, 3, 5}:2  F3: {2, 3, 5}
© Copyright 2015, Simplilearn. All rights reserved.
Ordering Items

The items in I are sorted in lexicographic order (total order).

• In each item set, it is used throughout the algorithm.
• {w[1], w[2], …, w[k]} represents a k-item set, where w consists of items w[1], w[2], …, w[k], where
w[1] < w[2] < … < w[k].

© Copyright 2015, Simplilearn. All rights reserved.

Ordering Items (contd.)

The algorithm for ordering items is:

C1  init-pass(T);
F1  {f | f  C1, f.count/n  minsup}; // n: no. of transactions in T
for (k = 2; Fk-1  ; k++) do
Ck  candidate-gen(Fk-1);
for each transaction t  T do
for each candidate c  Ck do
if c is contained in t then
c.count++;
end
end
Fk  {c  Ck | c.count/n  minsup}
end
return F  k Fk;

© Copyright 2015, Simplilearn. All rights reserved.

Candidate Generation

The candidate-gen function takes Fk-1 and returns candidates as the superset of the set of all frequent k
item sets. It includes two steps:
1
Join: Generate all possible candidate item sets Ck of length k

2
Prune: Remove the candidates in Ck that cannot be frequent

© Copyright 2015, Simplilearn. All rights reserved.

Candidate Generation (contd.)

The algorithm for candidate generation is:

Function candidate-gen(Fk-1)
Ck  ;
forall f1, f2  Fk-1
with f1 = {i1, … , ik-2, ik-1}
and f2 = {i1, … , ik-2, i’k-1}
and ik-1 < i’k-1 do
c  {i1, …, ik-1, i’k-1}; // join f1 and f2
Ck  Ck  {c};
for each (k-1)-subset s of c do
if (s  Fk-1) then
delete c from Ck; // prune
end
end
return Ck;

© Copyright 2015, Simplilearn. All rights reserved.

Candidate Generation: Example

Assume F3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}}, then:

After join C4 = {{1, 2, 3, 4}, {1, 3, 4, 5}}

After prune C4 = {{1, 2, 3, 4}}

© Copyright 2015, Simplilearn. All rights reserved.

Step 2—Generate Rules from Frequent Item Sets

For each frequent item set X and proper nonempty subset A of X, assume B = X – A.
A  B is an association rule if:
Confidence(A  B) ≥ minconf
support(A  B) = support(AB) = support(X)
confidence(A  B) = support(A  B) / support(A)

© Copyright 2015, Simplilearn. All rights reserved.

Generate Rules from Frequent Item Sets—Example

Assume {2,3,4} is frequent with sup = 50% and proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4},
with sup = 50%, 50%, 75%, 75%, 75%, 75%, respectively.

Association rules:
2,3  4, confidence = 100%
2,4  3, confidence = 100%
3,4  2, confidence = 67%
2  3,4, confidence = 67%
3  2,4, confidence = 67%
4  2,3, confidence = 67%
Support of all rules = 50%

© Copyright 2015, Simplilearn. All rights reserved.

Demo—Perform Association Using the Apriori Algorithm

This demo will show the steps to do association using the Apriori algorithm.

© Copyright 2015, Simplilearn. All rights reserved.

Demo—Perform Visualization on Associated Rules

This demo will show the steps to do visualization on associated rules.

© Copyright 2015, Simplilearn. All rights reserved.

Problems with Association Mining

Some problems related with association mining are:

Single minsup It assumes that all data items have similar frequencies and/or are of the same nature.

False Items Some items appear very frequently, whereas others appear rarely.

Items Frequencies If minsup is high, rules with rare items are not found; if minsup is set low, it may cause
Variation combinatorial explosion.

© Copyright 2015, Simplilearn. All rights reserved.

Quiz

© Copyright 2015, Simplilearn. All rights reserved.

QUIZ
Association rules are interesting:
1

a. if they satisfy both minimum and maximum iterations.

b. if they satisfy both minimum support and minimum confidence

thresholds.
c. if they satisfy both association correlations.

d. if they satisfy Apriori constants.

QUIZ
Association rules are interesting:
1

a. if they satisfy both minimum and maximum iterations.

b. if they satisfy both minimum support and minimum confidence

thresholds.
c. if they satisfy both association correlations.

d. if they satisfy Apriori constants.

The correct answer is b.

Explanation: Association rules are interesting if they satisfy both minimum support and
minimum confidence thresholds.

QUIZ
What is the formula to calculate support?
2

a. Pr(X | Y)

b. Pr(X  Y)

c. Pr(X * Y)

d. Pr(X / Y)

QUIZ
What is the formula to calculate support?
2

a. Pr(X | Y)

b. Pr(X  Y)

c. Pr(X * Y)

d. Pr(X / Y)

The correct answer is b.

Explanation: The formula to calculate Support is Pr(X  Y).

QUIZ Which of the following algorithms can be used to solve the problem of support and
3 confidence?

a. Candidate generation

b. Classification

c. Apriori

d. Item set

QUIZ Which of the following algorithms can be used to solve the problem of support and
3 confidence?

a. Candidate generation

b. Classification

c. Apriori

d. Item set

The correct
The answers
correct answerare
is b.c.

Explanation: The Apriori algorithm can be used to solve the problem of support and
confidence.

QUIZ
Which of the following conditions is true for mining frequent item sets?
4

a. sup < minsup

b. sup < minsup

c. sup = minsup

d. sup ≥ minsup

QUIZ
Which of the following conditions is true for mining frequent item sets?
4

a. sup < minsup

b. sup < minsup

c. sup = minsup

d. sup ≥ minsup

The correct answer is d.

Explanation: sup ≥ minsup is true for mining frequent item sets.

Summary
Summary

Let us summarize the • Association rule mining finds out interesting patterns in a dataset.
topics covered in this • The interesting relationships can have two parameters: frequent item sets and
lesson:
association rules.
• An association rule is a pattern that states when X occurs, Y occurs with a
certain probability.
• The measures of the strength of association rules are support and confidence.
• While support and confidence can help quantify the success of
association analysis, for thousands of sale items, the process can be
really slow, which is solved by algorithms, such as Apriori.
• The Apriori algorithm includes two steps: mining all frequent item sets and
generating rules from frequent item sets.

This concludes “Association.”
This is the last lesson of the course.

DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Unit 4
No ratings yet
Unit 4
97 pages
Module-2:Divide and Conquer
No ratings yet
Module-2:Divide and Conquer
26 pages
Mod 5
No ratings yet
Mod 5
56 pages
Data Mining 2, 3 Material
No ratings yet
Data Mining 2, 3 Material
173 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Association Rules
No ratings yet
Association Rules
33 pages
Linear Bounded Automata
100% (1)
Linear Bounded Automata
23 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Unit 4
No ratings yet
Unit 4
72 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
10-std Maths
86% (7)
10-std Maths
334 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
WBJEE 2025 Maths Question Paper With Solution - 1746023104017
No ratings yet
WBJEE 2025 Maths Question Paper With Solution - 1746023104017
31 pages
Appriori Algorithm
No ratings yet
Appriori Algorithm
15 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Data Mining and Data Analytics Unit-II
No ratings yet
Data Mining and Data Analytics Unit-II
26 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
44 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Association Rule
No ratings yet
Association Rule
22 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
44 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Compiler Design Question Bank
No ratings yet
Compiler Design Question Bank
4 pages
DataMining Chapter2
No ratings yet
DataMining Chapter2
8 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Chapter 7
No ratings yet
Chapter 7
8 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Maths - 1a Imp Questions
No ratings yet
Maths - 1a Imp Questions
73 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Worksheet - Set Notation
No ratings yet
Worksheet - Set Notation
4 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Association Rules
No ratings yet
Association Rules
24 pages
Constraint Satisfaction Problems Guide
No ratings yet
Constraint Satisfaction Problems Guide
44 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
First-Order Logic in Artificial Intelligence
No ratings yet
First-Order Logic in Artificial Intelligence
21 pages
2mod Theinverseofonetoonefunctions
No ratings yet
2mod Theinverseofonetoonefunctions
19 pages
Propositional Logic Basics
No ratings yet
Propositional Logic Basics
95 pages
All VerilogLabs
No ratings yet
All VerilogLabs
74 pages
Ckustering Datascience
No ratings yet
Ckustering Datascience
37 pages
Movement Odt
No ratings yet
Movement Odt
1 page
Pawn Movement Bishop Movement
No ratings yet
Pawn Movement Bishop Movement
1 page
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Unit 2
No ratings yet
Unit 2
14 pages
Logical Agents
No ratings yet
Logical Agents
37 pages
Estimation and Confidence Intervals
No ratings yet
Estimation and Confidence Intervals
28 pages
A Handbook of Statistical Analyses Using R Second Edition
No ratings yet
A Handbook of Statistical Analyses Using R Second Edition
47 pages
Revisiting The Works of Mihail Benado
No ratings yet
Revisiting The Works of Mihail Benado
39 pages
Assignment 2 (MCQ and Descriptive Type Questions)
No ratings yet
Assignment 2 (MCQ and Descriptive Type Questions)
5 pages
Apply Funcs DT
No ratings yet
Apply Funcs DT
32 pages
Lab 7
No ratings yet
Lab 7
4 pages
AI Problem Solving for Engineers
No ratings yet
AI Problem Solving for Engineers
18 pages
05 Recursion
No ratings yet
05 Recursion
28 pages
ADA Unit-4
No ratings yet
ADA Unit-4
28 pages
Chapter 2 Notes FULL
No ratings yet
Chapter 2 Notes FULL
15 pages
2-3 - Language of Binary Operations
No ratings yet
2-3 - Language of Binary Operations
6 pages
Just What Is Full-Blooded Platonism?: Abstract: Mark Balaguer Has, in His Platonism and Anti-Platonism in Mathematics
No ratings yet
Just What Is Full-Blooded Platonism?: Abstract: Mark Balaguer Has, in His Platonism and Anti-Platonism in Mathematics
9 pages
Mathematical Induction Explained
No ratings yet
Mathematical Induction Explained
5 pages
Prolog Record
No ratings yet
Prolog Record
10 pages
What Is Data Types in C Language
No ratings yet
What Is Data Types in C Language
9 pages
Resolution
No ratings yet
Resolution
18 pages
NOTES
No ratings yet
NOTES
4 pages
June - Aug: Beginner
No ratings yet
June - Aug: Beginner
3 pages
STD 12 Chapter 12 Solutions
No ratings yet
STD 12 Chapter 12 Solutions
9 pages
Simplex 2
No ratings yet
Simplex 2
6 pages
Mathematical Logic Foundations For Information Science 2nd Edition Wei Li Auth Instant Download
100% (2)
Mathematical Logic Foundations For Information Science 2nd Edition Wei Li Auth Instant Download
37 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
Elementary Number Theory and Methods of Proof: CSE 215, Foundations of Computer Science Stony Brook University
No ratings yet
Elementary Number Theory and Methods of Proof: CSE 215, Foundations of Computer Science Stony Brook University
52 pages