W5 - Apriori

Uploaded by

Kim Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

W5 - Apriori

Uploaded by

Kim Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Data Science and Big Data Analytics

Chap 5: Adv Analytical Theory and

Methods: Association Rules
Chapter Sections

 5.1 Overview
 5.2 Apriori Algorithm
 5.3 Evaluation of Candidate Rules
 5.4 Example: Transactions in a Grocery Store
 5.5 Validation and Testing
 5.6 Diagnostics
5.1 Overview
 Association rules method
 Unsupervised learning method
 Descriptive (not predictive) method
 Used to find hidden relationships in data
 The relationships are represented as rules
 Questions association rules might answer
 Which products tend to be purchased together
 What products do similar customers tend to buy
5.1 Overview
 Example – general logic of association rules
5.1 Overview
 Rules have the form X -> Y
 When X is observed, Y is also observed
 Itemset
 Collection of items or entities
 k-itemset = {item 1, item 2,…,item k}
 Examples
 Items purchased in one transaction
 Set of hyperlinks clicked by a user in one session
5.1 Overview – Apriori Algorithm

 Apriori is the most fundamental algorithm

 Given itemset L, support of L is the percent of
transactions that contain L
 Frequent itemset – items appear together “often
enough”
 Minimum support defines “often enough” (% transactions)
 If an itemset is frequent, then any subset is frequent
5.1 Overview – Apriori Algorithm
 If {B,C,D} frequent, then all subsets frequent
5.2 Apriori Algorithm
Frequent = minimum support
 Bottom-up iterative algorithm
 Identify the frequent (min support) 1-itemsets
 Frequent 1-itemsets are paired into 2-itemsets,
and the frequent 2-itemsets are identified, etc.
 Definitions for next slide
 D = transaction database
 d = minimum support threshold
 N = maximum length of itemset (optional parameter)
 Ck = set of candidate k-itemsets
 Lk = set of k-itemsets with minimum support
5.2 Apriori Algorithm
5.3 Evaluation of Candidate Rules
Confidence
 Frequent itemsets can form candidate rules
 Confidence measures the certainty of a rule

 Minimum confidence – predefined threshold

 Problem with confidence
 Given a rule X->Y, confidence considers only the
antecedent (X) and the co-occurrence of X and Y
 Cannot tell if a rule contains true implication
5.3 Evaluation of Candidate Rules
Lift
 Lift measures how much more often X and Y
occur together than expected if statistically
independent

 Lift = 1 if X and Y are statistically independent

 Lift > 1 indicates the degree of usefulness of the rule
 Example – in 1000 transactions,
 If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in
400, then Lift(milk->eggs) = 0.3/(0.5*0.4) = 1.5
 If {milk, bread} appears in 400, {milk} in 500, and {bread}
in 400, then Lift(milk->bread) = 0.4/(0.5*0.4) = 2.0
5.3 Evaluation of Candidate Rules
Leverage
 Leverage measures the difference in the
probability of X and Y appearing together
compared to statistical independence

 Leverage = 0 if X and Y are statistically independent

 Leverage > 0 indicates degree of usefulness of rule
 Example – in 1000 transactions,
 If {milk, eggs} appears in 300, {milk} in 500, and {eggs} in
400, then Leverage(milk->eggs) = 0.3 - 0.5*0.4 = 0.1
 If {milk, bread} appears in 400, {milk} in 500, and {bread}
in 400, then Leverage (milk->bread) = 0.4 - 0.5*0.4 = 0.2
5.4 Applications of Association Rules

 The term market basket analysis refers to a

specific implementation of association rules
 For better merchandising – products to
include/exclude from inventory each month
 Placement of products within related products
 Association rules also used for
 Recommender systems – Amazon, Netflix
 Clickstream analysis from web usage log files
 Website visitors to page X click on links A,B,C more than on
links D,E,F
5.6 Validation and Testing
 The frequent and high confidence itemsets are found by pre-
specified minimum support and minimum confidence levels
 Measures like lift and/or leverage then ensure that
interesting rules are identified rather than coincidental ones
 However, some of the remaining rules may be considered
subjectively uninteresting because they don’t yield
unexpected profitable actions
 E.g., rules like {paper} -> {pencil} are not interesting/meaningful
 Incorporating subjective knowledge requires domain experts
 Good rules provide valuable insights for institutions to
improve their business operations
5.7 Diagnostics

 Although minimum support is pre-specified in phases 3&4,

this level can be adjusted to target the range of the number
of rules – variants/improvements of Apriori are available
 For large datasets the Apriori algorithm can be
computationally expensive – efficiency improvements
 Partitioning
 Sampling
 Transaction reduction
 Hash-based itemset counting
 Dynamic itemset counting
arules in R
 https://rpubs.com/emzak208/281776

 https://
rpubs.com/aru0511/GroceriesDatasetAssociationAnaly
sis

Association Rules
No ratings yet
Association Rules
24 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Unit - V Part-1
No ratings yet
Unit - V Part-1
43 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
ML Module3
No ratings yet
ML Module3
83 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Slides03 - Items and Association
No ratings yet
Slides03 - Items and Association
17 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Association and Recommendation System
No ratings yet
Association and Recommendation System
24 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Contents
No ratings yet
Contents
59 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
06 CST8390 AssociationRule
No ratings yet
06 CST8390 AssociationRule
21 pages
AI & ML: Association Rule Mining
No ratings yet
AI & ML: Association Rule Mining
46 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Introduction To The Apriori Algorithm
No ratings yet
Introduction To The Apriori Algorithm
10 pages
Association (IML)
No ratings yet
Association (IML)
19 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Chapter 14 Association Rules
No ratings yet
Chapter 14 Association Rules
23 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
Association Rules
No ratings yet
Association Rules
14 pages
Unit - 5 Machine Learning
No ratings yet
Unit - 5 Machine Learning
72 pages
Week04 Association Rules and Collaborative Filtering
No ratings yet
Week04 Association Rules and Collaborative Filtering
21 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
APRIARI Algorithm
No ratings yet
APRIARI Algorithm
55 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining-Module Ii Notes (S4 Bca)
No ratings yet
Data Mining-Module Ii Notes (S4 Bca)
40 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Unit - III
No ratings yet
Unit - III
27 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
Data Mining & Association Rules
No ratings yet
Data Mining & Association Rules
39 pages
Association Rules
No ratings yet
Association Rules
29 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
44 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Chapter 3
No ratings yet
Chapter 3
23 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Online Exam System Worku
No ratings yet
Online Exam System Worku
30 pages
Gateway I IFC User Manual
No ratings yet
Gateway I IFC User Manual
30 pages
BE - Cyber - Security - and - Digital - Forensics - Question Bank
No ratings yet
BE - Cyber - Security - and - Digital - Forensics - Question Bank
2 pages
Project "Build": Development Methodologies
No ratings yet
Project "Build": Development Methodologies
16 pages
XRC Manual
No ratings yet
XRC Manual
342 pages
PL MS6M30 1B-1
No ratings yet
PL MS6M30 1B-1
9 pages
Road To IELTS
0% (2)
Road To IELTS
3 pages
Lista Inteco Septiembre 28
No ratings yet
Lista Inteco Septiembre 28
10 pages
Municipal Complaint Portal Guide
No ratings yet
Municipal Complaint Portal Guide
2 pages
HSDPA HSUPA and HSPA+
No ratings yet
HSDPA HSUPA and HSPA+
65 pages
Supplementary Slides For Software Engineering: A Practitioner's Approach, 5/e
No ratings yet
Supplementary Slides For Software Engineering: A Practitioner's Approach, 5/e
8 pages
HiPath Hospitality V2.0 Brochure
No ratings yet
HiPath Hospitality V2.0 Brochure
14 pages
Wireless Device Functions Explained
No ratings yet
Wireless Device Functions Explained
3 pages
Merge Multiple Excel Sheets Project Repot
No ratings yet
Merge Multiple Excel Sheets Project Repot
7 pages
Python Basic Codes
No ratings yet
Python Basic Codes
8 pages
Bus Enquiry System
No ratings yet
Bus Enquiry System
29 pages
Scada Report
No ratings yet
Scada Report
29 pages
Pro3 Working With A Mock Client
No ratings yet
Pro3 Working With A Mock Client
10 pages
Introduction To Java 2 Platform
No ratings yet
Introduction To Java 2 Platform
43 pages
为您提供虚拟电话号码的最佳解决方案
100% (2)
为您提供虚拟电话号码的最佳解决方案
8 pages
Downloads
No ratings yet
Downloads
15 pages
Suman Resume
No ratings yet
Suman Resume
2 pages
Durgesh Jha Resume
No ratings yet
Durgesh Jha Resume
3 pages
Places: 10M Image Scene Database
No ratings yet
Places: 10M Image Scene Database
14 pages
Download
No ratings yet
Download
4 pages
Assignment B52
No ratings yet
Assignment B52
13 pages
First Parallel Test in Empowerment Technologies 11: Godwino Integrated School
100% (1)
First Parallel Test in Empowerment Technologies 11: Godwino Integrated School
3 pages
Sharp - MX M550 620 700
No ratings yet
Sharp - MX M550 620 700
12 pages
Unit 4 New
No ratings yet
Unit 4 New
129 pages
Interview Questions For QA Tester
No ratings yet
Interview Questions For QA Tester
27 pages

W5 - Apriori

Uploaded by

W5 - Apriori

Uploaded by

Data Science and Big Data Analytics

Chap 5: Adv Analytical Theory and

 Apriori is the most fundamental algorithm

 Minimum confidence – predefined threshold

 Lift = 1 if X and Y are statistically independent

 Leverage = 0 if X and Y are statistically independent

 The term market basket analysis refers to a

 Although minimum support is pre-specified in phases 3&4,

You might also like