Unit - V Part-1

This document provides an overview of association rules in data science, focusing on their application in analyzing purchasing behavior through algorithms like Apriori. It discusses concepts such as support, confidence, lift, and leverage, which are essential for evaluating the strength of discovered rules. Additionally, it highlights practical applications, including recommendation systems and market basket analysis, along with methods for validating and improving the efficiency of the Apriori algorithm.

Uploaded by

Manthena Narasimha Raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views43 pages

Unit - V Part-1

Uploaded by

Manthena Narasimha Raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Data Science with R

Unit V (Part-1) : Association Rules

M. Narasimha Raju
Asst Professor, Dept. of Computer Science &
Engineering
Association Rules:
Association Rules Regression
Overview Linear Regression
Apriori Algorithm Logistic Regression
Evaluation of Candidate Reasons to Choose and
Rules Cautions.
Applications of Association
Rules
Example
Validation and Testing

2
Overview
Given a large collection of transactions, in which each
transaction consists of one or more items, association rules go
through the items being purchased to see what items are
frequently bought together and to discover a list of rules that
describe the purchasing behavior.
The goal with association rules is to discover interesting
relationships among the items.
The relationships that are interesting depend both on the
business context and the nature of the algorithm being used
for the discovery.

3
The general logic behind association rules

4
Association
Each of the uncovered rules is in the form X → Y, meaning
that when item X is observed, item Y is also observed. In
this case, the left-hand side (LHS) of the rule is X, and the
right-hand side (RHS) of the rule is Y.

Using association rules, patterns can be discovered from

the data that allow the association rule algorithms to
disclose rules of related product purchases.

5
Market basket analysis.
Each transaction can be viewed as the shopping basket of a
customer that contains one or more items. This is also known as an
itemset.
The term itemset refers to a collection of items or individual
entities that contain some kind of relationship.
This could be a set of retail items purchased together in one
transaction, a set of hyperlinks clicked on by one user in a single
session, or a set of tasks done in one day.
An itemset containing k items is called a k-itemset. {item 1,item 2,.
. . item k} to denote a k-itemset.
Computation of the association rules is typically based on itemsets.
6
Apriori - Suppot
It pioneered the use of support for pruning the itemsets
and controlling the exponential growth of candidate
itemsets.
Given an itemset L, the support of L is the percentage of
transactions that contain L.
For example, if 80% of all transactions contain itemset
{bread}, then the support of {bread} is 0.8.
Similarly, if 60% of all transactions contain itemset
{bread, butter}, then the support of {bread, butter} is
0.6.
7
Minimum support
A frequent itemset has items that appear together often
enough.
If the minimum support is set at 0.5, any itemset can be
considered a frequent itemset if at least 50% of the
transactions contain this itemset.
The support of a frequent itemset should be greater than or
equal to the minimum support
If an itemset is considered frequent, then any subset of the
frequent itemset must also be frequent.
This is referred to as the Apriori property (or downward
closure property). 8
Frequent item sets
If 60% of the transactions
contain {bread,jam}, then at
least 60% of all the transactions
will contain {bread} or {jam}.
In other words, when the
support of {bread,jam} is 0.6,
the support of {bread} or
{jam} is at least 0.6.
If itemset {B,C,D} is frequent,
then all the subsets of this
itemset, shaded, must also be
frequent itemsets.
9
Apriori Algorithm
The Apriori algorithm takes a bottom-up iterative approach to uncovering
the frequent itemsets by first determining all the possible items (or 1-
itemsets, for example {bread}, {eggs}, {milk}, …) and then identifying
which among them are frequent.

Assuming the minimum support threshold (or the minimum support

criterion) is set at 0.5, the algorithm identifies and retains those itemsets
that appear in at least 50% of all transactions and discards (or “prunes
away”) the itemsets that have a support less than 0.5 or appear in fewer
than 50% of the transactions.

The word prune is used like it would be in gardening, where unwanted

branches of a bush are clipped away.
10
Apriori algorithm
In the next iteration of the Apriori algorithm, the identified frequent 1-itemsets are
paired into 2-itemsets (for example, {bread,eggs}, {bread,milk}, {eggs,milk}, …)
and again evaluated to identify the frequent 2-itemsets among them.

At each iteration, the algorithm checks whether the support criterion can be met;
if it can, the algorithm grows the itemset, repeating the process until it runs out of
support or until the itemsets reach a predefined length.

Let variable Ck be the set of candidate k-itemsets and variable Lk be the set of k-
itemsets that satisfy the minimum support. Given a transaction database D, a
minimum support threshold δ, and an optional parameter N indicating the
maximum length an itemset could reach, Apriori iteratively computes frequent
itemsets Lk+1 based on Lk.

11
Apriori algorithm

12
Apriori algorithm
The first step of the Apriori algorithm is to identify the frequent
itemsets by starting with each item in the transactions that meets the
predefined minimum support threshold δ.
These itemsets are 1-itemsets denoted as L1, as each 1-itemset
contains only one item.
Next, the algorithm grows the itemsets by joining L1 onto itself to
form new, grown 2-itemsets denoted as L2 and determines the
support of each 2-itemset in L2.
Those itemsets that do not meet the minimum support threshold δ are
pruned away.
The growing and pruning process is repeated until no itemsets meet
the minimum support threshold.
Once completed, output of the Apriori algorithm is the collection of all
the frequent k-itemsets.
13
Evaluation of Candidate Rules
Confidence is defined as the measure of certainty or
trustworthiness associated with each discovered rule.
Confidence is the percent of transactions that contain both X
and Y out of all the transactions that contain X

For example, if {bread, eggs, milk} has a support of 0.15 and

{bread, eggs} also has a support of 0.15, the confidence of
rule {bread, eggs}→{milk} is 1, which means 100% of the
time a customer buys bread and eggs, milk is bought as well.
14
Evaluation of Candidate Rules
A relationship may be thought of as interesting when the algorithm
identifies the relationship with a measure of confidence greater
than or equal to a predefined threshold.
This predefined threshold is called the minimum confidence.
Lift measures how many times more often X and Y occur together
than expected if they are statistically independent of each other.
Lift is a measure of how X and Y are really related rather than
coincidentally happening together

Lift is 1 if X and Y are statistically independent of each other. In

contrast, a lift of X → Y greater than 1 indicates that there is some
usefulness to the rule. A larger value of lift suggests a greater
strength of the association between X and Y.
15
Evaluation of Candidate Rules
Assuming 1,000 transactions, with {milk,eggs} appearing in
300 of them, {milk} appearing in 500, and {eggs} appearing
in 400, then Lift(milk→eggs)=0.3/(0.5* 0.4)=1.5.
If {bread} appears in 400 transactions and {milk,bread}
appears in 400, then Lift(milk→bread)=0.4 /(0.5* 0.4)=2.
Therefore it can be concluded that milk and bread have a
stronger association than milk and eggs.

16
 Leverage measures the difference in the probability of X and Y appearing
together in the dataset compared to what would be expected if X and Y were
statistically independent of each other.

 Leverage is 0 when X and Y are statistically independent of each other.

 If X and Y have some kind of relationship, the leverage would be greater than
zero.
 A larger leverage value indicates a stronger relationship between X and Y.
 For the previous example, Leverage(milk→eggs)=0.3−(0.5* 0.4)=0.1 and
Leverage(milk→bread)=0.4−(0.5* 0.4)=0.2.
 It again confirms that milk and bread have a stronger association than milk
and eggs.
17
Applications of Association Rules
Broad-scale approaches to better merchandising—what
products should be included in or excluded from the inventory
each month
Cross-merchandising between products and high-margin or
high-ticket items
Physical or logical placement of product within related
categories of products
Promotional programs—multiple product purchase
incentives managed through a loyalty card program

18
Recommendation Systems
Many online service providers such as Amazon and Netflix use
recommender systems.
Recommender systems can use association rules to discover
related products or identify customers who have similar
interests.
For example, association rules may suggest that those
customers who have bought product A have also bought
product B, or those customers who have bought products A, B,
and C are more similar to this customer.
These findings provide opportunities for retailers to cross-sell
their products.
19
Clickstream analysis
Clickstream analysis refers to the analytics on data related to
web browsing and user clicks, which is stored on the client or
the server side.
Web usage log files generated on web servers contain huge
amounts of information, and association rules can potentially
give useful knowledge to web usage data analysts.
For example, association rules may suggest that website
visitors who land on page X click on links A, B, and C much
more often than links D, E, and F.
This observation provides valuable insight on how to better
personalize and recommend the content to site visitors.
20
An Example: Transactions in a Grocery Store
Using R and the arules and arulesViz packages

The Groceries Dataset

21
The class of the dataset is
transactions, as defined by the arules
package. The transactions class
contains three slots:
transactionInfo: A data frame with
vectors of the same length as the
number of transactions
itemInfo: A data frame to store item
labels
data: A binary incidence matrix that
indicates which item labels appear in
every transaction
22
23
Frequent Itemset Generation
The apriori() function from the arule package implements the Apriori algorithm to
create frequent itemsets.
Note that, by default, the apriori() function executes all the iterations at once.
Assume that the minimum support threshold is set to 0.02 based on management
discretion.
Because the dataset contains 9,853 transactions, an itemset should appear at least
198 times to be considered a frequent itemset.
The first iteration of the Apriori algorithm computes the support of each product in
the dataset and retains those products that satisfy the minimum support.
The following code identifies 59 frequent 1-itemsets that satisfy the minimum
support.
The parameters of apriori() specify the minimum and maximum lengths of the
itemsets, the minimum support threshold, and the target indicating the type of
association mined.
24
25
26
27
28
29
30
31
Rule Generation and Visualization

32
plot(rules)
The scatterplot shows that, of the 2,918 rules generated from
the Groceries dataset, the highest lift occurs at a low support
and a low confidence.

33
Entering plot(rules@quality) displays a scatterplot matrix
(Figure 5-4) to compare the support, confidence, and lift of
the 2,918 rules.

lift is
proportional to
confidence and
illustrates
several linear
groupings.

34
Lift=Confidence / Support(Y).
when the support of Y remains the same, lift is proportional to
confidence, and the slope of the linear trend is the reciprocal
of Support(Y).

35
36
37
38
Validation and Testing
After gathering the output rules, it may become
necessary to use one or more methods to validate the
results in the business context for the sample dataset.
The first approach can be established through statistical
measures such as confidence, lift, and leverage.
Rules that involve mutually independent items or cover
few transactions are considered uninteresting because
they may capture spurious relationships.

39
Confidence measures the chance that X and Y appear together in
relation to the chance X appears.
Confidence can be used to identify the interestingness of the rules.
Lift and leverage both compare the support of X and Y against their
individual support.
While mining data with association rules, some rules generated could be
purely coincidental.
For example, if 95% of customers buy X and 90% of customers buy Y,
then X and Y would occur together at least 85% of the time, even if there
is no relationship between the two.
Measures like lift and leverage ensure that interesting rules are
identified rather than coincidental ones.
40
Diagnostics
Although the Apriori algorithm is easy to understand and
implement, some of the rules generated are uninteresting or
practically useless.
Additionally, some of the rules may be generated due to coincidental
relationships between the variables.
Measures like confidence, lift, and leverage should be used along
with human insights to address this problem.
The Apriori algorithm reduces the computational workload by only
examining itemsets that meet the specified minimum threshold.
However, depending on the size of the dataset, the Apriori algorithm
can be computationally expensive.
For each level of support, the algorithm requires a scan of the entire
41
Approaches to improve Apriori’s efficiency:
Partitioning: Any itemset that is potentially frequent in a transaction
database must be frequent in at least one of the partitions of the
transaction database.
Sampling: This extracts a subset of the data with a lower support
threshold and uses the subset to perform association rule mining.
Transaction reduction: A transaction that does not contain frequent k-
itemsets is useless in subsequent scans and therefore can be ignored.
Hash-based itemset counting: If the corresponding hashing bucket
count of a k-itemset is below a certain threshold, the k-itemset cannot
be frequent.
Dynamic itemset counting: Only add new candidate itemsets when all
of their subsets are estimated to be frequent.
42
Data Science with R
Unit V (Part-2) : Association Rules With R Programming

Thank
You
M. Narasimha Raju
Asst Professor, Dept. of Computer Science &
Engineering

Data Mining and Warehouse MCQS With Answer Good
74% (72)
Data Mining and Warehouse MCQS With Answer Good
30 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Apriori Algorithm for Association Rule Mining
No ratings yet
Apriori Algorithm for Association Rule Mining
32 pages
Big Data Analytics Unit3
No ratings yet
Big Data Analytics Unit3
27 pages
ML Module3
No ratings yet
ML Module3
83 pages
DS Notes BCA
No ratings yet
DS Notes BCA
16 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
M.Tech JNTUK ADS UNIT-2
100% (1)
M.Tech JNTUK ADS UNIT-2
20 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Apriori Algorithm DWDM
No ratings yet
Apriori Algorithm DWDM
5 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Association Rules
No ratings yet
Association Rules
39 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
Mod 3 Notes Full
No ratings yet
Mod 3 Notes Full
25 pages
Slides
No ratings yet
Slides
92 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Sample Questions and Answers For All Subjects of MCA SEM 5 SMU
No ratings yet
Sample Questions and Answers For All Subjects of MCA SEM 5 SMU
88 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Lec 2
No ratings yet
Lec 2
18 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
OC - Module 4 - Theory and Methods 021312
No ratings yet
OC - Module 4 - Theory and Methods 021312
157 pages
Association Rules
No ratings yet
Association Rules
24 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
W5 - Apriori
No ratings yet
W5 - Apriori
16 pages
M.Tech JNTUK ADS UNIT-3
No ratings yet
M.Tech JNTUK ADS UNIT-3
13 pages
Market Basket Analysis & Apriori Algorithm
No ratings yet
Market Basket Analysis & Apriori Algorithm
10 pages
Data Warehousing & Mining Guide
100% (1)
Data Warehousing & Mining Guide
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Association Rules
No ratings yet
Association Rules
29 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
UNIT - I Part-1
No ratings yet
UNIT - I Part-1
27 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
17 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
CN Unit-3
No ratings yet
CN Unit-3
37 pages
CN Unit-5
No ratings yet
CN Unit-5
37 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
CS8091 BDA Unit 3
No ratings yet
CS8091 BDA Unit 3
144 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
UNIT - I Part 2 & 3
No ratings yet
UNIT - I Part 2 & 3
87 pages
07 FPAdvanced
No ratings yet
07 FPAdvanced
81 pages
Unit - V
No ratings yet
Unit - V
121 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
AI & ML: Association Rule Mining
No ratings yet
AI & ML: Association Rule Mining
46 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
BD25
No ratings yet
BD25
19 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
Data Warehouse Architecture Guide
No ratings yet
Data Warehouse Architecture Guide
21 pages
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
75 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
44 pages
Unit - IV Part-1
No ratings yet
Unit - IV Part-1
40 pages
Unit - II Part-2
No ratings yet
Unit - II Part-2
20 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
CN Unit-1
No ratings yet
CN Unit-1
32 pages
CN Unit-2
No ratings yet
CN Unit-2
32 pages
Unit - Iii
No ratings yet
Unit - Iii
78 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Unit - III
No ratings yet
Unit - III
27 pages
Unit-3 New
No ratings yet
Unit-3 New
75 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Project Report
No ratings yet
Project Report
57 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
FP Tree
No ratings yet
FP Tree
42 pages
Data Mining & Association Rules
No ratings yet
Data Mining & Association Rules
39 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
5 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
Contents
No ratings yet
Contents
59 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Mining Web Access Patterns With Super-Pattern Constraint
No ratings yet
Mining Web Access Patterns With Super-Pattern Constraint
13 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
Data Mining for Database Experts
No ratings yet
Data Mining for Database Experts
2 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Unit - 4
No ratings yet
Unit - 4
15 pages
Aos
No ratings yet
Aos
8 pages
What Is Patent Rights
No ratings yet
What Is Patent Rights
8 pages
Group 11: Members: Juliene Chloe Fuentes Maillen Fernandez Glenn Galladora
No ratings yet
Group 11: Members: Juliene Chloe Fuentes Maillen Fernandez Glenn Galladora
21 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Study of Temporal Data Mining Techniques IJERTV3IS100183
No ratings yet
Study of Temporal Data Mining Techniques IJERTV3IS100183
4 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Course Recommender System Aims at Predicting The Best Combination of Courses Selected by Students-1
No ratings yet
Course Recommender System Aims at Predicting The Best Combination of Courses Selected by Students-1
29 pages
Transaction ID Items Bought: Original Table
No ratings yet
Transaction ID Items Bought: Original Table
3 pages