0% found this document useful (0 votes)

22 views11 pages

DM Module 3

Gjjgcg vv

Uploaded by

abhiramsurya48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

DM Module 3

Gjjgcg vv

Uploaded by

abhiramsurya48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

MODULE 3

MINING FREQUENT PATTERNS

Frequent patterns are patterns (e.g., itemsets, subsequences, or substructures) that appear
frequently in a data set. For example, a set of items, such as milk and bread, that appear
frequently together in a transaction data set is a frequent itemset. A subsequence, such as buying
first a PC, then a digital camera, and then a memory card, if it occurs frequently in a shopping
history database, is a (frequent) sequential pattern. A substructure can refer to different structural
forms, such as subgraphs, subtrees, or sublattices, which may be combined with itemsets or
subsequences. If a substructure occurs frequently, it is called a (frequent) structured pattern.
Finding frequent patterns plays an essential role in mining associations, correlations, and many
other interesting relationships among data.
Market Basket Analysis: A Motivating Example
A typical example of frequent itemset mining is market basket analysis. This process analyzes
customer buying habits by finding associations between the different items that customers place
in their “shopping baskets” (Figure 6.1). The discovery of these associations can help retailers
develop marketing strategies by gaining insight into which items are frequently purchased
together by customers. For instance, if customers are buying milk, how likely are they to also
buy bread (and what kind of bread) on the same trip to the supermarket? This information can
lead to increased sales by helping retailers do selective marketing and plan their shelf space.
Rule support and confidence are two measures of rule interestingness. They respectively reflect
the usefulness and certainty of discovered rules. A support of 2% for above Rule means that 2%
of all the transactions under analysis show that computer and antivirus software are purchased
together. A confidence of 60% means that 60% of the customers who purchased a computer also
bought the software. Typically, association rules are considered interesting if they satisfy both a
minimum support threshold and a minimum confidence threshold. These thresholds can be a set
by users or domain experts.
Frequent Itemsets, Closed Itemsets, and Association Rules
Support(A=>B)=P(AUB)
Confidence(A=>B)=P(A\B)
Rules that satisfy both a minimum support threshold (min sup) and a minimum confidence
threshold (min conf ) are called strong. A set of items is referred to as an itemset. An itemset that
contains k items is a k-itemset. The set fcomputer, antivirus softwareg is a 2-itemset. The
occurrence frequency of an itemset is the number of transactions that contain the itemset. This is
also known, simply, as the frequency, support count, or count of the itemset.

In general, association rule mining can be viewed as a two-step process:

1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count, min sup.
2. Generate strong association rules from the frequent itemsets: By definition, these rules must
satisfy minimum support and minimum confidence.

Frequent Itemset Mining Methods

Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining frequent
itemsets for Boolean association rules [AS94b]. The name of the algorithm is based on the fact that the
algorithm uses prior knowledge of frequent itemset properties, as we shall see later. Apriori employs an
iterative approach known as a level-wise search, where k-itemsets are used to explore (k +1)- itemsets.
First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each
item, and collecting those items that satisfy minimum support. The resulting set is denoted by L1. Next,
L1 is used to find L2, the set of frequent 2- itemsets, which is used to find L3, and so on, until no more
frequent k-itemsets can be found. The finding of each Lk requires one full scan of the database. To
improve the efficiency of the level-wise generation of frequent itemsets, an important property called the
Apriori property is used to reduce the search space.

Let’s look at a concrete example, based on the AllElectronics transaction database, D, of Table . There are
nine transactions in this database, that is, |D| = 9. We use Figure 6.2 to illustrate the Apriori algorithm for
finding frequent itemsets in D.
1. In the first iteration of the algorithm, each item is a member of the set of candidate 1-itemsets, C1. The
algorithm simply scans all of the transactions to count the number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2. (Here, we are referring to
absolute support because we are using a support count. The corresponding relative support is 2/9 = 22%.)
The set of frequent 1-itemsets, L1, can then be determined. It consists of the candidate 1-itemsets
satisfying minimum support. In our example, all of the candidates in C1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 L1 to generate a
candidate set of 2-itemsets, C2. C2 consists of (|L1|C2) 2-itemsets. Note that no candidates are removed
from C2 during the prune step because each subset of the candidates is also frequent.
4. Next, the transactions in D are scanned and the support count of each candidate itemset in C2 is
accumulated, as shown in the middle table of the second row in Figure.
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-itemsets in C2
having minimum support.
6. The generation of the set of the candidate 3-itemsets, C3, is detailed in Figure. From the join step, we
first get C3 =L2 L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5},{I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
7. The transactions in D are scanned to determine L3, consisting of those candidate 3-itemsets in C3
having minimum support (Figure).
8. The algorithm uses L3 1 L3 to generate a candidate set of 4-itemsets, C4. Although the join results in
{{I1, I2, I3, I5}}, itemset {I1, I2, I3, I5} is pruned because its subset {I2, I3, I5} is not frequent. Thus, C4
= , and the algorithm terminates, having found all of the frequent itemsets.
FP Growth Algorithm
The FP-Growth Algorithm is an alternative way to find frequent item sets without using
candidate generations, thus improving performance. For so much, it uses a divide-and-conquer
strategy. The core of this method is the usage of a special data structure named frequent-pattern
tree (FP-tree), which retains the item set association information.
Using this strategy, the FP-Growth reduces the search costs by recursively looking for short
patterns and then concatenating them into the long frequent patterns.

In large databases, holding the FP tree in the main memory is impossible. A strategy to cope with
this problem is to partition the database into a set of smaller databases (called projected
databases) and then construct an FP-tree from each of these smaller databases.

FP-Tree

The frequent-pattern tree (FP-tree) is a compact data structure that stores quantitative
information about frequent patterns in a database. Each transaction is read and then mapped onto
a path in the FP-tree. This is done until all transactions have been read. Different transactions
with common subsets allow the tree to remain compact because their paths overlap.

A frequent Pattern Tree is made with the initial item sets of the database. The purpose of the FP
tree is to mine the most frequent pattern. Each node of the FP tree represents an item of the item
set.

The root node represents null, while the lower nodes represent the item sets. The associations of
the nodes with the lower nodes, that is, the item sets with the other item sets, are maintained
while forming the tree.

Han defines the FP-tree as the tree structure given below:

1. One root is labelled as "null" with a set of item-prefix subtrees as children and a frequent-
item-header table.
2. Each node in the item-prefix subtree consists of three fields:
o Item-name: registers which item is represented by the node;
o Count: the number of transactions represented by the portion of the path reaching
the node;
o Node-link: links to the next node in the FP-tree carrying the same item name or
null if there is none.
3. Each entry in the frequent-item-header table consists of two fields:
o Item-name: as the same to the node;
o Head of node-link: a pointer to the first node in the FP-tree carrying the item
name.

Additionally, the frequent-item-header table can have the count support for an item. The below
diagram is an example of a best-case scenario that occurs when all transactions have the same
itemset; the size of the FP-tree will be only a single branch of nodes.

The worst-case scenario occurs when every transaction has a unique item set. So the space
needed to store the tree is greater than the space used to store the original data set because the
FP-tree requires additional space to store pointers between nodes and the counters for each item.
The diagram below shows how a worst-case scenario FP-tree might appear. As you can see, the
tree's complexity grows with each transaction's uniqueness.
Using this algorithm, the FP-tree is constructed in two database scans. The first scan collects and
sorts the set of frequent items, and the second constructs the FP-Tree.

Example

Support threshold=50%, Confidence= 60%

Table 1:

Transaction List of items

T1 I1,I2,I3

T2 I2,I3,I4

T3 I4,I5

T4 I1,I2,I4

T5 I1,I2,I3,I5

T6 I1,I2,I3,I4

Solution: Support threshold=50% => 0.5*6= 3 => min_sup=3

Table 2: Count of each item

Item Count

I1 4

I2 5

I3 4

I4 4

I5 2
Table 3: Sort the itemset in descending order.

Item Count

I2 5

I1 4

I3 4

I4 4

Build FP Tree

Let's build the FP tree in the following steps, such as:

1. Considering the root node null.

2. The first scan of Transaction T1: I1, I2, I3 contains three items {I1:1}, {I2:1}, {I3:1},
where I2 is linked as a child, I1 is linked to I2 and I3 is linked to I1.
3. T2: I2, I3, and I4 contain I2, I3, and I4, where I2 is linked to root, I3 is linked to I2 and I4
is linked to I3. But this branch would share the I2 node as common as it is already used in
T1.
4. Increment the count of I2 by 1, and I3 is linked as a child to I2, and I4 is linked as a child
to I3. The count is {I2:2}, {I3:1}, {I4:1}.
5. T3: I4, I5. Similarly, a new branch with I5 is linked to I4 as a child is created.
6. T4: I1, I2, I4. The sequence will be I2, I1, and I4. I2 is already linked to the root node.
Hence it will be incremented by 1. Similarly I1 will be incremented by 1 as it is already
linked with I2 in T1, thus {I2:3}, {I1:2}, {I4:1}.
7. T5:I1, I2, I3, I5. The sequence will be I2, I1, I3, and I5. Thus {I2:4}, {I1:3}, {I3:2},
{I5:1}.
8. T6: I1, I2, I3, I4. The sequence will be I2, I1, I3, and I4. Thus {I2:5}, {I1:4}, {I3:3}, {I4
1}.
MINING ASSOCIATION RULES

Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be more
profitable. It tries to find some interesting relations or associations among the variables of
dataset. It is based on different rules to discover the interesting relations between variables in the
database.

The association rule learning is one of the very important concepts of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. Here
market basket analysis is a technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a supermarket, as in a
supermarket, all products that are purchased together are put together.Consider the below
diagram:
Types of association rules in data mining

There are multiple types of association rules in data mining. They include the following:

 Generalized. Rules in this category are meant to be general examples of association rules that
provide a high-level overview of what these associations of data points look like.

 Multilevel. Multilevel association rules separate data points into different levels of importance
-- known as levels of abstraction -- and distinguish between associations of more important
data points and ones of lower importance.

 Quantitative. This type of association rule is used to describe examples where associations
are made between numerical data points.

 Multi relational. This type is more in-depth than traditional association rules that consider
relationships between single data points. Multirelational rules are made across multiple
or multidimensional databases.

CS2029-Advanced Database Technology
No ratings yet
CS2029-Advanced Database Technology
18 pages
DWDM Unit III Notes
No ratings yet
DWDM Unit III Notes
23 pages
Data Mining 5 Units Notes
No ratings yet
Data Mining 5 Units Notes
85 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Unit 2-2
No ratings yet
Unit 2-2
53 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
MLA-C01 AWS Certified Machine Learning Engineer - Associate Practice Questions
100% (1)
MLA-C01 AWS Certified Machine Learning Engineer - Associate Practice Questions
17 pages
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
No ratings yet
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
21 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Unit 3
No ratings yet
Unit 3
62 pages
Association Rules
No ratings yet
Association Rules
48 pages
Data Mining for Business Insights
No ratings yet
Data Mining for Business Insights
7 pages
Module 2
No ratings yet
Module 2
14 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
Mining Frequent Patterns Ubnit 3
No ratings yet
Mining Frequent Patterns Ubnit 3
25 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
DS Notes BCA
No ratings yet
DS Notes BCA
16 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
Limited Pass Algorithm
No ratings yet
Limited Pass Algorithm
33 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Unit II
No ratings yet
Unit II
22 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Data Mining Course Guide
No ratings yet
Data Mining Course Guide
126 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
ML Notes UT-1
No ratings yet
ML Notes UT-1
21 pages
Final-BCA V and VI Sem Syllabus
No ratings yet
Final-BCA V and VI Sem Syllabus
25 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
426-Article Text-1037-1-10-20210421
No ratings yet
426-Article Text-1037-1-10-20210421
9 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Answers PDF
No ratings yet
Answers PDF
9 pages
Jawaharlal Nehru Engineering College Aurangabad: Data Warehousing and Data Mining (DWDM)
No ratings yet
Jawaharlal Nehru Engineering College Aurangabad: Data Warehousing and Data Mining (DWDM)
37 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Question Paper LH - MLT
No ratings yet
Question Paper LH - MLT
93 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
23 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
772s Data - Mining.concepts - And.techniques.2nd - Ed
No ratings yet
772s Data - Mining.concepts - And.techniques.2nd - Ed
239 pages
RDataMining Slides Association Rules PDF
No ratings yet
RDataMining Slides Association Rules PDF
75 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
MCS-221 2024-25 em
No ratings yet
MCS-221 2024-25 em
34 pages
Frequent Pattern Analysis Guide
No ratings yet
Frequent Pattern Analysis Guide
5 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
PHP VI TH 3rd
100% (1)
PHP VI TH 3rd
15 pages
Chapter 2 Data Mining
No ratings yet
Chapter 2 Data Mining
25 pages
Analysis and Implementation of FP & Q-FP Tree With Minimum CPU Utilization in Association Rule Mining
No ratings yet
Analysis and Implementation of FP & Q-FP Tree With Minimum CPU Utilization in Association Rule Mining
6 pages
Optimizing Promotion Strategies With Business Intelligence, Customer Segmentation, and Market Basket Analysis
No ratings yet
Optimizing Promotion Strategies With Business Intelligence, Customer Segmentation, and Market Basket Analysis
37 pages
Mod 5
No ratings yet
Mod 5
56 pages
Unit-3 AIA (BCA3) Not Complete
No ratings yet
Unit-3 AIA (BCA3) Not Complete
15 pages
Mtech Project Seminar1
No ratings yet
Mtech Project Seminar1
36 pages
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
No ratings yet
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
5 pages
BDOC
No ratings yet
BDOC
21 pages
PES University, Bangalore: UE21CS342AA2 - Data Analytics - Worksheet 4B
No ratings yet
PES University, Bangalore: UE21CS342AA2 - Data Analytics - Worksheet 4B
1 page
Efficient Algorithm for Closed Itemsets
No ratings yet
Efficient Algorithm for Closed Itemsets
8 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
MEMBA 2017 March 10 Competing On Analytics Part I
No ratings yet
MEMBA 2017 March 10 Competing On Analytics Part I
33 pages
Course Recommender System Aims at Predicting The Best Combination of Courses Selected by Students-1
No ratings yet
Course Recommender System Aims at Predicting The Best Combination of Courses Selected by Students-1
29 pages
2153-Article Text-12461-1-10-20240503
No ratings yet
2153-Article Text-12461-1-10-20240503
12 pages
Data Analytics Sys
No ratings yet
Data Analytics Sys
1 page
Data Warehousing & Mining Guide
No ratings yet
Data Warehousing & Mining Guide
10 pages
BDA - Assignment 2
No ratings yet
BDA - Assignment 2
2 pages
Computers
No ratings yet
Computers
167 pages
21CSA301 Datamining-Final
No ratings yet
21CSA301 Datamining-Final
10 pages
137BQ122019
No ratings yet
137BQ122019
2 pages
Educational Data Mining and Its Role in Determining Factors Affecting Students Academic Performance A Systematic Review
No ratings yet
Educational Data Mining and Its Role in Determining Factors Affecting Students Academic Performance A Systematic Review
7 pages
UNIT-2 AI Notes
No ratings yet
UNIT-2 AI Notes
7 pages
Data Mining Algorithms Guide
No ratings yet
Data Mining Algorithms Guide
8 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages

DM Module 3

Uploaded by

DM Module 3

Uploaded by

MODULE 3

MINING FREQUENT PATTERNS

In general, association rule mining can be viewed as a two-step process:

Frequent Itemset Mining Methods

Han defines the FP-tree as the tree structure given below:

Support threshold=50%, Confidence= 60%

Transaction List of items

Solution: Support threshold=50% => 0.5*6= 3 => min_sup=3

Table 2: Count of each item

Let's build the FP tree in the following steps, such as:

1. Considering the root node null.

You might also like