0% found this document useful (0 votes)

35 views14 pages

Data Mining Unit-V

The document covers key concepts in association rule mining, including antecedents and consequents, and their importance in identifying purchase patterns and optimizing marketing strategies. It also discusses multi-relational association rules, which extend traditional mining techniques to analyze data across multiple tables, and provides a case study on market basket analysis to illustrate practical applications. Additionally, it introduces cluster analysis methods, including partitioning, hierarchical, and density-based approaches, highlighting their strengths and weaknesses.

Uploaded by

chandini18225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views14 pages

Data Mining Unit-V

Uploaded by

chandini18225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT-5:

Association rule mining: Antecedent, consequent, muti-relational associatios rules,

ECLAT.case study on market basket analysis.
Cluster Analysis: Cluster analysis, partitioning methods, Hierarchal methods, density based
methods-DBSCAN.
Association Rule Mining: Antecedent
Association Rule Mining is a fundamental data mining technique used to discover interesting relationships
(associations) between variables in large datasets. It is commonly applied in market basket analysis to find
patterns in customer purchase behavior.

Key Concepts:
1. Association Rules: An association rule is an implication of the form A⇒B where:
o A is the antecedent (or left-hand side) of the rule.
o B is the consequent (or right-hand side) of the rule.
o The rule suggests that if A occurs, then B is likely to occur as well.

2. Antecedent: The antecedent is the set of items that appear in the transactions, which leads to the
consequent. In the context of market basket analysis, it represents the items that a customer
purchases.
o Example: In the rule {Milk}⇒{Bread} "Milk" is the antecedent, and "Bread" is the consequent.
This rule indicates that customers who buy milk are likely to also buy bread.

Importance of Antecedents
1. Identifying Purchase Patterns: Understanding the antecedents helps businesses identify patterns in
customer purchasing behavior, allowing them to optimize marketing strategies and product
placement.

2. Targeted Marketing: By analyzing antecedents, businesses can develop targeted marketing

campaigns that promote products frequently purchased together.
For example, if the antecedent frequently includes "diapers," retailers might promote "baby wipes"
alongside.
3. Inventory Management: Retailers can manage inventory more effectively by understanding which
items are frequently bought together, leading to better stock management and reducing stockouts of
related products.
4. Cross-Selling Opportunities: Identifying strong antecedents enables companies to create cross-
selling opportunities.
For example, if data shows that customers who purchase "laptops" also often buy "laptop bags," the
retailer can promote laptop bags alongside laptops.
Measuring Association Rules To evaluate the strength and relevance of association rules, several metrics are
used:
1. Support:
o Support is the proportion of transactions in the dataset that contain both the antecedent and
the consequent.
o Formula:
frequency(X,Y)
Support=
𝑁
o High support indicates that the rule is applicable to a large portion of the dataset.

2. Confidence:
o Confidence measures how often the consequent is found in transactions that contain the
antecedent.
frequency(X,Y)
o Formula: Confidence =
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑋)

o High confidence indicates a strong likelihood that if A occurs, B will also occur.
3. Lift:
o Lift evaluates the strength of the association rule relative to the expected occurrence of the
consequent if the antecedent was not present.
Support
o Formula: Lift(A⇒B) =
𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋)∗𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝑌)

o A lift value greater than 1 indicates a positive association between A and B, meaning that the
presence of A increases the likelihood of B.
Example Consider a dataset containing transactions from a grocery store:

Transaction ID Items

1 Milk, Bread

2 Milk, Diapers

3 Bread, Diapers

4 Milk, Bread, Diapers

5 Bread,milk

From this data, we can generate the following association rule:

 Rule: {Milk}⇒{Bread}
Calculating Support, Confidence, and Lift
1. Support:

o Total transactions: 5
o Transactions with both Milk and Bread: 3 (Transactions 1, 4, and 5)
3
o Support = 5 =0.6
2. Confidence:
o Transactions with Milk: 4 (Transactions 1, 2, 4, and 5)
3
o Confidence = 4 =0.75

3. Lift:
o Support of Bread: 4 (Transactions 1, 3, 4, and 5)
0.6 0⋅6
o Lift = = 0.32 =1.8
4/5∗2/5

Consequent
Consequent: The consequent is the item or set of items that are predicted to occur as a result of the
antecedent. It represents the outcome of the association rule.
 Example: In the rule {Milk}⇒{Bread}

o Antecedent: Milk
o Consequent: Bread
o This rule suggests that if a customer buys milk, they are likely to also buy bread.
Importance of the Consequent
 Targeting Marketing Efforts: Understanding the consequent helps businesses design targeted
promotions. If data shows that customers who buy "Diapers" often buy "Wipes," retailers can create
bundles or offers to encourage purchasing both.
 Inventory Management: Identifying the consequent of popular antecedents allows businesses to
stock related items together, ensuring that customers can easily find products they are likely to
purchase together.
 Customer Recommendations: E-commerce platforms can use association rules to generate product
recommendations based on items that customers frequently buy together.

Multi-relational Association Rules

Multi-Relational Association Rule Mining
Definition:
Multi-relational association rule mining extends traditional association rule mining to discover patterns and
relationships across multiple tables or relations in a relational database rather than a single flat dataset.
This technique is essential when the data is stored in normalized forms, such as relational databases with
multiple tables linked by foreign keys.
Key Components of Multi-Relational Association Rules
1. Multiple Tables: Rules are mined from datasets with multiple interconnected tables.
o Example: A database with tables for Customers, Orders, and Products.
2. Join Operations: Relationships between tables are established through joins.
3. Rules with Multiple Attributes: The antecedents and consequents can involve attributes from
different tables.
o Example Rule:
 "If a customer is aged 25-35 and buys electronics, they are likely to also purchase
warranties."
Approaches to Multi-Relational Association Rule Mining
1. Tuple ID Propagation:
o Each record (tuple) is assigned a unique ID. These IDs are propagated across tables to link
relevant records.

o Useful for maintaining relationships between tuples without requiring full joins repeatedly.
2. Flattening Approach:
o Multiple tables are transformed into a single flat table by performing joins beforehand.
o While simpler, this approach can lead to data explosion for large datasets.
3. Apriori-based Extensions:

o Adapt the traditional Apriori algorithm to mine rules across tables by extending candidate
generation and support counting to relational data.

Example
Dataset Structure:
 Customers Table:

CustomerID Age City

1 30 NYC

2 25 LA

 Orders Table:

OrderID CustomerID Product

101 1 Laptop

102 2 Smartphone

Rule:

 Antecedent: "Customers aged 25-35 who live in NYC."

 Consequent: "They are likely to buy a Laptop."
Challenges
1. Data Explosion: Joins across tables can create excessively large intermediate datasets.
2. Complexity: Multi-relational mining involves computationally expensive operations.
3. Support Counting: Accurately counting supports in a multi-relational context is more challenging than
in single tables.
Applications
 E-commerce: Finding purchase patterns across users, orders, and products.
 Healthcare: Mining relationships between patients, treatments, and outcomes.
 Banking: Analyzing transactional and customer demographic data.

Multi-relational association rule mining enables deeper insights by exploring data's structural relationships,
providing richer and more actionable patterns compared to flat datasets.

Case study Market Basket Analysis

Market Basket Analysis Case Study
Introduction
Market Basket Analysis (MBA) is a data mining technique used to uncover relationships between items
purchased together. It uses association rule mining to find patterns, helping businesses improve sales,
optimize inventory, and design promotional strategies.
Objective
To analyze sales transaction data and identify frequently purchased itemsets to:

1. Recommend products to customers.

2. Create targeted(certain age groups) marketing strategies.
3. Optimize store layout (strategic product placement frequently brought items ) and inventory.
Dataset

Transaction ID Items Bought

1 Bread, Milk, Butter

2 Bread, Butter

3 Milk, Butter, Cheese

4 Bread, Milk

5 Milk, Butter, Bread

Steps in Analysis
Step 1: Data Preprocessing
 Convert the transactional data into a format suitable for association rule mining.
 Binary Matrix Representation: Each row represents a transaction, and each column represents an
item.

Transaction Bread Milk Butter Cheese

1 1 1 1 0

2 1 0 1 0

3 0 1 1 1
Transaction Bread Milk Butter Cheese

4 1 1 0 0

5 1 1 1 0

Step 2: Apply Association Rule Mining

 Use algorithms like Apriori or ECLAT to generate frequent itemsets and association rules.
Step 3: Example Rules
 Minimum Support: 60% (itemset appears in 3 or more transactions).
 Minimum Confidence: 70% (likelihood of a consequent given the antecedent).
Frequent Itemsets:

Itemset Support

{Bread, Milk} 3/5 (60%)

{Milk, Butter} 3/5 (60%)

{Bread, Butter} 3/5 (60%)

Association Rules:

Rule Support Confidence

{Bread} → {Milk} 60% 75%

{Milk, Butter} → {Bread} 60% 75%

{Bread} → {Butter} 60% 75%

Insights from Rules

1. Rule {Bread} → {Milk}:
o Customers who buy Bread are 75% likely to buy Milk.
o Suggestion: Bundle Bread and Milk in promotions.

2. Rule {Milk, Butter} → {Bread}:

o Customers purchasing Milk and Butter are 75% likely to buy Bread.
o Suggestion: Position Bread near Milk and Butter in the store.
3. Rule {Bread} → {Butter}:
o Customers buying Bread are 75% likely to buy Butter.

o Suggestion: Offer discounts for buying Bread and Butter together.

Business Applications
1. Recommendation Systems:
o Recommending items based on customer purchase patterns (e.g., online shopping platforms).
2. Store Layout Optimization:
o Placing frequently bought-together items closer to each other in stores.
3. Targeted Marketing:

o Designing combo offers or personalized discounts.

Conclusion
Market Basket Analysis provides actionable insights for improving sales and customer satisfaction. By
understanding buying behavior through association rules, businesses can make data-driven decisions for
better profitability and customer engagement.

ECLAT Algorithm
ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal)

ECLAT is a popular algorithm in data mining for frequent itemset mining, focusing on efficiency and memory
usage. Unlike the Apriori algorithm, which generates candidate itemsets level-by-level, ECLAT works by
depth-first search and uses a vertical data format for computations.
Key Concepts

Algorithm Steps
1. Input: A dataset D, a minimum support threshold min_supmin_sup.
2. Transform Dataset: Convert the transactional dataset into a vertical format with TID-lists for each
item.
3. Mine Frequent Itemsets:
o Start with single items.

o Generate larger itemsets by recursively intersecting TID-lists of smaller itemsets.

o Prune itemsets that do not meet min_supmin_sup.
4. Output: All frequent itemsets.
Advantages
 Memory Efficiency: Uses the vertical data format, reducing memory overhead for large datasets.

 Speed: Depth-first traversal reduces candidate generation compared to level-wise approaches like
Apriori.

Example
Frequent Itemsets:
 Single: {Bread,Butter,Milk}
 Pairs: {Bread,Butter,Jam}
Applications

 Market Basket Analysis: Discovering products frequently purchased together.

 Text Mining: Finding co-occurring words or phrases.
 Bioinformatics: Mining gene expression patterns.
ECLAT is highly efficient for dense datasets and when dealing with many small transactions, making it a
practical choice for many real-world applications

Cluster Analysis
Cluster Analysis is an unsupervised machine learning technique used to group similar data points into
clusters based on their characteristics. The main goal is to identify inherent structures within the data without
pre-existing labels. Cluster analysis is widely used in various fields such as marketing, biology, image
processing, and social science.
Objectives of Cluster Analysis:

 Discover natural groupings in data.

 Reduce data dimensionality.
 Identify outliers or anomalies.
 Enhance data interpretation and visualization.
Types of Cluster Analysis Methods

There are several methods for performing cluster analysis, each with its own strengths and weaknesses. The
primary categories include partitioning methods, hierarchical methods, and density-based methods.

Partitioning Methods
Partitioning Methods divide the data into distinct non-overlapping groups, where each data point belongs
to exactly one cluster. One of the most commonly used partitioning methods is the K-Means algorithm.
K-Means Clustering:
 Procedure:

1. Choose the number of clusters k.

2. Initialize k centroids randomly.
3. Assign each data point to the nearest centroid.
4. Update the centroids by calculating the mean of the assigned points.
5. Repeat steps 3 and 4 until convergence (when assignments no longer change).
 Advantages:
o Simple to understand and implement.
o Efficient for large datasets.
 Disadvantages:
o Requires specifying the number of clusters kkk in advance.

o Sensitive to outliers.
o May converge to local minima.

Hierarchical Methods
Hierarchical Methods create a hierarchy of clusters, which can be visualized as a dendrogram. These
methods can be divided into two categories: Agglomerative and Divisive.

 Agglomerative Clustering:
o Starts with each data point as its own cluster.
o Iteratively merges the closest pairs of clusters until a single cluster remains or a stopping
criterion is met.
 Divisive Clustering:
o Starts with all data points in one cluster.

o Iteratively splits clusters until each data point is in its own cluster or a stopping criterion is
met.

 Dendrogram: A tree-like diagram that represents the hierarchy of clusters, showing how clusters are
merged or split at various distances.
 Advantages:
o No need to specify the number of clusters in advance.
o Provides a comprehensive view of the data structure.
 Disadvantages:
o Computationally expensive for large datasets.
o Sensitive to noise and outliers.

Density-Based Methods
Density-Based Methods cluster data points based on the density of data points in a region. One popular
density-based clustering algorithm is DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

DBSCAN:
 Procedure:
1. Define two parameters: epsilon (ϵ\epsilonϵ), the maximum distance for points to be
considered neighbors, and minPts, the minimum number of points required to form a dense
region.
2. Start with an unvisited point and retrieve its neighbors within ϵ\epsilonϵ.
3. If the number of neighbors is greater than or equal to minPts, a new cluster is formed.
4. Expand the cluster by recursively retrieving neighbors and adding them to the cluster.
5. Repeat the process until all points are visited.
 Advantages:

o Can identify clusters of arbitrary shapes.

o Effectively handles noise and outliers.
 Disadvantages:
o Choosing appropriate ϵ\epsilonϵ and minPts can be challenging.
o Not well-suited for clusters of varying densities.

2. Comparison of Clustering Methods

Method Strengths Weaknesses

K-Means Simple, efficient for large datasets Requires k, sensitive to outliers

No need to pre-specify clusters, comprehensive Computationally expensive, sensitive to

Hierarchical
view noise

DBSCAN Handles noise well, arbitrary-shaped clusters Parameter selection can be tricky

3. Conclusion
Cluster analysis is a vital technique in data analysis, enabling the grouping of similar data points into clusters.
The choice of clustering method depends on the nature of the data, the specific application, and the desired
outcomes. Understanding the strengths and weaknesses of various clustering techniques is essential for
effective data analysis and interpretation.

Practical Example
To illustrate how these clustering methods work in practice, let's consider a dataset of customer purchase
behavior in a retail environment.
 Dataset: Contains information about customer purchases, such as the total amount spent, the
number of items purchased, and frequency of visits.
 Goal: To segment customers into distinct groups based on their purchasing behavior.
1. K-Means: Group customers into k clusters based on their spending patterns.
2. Hierarchical Clustering: Create a dendrogram to visualize customer segments and understand the
relationship between different clusters.
3. DBSCAN: Identify customers who exhibit unusual purchasing behavior (outliers) while grouping
regular customers based on their purchasing density.
By applying these clustering techniques, the retail store can tailor its marketing strategies, improve
customer engagement, and optimize inventory based on customer segments

MC Female Home Challenge 6.0 Cut
100% (2)
MC Female Home Challenge 6.0 Cut
22 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
1 - People V Adriano, GR 205228
50% (2)
1 - People V Adriano, GR 205228
1 page
Web Development Report
No ratings yet
Web Development Report
26 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
DWDM Notes - Unit 1
No ratings yet
DWDM Notes - Unit 1
26 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
R Vectors and Lists Guide
No ratings yet
R Vectors and Lists Guide
12 pages
Binary Search Trees
No ratings yet
Binary Search Trees
39 pages
Unit Iv
No ratings yet
Unit Iv
7 pages
Web & Internet Technologies Question Bank: Dr. P. Chitralingappa Page 1 of 6
No ratings yet
Web & Internet Technologies Question Bank: Dr. P. Chitralingappa Page 1 of 6
6 pages
Widt Unit-Iii
100% (1)
Widt Unit-Iii
50 pages
PHP Lab - Iv Sem - Bca
No ratings yet
PHP Lab - Iv Sem - Bca
16 pages
Untitled
No ratings yet
Untitled
12 pages
Data Mining Concept Description: Characterization and Comparison
No ratings yet
Data Mining Concept Description: Characterization and Comparison
14 pages
Web Lab Ex 1-10
No ratings yet
Web Lab Ex 1-10
26 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Measure of Central Tendency Practical
No ratings yet
Measure of Central Tendency Practical
7 pages
HTML and VBScript Code Examples
33% (3)
HTML and VBScript Code Examples
20 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
C++ Operator Overloading Guide
100% (1)
C++ Operator Overloading Guide
27 pages
HTML Advantage and Disadvantage
No ratings yet
HTML Advantage and Disadvantage
4 pages
Oss Unit III
No ratings yet
Oss Unit III
44 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
PHP Programming Unit III
No ratings yet
PHP Programming Unit III
23 pages
5 Pca
No ratings yet
5 Pca
14 pages
Design and Analysis Algorithms III Bca
No ratings yet
Design and Analysis Algorithms III Bca
117 pages
Unit Ii Notes
No ratings yet
Unit Ii Notes
49 pages
Relational Algebra
100% (1)
Relational Algebra
40 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
12 pages
4.7.1 - Data Warehousing Mining & Business Intelligence
No ratings yet
4.7.1 - Data Warehousing Mining & Business Intelligence
3 pages
Unit-1-Ppt Ada
No ratings yet
Unit-1-Ppt Ada
78 pages
Web Technology - ISolved Practical Slips
100% (1)
Web Technology - ISolved Practical Slips
30 pages
PHP Lab Manual Odd Sem Bca
No ratings yet
PHP Lab Manual Odd Sem Bca
50 pages
Unit - III (Widt) Pre
No ratings yet
Unit - III (Widt) Pre
46 pages
DMDW-Unit II
No ratings yet
DMDW-Unit II
19 pages
Practical 1: Develop and Demonstrate The Usage of Inline, Internal and External Style Sheet Using CSS. Solution
0% (1)
Practical 1: Develop and Demonstrate The Usage of Inline, Internal and External Style Sheet Using CSS. Solution
4 pages
JavaScript Programming Guide
No ratings yet
JavaScript Programming Guide
16 pages
To Implement Curve Fitting Using Least Squares Curve Fitting Method
No ratings yet
To Implement Curve Fitting Using Least Squares Curve Fitting Method
6 pages
HTML & CSS: Lab 03 Layout & Menus
No ratings yet
HTML & CSS: Lab 03 Layout & Menus
7 pages
Bca 651 Java Lab Assignment
0% (1)
Bca 651 Java Lab Assignment
21 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Data Structures & System Programming Lab File
0% (1)
Data Structures & System Programming Lab File
29 pages
BCS-053 Solved Assignment 2015-16 PDF
No ratings yet
BCS-053 Solved Assignment 2015-16 PDF
18 pages
Unit 3 Web Design and Development
No ratings yet
Unit 3 Web Design and Development
39 pages
Halstead's Operators and Operands in C, C++, JAVA (By Indranil Nandy)
100% (6)
Halstead's Operators and Operands in C, C++, JAVA (By Indranil Nandy)
5 pages
CL X Practical File 2022-23
No ratings yet
CL X Practical File 2022-23
9 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
Unit 4 BSC WIDT5 Thsem
No ratings yet
Unit 4 BSC WIDT5 Thsem
12 pages
Chapter Five JavaScript On HTML Forms
No ratings yet
Chapter Five JavaScript On HTML Forms
8 pages
Java RMI Architecture
No ratings yet
Java RMI Architecture
4 pages
Assignment
No ratings yet
Assignment
27 pages
Chp4 Advance Analytics-KMeans
No ratings yet
Chp4 Advance Analytics-KMeans
40 pages
Web Lab Manual2021
No ratings yet
Web Lab Manual2021
22 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
MyPhp Materials
No ratings yet
MyPhp Materials
233 pages
Unit II - Array, Function and String
No ratings yet
Unit II - Array, Function and String
17 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Bacterii
No ratings yet
Bacterii
11 pages
Artistic Skills and Techniques To Contemporary Art Creations
No ratings yet
Artistic Skills and Techniques To Contemporary Art Creations
40 pages
Illustration-W5 6
No ratings yet
Illustration-W5 6
16 pages
STCMB 1
No ratings yet
STCMB 1
59 pages
Industrial Valve Specifications
No ratings yet
Industrial Valve Specifications
9 pages
A Review On Artabotrys Odoratissimus (Annonaceae) : Saritha Kodithala and R Murali
No ratings yet
A Review On Artabotrys Odoratissimus (Annonaceae) : Saritha Kodithala and R Murali
3 pages
Span 210-MW Syllabus Spring 2014
No ratings yet
Span 210-MW Syllabus Spring 2014
12 pages
Mysterious Loan Request at Bank
No ratings yet
Mysterious Loan Request at Bank
28 pages
Canon Irc2380i Irc3080 Irc3080i Irc3580 Irc3580i Brochure
No ratings yet
Canon Irc2380i Irc3080 Irc3080i Irc3580 Irc3580i Brochure
8 pages
Dbms Theory
No ratings yet
Dbms Theory
20 pages
AES DRRM Memo PASS
No ratings yet
AES DRRM Memo PASS
2 pages
Irony Reading
No ratings yet
Irony Reading
17 pages
Nokia 303 User Guide: Issue 1.1
No ratings yet
Nokia 303 User Guide: Issue 1.1
50 pages
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
100% (13)
Calculus and Its Applications 11th Edition Bittinger Solutions Manualpdf Download
42 pages
Secure HTTP: A Historical Overview
No ratings yet
Secure HTTP: A Historical Overview
1 page
The Famished Road
No ratings yet
The Famished Road
91 pages
Vipin Kumar Resume
No ratings yet
Vipin Kumar Resume
1 page
Student Animal Research Booklets
100% (1)
Student Animal Research Booklets
45 pages
Assignment/ Tugasan HBEC4403 Social and Emotional Development of Young Children/ September 2023 Semester
No ratings yet
Assignment/ Tugasan HBEC4403 Social and Emotional Development of Young Children/ September 2023 Semester
12 pages
Course Syllabus - Upper Intermediate - Spring - 2022
No ratings yet
Course Syllabus - Upper Intermediate - Spring - 2022
3 pages
Genetics Practicum Insights
No ratings yet
Genetics Practicum Insights
53 pages
C-TAW12-71 Exam Practice Questions and Answers
No ratings yet
C-TAW12-71 Exam Practice Questions and Answers
10 pages
Conditionals in Reported Speech
No ratings yet
Conditionals in Reported Speech
2 pages
CSF Anatomy & Physiology
No ratings yet
CSF Anatomy & Physiology
20 pages
Fiz117 Notebook
No ratings yet
Fiz117 Notebook
77 pages
Construction Professionals' Epoxy Guide
No ratings yet
Construction Professionals' Epoxy Guide
3 pages
Manual Ventiladores Munters - mfs36-52
No ratings yet
Manual Ventiladores Munters - mfs36-52
39 pages

Data Mining Unit-V

Uploaded by

Data Mining Unit-V

Uploaded by

UNIT-5:

Association rule mining: Antecedent, consequent, muti-relational associatios rules,

2. Targeted Marketing: By analyzing antecedents, businesses can develop targeted marketing

4 Milk, Bread, Diapers

From this data, we can generate the following association rule:

Multi-relational Association Rules

CustomerID Age City

OrderID CustomerID Product

 Antecedent: "Customers aged 25-35 who live in NYC."

Case study Market Basket Analysis

1. Recommend products to customers.

Transaction ID Items Bought

1 Bread, Milk, Butter

3 Milk, Butter, Cheese

5 Milk, Butter, Bread

Transaction Bread Milk Butter Cheese

Step 2: Apply Association Rule Mining

{Bread, Milk} 3/5 (60%)

{Milk, Butter} 3/5 (60%)

{Bread, Butter} 3/5 (60%)

Rule Support Confidence

{Bread} → {Milk} 60% 75%

{Milk, Butter} → {Bread} 60% 75%

{Bread} → {Butter} 60% 75%

Insights from Rules

2. Rule {Milk, Butter} → {Bread}:

o Suggestion: Offer discounts for buying Bread and Butter together.

o Designing combo offers or personalized discounts.

o Generate larger itemsets by recursively intersecting TID-lists of smaller itemsets.

 Market Basket Analysis: Discovering products frequently purchased together.

 Discover natural groupings in data.

1. Choose the number of clusters k.

o Can identify clusters of arbitrary shapes.

2. Comparison of Clustering Methods

Method Strengths Weaknesses

K-Means Simple, efficient for large datasets Requires k, sensitive to outliers

No need to pre-specify clusters, comprehensive Computationally expensive, sensitive to

You might also like