Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views55 pages

Data Mining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views55 pages

Data Mining

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Data Mining

Module- II
Prepared by :
Prof. Aparna Baboo
Data Mining
• Data Mining is the process of discovering patterns, trends, correlations or useful
information from large amounts of data using statistical, machine learning, and
database techniques.
• It is the crucial step of the knowledge discovery in database(KDD).
• KDD: Overall process of discovering useful knowledge from data, which includes
several steps like data preprocessing, mining and interpreting patterns.
• KDD= Data Preparation+ Data mining+ Result Evaluation
• Steps in KDD:
• Data Cleaning, Data Integration, Data selection, Data Transformation, Data
Mining, Pattern Evaluation, Knowledge Presentation
Data Mining

Step Description
1. Data Cleaning Remove noise, errors, and inconsistent data.
Combine data from multiple sources into a
2. Data Integration
coherent dataset.
3. Data Selection Choose relevant data for the mining task.
Convert data into formats suitable for mining
4. Data Transformation
(e.g., normalization).
Apply algorithms to extract patterns, trends, or
5. Data Mining
relationships.
Identify interesting, valid, and useful patterns
6. Pattern Evaluation
from results.
Visualize or present the knowledge in
7. Knowledge Presentation
understandable formats.
Data Mining
• Example of KDD steps:
• Data Cleaning removes missing product names.
• Integration combines data from online and offline stores.
• Selection picks data from the last 6 months.
• Mining applies association rules to find items frequently bought
together.
• Evaluation filters out unimportant results.
• Presentation shows frequent item sets in a graph/chart.
Tasks and Functionalities of Data Mining
• Data Mining functions are used to define the trends or correlation The activities
can be divided into 2 categories:
1) Descriptive Data Mining: This type of data mining focuses on uncovering
patterns and relationships within the data that help reveal its underlying structure.
Descriptive data mining is commonly used to explore and summarize data,
helping to answer questions like: What are the most frequent patterns or
associations? Are there clusters or groups with similar characteristics? Which data
points are outliers, and what do they indicate?
2) Predictive Data Mining: This type of data mining focuses on creating models
that can forecast future outcomes or behaviors using past data. Predictive data
mining is commonly applied in classification and regression tasks, helping to
answer questions like: What is the chance of a customer churning? How much
revenue is expected from a new product launch? What is the probability of a loan
default?
Descriptive Data Mining
• Cluster Analysis: Group data objects Association Rule Mining: Finds
into clusters so that items in the same relation between items in large datasets.
group are similar and different groups
• Example: In a super market basket
are dissimilar. analysis- If a customer buys bread and
• Example: Customer segmentation butter, they are likely to buy milk.
(grouping customers into budget • Key points:
buyers, luxury shoppers) • Support- How often the rule appears in
• Algorithms: K-Means, K-Medoids, the dataset.
Hierarchical • Confidence- How often the rule is
Clustering(Agglomerative, Divisive), correct
DBSCAN (Density-Based Spatial • Lift- How much more often the item
Clustering), OPTICS (Ordering occur together than expected.
Points To Identify the Clustering • Algorithms: Apriori Algoithm, FP-
Structure) Growth, Eclat Algorithm
Descriptive Data Mining
• Summarization: Provides a compact
• Sequence Pattern Mining: Discover
description of the dataset.
patterns where events occur in a
• Example: Generate statistical specific order.
summarise like mean, median,
• Example: In web clickstreams- Users
variance
who visit page A, then B, often visit C
• Create text summary from the large next.
document
• Algorithm: GSP, SPADE, PrefixSpan,
• Algorithm: OLAP, Data Cube BIDE
Aggregation, Principal Components
Analysis, t-SNE, Latent Semantic
Analysis
Predictive Data Mining
• Classification Algorithms: • Regression Algorithms:
Predicts a categorical label. • Predicts a continuous numeric value.
Example: Whether an email is spam or not • Example: Predicting house Prices
spam • Algorithms:
Algorithms: • Linear Regression, Polynomial
Decision Tree(ID3, C4.5, CART), Regression, Decision Tree Regression,
Naïve Bayes, K-NN(K-nearest Random Forest Regression, Support
Neighbours), SVM (Support Vector Vector Regression, Neural Network
Machines), Random Forest, Logistic Regression
Regression, Neural Networks (ANN,
CNN,RNN)
Functionalities
1. Characterization: Summarize the general features of a set data
• Example: Profiling the spending behavior of premium customers.
2. Discrimination: Compares the features of one group with another.
• Examples: Comparing the buying habits of male vs. female
customers.
3. Association Analysis: Finds relationships between variables.
• Example: If a customer buys bread, they are likely to buy butter.
4. Classification: Assigns items into predefined categories.
• Example: Classifying emails as spam or not spam.
Functionalities
5. Prediction: Forecasts future values based on current/past data.
• Example: Predicting next month’s sales.
6. Clustering: Groups similar data objects without predefined
categories.
• Example: Grouping customers into market segments.
7. Outlier Detection: Identifies data points that deviate significantly
from the rest.
• Example: Detecting fraudulent credit card transactions.
8. Evolution & Trend Analysis: Analyzes changes and patterns over
time.
• Example: Tracking stock price trends over months/years.
Data Preprocessing
Data preprocessing is the process of preparing raw data for analysis by
cleaning and transforming it into a usable format.
• Goal is to improve the quality of the data.
• Helps in handling missing values, removing duplicates, and normalizing
data.
• Ensures the accuracy and consistency of the dataset.
Steps in Preprocessing:
Data cleaning, data integration, data transformation, data reduction
Data Preprocessing
Data cleaning: It is the process of identifying and correcting errors or inconsistencies
in the dataset. It involves handling missing values, removing duplicates, and correcting
incorrect or outlier data to ensure the dataset is accurate and reliable.
Missing Values: This occur when data is absent from a dataset. You can either ignore
the rows with missing data or fill the gaps manually, with the attribute mean, or by
using the most probable value.
Noisy Data refers to inaccurate, irrelevant, or inconsistent data that machines find
difficult to interpret, often resulting from errors during data collection or entry. It can
be managed using several techniques:
• Binning: The data is divided into equal intervals, and values within each interval are
smoothed by replacing them with the mean or boundary values.
• Regression: Data is smoothed by fitting it to a regression model (linear or multiple)
that predicts values more accurately.
• Clustering: Similar data points are grouped together, while outliers either remain
undetected or fall outside the clusters.
• These methods help minimize noise and enhance the overall quality of data.
Data Preprocessing
Removing Duplicates: It involves identifying and eliminating repeated
data entries to ensure accuracy and consistency in the dataset.
Data Integration: It involves merging data from various sources into a
single, unified dataset. It can be challenging due to differences in data
formats, structures, and meanings.
• Record Linkage is the process of identifying and matching records from
different datasets that refer to the same entity, even if they are
represented differently.
• Data Fusion involves combining data from multiple sources to create a
more comprehensive and accurate dataset. It integrates information that
may be inconsistent or incomplete from different sources,
Data Preprocessing
Data Transformation: It involves converting data into a format suitable
for analysis. Common techniques include normalization, which scales data
to a common range; standardization, which adjusts data to have zero mean
and unit variance; and discretization, which converts continuous data into
discrete categories.
• Data Normalization: The process of scaling data to a common range to
ensure consistency across variables.
• Discretization: Converting continuous data into discrete categories for
easier analysis.
• Data Aggregation: Combining multiple data points into a summary
form, such as averages or totals, to simplify analysis.
• Concept Hierarchy Generation: Organizing data into a hierarchy of
concepts to provide a higher-level view for better understanding and
analysis.
Data Preprocessing
Data Reduction: It reduces the dataset's size while maintaining key
information. This can be done through feature selection, which chooses
the most relevant features, and feature extraction, which transforms the
data into a lower-dimensional space while preserving important details.
• Dimensionality Reduction (e.g., Principal Component Analysis): A
technique that reduces the number of variables in a dataset while
retaining its essential information.
• Numerosity Reduction: Reducing the number of data points by
methods like sampling to simplify the dataset without losing critical
patterns.
• Data Compression: Reducing the size of data by encoding it in a more
compact form, making it easier to store and process.
Data Discretization
Discretization is the process of converting continuous data or numerical
values into discrete categories or bins. This technique is often used in
data analysis and machine learning to simplify complex data and make it
easier to analyze and work with.
Types of Discretization:
1. Equal Width Binding- This technique divides the entire range of data
into equal-sized intervals. Each bin has an equal width, determined by
dividing the range of the data into n intervals.
Formula:
• Bin Width=(Max Value−Min Value)/n
For example, if you have data from 1 to 100, you can divide it into 5
intervals: 1-20, 21-40, 41-60, 61-80, and 81-100.
Data Discretization
2. Equal Frequency Binning- This method divides the data so that each
interval has the same number of data points. For example, if you have 100
data points, you might divide them into 5 intervals, each containing 20
data points.
3. K-Means Clustering- This technique uses clustering algorithms to
group data into clusters based on similarity. The data points in each cluster
are treated as a single category.
Data Discretization
4. Decision Tree discretization- This method uses decision trees to split
the data based on feature values, turning continuous variables into discrete
categories that help in prediction.
Data Discretization
5. Custom Binning- In this method, you define your own bin edges based
on domain knowledge or specific needs. For example, in age data, you
might want to manually set ranges like "0-18," "19-40," and "41+“.
Advantages of Discretization:
1. Simplifies Analysis
2. Improves model performance
3. Reduce Noise
4. Enhance Data compatibility
** Discretization and binning are related but different concepts.
Discretization refers to converting continuous data into discrete
categories for analysis whereas binning is a specific technique used
within discretization to group data into intervals (bins).
Concept Hierarchy in Data Mining
• In data mining, the concept of a concept hierarchy refers to the
organization of data into a tree-like structure, where each level
of the hierarchy represents a concept that is more general than
the level below it.

• The main idea behind the concept of hierarchy is that the same
data can have different levels of granularity or levels of detail
and that by organizing the data in a hierarchical fashion, it is
easier to understand and perform analysis.
Concept Hierarchy in Data Mining
Concept Hierarchy in Data Mining
Types of concept Hierarchy:-
1. Schema Hierarchy: Schema Hierarchy is a type of concept
hierarchy that is used to organize the schema of a database in a
logical and meaningful way, grouping similar objects together.
2. Set-Grouping Hierarchy: Set-Grouping Hierarchy is a type of
concept hierarchy that is based on set theory, where each set in the
hierarchy is defined in terms of its membership in other sets.
3. Operation-Derived Hierarchy: An Operation-Derived Hierarchy is
a type of concept hierarchy that is used to organize data by applying
a series of operations or transformations to the data. The operations
are applied in a top-down fashion, with each level of the hierarchy
representing a more general or abstract view of the data than the
level below it.
Concept Hierarchy in Data Mining
4. Rule-based Hierarchy: Rule-based Hierarchy is a type of concept
hierarchy that is used to organize data by applying a set of rules or
conditions to the data.
Use of Hierarchy-
Schema Hierarchy- Used to organize different types of data such as
table, attributes and relationships in a logical & meaningful manner.
Set Grouping- Used for data cleaning, data-preprocessing &
integration.
Operation- Derived- Used in data mining tasks such as clustering,
dimensionality reduction.
Rule-based- Used in classification, decision-making, data exploration.
Concept Hierarchy in Data Mining
Advantages:
Improved Data analysis
Improved data visualization
Improved algorithm Performance
Data cleaning and preprocessing
Application of Concept Hierarchy-
Data Warehousing
Business Intelligence
Online retail
Healthcare
NLP
Fraud detections
Architecture of Data Mining System
Contd.
As we know Data Mining means “Detection and Extraction of new
pattern from already collected data.”
Workings of System:
• The process begins when a user submits specific data mining requests,
which are forwarded to data mining engines for pattern evaluation.
These applications attempt to address the query by utilizing the
existing database.
• The extracted metadata is then analyzed by the data mining engine,
which may also interact with pattern evaluation modules to derive
accurate results. Finally, the results are presented to the user at the
front end in a clear and understandable format through an appropriate
interface.
Contd.
Description about the components:
Data Sources- Database, WWW, Data warehouse are parts of data
sources. The data in these sources may be in the form of plain text,
spreadsheets, or other forms of media like photos or videos. WWW is
one of the biggest sources of data.
Database Server-The database server contains the actual data ready to
be processed. It performs the task of handling data retrieval as per the
request of the user.
Data Mining Engine- It is one of the core components of the data
mining architecture that performs all kinds of data mining techniques
like association, classification, characterization, clustering, prediction,
etc.
Description about the components: (contd.)
Pattern Evaluation Modules: They are responsible for finding
interesting patterns in the data and sometimes they also interact with the
database servers for producing the result of the user requests.
Graphic User Interface: Since the user cannot fully understand the
complexity of the data mining process so graphical user interface helps
the user to communicate effectively with the data mining system.
Knowledge Base: Knowledge Base is an important part of the data
mining engine that is quite beneficial in guiding the search for the result
patterns. Data mining engines may also sometimes get inputs from the
knowledge base. This knowledge base may contain data from user
experiences. The objective of the knowledge base is to make the result
more accurate and reliable.
Realtime example for better Understanding the system-
1. User interface:
•A customer logs into the website and searches for “sports shoes.”
•This request acts as the query that initiates the data mining process.

2. Database/Data warehouse:
• The system retrieves stored data such as:
• Purchase history of customers
• Product details (brand, price, reviews)
• Customer demographics (age, location, preferences)
3. Data Mining Engine:
•The data mining engine processes the request and applies
algorithms (e.g., association rule mining, clustering,
classification).
•Example: It analyzes that “70% of customers who bought sports
shoes also bought fitness watches.”

4. Pattern Evaluation:
•It evaluates the discovered patterns for usefulness.
•Example: Out of many associations, the system identifies that the
link between sports shoes and fitness watches is the most relevant
and strong.
5. Knowledge Base(Metadata):
•The metadata stores rules, constraints, and past knowledge about
customer behavior.
•Example: The rule “If a customer buys sports shoes, recommend
fitness watches” is stored in the knowledge base.

6. Result Presentation(Interface Layer):


The recommendation “Customers who bought this item also viewed
fitness watches” is displayed to the user in an understandable format.
Association Rule Mining
• Finds relationships among items in datasets
• Example: {Bread, Butter} ⇒ {Milk}
• Applications: Market Basket, Healthcare, IoT, Web Mining

Key Metrics:
• Support(X): % of transactions with X
• Confidence(X⇒Y): Probability of Y when X occurs
• Lift(X⇒Y): Strength vs random chance
Frequent Itemset Mining
• Goal: Find itemset with high support
• Challenge: Search space grows exponentially (2ⁿ)
•Number of possible itemset = 2ⁿ (exponential growth).
•For 1,000 items → more than 2^{1000} possible
combinations
•Makes mining computationally expensive.
• So Needs efficient methods
Frequent Itemset Mining
• Goal: Find itemset with high support
• Challenge: Search space grows exponentially (2ⁿ)
•Number of possible itemset = 2ⁿ (exponential growth).
•For 1,000 items → more than 2^{1000} possible
combinations
•Makes mining computationally expensive.
• So Needs efficient methods
Efficient Methods
• Efficient methods are algorithms and techniques designed
to quickly find frequent patterns/itemset in large datasets
without wasting time and resources.
• smart algorithms that reduce computation, avoid
unnecessary checks, and make frequent itemset mining
scalable for big data.
• Example of Efficient Methods:
•Apriori – Uses property: If itemset is frequent → all subsets
are frequent.
•FP-Growth – Uses FP-Tree to compress data, avoids
candidate explosion.
•ECLAT – Uses transaction ID lists, mines via intersections.
Apriori Algorithm
• Property: If set is frequent → subsets must be frequent.
• Iterative: Generate → Test → Prune.
• Pros: Simple.
• Cons: Many DB scans, candidate explosion.
• Visual Flow:
Transactions → Candidate Generation → Pruning →
Frequent Itemset
FP-Growth Algorithm
• Builds FP-Tree (compressed DB).
• No candidate generation.
• Only 2 scans of DB.
• Much faster than Apriori.
• Visual: Tree diagram showing items branching (Bread →
Milk → Diaper).
ECLAT Algorithm
• Uses vertical data (TID sets).
• Finds frequent itemset via set intersections.
• Works well for dense datasets.
• Visual: Table with Item → {T1,T3,T4} sets
intersecting.
Quick Example
•Transactions:
T1 {Bread, Milk}, T2 {Bread, Diaper, Beer}, T3 =
{Bread, Milk, Diaper},T4 = {Milk, Diaper}, T5 =
{Bread, Milk, Diaper, Beer}
• Support({Bread, Milk}) = 3/5 = 0.6=60%
• Confidence({Bread, Milk}⇒Diaper) = 2/3 =
0.67=67%
Meaning: If someone buys Bread & Milk, there’s a 67%
chance they also buy Diaper.
Quick Example
Lift(X ⇒ Y) = Confidence(X ⇒ Y) / Support(Y)
• Example: Lift({Bread, Milk} ⇒ Diaper)
• Confidence({Bread, Milk} ⇒ Diaper) = 0.67
• Support({Diaper}) → Appears in T2, T3, T4, T5 → 4/5 = 0.8
• Lift = 0.67 / 0.8 = 0.84
• Meaning:
• If Lift = 1 → Independent (no strong association).
• If Lift > 1 → Positive association (buying X increases chance of Y).
• If Lift < 1 → Negative association (buying X reduces chance of Y).
• Here 0.84 < 1, so Bread & Milk together actually reduce the chance of
buying Diaper slightly.
Question
• T1 = {Milk, Bread, Butter}
• T2 = {Milk, Diaper, Beer}
• T3 = {Bread, Butter, Diaper}
• T4 = {Milk, Bread, Diaper, Butter}
• T5 = {Bread, Butter}
• T6 = {Milk, Bread, Beer}
• For the rule:
{Milk, Bread} ⇒ Butter
• Find
• Support({Milk, Bread})
• Confidence({Milk, Bread} ⇒ Butter)
• Lift({Milk, Bread} ⇒ Butter)
• Find the most frequent items buying together.
Mining Various Mining Rules
It refers to extending the basic concept of association rule mining (like
market basket analysis) into different types of rules beyond the simple
frequent itemset → association type.
Association rule mining usually gives rules like:
{Milk} → {Bread} (if you buy milk, you also buy bread).
But we can extend it into different kinds:
• 1. Multilevel Association Rules
• Rules at different concept levels (from general to specific).
• Example:
• High-level: {Milk} → {Bread}
• Low-level: {Amul Milk} → {Whole Wheat Bread}
Mining Various Mining Rules
• 2. Multidimensional Association Rules
• Use more than one dimension/attribute.
• Example:
• {Age = 20–30, Income = High} → {Buys Laptop}
• 3. Quantitative Association Rules
• Handle numeric values by ranges.
• Example:
• {Age ∈ [18–25]} → {Buys Sports Shoes}
Mining Various Mining Rules
• 4. Correlation-Based Association Rules
• Check if items are really correlated (not just frequent).
• Example:
• {Diaper} and {Beer} appear together often → correlation analysis
confirms whether they are truly linked.
• 5. Constraint-Based Association Rules
• Mine rules under user-specified constraints.
• Example:
• Only find rules where consequent is “Buys Laptop.”
• Result: {Student, High Income} → {Buys Laptop}
Association Mining to Correlation Analysis
Correlation Analysis:
• Goes beyond support and confidence.
• Uses statistical measures like lift, chi-square, or correlation
coefficient.
• Helps test if items are positively, negatively, or not correlated.
• Key Measure:
Lift(X→Y)=Support(X∪Y)​/ Support(X)×Support(Y)

•If Lift > 1 → Positive correlation (X and Y occur together more than
expected).
•If Lift = 1 → No correlation (independent).
•If Lift < 1 → Negative correlation (X and Y rarely occur together).
Association Mining to Correlation Analysis
Correlation Analysis:
• Goes beyond support and confidence.
• Uses statistical measures like lift, chi-square, or correlation
coefficient.
• Helps test if items are positively, negatively, or not correlated.
• Key Measure:
Lift(X→Y)=Support(X∪Y)​/ Support(X)×Support(Y)

•If Lift > 1 → Positive correlation (X and Y occur together more than
expected).
•If Lift = 1 → No correlation (independent).
•If Lift < 1 → Negative correlation (X and Y rarely occur together).
Association Mining to Correlation Analysis

Transaction ID Items Purchased


T1 {Milk, Bread, Butter}
T2 {Milk, Bread}
T3 {Bread, Butter}
T4 {Milk, Bread, Butter, Eggs}
T5 {Milk, Eggs}
Mining Various Mining Rules
• Step 1: Generate Itemsets and • Pairs support:
Support • {Milk, Bread} → 3/5 = 0.6
• Support = (Number of transactions • {Milk, Butter} → 2/5 = 0.4
containing the itemset) ÷ (Total
number of transactions) • {Bread, Butter} → 3/5 = 0.6
• Total transactions = 5 • {Milk, Eggs} → 2/5 = 0.4
• Single items support: • {Bread, Eggs} → 1/5 = 0.2
• Milk → 4/5 = 0.8 • {Butter, Eggs} → 1/5 = 0.2
• Bread → 4/5 = 0.8
• Butter → 3/5 = 0.6
• Eggs → 2/5 = 0.4
Mining Various Mining Rules
• Step 3: Correlation Analysis (Lift) • For Bread ⇒ Butter:
• Lift(A ⇒ B) = Confidence(A ⇒ B) • Confidence = 0.6 / 0.8 = 0.75
÷ Support(B) • Lift = 0.75 ÷ 0.6 = 1.25 (> 1 means
• For Milk ⇒ Bread: positive correlation → Bread
• Lift = 0.75 ÷ 0.8 = 0.9375 (< 1 increases chance of Butter).
means negative correlation → •Buying Milk does not strongly
Milk slightly reduces chance of imply buying Bread (negative
Bread). correlation).
•Buying Bread strongly correlates
with Butter (positive correlation).
Association Mining Vs Correlation Analysis
• ARM is about Finding patterns in the forms of rules. If A-> B
•It mainly uses Support : How often the items occur together and
Confidence: Given A how often B occurs.
Example: Rule: Milk ⇒ Bread
•Support = 0.6
•Confidence = 0.75
•This means: In 60% of all transactions, Milk and Bread occur
together. If Milk is bought, there is a 75% chance that Bread is also
bought.
Association Mining Vs Correlation Analysis
• Correlation Analysis
• ARM may produce many rules, but not all are truly interesting.
• Correlation (measured with Lift) checks whether items are actually dependent or
just appear together by chance.
• Lift: (A-> B) = confidence(A->B)/ Support (B)
•Lift > 1 → Positive correlation (A increases chance of B).
•Lift = 1 → Independent (A does not affect B).
•Lift < 1 → Negative correlation (A reduces chance of B).
•Example from above:
•Milk ⇒ Bread → Lift = 0.9375 (<1)
•Even though Confidence was 75%, correlation says Milk actually reduces
the chance of Bread.
•Bread ⇒ Butter → Lift = 1.25 (>1)
•Bread positively correlates with Butter (stronger relationship).
Constraint-Based Association Mining
• Constraint-Based Association Mining is an extension of association
rule mining where rules are generated only if they satisfy user-
specified constraints.
This helps in:
• Reducing the search space (fewer unnecessary rules).
• Making rules more relevant to user needs.
• Types of Constraints:
•Knowledge-based constraints
•Restrict rules based on user interest.
•Example: Only rules involving “Milk” should be mined.
•Data constraints
•Restrict items or attributes from being included.
•Example: Ignore “Butter” in rules.
Constraint-Based Association Mining
•Rule constraints
•Apply conditions on Support, Confidence, or other interestingness
measures.
•Example: Only rules with Support > 40% and Confidence > 70%.
•Boolean constraints
•Users specify presence/absence of items in rules.
•Example: Rules must contain “Milk” AND “Bread”, but NOT “Eggs.”
•Aggregate constraints
•Apply mathematical functions like SUM, AVG, MIN, MAX.
•Example: The total price of items in the rule must be > ₹200.
Constraint-Based Association Mining
Transaction ID Items Purchased With Constraints
1.Constraint: Only rules containing
{Milk, Bread, Milk.
T1
Butter} 1. Output: {Milk ⇒ Bread}, {Milk ⇒
Eggs}, {Milk ⇒ Butter}.
T2 {Milk, Bread}
2.Constraint: Support ≥ 0.5,
T3 {Bread, Butter} Confidence ≥ 0.7.
1. Output: {Milk ⇒ Bread}
{Milk, Bread, (Support=0.6, Confidence=0.75).
T4
Butter, Eggs} 3.Constraint: Exclude “Eggs.”
1. Output: {Milk ⇒ Bread}, {Bread ⇒
T5 {Milk, Eggs} Butter}, {Milk ⇒ Butter}.
Constraint-Based Association Mining
• Advantages:
• Reduce computation cost
• Produces fewer, and relevant rules
• Matches user’s domain knowledge
• In one line: Constraint based association Mining = ARM+ User Constraints->
focused, meaningful and efficient rules.

You might also like