Data Mining Techniques (DMT)
By Kushal Anjaria
Session-2
likely to buy beer. Now this kind of pattern had some
• Next, we will focus on data transformation and pattern commercial significance. For example, if you buy diapers, I
mining. So, the first pattern we will consider is could have given you a discount, and you can buy beer at a
something known as association rules. Next, we will discounted rate. I can also arrange the placements of the items
focus on data transformation and pattern mining. So, the accordingly in the store. Now the question is, from the vast
first pattern we will consider is something known as amount of data, how to calculate association rules?
For association rule, the following terminologies are useful:
association rules.
1. Item set: Collection of one or more items: e.g.
• This association pattern origin was one of the earliest use {Bread, Milk, Diaper, Coke}. The k-item set means
of data mining in the retail shop. Say, for example, you the item set that contains k-items
are going to a supermarket or a mall, and you have 2. Support count (𝜎): Frequency of occurrence of an
bought some items. For this instance, I may record the item set. E.g., 𝜎 ({Bread, Milk, Diaper}) =2
bill after a person has bought something in his basket.
3. Support (s): The fraction of transactions that
For each transaction or purchase by the customer, I will
contains an item set. s({Bread, Milk, Diaper})=⅖
have a massive number of rows for each basket of items.
4. Frequent Itemset: An itemset whose support is
You can see a table where these rows are describing the
greater than or equal to some minimum support
different transactions. So, TID 1 is the transaction Id 1,
threshold.
for the first customer's transactions. The next row is the
5. Association rule is represented using the form 𝑋 →
subsequent customer transactions, and so on. Along with
𝑌, where X and Y are Itemset. Example: {Milk,
the transactions, what is noted is that what are the items
Diaper} →{Beer} Support (s) for the form 𝑋 → 𝑌,
purchased by that customer. So, you can see that in this
would be the fraction of the transaction that contains
table, customer one has bought bread and milk, customer
both X and Y.
two has bought bread and diaper and beer and eggs, and
6. Confidence (c): How often items Y appear in
customer three has bought milk and diaper and beer and
transaction that contains X.
coke and so on.
7. Example: 𝑋 → 𝑌= {Milk, Diaper} →{Beer}
• These types of transactions are called market basket 𝜎({Milk, Diaper, Beer}) 2
transactions. So, these transactions consist of 2 parts. 1st s= = =0.4
|𝑇| 5
𝜎({Milk, Diaper, Beer}) 2
is the Id of the transaction of particular customers, and c= = =0.67
𝜎({Milk, Diaper}) 3
the second is the list of items purchased by the customers.
Suppose every day thousands of people come to the In simple words, support suggests that item sets are popular
supermarket, and do this kind of transaction. If you see or not. And confidence suggests whether they are purchased
throughout, say 1 or 2 years, there will be an enormous together or not. Association rule requires both, the item set
amount of data. IBM was the first company to analyze should be popular and confident. From the above concepts
these data types and come up with the association rule and understanding, following questions may come to readers'
generation and mining technique. These types of mind:
transactions are called market basket transactions. So, ❑ How to find association rules?
these transactions consist of 2 parts. 1st is the Id of the ❑ How to scan millions of transactions and check which
transaction of particular customers, and the second is the item sets are satisfying support and confidence criteria?
list of items purchased by the customers. Suppose every ❑ Which mathematical concept is applicable in case of
day thousands of people come to the supermarket, and do association rule mining?
❑ How to visualize and represent huge data and association
this kind of transaction. If you see throughout, say 1 or 2
among data points?
years, there will be an enormous amount of data. IBM
General steps to generate association rules:
was the first company to analyze these data types and ❑ Suppose, you have the form 𝑋 → 𝑌,
come up with the association rule generation and mining ❑ First of all, you consider all items and all possible values
technique. of X and Y.
❑ Based on these X and Y, try to make rules using support
Let's observe the following table and find out what IBM has
and confidence.
discovered from the data.
❑ The initial form of such rules are known as candidate
TID ITEMS rules.
❑ Next you decide the threshold of confidence and support
1 Bread, Milk values.
❑ If for some pair of X and Y in the candidate rules,
2 Bread, Diaper, Beer, Eggs confidence and support values are above threshold then
they are rules. Example: {Milk, Diaper} →{Beer}
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer The above approach definitely gives us the required result.
However, the approach is computationally prohibitive.
5 Bread, Milk, Diaper, Coke Suppose there are 100 items. For these items, 2100 values
would appear. Thus, this brute force approach will not work.
To form the Association rules, we use an Apriori algorithm.
The table shows that people who buy bread and milk are most
likely to buy diapers. The people who buy diapers are most
❑ To form the Association rules, we use an Apriori 2.
Rule generation: Generate high confidence
algorithm. Just like the general steps, Ariori technique is rules from each itemsets, where each rule is a
also based on two pillar elements: binary partitioning of frequent itemsets.
1. Frequent Itemset generation: Generate all ❑ We understand Apriori technique using lattice diagram
itemsets whose support >= minsup
Fig-1 Lattice diagram to understand the Apriori Technique
Fig-2 Example of Apriori Algorithm
the people are buying milk, bread and beer frequently or
The lattice theory can be related with the Apriori technique not.
in the following ways: ❑ If I know that people do not buy milk and bread
frequently, then people will not buy milk, bread and beer
❑ If there are d number of items then there would be
frequently.
2𝑑 possible candidate itemset possible. ❑ This intuition leads to the Apriori Principle. The Apriori
❑ Starts with null and ends with full itemset principle stats that:
❑ With each level, uniform increment in the set. Apriori Principle: If an item subset is frequent, then all of
❑ With each level, the number of sets generated is reduced its superset must also be frequent
by one. E.g., A can generate 4 sets, AB can generate 3
sets, ABC can generate 2 sets, ABCD can generate 1 set. ❑ As per the apriori rule, support of an item set never
❑ In lattice, one can check each and every member of the exceeds the support of its subsets.
graph as they are candidate rules and check whether they ❑ This is known as the anti-monotone property of support.
are appearing frequently or not and then decide Its vice versa is also true.
association rules. This is basically a brute force ❑ If some itemset is frequent then its subset would also
approach. frequent.
❑ In computational complexity we have to consider the ❑ In lattice, if AB is not frequent, then it means ABC
length of the itemset also. cannot be frequent. How many other sets cannot be
❑ If I know that ABCD appears frequently, can I say frequent?
something about its upper layer? Or if I know ABCD is ❑ Lattice can be helpful in finding this pattern. From the
not frequent, can I say something about the upper layer? Apriori principle we can prune candidate rules.
❑ For example, if I know that people do not buy milk and
bread frequently, then can I answer the question whether
Apriori algorithms say that start with the one dataset and ❑ The confidence of rules generated from the same item set
check whether they are frequent or not. Example of the follows apriori property. In other words, confidence
Apriori algorithm is shown in the figure-2 follows the apriori principle with respect to the number
of items on the RHS part of the rule. E.g. {A, B, C, D} is
Rule generation using Apriori Principle the frequent itemset then C(ABC-->D) ≥C(AB-->CD) ≥
❑ Now we have frequent itemset. From these itemset, how C(A-->BCD).
to generate rules? Note that confidence does not follow ❑ The rule-generation is shown in the figure-3. Examples
Apriori property. I.e. C(ABC-->D) can be larger or of the Apriori Algorithm is shown in figure-4 and
smaller than C(AB-->D) Figure-5
Figure-3 Apriori Algorithm for association rule generation
Figure-4 Apriori Algorithm Example
Figure-5 Apriori Algorithm Example