Unsupervised Learning
Clustering
Unsupervised classification, that is, without
the class attribute
Want to discover the classes
Association Rule Discovery
Discover correlation
Data Mining and Knowledge
Discovery 1
The Clustering Process
Pattern representation
Definition of pattern proximity measure
Clustering
Data abstraction
Cluster validation
Data Mining and Knowledge
Discovery 2
Pattern Representation
Number of classes
Number of available patterns
Circles, ellipses, squares, etc.
Feature selection
Can we use wrappers and filters?
Feature extraction
Produce new features
E.g., principle component analysis (PCA)
Data Mining and Knowledge
Discovery 3
Pattern Proximity
Want clusters of instances that are similar
to each other but dissimilar to others
Need a similarity measure
Continuous case
Euclidean measure (compact isolated clusters)
The squared Mahalanobis distance
1
d M (xi , x j ) (xi x j ) (xi x j )
T
alleviates problems with correlation
Many more measures
Data Mining and Knowledge
Discovery 4
Pattern Proximity
Nominal attributes
nx
d (xi , x j )
n
n Number of attributes
x Number of attributes that are the same
Data Mining and Knowledge
Discovery 5
Clustering Techniques
Clustering
Hierarchical Partitional
Single Complete Square Mixture
Link Link Error Maximization
CobWeb K-means Expectation
Maximization
Data Mining and Knowledge
Discovery 6
Technique Characteristics
Agglomerative vs Divisive
Agglomerative: each instance is its own cluster
and the algorithm merges clusters
Divisive: begins with all instances in one cluster
and divides it up
Hard vs Fuzzy
Hard clustering assigns each instance to one
cluster whereas in fuzzy clustering assigns degree
of membership
Data Mining and Knowledge
Discovery 7
More Characteristics
Monothetic vs Polythetic
Polythetic: all attributes are used simultaneously, e.g., to
calculate distance (most algorithms)
Monothetic: attributes are considered one at a time
Incremental vs Non-Incremental
With large data sets it may be necessary to consider only
part of the data at a time (data mining)
Incremental works instance by instance
Data Mining and Knowledge
Discovery 8
Hierarchical Clustering
Dendrogram
S
F G i
m
i
l
a
r
i
C DE t
B y
A
A B C D E F G
Data Mining and Knowledge
Discovery 9
Hierarchical Algorithms
Single-link
Distance between two clusters set equal to the
minimum of distances between all instances
More versatile
Produces (sometimes too) elongated clusters
Complete-link
Distance between two clusters set equal to maximum
of all distances between instances in the clusters
Tightly bound, compact clusters
Often more useful in practice
Data Mining and Knowledge
Discovery 10
Example: Clusters Found
2
Single-Link 1 1 2 2 2 2
1 1 1 2 2
1 * * * * * * * * 2* 2
1 1 2
1 11 2
2 2 2
2
1 1 2 2 2 2
1 1 1 2
Complete-Link 1
1
* * * * * * * * 2* 2 2
2
1
1 11 2
2 2 2
Data Mining and Knowledge
Discovery 11
Partitional Clustering
Output a single partition of the data
into clusters
Good for large data sets
Determining the number of clusters is a
major challenge
Data Mining and Knowledge
Discovery 12
K-Means
Predetermined
number of clusters
Start with seed
clusters of one
element
Seeds
Data Mining and Knowledge
Discovery 13
Assign Instances to Clusters
Data Mining and Knowledge
Discovery 14
Find New Centroids
Data Mining and Knowledge
Discovery 15
New Clusters
Data Mining and Knowledge
Discovery 16
Discussion: k-means
Applicable to fairly large data sets
Sensitive to initial centers
Use other heuristics to find good initial
centers
Converges to a local optimum
Specifying the number of centers very
subjective
Data Mining and Knowledge
Discovery 17
Clustering in Weka
Clustering algorithms in Weka
K-Means
Expectation Maximization (EM)
Cobweb
hierarchical, incremental, and
agglomerative
Data Mining and Knowledge
Discovery 18
CobWeb
Algorithm (main) characteristics:
Hierarchical and incremental
Uses category utility
Improvemen t in probability estimate
The k clusters because of instancecluster assigment
Pr C
l i ij l
Pr a v | C
2
Pr ai vij
2
CU C1 , C2 ,..., Ck
l i j
All possible values
Why divide by k? for attribute ai
Data Mining and Knowledge
Discovery 19
Category Utility
If each instance in its own cluster
1 vij actual value of instance
Pr ai vij | Cl
0 otherwise
Category utility function becomes
n Pr ai vij
2
CU C1 , C2 ,..., Ck
i j
k
Without k it would always be best for each
instance to have its own cluster, overfitting!
Data Mining and Knowledge
Discovery 20
The Weather Problem
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Data Mining and Knowledge
Discovery 21
Weather Data (without Play)
Label instances: a,b,….,n
Start by putting Add another instance
the first instance in its own cluster
in its own cluster
a a b
Data Mining and Knowledge
Discovery 22
Adding the Third Instance
Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster
b a a c b
a c b c
Highest utility
Data Mining and Knowledge
Discovery 23
Adding Instance f
First instance not to get
its own cluster:
a b c d
e f
Look at the instances:
Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!
Data Mining and Knowledge
Discovery 24
Add Instance g
Look at the instances:
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE
a b c d
e f g
Data Mining and Knowledge
Discovery 25
Add Instance h
Look at the instances: Runner up
A) Sunny Hot High FALSE
D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange: Best matching node
Merged into a
single cluster
before h is added
b c
a d h e f g
(Splitting is also possible)
Data Mining and Knowledge
Discovery 26
Final Hierarchy
g f j m n
a d h c l e i
b k What next?
Data Mining and Knowledge
Discovery 27
Dendrogram Clusters
g f j m n
a d h c l e i
b k What do a, b, c, d, h, k, and l
have in common?
Data Mining and Knowledge
Discovery 28
Numerical Attributes
Assume normal distribution
1 1
l PrCl 2 i
1
CU C1 , C2 ,..., Ck il i
k
Problems with zero variance!
The acuity parameter imposes a minimum
variance
Data Mining and Knowledge
Discovery 29
Hierarchy Size (Scalability)
May create very large hierarchy
The cutoff parameter is uses to suppress
growth
If
CU C1 , C2 ,..., Ck Cutoff
cut node off.
Data Mining and Knowledge
Discovery 30
Discussion
Advantages
Incremental scales to large number of instances
Cutoff limits size of hierarchy
Handles mixed attributes
Disadvantages
Incremental sensitive to order of instances?
Arbitrary choice of parameters:
divide by k,
artificial minimum value for variance of numeric attributes,
ad hoc cutoff value
Data Mining and Knowledge
Discovery 31
Probabilistic Perspective
Most likely set of clusters given data
Probability of each instance belonging to a
cluster
Assumption: instances are drawn from one of
several distributions
Goal: estimate the parameters of these
distributions
Usually: assume distributions are normal
Data Mining and Knowledge
Discovery 32
Mixture Resolution
Mixture: set of k probability distributions
Represent the k clusters
Probabilities that an instance takes certain
attribute values given it is in the cluster
What is the probability an instance belongs to
a cluster (or a distribution)
Data Mining and Knowledge
Discovery 33
One Numeric Attribute
Two cluster mixture model:
Cluster B
Cluster A
Attribute
Given some data, how can you determine the parameters:
A Mean for Cluster A
A Standard deviation for Cluster A
B Mean for Cluster B
B Standard deviation for Cluster B
p A Probabilit y of being in Cluster A
Data Mining and Knowledge
Discovery 34
Problems
If we knew which instance came from each
cluster we could estimate these values
If we knew the parameters we could calculate
the probability that an instance belongs to
each cluster
Prx | A Pr[ A] f ( x; A , A ) p A
PrA | x
Pr[ x] Pr[ x]
( x )2
1
f ( x; A , A ) e 2 2
.
2
Data Mining and Knowledge
Discovery 35
EM Algorithm
Expectation Maximization (EM)
Start with initial values for the parameters
Calculate the cluster probabilities for each instance
Re-estimate the values for the parameters
Repeat
General purpose maximum likelihood
estimate algorithm for missing data
Can also be used to train Bayesian networks
(later)
Data Mining and Knowledge
Discovery 36
Beyond Normal Models
More than one class:
Straightforward
More than one numeric attribute
Easy if assume attributes independent
If dependent attributes, treat them jointly
using the bivariate normal
Nominal attributes
No more normal distribution!
Data Mining and Knowledge
Discovery 37
EM using Weka
Options
numClusters: set number of clusters.
Default = -1 selects it automatically
maxIterations: maximum number of iterations
seed -- random number seed
minStdDev -- set minimum allowable standard
deviation
Data Mining and Knowledge
Discovery 38
Other Clustering
Artificial Neural Networks (ANN)
Random search
Genetic Algorithms (GA)
GA used to find initial centroids for k-means
Simulated Annealing (SA)
Tabu Search (TS)
Support Vector Machines (SVM)
Will discuss GA and SVM later
Data Mining and Knowledge
Discovery 39
Applications
Image segmentation
Object and Character Recognition
Data Mining:
Stand-alone to gain insight into the data
Preprocess before classification that
operates on the detected clusters
Data Mining and Knowledge
Discovery 40
DM Clustering Challenges
Data mining deals with large databases
Scalability with respect to number of instance
Use a random sample (possible bias)
Dealing with mixed data
Many algorithms only make sense for numeric
data
High dimensional problems
Can the algorithm handle many attributes?
How do we interpret a cluster in high dimensions?
Data Mining and Knowledge
Discovery 41
Other (General) Challenges
Shape of clusters
Minimum domain knowledge (e.g.,
knowing the number of clusters)
Noisy data
Insensitivity to instance order
Interpretability and usability
Data Mining and Knowledge
Discovery 42
Clustering for DM
Main issue is scalability to large databases
Many algorithms have been developed for
scalable clustering:
Partitional methods: CLARA, CLARANS
Hierarchical methods: AGNES, DIANA, BIRCH,
CURE, Chameleon
Data Mining and Knowledge
Discovery 43
Practical Partitional Clustering
Algorithms
Classic k-Means (1967)
Work from 1990 and later:
k-Medoids
Uses the mediod instead of the centroid
Less sensitive to outliers and noise
Computations more costly
PAM (Partitioning Around Mediods)
algorithm
Data Mining and Knowledge
Discovery 44
Large-Scale Problems
CLARA: Clustering LARge Applications
Select several random samples of instances
Apply PAM to each
Return the best clusters
CLARANS:
Similar to CLARA
Draws samples randomly while searching
More effective than PAM and CLARA
Data Mining and Knowledge
Discovery 45
Hierarchical Methods
BIRCH: Balanced Iterative Reducing and
Clustering using Hierarchies
Clustering feature: triplet summarizing
information about subclusters
Clustering feature tree: height-balanced
tree that stores the clustering features
Data Mining and Knowledge
Discovery 46
BIRCH Mechanism
Phase I:
Scan database to build an initial CF tree
Multilevel compression of the data
Phase II:
Apply a selected clustering algorithm to the
leaf nodes of the CF tree
Has been found to be very scalable
Data Mining and Knowledge
Discovery 47
Conclusion
The use of clustering in data mining
practice seems to be somewhat limited
due to scalability problems
More commonly used unsupervised
learning:
Association Rule Discovery
Data Mining and Knowledge
Discovery 48
Association Rule Discovery
Aims to discovery interesting correlation or
other relationships in large databases
Finds a rule of the form
if A and B then C and D
Which attributes will be included in the
relation is unknown
Data Mining and Knowledge
Discovery 49
Mining Association Rules
Similar to classification rules
Use same procedure?
Every attribute is the same
Apply to every possible expression on right
hand side
Huge number of rules Infeasible
Only want rules with high coverage/support
Data Mining and Knowledge
Discovery 50
Market Basket Analysis
Basket data: items purchased on per-
transaction basis (not cumulative, etc)
How do you boost the sales of a given product?
What other products does discontinuing a product
impact?
Which products should be shelved together?
Terminology (market basket analysis):
Item - an attribute/value pair
Item set - combination of items with min. coverage
Data Mining and Knowledge
Discovery 51
How Many k-Item Sets Have
Minimum Coverage?
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Data Mining and Knowledge
Discovery 52
Item Sets
1-Item 2-Item 3-Item 4-Item
Outlook=sunny Outlook=sunny Outlook=sunny Outlook=sunny
(5) temp=mild (2) temp=hot temp=hot
humidity=high humidity=high
(2) play=no (2)
Outlook= Outlook=sunny Outlook=sunny Outlook=sunny
overcast (4) temp=hot (2) temp=hot humidity=high
play=no (2) windy=false
play=no (2)
Outlook=rainy Outlook=sunny Outlook=sunny Outlook=over
(5) humidity=norm humidity=norm temp=hot
(2) play=yes (2) windy=false
play=no (2)
Temp=cool (4) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high temp=mild
windy=false (2) windy=false
play=yes (2)
Temp=mild (6) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high humidity=norm
play=no (3) windy=false
play=yes (2)
Data Mining and Knowledge
Discovery 53
From Sets to Rules
3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes
Association Rules: Accuracy
If humidity = normal and windy = false then play = yes 4/4
If humidity = normal and play = yes then windy = false 4/6
If windy = false and play = yes then humidity = normal 4/6
If humidity = normal then windy = false and play = yes 4/7
If windy = false then humidity = normal and play = yes 4/8
If play = yes then humidity = normal and windy = false 4/9
If - then humidity = normal and windy = false and play=yes 4/12
Data Mining and Knowledge
Discovery 54
From Sets to Rules
(continued)
4-Item Set w/coverage 2:
Temperature = cool, humidity = normal,
windy = false, play = yes
Association Rules: Accuracy
If temperature = cool, windy = false humidity = normal, play = yes 2/2
If temperature = cool, humidity = normal, windy = false play = yes 2/2
If temperature = cool, windy = false, play = yes humidity = normal 2/2
Data Mining and Knowledge
Discovery 55
Overall
Minimum coverage (2):
12 1-item sets, 47 2-item sets, 39 3-item sets, 6 4-item
sets
Minimum accuracy (100%):
58 association rules
“Best” Rules (Coverage = 4, Accuracy = 100%)
If humidity = normal and windy = false play = yes
If temperature = cool humidity = normal
If outlook = overcast play = yes
Data Mining and Knowledge
Discovery 56
Association Rule Mining
STEP 1: Find all item sets that meet
minimum coverage
STEP 2: Find all rules that meet minimum
accuracy
STEP 3: Prune
Data Mining and Knowledge
Discovery 57
Generating Item Sets
How do we generate minimum coverage item
sets in a scalable manner?
Total number of item set is huge
Grows exponentially in the number of attributes
Need an efficient algorithm:
Start by generating minimum coverage 1-item sets
Use those to generate 2-item sets, etc
Why do we only need to consider minimum
coverage 1-item sets?
Data Mining and Knowledge
Discovery 58
Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high
Item Set 2: {Windy = false}
Coverage (2) = Number of times windy is false
Item Set 3: {Humidity = high, Windy = false}
Coverage (3) = Number of times humidity is high and
windy is false
Coverage (3) Coverage(1) If Item Set 1 and 2 do not
Coverage (3) Coverage(2) both meet min. coverage
Item Set 3 cannot either
Data Mining and Knowledge
Discovery 59
Generating Item Sets
Start with all (A B C)
3-item sets (A B D)
There are only two 4-
that meet min. (A C D)
item sets that could
coverage (A C E)
possibly work
Merge to (Consider only
generate sets that start
4-item sets with the same
two attributes)
(A B C D) Candidate 4-item sets with minimum
(A C D E) coverage (must be checked)
Data Mining and Knowledge
Discovery 60
Algorithm for Generating Item
Sets
Build up from 1-item sets so that we
only consider item sets that is found by
merging two minimum coverage sets
Only consider sets that have all but one
item in common
Computational efficiency further
improved using hash tables
Data Mining and Knowledge
Discovery 61
Generating Rules
Meets min. If windy = false and play = no then
coverage
and accuracy
outlook = sunny and humidity = high
If windy = false and play = no
Meets min. then outlook = sunny
coverage
If windy = false and play = no
and accuracy
then humidity = high
Data Mining and Knowledge
Discovery 62
How Many Rules?
Want to consider every possible subset
of attributes as consequent
Have 4 attributes:
Four single consequent rules
Six double consequent rules
Two triple consequent rules
Twelve possible rules for single 4-item set!
Exponential explosion of possible rules
Data Mining and Knowledge
Discovery 63
Must We Check All?
If A and B then C and D
Coverage Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy
Number of times A and B are true
If A,B and C then D
Coverage Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy
Number of times A, B, and C are true
Data Mining and Knowledge
Discovery 64
Efficiency Improvement
A double consequent rule can only be OK if
both single consequent rules are OK
Procedure:
Start with single consequent rules
Build up double consequent rules, etc.
candidate rules
check for accuracy
In practice: need to check far fewer rules
Data Mining and Knowledge
Discovery 65
Apriori Algorithm
This is a simplified description of the
Apriori algorithm
Developed in early 90s and is the most
commonly used approach
New developments focus on
Generating item sets more efficiently
Generating rules from item sets more
efficiently
Data Mining and Knowledge
Discovery 66
Association Rule Discovery
using Weka
Parameters to be specified in Apriori:
upperBoundMinSupport: start with this value
of minimum support
delta: in each step decrease the minimum
support required by this value
lowerBoundMinSupport: final minimum
support
numRules: how many rules are generated
metricType: confidence, lift, leverage, conviction
minMetric: smallest acceptable value for a rule
Handles only nominal attributes
Data Mining and Knowledge
Discovery 67
Difficulties
Apriori algorithm improves performance
by using candidate item sets
Still some problems …
Costly to generate large number of item sets
To generate a frequent pattern of size 100 need
>21001030 candidates!
Requires repeated scans of database to check
candidates
Again, most problematic for long patterns
Data Mining and Knowledge
Discovery 68
Solution?
Can candidate generation be avoided?
New approach:
Create a frequent pattern tree (FP-tree)
stores information on frequent patterns
Use the FP-tree for mining frequent
patterns
partitioning-based
divide-and-conquer
(as opposed to bottom-up generation)
Data Mining and Knowledge
Discovery 69
Database FP-Tree
TID Items Frequent Items
100 F,A,C,D,G,I,M,P F,C,A,M,P
200 A,B,C,F,L,M,O F,C,A,B,M
300 B,F,H,J,O F,B
400 B,C,K,S,P C,B,P Root
500 A,F,C,E,L,P,M,N F,C,A,M,P
Head of
Item node links F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
(Min. support = 3) M:2 B:1
M
P
P:2 M:1
Data Mining and Knowledge
Discovery 70
Computational Effort
Each node has three fields
item name
count
node link
Also a header table with
item name
head of node link
Need two scans of the database
Collect set of frequent items
Construct the FP-tree
Data Mining and Knowledge
Discovery 71
Comments
The FP-tree is a compact data structure
The FP-tree contains all the information
related to mining frequent patterns (given the
support)
The size of the tree is bounded by the
occurrences of frequent items
The height of the tree is bounded by the
maximum number of items in a transaction
Data Mining and Knowledge
Discovery 72
Mining Patterns
Mine complete set of frequent patterns
For any frequent item A, all possible
patterns containing A can be obtained
by following A’s node links starting from
A’s head of node links
Data Mining and Knowledge
Discovery 73
Example Root
Head of
Item node links
F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
M M:2 B:1 Frequent Pattern
P (P:3)
P:2 M:1
Paths
<F:4, C:3, A:3, M:2, P:2>
Occurs twice <C:1, B:1, P:1>
Occurs ones
Data Mining and Knowledge
Discovery 74
Rule Generation
Mining complete set of association rules
has some problems
May be a large number of frequent item
sets
May be a huge number of association rules
One potential solution is to look at
closed item sets only
Data Mining and Knowledge
Discovery 75
Frequent Closed Item Sets
An item set X is a closed item set if there is
no item set X’ such that X X’ and every
transaction containing X also contains X’
A rule X Y is an association rule on a
frequent closed item set if
both X and XY are frequent closed item sets, and
there does not exist a frequent closed item set Z
such that X Z XY
Data Mining and Knowledge
Discovery 76
Example
ID Items
10 A,C,D,E,F Frequent Item Sets (min support = 2):
20 A,B,E
30 C,E,F
A (3),
40 A,C,D,F E (4),
50 C,E,F AE (2),
ACDF (2), All the closed sets
CF (3),
CEF (3),
D (2),
AC (2), Not closed! Why?
+ 12 more
Data Mining and Knowledge
Discovery 77
Mining Frequent Closed Item
Sets (CLOSET) TDB
NOTE CEFAD
C:4 EA
E:4 CEF
F:4 CFAD
A:3 Order for CEF
D:2 conditional DB
D-cond DB (D:2) A-cond DB (A:3) F-cond DB (F:4) E-cond DB (E:4)
CEFA CEF CE:3 C:4
CFA E C
CF Output: E:4
Output: CFAD:2 Output: CF:2,CEF:3
Output: A:3
EA-cond DB (EA:2)
C
Output: EA:2
Data Mining and Knowledge
Discovery 78
Mining with Taxonomies
Taxonomy:
Clothes Footwear
Outerwear Shirts Shoes Hiking Boots
Jackets Ski Pants
Generalized association rule
X Y where no item in Y is
an ancestor of an item in X
Data Mining and Knowledge
Discovery 79
Why Taxonomy?
The ‘classic’ association rule mining restricts
the rules to the leave nodes in the taxonomy
However:
Rules at lower levels may not have minimum
support and thus interesting association may go
undiscovered
Taxonomies can be used to prune uninteresting
and redundant rules
Data Mining and Knowledge
Discovery 80
Example
Item Set Support
ID Items {Jacket} 2
10 Shirt {Outerwear} 3
20 Jacket, Hiking Boots {Cloths} 4
30 Ski pants, Hiking Boots {Shoes} 2
40 Shoes {Hiking Boots} 2
50 Shoes {Footwear} 2
60 Jacket {Outerwear, Hiking Boots} 2
{Cloths, Hiking Boots} 2
{Outerwear, Footwear} 2
{Cloths, Footwear} 2
Rule Support Confidence
Outerwear Hiking Boots 2 2/3
Outerwear Footwear 2 2/3
Hiking Boots Outerwear 2 2/2
Hiking Boots Clothes 2 2/2
Data Mining and Knowledge
Discovery 81
Interesting Rules
Many way in which the interestingness of a rule can be
evaluated based on ancestors
For example:
A rule with no ancestors is interesting
A rule with ancestor(s) is interesting only if it has enough
‘relative support’
Rule ID Rule Support Item Support
1 Clothes Footwear 10 Clothes 5
2 Outerwear Footwear 8 Outerwear 2
3 Jackets Footwear 4 Jackets 1
Which rules are interesting?
Data Mining and Knowledge
Discovery 82
Discussion
Association rule mining finds expression of
the form X Y from large data sets
One of the most popular data mining tasks
Originates in market basket analysis
Key measures of performance
Support
Confidence (or accuracy)
Is support and confidence enough?
Data Mining and Knowledge
Discovery 83
Type of Rules Discovered
‘Classic’ association rule problem
All rules satisfying minimum threshold of
support and confidence
Focus on subset of rules, e.g.,
Optimized rules
What makes for an
Maximal frequent item sets interesting rule?
Closed item sets
Data Mining and Knowledge
Discovery 84
Algorithm Construction
Determine frequent item sets (all or
part)
By far the most computational time
Variations focus on this part
Generate rules from frequent item sets
Data Mining and Knowledge
Discovery 85
Generating Item Sets
Search space Bottom-up Top-down
traversed
Support
determined Counting Intersecting Counting Intersecting
Apriori* Partition FP-Growth* Eclat
Apriori-like AprioriTID
algorithms DIC
No algorithm
dominates others!
* Have discussed
Data Mining and Knowledge
Discovery 86
Applications
Market basket analysis
Classic marketing application
Applications to recommender systems
Data Mining and Knowledge
Discovery 87
Recommender
Customized goods and services
Recommend products
Collaborative filtering
similarities among users’ tastes
recommend based on other users
many on-line systems
simple algorithms
Data Mining and Knowledge
Discovery 88
Classification Approach
View as classification problem
Product either of interest or not
Induce a model, e.g., a decision tree
Classify a new product as either interesting
or not interesting
Difficulty in this approach?
Data Mining and Knowledge
Discovery 89
Association Rule Approach
Product associations
90% of users who like product A and product B also
like product C
A and B C (90%)
User associations
90% of products liked by user A and user B are also
liked by user C
Use combination of product and user
associations
Data Mining and Knowledge
Discovery 90
Advantages
‘Classic’ collaborative filtering must identify
users with similar tastes
This approach uses overlap of other users’
tastes to match given user’s taste
Can be applied to users whose tastes don’t
correlate strongly with those of other users
Can take advantage of information from, say user
A, for a recommendation to user B, even if they do
not correlate
Data Mining and Knowledge
Discovery 91
What’s Different Here?
Is this really a ‘classic’ association rule
problem?
Want to learn what products are liked by
what users
‘Semi-supervised’
Target item
User (for user associations)
Product (for product associations)
Data Mining and Knowledge
Discovery 92
Single-Consequent Rules
Only a single (target) item in the
consequent
Go through all such items
Association Rules Associations for
All possible item Recommender
combination consequent
Classification
One single item
consequent
Data Mining and Knowledge
Discovery 93