0% found this document useful (0 votes)

106 views93 pages

Unsupervised Learning

The document discusses unsupervised learning techniques for clustering unlabeled data. It describes clustering algorithms like k-means and hierarchical clustering, and how they are used to group similar patterns together without labeled outputs. It also covers the clustering process and evaluating different clustering solutions.

Uploaded by

Jeonghun Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views93 pages

Unsupervised Learning

Uploaded by

Jeonghun Lee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 93

Unsupervised Learning

 Clustering
 Unsupervised classification, that is, without
the class attribute
 Want to discover the classes

 Association Rule Discovery

 Discover correlation

Data Mining and Knowledge

Discovery 1
The Clustering Process
 Pattern representation
 Definition of pattern proximity measure
 Clustering
 Data abstraction
 Cluster validation
Data Mining and Knowledge
Discovery 2
Pattern Representation
 Number of classes
 Number of available patterns
 Circles, ellipses, squares, etc.
 Feature selection
 Can we use wrappers and filters?
 Feature extraction
 Produce new features
 E.g., principle component analysis (PCA)
Data Mining and Knowledge
Discovery 3
Pattern Proximity
 Want clusters of instances that are similar
to each other but dissimilar to others
 Need a similarity measure
 Continuous case
 Euclidean measure (compact isolated clusters)
 The squared Mahalanobis distance
1
d M (xi , x j )  (xi  x j ) (xi  x j )
T

alleviates problems with correlation

 Many more measures
Data Mining and Knowledge
Discovery 4
Pattern Proximity
 Nominal attributes
nx
d (xi , x j ) 
n
n  Number of attributes
x  Number of attributes that are the same

Data Mining and Knowledge

Discovery 5
Clustering Techniques
Clustering

Hierarchical Partitional

Single Complete Square Mixture

Link Link Error Maximization

CobWeb K-means Expectation

Maximization
Data Mining and Knowledge
Discovery 6
Technique Characteristics
 Agglomerative vs Divisive
 Agglomerative: each instance is its own cluster
and the algorithm merges clusters
 Divisive: begins with all instances in one cluster
and divides it up
 Hard vs Fuzzy
 Hard clustering assigns each instance to one
cluster whereas in fuzzy clustering assigns degree
of membership
Data Mining and Knowledge
Discovery 7
More Characteristics
 Monothetic vs Polythetic
 Polythetic: all attributes are used simultaneously, e.g., to
calculate distance (most algorithms)
 Monothetic: attributes are considered one at a time
 Incremental vs Non-Incremental
 With large data sets it may be necessary to consider only
part of the data at a time (data mining)
 Incremental works instance by instance

Data Mining and Knowledge

Discovery 8
Hierarchical Clustering
Dendrogram
S
F G i
m
i
l
a
r
i
C DE t
B y
A
A B C D E F G

Data Mining and Knowledge

Discovery 9
Hierarchical Algorithms
 Single-link
 Distance between two clusters set equal to the
minimum of distances between all instances
 More versatile
 Produces (sometimes too) elongated clusters
 Complete-link
 Distance between two clusters set equal to maximum
of all distances between instances in the clusters
 Tightly bound, compact clusters
 Often more useful in practice
Data Mining and Knowledge
Discovery 10
Example: Clusters Found
2
Single-Link 1 1 2 2 2 2
1 1 1 2 2
1 * * * * * * * * 2* 2
1 1 2
1 11 2
2 2 2

2
1 1 2 2 2 2
1 1 1 2
Complete-Link 1
1
* * * * * * * * 2* 2 2
2
1
1 11 2
2 2 2

Data Mining and Knowledge

Discovery 11
Partitional Clustering
 Output a single partition of the data
into clusters
 Good for large data sets
 Determining the number of clusters is a
major challenge

Data Mining and Knowledge

Discovery 12
K-Means
Predetermined
number of clusters

Start with seed

clusters of one
element

Seeds
Data Mining and Knowledge
Discovery 13
Assign Instances to Clusters

Data Mining and Knowledge

Discovery 14
Find New Centroids

Data Mining and Knowledge

Discovery 15
New Clusters

Data Mining and Knowledge

Discovery 16
Discussion: k-means
 Applicable to fairly large data sets
 Sensitive to initial centers
 Use other heuristics to find good initial
centers
 Converges to a local optimum
 Specifying the number of centers very
subjective

Data Mining and Knowledge

Discovery 17
Clustering in Weka
 Clustering algorithms in Weka
 K-Means
 Expectation Maximization (EM)

 Cobweb

 hierarchical, incremental, and

agglomerative

Data Mining and Knowledge

Discovery 18
CobWeb
 Algorithm (main) characteristics:
 Hierarchical and incremental
 Uses category utility
Improvemen t in probability estimate
The k clusters because of instancecluster assigment

Pr C 
 l  i ij l
Pr a  v | C 
2
 Pr ai  vij
2
  
CU C1 , C2 ,..., Ck  
l i j

All possible values

 Why divide by k? for attribute ai

Data Mining and Knowledge

Discovery 19
Category Utility
 If each instance in its own cluster
1 vij  actual value of instance
 
Pr ai  vij | Cl  
0 otherwise
 Category utility function becomes
n   Pr ai  vij  
2

CU C1 , C2 ,..., Ck  
i j

k
 Without k it would always be best for each
instance to have its own cluster, overfitting!
Data Mining and Knowledge
Discovery 20
The Weather Problem
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Data Mining and Knowledge
Discovery 21
Weather Data (without Play)
 Label instances: a,b,….,n
Start by putting Add another instance
the first instance in its own cluster
in its own cluster

a a b

Data Mining and Knowledge

Discovery 22
Adding the Third Instance
Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster

b a a c b

a c b c

Highest utility

Data Mining and Knowledge

Discovery 23
Adding Instance f
First instance not to get
its own cluster:

a b c d

e f
Look at the instances:
Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!

Data Mining and Knowledge

Discovery 24
Add Instance g
Look at the instances:
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE

a b c d

e f g

Data Mining and Knowledge

Discovery 25
Add Instance h
Look at the instances: Runner up

A) Sunny Hot High FALSE

D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange: Best matching node

Merged into a
single cluster
before h is added
b c

a d h e f g

(Splitting is also possible)

Data Mining and Knowledge
Discovery 26
Final Hierarchy

g f j m n

a d h c l e i

b k What next?

Data Mining and Knowledge

Discovery 27
Dendrogram  Clusters

g f j m n

a d h c l e i

b k What do a, b, c, d, h, k, and l
have in common?

Data Mining and Knowledge

Discovery 28
Numerical Attributes
 Assume normal distribution
 1 1
l PrCl  2  i     
1

CU C1 , C2 ,..., Ck    il i 

k
 Problems with zero variance!
 The acuity parameter imposes a minimum
variance

Data Mining and Knowledge

Discovery 29
Hierarchy Size (Scalability)
 May create very large hierarchy
 The cutoff parameter is uses to suppress
growth
If
CU C1 , C2 ,..., Ck   Cutoff


cut node off.

Data Mining and Knowledge

Discovery 30
Discussion
 Advantages
 Incremental  scales to large number of instances
 Cutoff  limits size of hierarchy
 Handles mixed attributes
 Disadvantages
 Incremental  sensitive to order of instances?
 Arbitrary choice of parameters:
 divide by k,
 artificial minimum value for variance of numeric attributes,
 ad hoc cutoff value
Data Mining and Knowledge
Discovery 31
Probabilistic Perspective
 Most likely set of clusters given data
 Probability of each instance belonging to a
cluster
 Assumption: instances are drawn from one of
several distributions
 Goal: estimate the parameters of these
distributions
 Usually: assume distributions are normal

Data Mining and Knowledge

Discovery 32
Mixture Resolution
 Mixture: set of k probability distributions
 Represent the k clusters
 Probabilities that an instance takes certain
attribute values given it is in the cluster

 What is the probability an instance belongs to

a cluster (or a distribution)

Data Mining and Knowledge

Discovery 33
One Numeric Attribute
Two cluster mixture model:
Cluster B
Cluster A

Attribute
Given some data, how can you determine the parameters:
 A  Mean for Cluster A
 A  Standard deviation for Cluster A
 B  Mean for Cluster B
 B  Standard deviation for Cluster B
p A  Probabilit y of being in Cluster A
Data Mining and Knowledge
Discovery 34
Problems
 If we knew which instance came from each
cluster we could estimate these values
 If we knew the parameters we could calculate
the probability that an instance belongs to
each cluster
Prx | A Pr[ A] f ( x;  A ,  A ) p A
PrA | x   
Pr[ x] Pr[ x]
( x )2
1 
f ( x;  A ,  A )  e 2 2
.
2
Data Mining and Knowledge
Discovery 35
EM Algorithm
 Expectation Maximization (EM)
 Start with initial values for the parameters
 Calculate the cluster probabilities for each instance
 Re-estimate the values for the parameters
 Repeat
 General purpose maximum likelihood
estimate algorithm for missing data
 Can also be used to train Bayesian networks
(later)
Data Mining and Knowledge
Discovery 36
Beyond Normal Models
 More than one class:
 Straightforward
 More than one numeric attribute
 Easy if assume attributes independent
 If dependent attributes, treat them jointly
using the bivariate normal
 Nominal attributes
 No more normal distribution!

Data Mining and Knowledge

Discovery 37
EM using Weka
 Options
 numClusters: set number of clusters.
 Default = -1 selects it automatically
 maxIterations: maximum number of iterations
 seed -- random number seed
 minStdDev -- set minimum allowable standard
deviation

Data Mining and Knowledge

Discovery 38
Other Clustering
 Artificial Neural Networks (ANN)
 Random search
 Genetic Algorithms (GA)
 GA used to find initial centroids for k-means
 Simulated Annealing (SA)
 Tabu Search (TS)
 Support Vector Machines (SVM)
 Will discuss GA and SVM later
Data Mining and Knowledge
Discovery 39
Applications
 Image segmentation
 Object and Character Recognition
 Data Mining:
 Stand-alone to gain insight into the data
 Preprocess before classification that
operates on the detected clusters
Data Mining and Knowledge
Discovery 40
DM Clustering Challenges
 Data mining deals with large databases
 Scalability with respect to number of instance
 Use a random sample (possible bias)
 Dealing with mixed data
 Many algorithms only make sense for numeric
data
 High dimensional problems
 Can the algorithm handle many attributes?
 How do we interpret a cluster in high dimensions?

Data Mining and Knowledge

Discovery 41
Other (General) Challenges
 Shape of clusters
 Minimum domain knowledge (e.g.,
knowing the number of clusters)
 Noisy data
 Insensitivity to instance order
 Interpretability and usability

Data Mining and Knowledge

Discovery 42
Clustering for DM
 Main issue is scalability to large databases
 Many algorithms have been developed for
scalable clustering:
 Partitional methods: CLARA, CLARANS
 Hierarchical methods: AGNES, DIANA, BIRCH,
CURE, Chameleon

Data Mining and Knowledge

Discovery 43
Practical Partitional Clustering
Algorithms
 Classic k-Means (1967)
 Work from 1990 and later:
 k-Medoids
 Uses the mediod instead of the centroid
 Less sensitive to outliers and noise
 Computations more costly
 PAM (Partitioning Around Mediods)
algorithm

Data Mining and Knowledge

Discovery 44
Large-Scale Problems
 CLARA: Clustering LARge Applications
 Select several random samples of instances
 Apply PAM to each
 Return the best clusters
 CLARANS:
 Similar to CLARA
 Draws samples randomly while searching
 More effective than PAM and CLARA
Data Mining and Knowledge
Discovery 45
Hierarchical Methods
 BIRCH: Balanced Iterative Reducing and
Clustering using Hierarchies
 Clustering feature: triplet summarizing
information about subclusters

 Clustering feature tree: height-balanced

tree that stores the clustering features

Data Mining and Knowledge

Discovery 46
BIRCH Mechanism
 Phase I:
 Scan database to build an initial CF tree
 Multilevel compression of the data
 Phase II:
 Apply a selected clustering algorithm to the
leaf nodes of the CF tree
 Has been found to be very scalable

Data Mining and Knowledge

Discovery 47
Conclusion
 The use of clustering in data mining
practice seems to be somewhat limited
due to scalability problems
 More commonly used unsupervised
learning:

Association Rule Discovery

Data Mining and Knowledge
Discovery 48
Association Rule Discovery
 Aims to discovery interesting correlation or
other relationships in large databases
 Finds a rule of the form
if A and B then C and D
 Which attributes will be included in the
relation is unknown

Data Mining and Knowledge

Discovery 49
Mining Association Rules

 Similar to classification rules

 Use same procedure?
 Every attribute is the same
 Apply to every possible expression on right
hand side
 Huge number of rules  Infeasible

 Only want rules with high coverage/support

Data Mining and Knowledge

Discovery 50
Market Basket Analysis

 Basket data: items purchased on per-

transaction basis (not cumulative, etc)
 How do you boost the sales of a given product?
 What other products does discontinuing a product
impact?
 Which products should be shelved together?
 Terminology (market basket analysis):
 Item - an attribute/value pair
 Item set - combination of items with min. coverage

Data Mining and Knowledge

Discovery 51
How Many k-Item Sets Have
Minimum Coverage?
Outlook Temp. Humidity Windy Play
Sunny Hot High FALSE No
Sunny Hot High TRUE No
Overcast Hot High FALSE Yes
Rainy Mild High FALSE Yes
Rainy Cool Normal FALSE Yes
Rainy Cool Normal TRUE No
Overcast Cool Normal TRUE Yes
Sunny Mild High FALSE No
Sunny Cool Normal FALSE Yes
Rainy Mild Normal FALSE Yes
Sunny Mild Normal TRUE Yes
Overcast Mild High TRUE Yes
Overcast Hot Normal FALSE Yes
Rainy Mild High TRUE No
Data Mining and Knowledge
Discovery 52
Item Sets
1-Item 2-Item 3-Item 4-Item

Outlook=sunny Outlook=sunny Outlook=sunny Outlook=sunny

(5) temp=mild (2) temp=hot temp=hot
humidity=high humidity=high
(2) play=no (2)
Outlook= Outlook=sunny Outlook=sunny Outlook=sunny
overcast (4) temp=hot (2) temp=hot humidity=high
play=no (2) windy=false
play=no (2)
Outlook=rainy Outlook=sunny Outlook=sunny Outlook=over
(5) humidity=norm humidity=norm temp=hot
(2) play=yes (2) windy=false
play=no (2)
Temp=cool (4) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high temp=mild
windy=false (2) windy=false
play=yes (2)
Temp=mild (6) Outlook=sunny Outlook=sunny Outlook=rainy
windy=true (2) humidity=high humidity=norm
play=no (3) windy=false
play=yes (2)
Data Mining and Knowledge
Discovery 53
From Sets to Rules
3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes

Association Rules: Accuracy

If humidity = normal and windy = false then play = yes 4/4

If humidity = normal and play = yes then windy = false 4/6
If windy = false and play = yes then humidity = normal 4/6
If humidity = normal then windy = false and play = yes 4/7
If windy = false then humidity = normal and play = yes 4/8
If play = yes then humidity = normal and windy = false 4/9
If - then humidity = normal and windy = false and play=yes 4/12

Data Mining and Knowledge

Discovery 54
From Sets to Rules
(continued)
4-Item Set w/coverage 2:
Temperature = cool, humidity = normal,
windy = false, play = yes

Association Rules: Accuracy

If temperature = cool, windy = false  humidity = normal, play = yes 2/2

If temperature = cool, humidity = normal, windy = false  play = yes 2/2
If temperature = cool, windy = false, play = yes  humidity = normal 2/2

Data Mining and Knowledge

Discovery 55
Overall

 Minimum coverage (2):

 12 1-item sets, 47 2-item sets, 39 3-item sets, 6 4-item
sets

 Minimum accuracy (100%):

 58 association rules
“Best” Rules (Coverage = 4, Accuracy = 100%)

If humidity = normal and windy = false  play = yes

If temperature = cool  humidity = normal
If outlook = overcast  play = yes
Data Mining and Knowledge
Discovery 56
Association Rule Mining

STEP 1: Find all item sets that meet

minimum coverage

STEP 2: Find all rules that meet minimum

accuracy

STEP 3: Prune
Data Mining and Knowledge
Discovery 57
Generating Item Sets
 How do we generate minimum coverage item
sets in a scalable manner?
 Total number of item set is huge
 Grows exponentially in the number of attributes
 Need an efficient algorithm:
 Start by generating minimum coverage 1-item sets
 Use those to generate 2-item sets, etc
 Why do we only need to consider minimum
coverage 1-item sets?

Data Mining and Knowledge

Discovery 58
Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high

Item Set 2: {Windy = false}

Coverage (2) = Number of times windy is false

Item Set 3: {Humidity = high, Windy = false}

Coverage (3) = Number of times humidity is high and
windy is false

Coverage (3)  Coverage(1) If Item Set 1 and 2 do not

Coverage (3)  Coverage(2) both meet min. coverage
Item Set 3 cannot either
Data Mining and Knowledge
Discovery 59
Generating Item Sets
Start with all (A B C)
3-item sets (A B D)
There are only two 4-
that meet min. (A C D)
item sets that could
coverage (A C E)
possibly work
Merge to (Consider only
generate sets that start
4-item sets with the same
two attributes)

(A B C D) Candidate 4-item sets with minimum

(A C D E) coverage (must be checked)

Data Mining and Knowledge

Discovery 60
Algorithm for Generating Item
Sets
 Build up from 1-item sets so that we
only consider item sets that is found by
merging two minimum coverage sets
 Only consider sets that have all but one
item in common
 Computational efficiency further
improved using hash tables

Data Mining and Knowledge

Discovery 61
Generating Rules
Meets min. If windy = false and play = no then
coverage
and accuracy
outlook = sunny and humidity = high


If windy = false and play = no
Meets min. then outlook = sunny
coverage
If windy = false and play = no
and accuracy
then humidity = high

Data Mining and Knowledge

Discovery 62
How Many Rules?
 Want to consider every possible subset
of attributes as consequent
 Have 4 attributes:
 Four single consequent rules
 Six double consequent rules
 Two triple consequent rules
 Twelve possible rules for single 4-item set!
 Exponential explosion of possible rules
Data Mining and Knowledge
Discovery 63
Must We Check All?
If A and B then C and D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A and B are true
If A,B and C then D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A, B, and C are true

Data Mining and Knowledge

Discovery 64
Efficiency Improvement
 A double consequent rule can only be OK if
both single consequent rules are OK
 Procedure:
 Start with single consequent rules
 Build up double consequent rules, etc.
 candidate rules
 check for accuracy
 In practice: need to check far fewer rules

Data Mining and Knowledge

Discovery 65
Apriori Algorithm
 This is a simplified description of the
Apriori algorithm
 Developed in early 90s and is the most
commonly used approach
 New developments focus on
 Generating item sets more efficiently
 Generating rules from item sets more
efficiently

Data Mining and Knowledge

Discovery 66
Association Rule Discovery
using Weka
 Parameters to be specified in Apriori:
 upperBoundMinSupport: start with this value
of minimum support
 delta: in each step decrease the minimum
support required by this value
 lowerBoundMinSupport: final minimum
support
 numRules: how many rules are generated
 metricType: confidence, lift, leverage, conviction
 minMetric: smallest acceptable value for a rule
 Handles only nominal attributes
Data Mining and Knowledge
Discovery 67
Difficulties
 Apriori algorithm improves performance
by using candidate item sets
 Still some problems …
 Costly to generate large number of item sets
 To generate a frequent pattern of size 100 need
>21001030 candidates!
 Requires repeated scans of database to check
candidates
 Again, most problematic for long patterns
Data Mining and Knowledge
Discovery 68
Solution?
 Can candidate generation be avoided?
 New approach:
 Create a frequent pattern tree (FP-tree)
 stores information on frequent patterns
 Use the FP-tree for mining frequent
patterns
 partitioning-based
 divide-and-conquer
 (as opposed to bottom-up generation)
Data Mining and Knowledge
Discovery 69
Database  FP-Tree
TID Items Frequent Items
100 F,A,C,D,G,I,M,P F,C,A,M,P
200 A,B,C,F,L,M,O F,C,A,B,M
300 B,F,H,J,O F,B
400 B,C,K,S,P C,B,P Root
500 A,F,C,E,L,P,M,N F,C,A,M,P
Head of
Item node links F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
(Min. support = 3) M:2 B:1
M
P
P:2 M:1
Data Mining and Knowledge
Discovery 70
Computational Effort
 Each node has three fields
 item name
 count
 node link
 Also a header table with
 item name
 head of node link
 Need two scans of the database
 Collect set of frequent items
 Construct the FP-tree
Data Mining and Knowledge
Discovery 71
Comments
 The FP-tree is a compact data structure
 The FP-tree contains all the information
related to mining frequent patterns (given the
support)
 The size of the tree is bounded by the
occurrences of frequent items
 The height of the tree is bounded by the
maximum number of items in a transaction
Data Mining and Knowledge
Discovery 72
Mining Patterns
 Mine complete set of frequent patterns

 For any frequent item A, all possible

patterns containing A can be obtained
by following A’s node links starting from
A’s head of node links

Data Mining and Knowledge

Discovery 73
Example Root

Head of
Item node links
F:4 C:1
F
C:3 B:1 B:1
C
A A:3 P:1
B
M M:2 B:1 Frequent Pattern
P (P:3)
P:2 M:1
Paths
<F:4, C:3, A:3, M:2, P:2>
Occurs twice <C:1, B:1, P:1>
Occurs ones
Data Mining and Knowledge
Discovery 74
Rule Generation
 Mining complete set of association rules
has some problems
 May be a large number of frequent item
sets
 May be a huge number of association rules

 One potential solution is to look at

closed item sets only
Data Mining and Knowledge
Discovery 75
Frequent Closed Item Sets
 An item set X is a closed item set if there is
no item set X’ such that X  X’ and every
transaction containing X also contains X’

 A rule X Y is an association rule on a

frequent closed item set if
 both X and XY are frequent closed item sets, and
 there does not exist a frequent closed item set Z
such that X  Z  XY
Data Mining and Knowledge
Discovery 76
Example
ID Items
10 A,C,D,E,F Frequent Item Sets (min support = 2):
20 A,B,E
30 C,E,F
A (3),
40 A,C,D,F E (4),
50 C,E,F AE (2),
ACDF (2), All the closed sets
CF (3),
CEF (3),
D (2),
AC (2), Not closed! Why?
+ 12 more

Data Mining and Knowledge

Discovery 77
Mining Frequent Closed Item
Sets (CLOSET) TDB
NOTE CEFAD
C:4 EA
E:4 CEF
F:4 CFAD
A:3 Order for CEF
D:2 conditional DB

D-cond DB (D:2) A-cond DB (A:3) F-cond DB (F:4) E-cond DB (E:4)

CEFA CEF CE:3 C:4
CFA E C
CF Output: E:4
Output: CFAD:2 Output: CF:2,CEF:3
Output: A:3
EA-cond DB (EA:2)
C

Output: EA:2
Data Mining and Knowledge
Discovery 78
Mining with Taxonomies
Taxonomy:
Clothes Footwear

Outerwear Shirts Shoes Hiking Boots

Jackets Ski Pants

Generalized association rule

X Y where no item in Y is
an ancestor of an item in X

Data Mining and Knowledge

Discovery 79
Why Taxonomy?
 The ‘classic’ association rule mining restricts
the rules to the leave nodes in the taxonomy
 However:
 Rules at lower levels may not have minimum
support and thus interesting association may go
undiscovered
 Taxonomies can be used to prune uninteresting
and redundant rules
Data Mining and Knowledge
Discovery 80
Example
Item Set Support
ID Items {Jacket} 2
10 Shirt {Outerwear} 3
20 Jacket, Hiking Boots {Cloths} 4
30 Ski pants, Hiking Boots {Shoes} 2
40 Shoes {Hiking Boots} 2
50 Shoes {Footwear} 2
60 Jacket {Outerwear, Hiking Boots} 2
{Cloths, Hiking Boots} 2
{Outerwear, Footwear} 2
{Cloths, Footwear} 2

Rule Support Confidence

Outerwear  Hiking Boots 2 2/3
Outerwear  Footwear 2 2/3
Hiking Boots  Outerwear 2 2/2
Hiking Boots  Clothes 2 2/2

Data Mining and Knowledge

Discovery 81
Interesting Rules
 Many way in which the interestingness of a rule can be
evaluated based on ancestors
 For example:
 A rule with no ancestors is interesting
 A rule with ancestor(s) is interesting only if it has enough
‘relative support’

Rule ID Rule Support Item Support

1 Clothes  Footwear 10 Clothes 5
2 Outerwear  Footwear 8 Outerwear 2
3 Jackets  Footwear 4 Jackets 1

 Which rules are interesting?

Data Mining and Knowledge
Discovery 82
Discussion
 Association rule mining finds expression of
the form X Y from large data sets
 One of the most popular data mining tasks
 Originates in market basket analysis
 Key measures of performance
 Support
 Confidence (or accuracy)
 Is support and confidence enough?
Data Mining and Knowledge
Discovery 83
Type of Rules Discovered
 ‘Classic’ association rule problem
 All rules satisfying minimum threshold of
support and confidence

 Focus on subset of rules, e.g.,

 Optimized rules
What makes for an
 Maximal frequent item sets interesting rule?
 Closed item sets

Data Mining and Knowledge

Discovery 84
Algorithm Construction
 Determine frequent item sets (all or
part)
 By far the most computational time
 Variations focus on this part

 Generate rules from frequent item sets

Data Mining and Knowledge

Discovery 85
Generating Item Sets
Search space Bottom-up Top-down
traversed

Support
determined Counting Intersecting Counting Intersecting
Apriori* Partition FP-Growth* Eclat
Apriori-like AprioriTID
algorithms DIC
No algorithm
dominates others!
* Have discussed
Data Mining and Knowledge
Discovery 86
Applications
 Market basket analysis
 Classic marketing application

 Applications to recommender systems

Data Mining and Knowledge

Discovery 87
Recommender
 Customized goods and services
 Recommend products
 Collaborative filtering
 similarities among users’ tastes
 recommend based on other users
 many on-line systems
 simple algorithms

Data Mining and Knowledge

Discovery 88
Classification Approach
 View as classification problem
 Product either of interest or not
 Induce a model, e.g., a decision tree
 Classify a new product as either interesting
or not interesting
 Difficulty in this approach?

Data Mining and Knowledge

Discovery 89
Association Rule Approach
 Product associations
 90% of users who like product A and product B also
like product C
A and B  C (90%)
 User associations
 90% of products liked by user A and user B are also
liked by user C
 Use combination of product and user
associations
Data Mining and Knowledge
Discovery 90
Advantages
 ‘Classic’ collaborative filtering must identify
users with similar tastes
 This approach uses overlap of other users’
tastes to match given user’s taste
 Can be applied to users whose tastes don’t
correlate strongly with those of other users
 Can take advantage of information from, say user
A, for a recommendation to user B, even if they do
not correlate

Data Mining and Knowledge

Discovery 91
What’s Different Here?
 Is this really a ‘classic’ association rule
problem?
 Want to learn what products are liked by
what users
 ‘Semi-supervised’
 Target item
 User (for user associations)
 Product (for product associations)
Data Mining and Knowledge
Discovery 92
Single-Consequent Rules
 Only a single (target) item in the
consequent
 Go through all such items
Association Rules Associations for
All possible item Recommender
combination consequent

Classification
One single item
consequent
Data Mining and Knowledge
Discovery 93

Clustering
No ratings yet
Clustering
84 pages
Clustering
No ratings yet
Clustering
23 pages
Lecture 7
No ratings yet
Lecture 7
45 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Week 7
No ratings yet
Week 7
32 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Clustering
No ratings yet
Clustering
80 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
L5 Clustering
No ratings yet
L5 Clustering
6 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
INTRODUCTION Data Mining
No ratings yet
INTRODUCTION Data Mining
43 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
AIML-HC Mod 02
No ratings yet
AIML-HC Mod 02
65 pages
Clustering
No ratings yet
Clustering
65 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Chapter 4
No ratings yet
Chapter 4
60 pages
Clustering
No ratings yet
Clustering
49 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
Clustering
No ratings yet
Clustering
67 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
Clustering
No ratings yet
Clustering
32 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Cluster Analysis Techniques Guide
No ratings yet
Cluster Analysis Techniques Guide
37 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Clustering Lecture
No ratings yet
Clustering Lecture
49 pages
Slides Courtesy: Ling Chen [email protected]
No ratings yet
Slides Courtesy: Ling Chen [email protected]
42 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
96 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Cluster Analysis & K-Means Limitations
No ratings yet
Cluster Analysis & K-Means Limitations
84 pages
تنقيب بيانات 7 بعد التعديل Maj
No ratings yet
تنقيب بيانات 7 بعد التعديل Maj
35 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Data Mining K-Means Algorithm
No ratings yet
Data Mining K-Means Algorithm
36 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Q & A Unit 3 - Clustering Methods
No ratings yet
Q & A Unit 3 - Clustering Methods
21 pages
Data Mining Ii Sol
No ratings yet
Data Mining Ii Sol
106 pages
Clustering
No ratings yet
Clustering
38 pages
Clustering 1
No ratings yet
Clustering 1
18 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
DMDW Unit 5 PPT Cluster Analysis 06.01.2021
No ratings yet
DMDW Unit 5 PPT Cluster Analysis 06.01.2021
112 pages
Productflyer - 978 0 387 95576 6
No ratings yet
Productflyer - 978 0 387 95576 6
1 page
BSL Appendix A Section 6 Is A Useful Reference, in Addition To My Lecture Notes
No ratings yet
BSL Appendix A Section 6 Is A Useful Reference, in Addition To My Lecture Notes
1 page
PHD Thesis Defense (Final)
No ratings yet
PHD Thesis Defense (Final)
96 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Clustering
No ratings yet
Clustering
28 pages
R Radius of Curvature y T T R R R Y) y R
No ratings yet
R Radius of Curvature y T T R R R Y) y R
1 page
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Ancient Greek Creation Myths
No ratings yet
Ancient Greek Creation Myths
2 pages
The Gods (Lecture 6 Slides) The Olympians
No ratings yet
The Gods (Lecture 6 Slides) The Olympians
6 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Hexagonal Crystal Directions Guide
No ratings yet
Hexagonal Crystal Directions Guide
6 pages
Passive Voice
No ratings yet
Passive Voice
4 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Aphrodite (Lecture 9 Slides)
No ratings yet
Aphrodite (Lecture 9 Slides)
5 pages
2014-Murtagh - Pierre - Wards Hierarchical Agglomerative Clustering Method - Which Algorithms Implement Ward's Criterion
No ratings yet
2014-Murtagh - Pierre - Wards Hierarchical Agglomerative Clustering Method - Which Algorithms Implement Ward's Criterion
22 pages
CSE 319 Pattern Recognition: Clustering
No ratings yet
CSE 319 Pattern Recognition: Clustering
58 pages
Cluster Analysis for Data Scientists
No ratings yet
Cluster Analysis for Data Scientists
30 pages
Bohr Magneton11s PDF
No ratings yet
Bohr Magneton11s PDF
1 page
ML-1 Project
No ratings yet
ML-1 Project
30 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Ps Guide
No ratings yet
Ps Guide
2 pages
Cluster Analysis Techniques
No ratings yet
Cluster Analysis Techniques
28 pages
Historical Background (Lecture 1 Slides)
No ratings yet
Historical Background (Lecture 1 Slides)
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
19 pages
Understanding Mythology Basics
No ratings yet
Understanding Mythology Basics
3 pages
EECS 445 Machine Learning Exam
No ratings yet
EECS 445 Machine Learning Exam
30 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Jaggia BA 1e Chap011 PPT
No ratings yet
Jaggia BA 1e Chap011 PPT
32 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
F2P Game Data Time Series Clustering
No ratings yet
F2P Game Data Time Series Clustering
8 pages
Seminar Report Maddu Ravindra 19103335 - Ravindra Babu
No ratings yet
Seminar Report Maddu Ravindra 19103335 - Ravindra Babu
21 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
11 pages
Machine Learning in Drug Discovery and Development Part 1: A Primer
No ratings yet
Machine Learning in Drug Discovery and Development Part 1: A Primer
14 pages
Athena (Lecture Slides 8) Pallas Athena (Minerva) : Intro
No ratings yet
Athena (Lecture Slides 8) Pallas Athena (Minerva) : Intro
4 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Creation of Mortals (Lecture 5 Slides) - Hesiod's Versions
No ratings yet
Creation of Mortals (Lecture 5 Slides) - Hesiod's Versions
5 pages
Clustering Theory Applications and Algorithms
No ratings yet
Clustering Theory Applications and Algorithms
9 pages
Cluster Analysis for Data Mining
No ratings yet
Cluster Analysis for Data Mining
43 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
Unit 1
No ratings yet
Unit 1
15 pages
Apollo's Myths and Shrines
No ratings yet
Apollo's Myths and Shrines
6 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
3 Clustering
No ratings yet
3 Clustering
18 pages
Cluster
100% (1)
Cluster
72 pages
Jmp053 Cluster Analysis in The Public Sector
No ratings yet
Jmp053 Cluster Analysis in The Public Sector
18 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
14 pages
Advanced Electromagnetism Course
No ratings yet
Advanced Electromagnetism Course
26 pages