0% found this document useful (0 votes)

36 views14 pages

CURE

The document describes CURE, a clustering algorithm for large datasets. CURE uses multiple representative points to model each cluster, capturing varying shapes and sizes. It is robust to outliers and can identify non-spherical clusters unlike previous approaches. The algorithm and enhancements for scaling to large data are explained.

Uploaded by

Punitha viswanathan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views14 pages

CURE

Uploaded by

Punitha viswanathan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 14

CURE: An Efficient Clustering

Algorithm for Large Databases

Authors: Presentation by:
Vuk Malbasa
Sudipto Guha, For
Rajeev Rastogi, CIS664
Kyuseok Shim Prof. Vasilis
Megalooekonomou
Overview
• Introduction
• Previous Approaches
• Drawbacks of previous approaches
• CURE: Approach
• Enhancements for Large Datasets
• Conclusions
Introduction
• Clustering problem: Given points separate them
into clusters so that data points within a cluster
are more similar to each other than points in
different clusters.
• Traditional clustering techniques either favor
clusters with spherical shapes and similar sizes
or are fragile to the presence of outliers.
• CURE is robust to outliers and identifies clusters
with non-spherical shapes, and wide variances
in size.
• Each cluster is represented by a fixed number of
well scattered points.
Introduction

• CURE is a hierarchical clustering

technique where each partition is nested
into the next partition in the sequence.
• CURE is an agglomerative algorithm
where disjoint clusters are successively
merged until the number of clusters
reduces to the desired number of clusters.
Previous Approaches

• At each step in agglomerative clustering the

merged clusters are ones where some distance
metric is minimized.
• This distance metric can be:
– Distance between means of clusters, dmean
– Average distance between all points in clusters, dave
– Maximal distance between points in clusters, dmax
– Minimal distance between points in clusters, dmin
Drawbacks of previous approaches
• For situations where clusters vary in size
dave, dmax and dmean distance metrics will
split large clusters into parts.
• Non spherical clusters will be split by dmean
• Clusters connected by outliers will be
connected if the dmin metric is used
• None of the stated approaches work well
in the presence of non spherical clusters
or outliers.
Drawbacks of previous approaches
CURE: Approach
• CURE is positioned between centroid
based (dave) and all point (dmin) extremes.
• A constant number of well scattered
pointsis used to capture the shape and
extend of a cluster.
• The points are shrunk towards the centroid
of the cluster by a factor α.
• These well scattered and shrunk points
are used as representative of the cluster.
CURE: Approach
• Scattered points approach alleviates
shortcomings of dave and dmin.
– Since multiple representatives are used the
splitting of large clusters is avoided.
– Multiple representatives allow for discovery of
non spherical clusters.
– The shrinking phase will affect outliers more
than other points since their distance from the
centroid will be decreased more than that of
regular points.
CURE: Approach
• Initially since all points are in separate clusters, each cluster is
defined by the point in the cluster.
• Clusters are merged until they contain at least c points.
• The first scattered point in a cluster in one which is farthest away
from the clusters centroid.
• Other scattered points are chosen so that their distance from
previously chosen scattered points in maximal.
• When c well scattered points are calculated they are shrunk by
some factor α (r = p + α*(mean-p)).
• After clusters have c representatives the distance between two
clusters is the distance between two of the closest representatives
of each cluster
• Every time two clusters are merged their representatives are re-
calculated.
Enhancements for Large Datasets
• Random sampling
– Filters outliers and allows the dataset to fit into
memory
• Partitioning
– First cluster in partitions then merge partitions
• Labeling Data on Disk
– The final labeling phase can be done by NN on
already chosen cluster representatives
• Handling outliers
– Outliers are partially eliminated and spread out by
random sampling, are identified because they belong
to small clusters that grow slowly
Conclusions
• CURE can identify clusters that are not
spherical but also ellipsoid
• CURE is robust to outliers
• CURE correctly clusters data with large
differences in cluster size
• Running time for a low dimensional
dataset with s points is O(s2)
• Using partitioning and sampling CURE can
be applied to large datasets
Thanks!
?

TAT Practical Guide
100% (2)
TAT Practical Guide
133 pages
Counselor Handbook PDF
100% (1)
Counselor Handbook PDF
546 pages
How To Work With Spirits - Taylor Ellwood
No ratings yet
How To Work With Spirits - Taylor Ellwood
8 pages
Simonkucher Case Interview Prep 2015 PDF
100% (1)
Simonkucher Case Interview Prep 2015 PDF
23 pages
Strategy: The Totality of Decisions - 47
No ratings yet
Strategy: The Totality of Decisions - 47
1 page
Alfred Adler's Individual Psychology - QUIZ
No ratings yet
Alfred Adler's Individual Psychology - QUIZ
26 pages
Breeds of Animal Week 4
No ratings yet
Breeds of Animal Week 4
5 pages
Common and Proper Nouns Lesson Plan
80% (5)
Common and Proper Nouns Lesson Plan
4 pages
Pre-Hiring, Hiring, and Post-Hiring
0% (1)
Pre-Hiring, Hiring, and Post-Hiring
11 pages
Week 10
No ratings yet
Week 10
84 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Siena News Fall 2010
No ratings yet
Siena News Fall 2010
36 pages
Lecture 7
No ratings yet
Lecture 7
45 pages
User Experience Design Project Guide
No ratings yet
User Experience Design Project Guide
70 pages
Cluster
100% (1)
Cluster
72 pages
Open Lecture 13 - DBSCAN PDF
No ratings yet
Open Lecture 13 - DBSCAN PDF
33 pages
Beginner's Guide to Korean Basics
100% (1)
Beginner's Guide to Korean Basics
12 pages
DS143 Group 13 Presentation-1
No ratings yet
DS143 Group 13 Presentation-1
27 pages
Production Diary Steven Tweedale
No ratings yet
Production Diary Steven Tweedale
4 pages
Case 7 (Interviewee 1) Gulfraz Ahmed
No ratings yet
Case 7 (Interviewee 1) Gulfraz Ahmed
8 pages
Asit Kumar Das - M4 BDA Clustering
No ratings yet
Asit Kumar Das - M4 BDA Clustering
99 pages
Density Based Clustering Methods
No ratings yet
Density Based Clustering Methods
14 pages
07 Clustering
No ratings yet
07 Clustering
44 pages
K Medoids
No ratings yet
K Medoids
101 pages
Module 5
No ratings yet
Module 5
43 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Pega CSA 7.4 Certification Guide
0% (1)
Pega CSA 7.4 Certification Guide
2 pages
Clustering Class
No ratings yet
Clustering Class
103 pages
Efficient Clustering Algorithm For Large Database
No ratings yet
Efficient Clustering Algorithm For Large Database
25 pages
Clustering
No ratings yet
Clustering
53 pages
Assessing Children's Pain: R-Flacc Pain Rating Scale For Children With Developmental Disability
0% (1)
Assessing Children's Pain: R-Flacc Pain Rating Scale For Children With Developmental Disability
1 page
3D Building Design for Students
No ratings yet
3D Building Design for Students
12 pages
Maharaja Agrasen College Pocket Guide For Freshies 2025 - Siddharth Mahajan
No ratings yet
Maharaja Agrasen College Pocket Guide For Freshies 2025 - Siddharth Mahajan
15 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Application 22290050 57
No ratings yet
Application 22290050 57
34 pages
MLLecture 1
No ratings yet
MLLecture 1
56 pages
Dbscan and Optics
No ratings yet
Dbscan and Optics
28 pages
AI 900 Microsoft Azure AI Fundamentals 7a1efce5d1
No ratings yet
AI 900 Microsoft Azure AI Fundamentals 7a1efce5d1
4 pages
Chapter 9 Clustering
No ratings yet
Chapter 9 Clustering
6 pages
Clustering Part2
No ratings yet
Clustering Part2
40 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Unit 5
No ratings yet
Unit 5
10 pages
LM Asante Twi Section 2 T Version
No ratings yet
LM Asante Twi Section 2 T Version
20 pages
Enhancement of Qualities of Clusters by Eliminating Outlier For Data Mining Application in Education
No ratings yet
Enhancement of Qualities of Clusters by Eliminating Outlier For Data Mining Application in Education
27 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
31 pages
Presentation On Clustering Algorithms
No ratings yet
Presentation On Clustering Algorithms
43 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Original Points A Partitional Clustering
No ratings yet
Original Points A Partitional Clustering
50 pages
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
No ratings yet
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
11 pages
Lecture 8
No ratings yet
Lecture 8
56 pages
Reaction Paper On BFR Clustering Algorithm
No ratings yet
Reaction Paper On BFR Clustering Algorithm
5 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Fuzzy C-Means - Review
No ratings yet
Fuzzy C-Means - Review
3 pages
Clustering
No ratings yet
Clustering
12 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
Scalable Fuzzy Clustering Method
No ratings yet
Scalable Fuzzy Clustering Method
12 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
50 pages
Multicultural Identity and Ecocentrism
No ratings yet
Multicultural Identity and Ecocentrism
13 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
660 For Upload AY 2021 2022 2
No ratings yet
660 For Upload AY 2021 2022 2
64 pages
Clustering
No ratings yet
Clustering
45 pages
Computer Science Exam Prep
No ratings yet
Computer Science Exam Prep
7 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
PR Assignment 02 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 02 - Seemal Ajaz (206979)
5 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
English 3: Unit 1 - My Friends Lesson
No ratings yet
English 3: Unit 1 - My Friends Lesson
12 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
Arithmatic Circuit
No ratings yet
Arithmatic Circuit
7 pages
Model 3
No ratings yet
Model 3
31 pages
IJSRD - International Journal For Scientific Research & Development - Vol. 8, Issue 3, 2020 - ISSN (Online) - 2321-0613
No ratings yet
IJSRD - International Journal For Scientific Research & Development - Vol. 8, Issue 3, 2020 - ISSN (Online) - 2321-0613
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Kepuasan Pemustaka di Perpustakaan Yogyakarta
No ratings yet
Kepuasan Pemustaka di Perpustakaan Yogyakarta
19 pages
Unit 5 DWM by DR KSR Cluster Analysis
No ratings yet
Unit 5 DWM by DR KSR Cluster Analysis
72 pages
Learner Profile Brochure
No ratings yet
Learner Profile Brochure
3 pages
M6
No ratings yet
M6
23 pages
Lecture Notes For Chapter 9: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 9: by Tan, Steinbach, Kumar
37 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
4 Clustering
No ratings yet
4 Clustering
21 pages
Cognitive Learning Strategies Guide
No ratings yet
Cognitive Learning Strategies Guide
1 page
Strategies and Algorithms For Clustering Large Datasets: A Review
No ratings yet
Strategies and Algorithms For Clustering Large Datasets: A Review
20 pages
Comparative Analysis of Clustering Techniques
No ratings yet
Comparative Analysis of Clustering Techniques
13 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 26-Apr-2021 Clustering
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 26-Apr-2021 Clustering
43 pages
Cluster Analysis Techniques Explained
No ratings yet
Cluster Analysis Techniques Explained
84 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
49 pages
Clustering Algorithms in Data Mining
No ratings yet
Clustering Algorithms in Data Mining
46 pages

CURE

Uploaded by

CURE

Uploaded by

CURE: An Efficient Clustering

Algorithm for Large Databases

• CURE is a hierarchical clustering

• At each step in agglomerative clustering the

You might also like