0% found this document useful (0 votes)

26 views43 pages

Presentation On Clustering Algorithms

Uploaded by

afifrafsan111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views43 pages

Presentation On Clustering Algorithms

Uploaded by

afifrafsan111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Presentation on

Clustering Algorithms

23 May 2024 1
Presented to:

Dr. Tushar Kanti Saha

Professor
Dept. of Computer Science and Engineering
Jatiya Kabi Kazi Nazrul Islam University

Presented by:
A.K.M Mahfuzur Rahman
Roll: 231020104
Reg: 7724
Session: MS:2022-2023
Dept. of Computer Science and Engineering
Jatiya Kabi Kazi Nazrul Islam University

23 May 2024 2
Overview

BIRCH

CURE

CHAMELEON

23 May 2024 3
BRICH Agendas:
 What is Birch ?
1.
 Data Clustering
BIRCH  BIRCH Goal
 Example -1,2,3,4

23 May 2024 4
Balance

Hierarchies Iterative
BIRCH BIRCH

Clustering Reducing

23 May 2024 5
 Birch is based on the notation of Clustering Feature a CF
What is Tree.
BIRCH ?  CF tree is a height balanced tree that stores the clustering
features for a hierarchical clustering.

23 May 2024 6
Data  It is portioning of a database into clusters.
 Closely packed group.
Clustering
 Collection of data objects that are similar to one another
and treated collectively in a group.

23 May 2024 7
Clustering
Treat dense areas
decision made
as one and
without scanning
reduce noise
whole data

BIRCH
Goals Minimizes
running time and
Exploit the non
uniformity of the
data scans data

23 May 2024 8
The tree cluster of data points as CF is represented by three
numbers (N, LS, SS).

BIRCH N = number of LS = vector SS = sum of

items in sub- sum of the data the squared
clusters points data points

23 May 2024 9
Phase 1: Scan all data and build an initial in-memory CF tree,
using the given amount of memory and recycling space on
disk.
Algorithm Phase 2: Condense into desirable length by building a smaller
CF tree.
Phase 3: Global clustering.
Phase 4: Cluster refining - this is optional, and requires more
passes over the data to refine the results.

23 May 2024 10
𝑛 𝑛 2
LS= 𝑖=1 𝑥𝑖 𝑎𝑛𝑑 𝑆𝑆 = 𝑖=1 𝑥𝑖

Consider a cluster C1={3,5,2,8,9,1} then

Example-1 CF(C1)=(6,28,184)
where n=6,
LS=3+5+2+8+9+1=28 and
SS=3²+5²+2²+8²+9²+1²=184

23 May 2024 11
Another example with 2-D objects,
Given that,
C2={(1,1),(2,1),(3,2)}
So,
Example-2 CF(C2)=(3,(6,4),(14,6))
where n=3,

LS = (1+2+3,1+1+2) = (6,4)
SS = (1²+2²+3², 1²+1²+2²) =(14,6)

23 May 2024 12
Another important property of the CFs is that they are additive. That
is, two disjoint clusters C1 and C2 with CFs CF1=(n1,LS2,SS1) and
CF2=(n2,LS2,SS2) respectively.
The CF of the cluster formed by merging C1 and C2 is given as,
CF1+CF2=(n1+n2,LS1+LS2,SS1+SS2)

C1={(2,5),(3,2),(4,3)} and
Example-3 C2={(1,1),(2,1),(3,1)}
Then,
CF1 = (3,(2+3+4,5+2+3), (2²+3²+4²,5²+2²+3²)) = (3,(9,10),(29,38))
and CF2 = (3,(1+2+3,1+1+1),(1²+2²+3²,1²+1²+1²)) = (3,(6,3),(14,3))
Now,
if C3 = C1UC2 then
CF3 = CF1+CF2 = (6,(15,13),(43,41))

23 May 2024 13
𝐿𝑆
Cluster’s Centroid, X0 = 𝑛

𝑛 2
𝑥𝑖 −𝑋0 𝑛 𝑆𝑆 −2𝐿𝑆2−𝑛(𝐿𝑆)
Cluster’s Radius, R= 𝑖=1
=
Formula 𝑛 𝑛2

𝑛 𝑛 2
𝑗=1(𝑥𝑖−𝑥𝑗 )
2
𝑖=1 2𝑛 𝑆𝑆 −2 𝐿𝑆
Cluster’s Diameter, D= =
𝑛(𝑛−1) 𝑛(𝑛−1)

23 May 2024 14
Apply BIRCH to cluster the given dataset. The dataset D:{
(3,4) (2,6) (4,5) (4,7) (3, 8), (6, 2), (7, 2), (7, 4), (8, 4), (7, 9)}.
The branching factor, B= 2, the maximum number of sub-
clusters at each leaf node, L= 2, and the threshold on the
diameter of sub-clusters stored in the leaf nodes is 1.5.
• For each data point, find the Radius and CF.
• Consider data point x₁ = (3,4)
• It is alone in the feature map. So,
• Radius = 0
Example-4 • Cluster Feature CF1<N, LS, SS>=<1, (3, 4), (9, 16)>
• Create the leaf node with data point x₁ = (3,4) and branch as
CFI.

CF1<1,(3,4),(9,16)>

Leaf
X1=(3,4)

23 May 2024 15
For each data point, find the Radius and CF.
 Consider data point x2 = (2,6):
1. Linear Sum, LS= (3, 4)+(2, 6) = (5,10)
2. Squared Sum, SS = (22+9, 62+16) = (13,52); N=2 CF1<2,(5,10),(13,52)>

𝐿𝑆2 (5,10)2
𝑆𝑆 (13,52)
• Radius= − 𝑁
= − 2
= (0.5, 1) Leaf
𝑁 𝑁 2 2
X1=(3,4)
X2=(2,6)
• R (0.5, 1)<(T, T) -->True
• So, x2 = (2,6) will cluster with leaf x1 = (3,4).
3. Cluster Feature CF1 <N, LS, SS>=<2, (5, 10), (13, 52)>

23 May 2024 16
For each data point, find the Radius and CF.
 Consider data point x3 = (4,5):
1. Linear Sum, LS= (5, 10)+(4, 5) = (9,15)
CF1<3, (9, 15), (29, 77)>
2. Squared Sum, SS = (42 + 13, 52+52) = (29,77); N=3

𝐿𝑆2 (9,15)2
𝑆𝑆 (29,77) Leaf
• Radius= 𝑁
− 𝑁
𝑁
= 3
− 3
3
= (0.47, 0.47) X1=(3,4)
X2=(2,6)
X3=(4,5)
• R (0.47, 0.47)<(T, T) -->True
• So, x3 = (4,5) will cluster with leaf x1 and x2
3. Cluster Feature CF1 <N, LS, SS>=<3, (9, 15), (29, 77)>

23 May 2024 17
Similarly

CF1<5, (16, 30), (54, 190)>

Leaf
X1=(3,4)
X2=(2,6)
X3=(4,5)
x4=(4,7)
X5=(3,8)

23 May 2024 18
 Consider data point x6 = (6,2):
1. Linear Sum, LS= (16, 30)+(6, 2) = (22,32)
2. Squared Sum, SS = (62 + 54, 22+190) = (90,194); N=6

𝐿𝑆2 (22,32)2
𝑆𝑆 (90,194)
• Radius= 𝑁
− 𝑁
𝑁
= 6
− 6
6
= (1.24, 1.97)

• R (1.24, 1.97)<(T, T) -->False

• So, x6 = (6,2) will cluster in deferent branch
3. Cluster Feature CF2 <N, LS, SS>= <1, (6, 2), (36, 4)>

CF1<5, (16, 30), (54, 190)> CF2<1, (6, 2), (36, 4)>

Leaf Leaf
X1=(3,4) X6=(6,2)
X2=(2,6)
X3=(4,5)
X4=(4,7)
23 May 2024 X5=(3,8) 19
 For data point x7 = (7,2). Two Branches B1 for CF1 and B2 for CF2 exists. Find x,
closes to CF1 or CF2. Then find the Radius.
𝐿𝑆 (16,30) 𝐿𝑆 (6,2)
CF1= 𝑁 = 5 = (3.2, 6) CF2= 𝑁 = 1 = (6, 2)-> is close to x7
CF2 will be in consider
1. Linear Sum, LS= (6, 2)+(7, 2) = (13,4)
2. Squared Sum, SS = (72 + 6, 22+2) = (85,8); N=2
𝐿𝑆2 (13,4)2
𝑆𝑆 (85,8)
• Radius= − 𝑁
= − 2
= (0.5, 0)
𝑁 𝑁 2 2

• R (0.5, 0)<(T, T) -->True

• So, x7 = (7,2) will cluster with x6
3. Cluster Feature CF2 <N, LS, SS>= <2, (13, 4), (85, 8)>

CF1<5, (16, 30), (54, 190)> CF2<2, (13, 4), (85, 8)>

Leaf
Leaf
X1=(3,4)
X6=(6,2)
X2=(2,6)
X7=(7,2)
X3=(4,5)
X4=(4,7)
X5=(3,8)
23 May 2024 20
Similarly

CF1<5, (16, 30), (54, 190)> CF2<4, (28,12), (198, 40)>

Leaf
Leaf
X6=(6,2)
X1=(3,4)
X7=(7,2)
X2=(2,6)
X8=(7,4)
X3=(4,5)
X9=(8,4)
x4=(4,7)
X5=(3,8)

23 May 2024 21
 For data point x10 = (7,9). Two Branches B1 for CF1 and B2 for CF2
exists. Find x, closes to
CF1 or CF2. Then find the Radius.
𝐿𝑆 (16,30) 𝐿𝑆 (28,12)
CF1= 𝑁 = 5 = (3.2, 6) CF2= 𝑁 = 4 = (7, 3)

CF1 will be in consider

1. Linear Sum, LS= (16, 30)+(7, 9) = (23,39)

2. Squared Sum, SS = (72 + 54, 92+190) = (103,271); N=6
𝐿𝑆2 (23,39)2
𝑆𝑆 (103,271)
• Radius= − 𝑁
= − 6
= (1.57, 1.7)
𝑁 𝑁 6 6
• R (1.57, 1.7)<(T, T) -->False and L=5
• So, x10 = (7,9) cannot cluster with CF1

23 May 2024 22
As branching factor is 2, cannot create another branch, So we have to another parent.

CF12<9, (44, 42), (252, 230)> CF3<1, (7, 9), (49, 81)>

CF1<5, (16, 30), (54, 190)> CF2<4, (28,12), (198, 40)> CF3<1, (7, 9), (49, 81)>

Leaf Leaf Leaf

X1=(3,4) X6=(6,2) X10=(7,9)
X2=(2,6) X7=(7,2)
X3=(4,5) X8=(7,4)
x4=(4,7) X9=(8,4)
X5=(3,8)

23 May 2024 23
CURE Agendas:
2.  What is CURE ?

CURE  Structure
 Algorithm
 Example
OUTLIERS

CLUSTERS

23 May 2024 24
Clustering

CURE CURE

Representatives Using

23 May 2024 25
 It is a hierarchical based clustering technique, that adopts a
middle ground between the centroid based and the all-point
extremes.

 It is useful for discovering groups and identifying

What is interesting distributions in the underlying data.

CURE?  Instead of using one point centroid, as in most of data

mining algorithms, CURE uses a set of well-defined
representative points, for efficiently handling the clusters
and eliminating the outliers.

23 May 2024 26
Data
Draw Random Partition
Sample Sample

Partially
Cluster
Partitions
Structure
Eliminations of
Outliers

Clusters
Label Data on Cluster Partial
Disk Clusters

23 May 2024 27
Phase 1: Begin with a large dataset D consisting of n data
points.
Phase 2: Randomly select a sample of c points from the dataset
D where c<<n. Sample should be representative of the entire
dataset.
Phase 3: Use a hierarchical clustering method (e.g., single-link,
complete-link, or average-link) on the sample to form an initial
set of clusters. This is typically done until a desired number of
clusters k is reached.

Algorithm Phase 4: For each cluster obtained, select a fixed number of

representative points r. These points are chosen to be as far apart
as possible to capture the shape and extent of the cluster.
Phase 5: For each cluster, move the representative points
towards the mean of the cluster by a fraction α. This step helps
to avoid the influence of outliers.
Phase 6: Repeat the merging process for the remaining clusters
until the desired number of clusters is achieved.
Phase 7: Assign the remaining non-sampled points in D to the
nearest cluster using the representative points.

23 May 2024 28
Example

23 May 2024 29
CURE is designed to efficiently process large datasets.

CURE is designed to efficiently process large datasets.

CURE reduces the computational complexity without
significantly compromising the quality of the clustering.

Advantages
CURE is relatively straightforward to implement

Flexibility in cluster shapes.

23 May 2024 30
Although subsampling helps reduce complexity, the
initial phase of clustering a large sample can still be
computationally intensive, especially for very large
datasets.

Too few points may not capture cluster shape

Disadvantages accurately, while too many points can increase
computational costs.

As CURE is designed to handle large datasets,

extremely large-scale applications might still face
scalability issues.

23 May 2024 31
CHAMELEON Agendas:
 What is CHAMELEON ?
 Framework of CHAMELEON
3.
 Phase of CHAMELEON
CHAMELEON  Advantages

23 May 2024 32
 Chameleon is a hierarchical clustering algorithm
that uses dynamic modeling to decide the
similarity among pairs of clusters.

What is  It was changed based on the observed

weaknesses of two hierarchical clustering
Chameleon? algorithms such as ROCK and CURE.

 In Chameleon, cluster similarity is assessed

depending on how well-connected objects are
inside a cluster and on the proximity of clusters.

23 May 2024 33
Construct (K-NN)
Sparse Graph

Data Set

Partition the Graph

Framework

Final Clusters

Fig: Overall framework for CHAMELEON.

23 May 2024 34
A Two-phase Clustering Algorithm.

Phase 1: Finding Initial Sub-clusters. The first phase

is graph partitioning, which allows data items to be
clustered into a large number of sub-clusters.

Phase…

(a) (b)
Fig: An example of the bisections produced by multilevel graph
partitioning algorithms on two spatial data sets.

23 May 2024 35
Phase 2: Merging Sub-Clusters using a Dynamic
Framework . It employs an agglomerative hierarchical
clustering method to look for real clusters that may be
merged with the sub-clusters that are generated.

 Two different schemes have been implemented in

CHAMELEON to employ an agglomerative
hierarchical clustering method.
Phase
1. Merges those pairs of clusters whose
relative inter-connectivity and relative
closeness are both above some user
specified threshold.

2. Combine the relative inter-connectivity

and relative closeness. then selects to
merge the pair of clusters that maximizes
this function.

23 May 2024 36
 Ci and Cj are two clusters

RIC= (Absolute IC(Ci,Cj))/‫(‏‬Internal IC(Ci)+Internal IC(Cj))/2

Relative where Absolute IC(Ci,Cj)= sum of weights of

edges that connect Ci with Cj.
Inter
Connectivity Internal IC(Ci) = weighted sum of edges
that partition the cluster into roughly equal
parts.

23 May 2024 37
 Absolute closeness normalized with
respect to the internal closeness of the
two clusters.

Relative  Absolute closeness got by average

Closeness similarity between the points in Ci that
are connected to the points in Cj.

23 May 2024 38
 Internal closeness of the cluster got by
average of the weights of the edges in
the cluster.

Internal
Closeness
 Using them,

23 May 2024 39
 If the relative inter-connectivity measure
relative closeness measure are same,
choose inter-connectivity.
Merging
 Can also use,
the
Clusters RI (Ci , C j )≥T(RI) and RC(C i,C j ) ≥ T(RC)

23 May 2024 40
 Allows it to adapt to the natural shapes and
densities of clusters in the data.

 Can handle large dataset effectively.

Advantages  The margin and refinement processes enhance

the quality of data.

23 May 2024 41
Any Questions?

23 May 2024 42
Thank You

23 May 2024 43

CNAS (PS-DBM) June 13, 2025
No ratings yet
CNAS (PS-DBM) June 13, 2025
5 pages
Salesforce Developer Cheat Sheet
No ratings yet
Salesforce Developer Cheat Sheet
2 pages
Top Strategic Technology Trends For 2022 Cybersecurity Mesh
No ratings yet
Top Strategic Technology Trends For 2022 Cybersecurity Mesh
14 pages
Hierarchical ClusteringAlgorithm
No ratings yet
Hierarchical ClusteringAlgorithm
32 pages
BIRCH Algorithm
No ratings yet
BIRCH Algorithm
8 pages
Clustering Part2
No ratings yet
Clustering Part2
40 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
Week 10
No ratings yet
Week 10
84 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
Lesson 3.6 - Supervised Learning Neural Networks
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks
35 pages
Birch
No ratings yet
Birch
30 pages
Clustering Part 2
No ratings yet
Clustering Part 2
28 pages
Clustering
No ratings yet
Clustering
45 pages
Enli
No ratings yet
Enli
19 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Efficient Clustering Algorithm For Large Database
No ratings yet
Efficient Clustering Algorithm For Large Database
25 pages
4.6 Birch
No ratings yet
4.6 Birch
21 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Clustering Data Streams: Chun Wei Dept Computer & Information Technology Advisor: Dr. Sprague
No ratings yet
Clustering Data Streams: Chun Wei Dept Computer & Information Technology Advisor: Dr. Sprague
19 pages
Heirarchical Clustering
No ratings yet
Heirarchical Clustering
22 pages
Balanced Iterative Reducing and Clustering Using Hierarchies
No ratings yet
Balanced Iterative Reducing and Clustering Using Hierarchies
33 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
DOC-20231118-WA0008new Unit 5
No ratings yet
DOC-20231118-WA0008new Unit 5
15 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
24 pages
BIRCH: A New Data Clustering Algorithm and Its Applications
No ratings yet
BIRCH: A New Data Clustering Algorithm and Its Applications
42 pages
Birch
No ratings yet
Birch
6 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Lecture 8
No ratings yet
Lecture 8
56 pages
Cluster
100% (1)
Cluster
72 pages
Unit 5
No ratings yet
Unit 5
10 pages
Lecture 18
No ratings yet
Lecture 18
27 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
50 pages
Birch Clustering
No ratings yet
Birch Clustering
11 pages
Module 5
No ratings yet
Module 5
43 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Intro to Clustering Methods
No ratings yet
Intro to Clustering Methods
39 pages
Data Mining
No ratings yet
Data Mining
4 pages
Unit-4 Notes
No ratings yet
Unit-4 Notes
16 pages
Balanced Iterative Reducing and Clustering Using Hierarchies
No ratings yet
Balanced Iterative Reducing and Clustering Using Hierarchies
28 pages
Lecture 13
No ratings yet
Lecture 13
45 pages
A Cluster Ensemble Framework Based On Three-Way Decisions
No ratings yet
A Cluster Ensemble Framework Based On Three-Way Decisions
11 pages
UNIT5
No ratings yet
UNIT5
60 pages
Clustering
No ratings yet
Clustering
28 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
ML Module Iv
No ratings yet
ML Module Iv
27 pages
ML - 8
No ratings yet
ML - 8
70 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Lecture-02 Unsupervised Learning Algorithm (Clustering)
No ratings yet
Lecture-02 Unsupervised Learning Algorithm (Clustering)
60 pages
Asit Kumar Das - M4 BDA Clustering
No ratings yet
Asit Kumar Das - M4 BDA Clustering
99 pages
Week 6 AM Slides
No ratings yet
Week 6 AM Slides
39 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Birch
No ratings yet
Birch
17 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
55 pages
CURE
No ratings yet
CURE
14 pages
Chapter 9 Clustering
No ratings yet
Chapter 9 Clustering
6 pages
MIL Module 2
No ratings yet
MIL Module 2
2 pages
ECT426 M5 Ktunotes - in
No ratings yet
ECT426 M5 Ktunotes - in
34 pages
AI Documentary Project Plan
No ratings yet
AI Documentary Project Plan
5 pages
Corporate Brochure
No ratings yet
Corporate Brochure
6 pages
Datasheet 1325904 Genius SP HF 800a 20 PC Speaker Corded 20 W Wood Black
No ratings yet
Datasheet 1325904 Genius SP HF 800a 20 PC Speaker Corded 20 W Wood Black
1 page
General Terminal Commands::cd:pwd
No ratings yet
General Terminal Commands::cd:pwd
19 pages
Phev Technical
100% (1)
Phev Technical
22 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
4 pages
Cómo Escribir Un Ensayo Paso A Paso
100% (1)
Cómo Escribir Un Ensayo Paso A Paso
7 pages
The Importance and Applications of Data Compression
No ratings yet
The Importance and Applications of Data Compression
4 pages
Barani Institute of Management Sciences: Final-Term Exam Fall-2019
No ratings yet
Barani Institute of Management Sciences: Final-Term Exam Fall-2019
2 pages
Type DHM9B (Digital) Load Cell: Short Description
100% (1)
Type DHM9B (Digital) Load Cell: Short Description
2 pages
Real-ESRGAN: Synthetic Data Super-Resolution
No ratings yet
Real-ESRGAN: Synthetic Data Super-Resolution
10 pages
Resuume
No ratings yet
Resuume
2 pages
Fiche Technique SAP
No ratings yet
Fiche Technique SAP
7 pages
Versa Training Lab Guide: Groups 1 - 2
No ratings yet
Versa Training Lab Guide: Groups 1 - 2
20 pages
AIML Lab: Regression Models Guide
No ratings yet
AIML Lab: Regression Models Guide
7 pages
Other Planes of There Selected Writings Renée Green - The Complete Ebook Is Available For Download With One Click
100% (5)
Other Planes of There Selected Writings Renée Green - The Complete Ebook Is Available For Download With One Click
50 pages
Keyboard Scan Codes: Set 2: 101-, 102-, and 104-Key Keyboards
No ratings yet
Keyboard Scan Codes: Set 2: 101-, 102-, and 104-Key Keyboards
2 pages
fОвщвдвлв
No ratings yet
fОвщвдвлв
77 pages
Ddco Question Bank
No ratings yet
Ddco Question Bank
1 page
IND AS 115: Revenue Recognition Guide
No ratings yet
IND AS 115: Revenue Recognition Guide
21 pages
PS.2024.C3.Corte1.Pruebas de Integracion.223204.GallegosBorraz
No ratings yet
PS.2024.C3.Corte1.Pruebas de Integracion.223204.GallegosBorraz
6 pages
Programming Fundamentals: Lecture # 1
No ratings yet
Programming Fundamentals: Lecture # 1
42 pages
Massachusetts Institute of Technology
No ratings yet
Massachusetts Institute of Technology
3 pages
620-0216-001-NetPerformer Overview
No ratings yet
620-0216-001-NetPerformer Overview
38 pages
Lab 3 Oops
No ratings yet
Lab 3 Oops
17 pages

Presentation On Clustering Algorithms

Uploaded by

Presentation On Clustering Algorithms

Uploaded by

Presentation on

Dr. Tushar Kanti Saha

BIRCH N = number of LS = vector SS = sum of

Consider a cluster C1={3,5,2,8,9,1} then

CF1<5, (16, 30), (54, 190)>

• R (1.24, 1.97)<(T, T) -->False

• R (0.5, 0)<(T, T) -->True

CF1<5, (16, 30), (54, 190)> CF2<4, (28,12), (198, 40)>

CF1 will be in consider

1. Linear Sum, LS= (16, 30)+(7, 9) = (23,39)

Leaf Leaf Leaf

 It is useful for discovering groups and identifying

What is interesting distributions in the underlying data.

CURE?  Instead of using one point centroid, as in most of data

Algorithm Phase 4: For each cluster obtained, select a fixed number of

CURE is designed to efficiently process large datasets.

Flexibility in cluster shapes.

Too few points may not capture cluster shape

As CURE is designed to handle large datasets,

What is  It was changed based on the observed

 In Chameleon, cluster similarity is assessed

Partition the Graph

Fig: Overall framework for CHAMELEON.

Phase 1: Finding Initial Sub-clusters. The first phase

 Two different schemes have been implemented in

2. Combine the relative inter-connectivity

RIC= (Absolute IC(Ci,Cj))/‫(‏‬Internal IC(Ci)+Internal IC(Cj))/2

Relative where Absolute IC(Ci,Cj)= sum of weights of

Relative  Absolute closeness got by average

 Can handle large dataset effectively.

Advantages  The margin and refinement processes enhance

You might also like