Lecture 6

The document discusses clustering as an unsupervised learning method that groups similar data points into clusters, highlighting its goal of organizing data for better insights. It differentiates clustering from classification, outlines types of clustering methods, and details hierarchical clustering approaches, particularly agglomerative clustering. Additionally, it explains the Silhouette Score as a metric for evaluating clustering performance and introduces the R-index for comparing distances within and between clusters.

Uploaded by

vikrammadhad2446

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views42 pages

Lecture 6

Uploaded by

vikrammadhad2446

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

AIML

Dr. Nitin A. Shelke

Clustering
• Clustering (An unsupervised learning method ): It is a technique in which a
set of objects or points with similar characteristics are grouped together in
clusters.
• Clustering is the task of dividing the population or data points into a number
of groups such that data points in the same groups are more similar to other
data points in the same group and dissimilar to the data points in other
groups.
• The aim of cluster analysis is to organize observed data into meaningful
structures in order to gain further insight from them
Clustering
Difference between Clustering and
Classification
• Uses Unsupervised machine learning
• Clustering uses Unlabeled Data as an Input
• The Output is unknown
• There is no target variable in clustering
Goal of Clustering
Types of Clustering Methods
• Centroid-based Clustering (Partitioning methods) (Already Covered in SML)
• Connectivity-based Clustering (Hierarchical clustering)
• Density-based Clustering (Model-based methods)
What is Hierarchical Clustering?
Hierarchical clustering is another unsupervised learning
algorithm that is used to group together the unlabeled data
points having similar characteristics.
Hierarchical Clustering Approaches
• Agglomerative hierarchical algorithms − In agglomerative hierarchical
algorithms, each data point is treated as a single cluster and then successively
merge or agglomerate (bottom-up approach) the pairs of clusters. The
hierarchy of the clusters is represented as a dendrogram or tree structure.
• Divisive hierarchical algorithms − On the other hand, in divisive hierarchical
algorithms, all the data points are treated as one big cluster and the process
of clustering involves dividing (Top-down approach) the one big cluster into
various small clusters. Only in theoretical use.
Hierarchical Clustering Approaches
Agglomerative Clustering Algorithm
Agglomerative Clustering
Agglomerative Clustering
Typical Alternatives to Calculate the
Distance Between Clusters
Example of Aglomerative Clustering with
Single linkage method
Example of Aglomerative Clustering with
Complete Linkage method
Comparison
Linkage Criteria Supported in Sklearn
• The AgglomerativeClustering object performs a hierarchical clustering using a bottom up approach: each
observation starts in its own cluster, and clusters are successively merged together. The linkage criteria
determines the metric used for the merge strategy:
• Maximum or complete linkage minimizes the maximum distance between observations of pairs of
clusters.
• Single linkage minimizes the distance between the closest observations of pairs of clusters.
• Average linkage minimizes the average of the distances between all observations of pairs of clusters.
• Ward minimizes the sum of squared differences within all clusters. It is a variance-minimizing approach
and in this sense is similar to the k-means objective function but tackled with an agglomerative hierarchical
approach.
What is the Silhouette Score?
• The silhouette coefficient is a metric that measures how well each
data point fits into its assigned cluster.
• It combines information about both the cohesion (how close a data
point is to other points in its own cluster) and the separation (how far
a data point is from points in other clusters) of the data point.
Silhouette Score

Silhouette Coefficient = (b - a) / max(a, b)

• a denotes the mean intra-cluster

distance
• b denotes the mean nearest-
cluster distance for each sample
What is the Silhouette Score?
• The Silhouette Score evaluates clustering performance by measuring
how similar a sample is to its own cluster (cohesion) compared to
other clusters (separation). It ranges from -1 to 1:
• 1 → Perfectly clustered (well-separated clusters).
• 0 → Overlapping clusters (not well-defined).
• Negative → Incorrect clustering (samples assigned to the wrong cluster).
Calculating the Silhouette Coefficient
1. For each data point, calculate two values:
— Average distance to all other data points within the same cluster
(cohesion).
— Average distance to all data points in the nearest neighboring cluster
(separation).
2. Compute the silhouette coefficient for each data point using the formula:
silhouette coefficient = (separation — cohesion) / max(separation, cohesion)
3. Calculate the average silhouette coefficient across all data points to obtain
the overall silhouette score for the clustering result.
R Index
The R-index is calculated by comparing the average distance between
points in the same cluster to the average distance between the nearest
cluster.

Clustering
No ratings yet
Clustering
7 pages
Cluster Analysis I: Presidency University
No ratings yet
Cluster Analysis I: Presidency University
98 pages
Merge Sort
No ratings yet
Merge Sort
9 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
93 pages
Silhouette (Clustering) : Method
No ratings yet
Silhouette (Clustering) : Method
7 pages
Comparative Analysis of Clustering Techniques
No ratings yet
Comparative Analysis of Clustering Techniques
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Cluster Analysis & Methods Guide
No ratings yet
Cluster Analysis & Methods Guide
11 pages
Solving Transportation Problem Using Vogel's Approximation Method, Stepping Stone Method & Modified Distribution Method
No ratings yet
Solving Transportation Problem Using Vogel's Approximation Method, Stepping Stone Method & Modified Distribution Method
38 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
82 pages
UNIT5
No ratings yet
UNIT5
60 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
MCA-4sem Computer Graphics and Animation
No ratings yet
MCA-4sem Computer Graphics and Animation
34 pages
Cluster
100% (1)
Cluster
72 pages
Ant Colony Optimization Explained
100% (1)
Ant Colony Optimization Explained
13 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Dit FFT
100% (1)
Dit FFT
18 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering
No ratings yet
Clustering
29 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
69 pages
Numerical Solutions for Linear Equations
No ratings yet
Numerical Solutions for Linear Equations
9 pages
Gauss Seidel Iterative Method
No ratings yet
Gauss Seidel Iterative Method
8 pages
Numerical Methods Course Overview
No ratings yet
Numerical Methods Course Overview
7 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
DM 4
No ratings yet
DM 4
76 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
LABEX4 Solved DSP
No ratings yet
LABEX4 Solved DSP
16 pages
Filter Enhancements
No ratings yet
Filter Enhancements
5 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Clustering
No ratings yet
Clustering
6 pages
A New Hierarchical Clustering Algorithm
No ratings yet
A New Hierarchical Clustering Algorithm
5 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
2018 7TH Sem Elective Stream 1
No ratings yet
2018 7TH Sem Elective Stream 1
11 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Linear Filters
No ratings yet
Linear Filters
41 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Bucket Sort Algorithm Guide
No ratings yet
Bucket Sort Algorithm Guide
4 pages
Clustering
No ratings yet
Clustering
38 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
Byte Stuffing
No ratings yet
Byte Stuffing
6 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Clustering
No ratings yet
Clustering
20 pages
Maximize Bakery Profit with Linear Programming
No ratings yet
Maximize Bakery Profit with Linear Programming
14 pages
B.Tech EEE Optimization Exam
No ratings yet
B.Tech EEE Optimization Exam
1 page
Lab Manual ADS
No ratings yet
Lab Manual ADS
55 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
66 pages
SARSA, Expected SARSA, Q-Learning
No ratings yet
SARSA, Expected SARSA, Q-Learning
4 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
Clustering
No ratings yet
Clustering
44 pages
Image With GAN-topic
No ratings yet
Image With GAN-topic
20 pages
Mathematics-I M101: Narula Institute of Technology
No ratings yet
Mathematics-I M101: Narula Institute of Technology
4 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Clustering New
No ratings yet
Clustering New
6 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
DC Unit 5
No ratings yet
DC Unit 5
7 pages
Sce5401 Ay21-22-S2 Tutr7-Sol (R0)
No ratings yet
Sce5401 Ay21-22-S2 Tutr7-Sol (R0)
7 pages
Unit 4
No ratings yet
Unit 4
106 pages
Lab 5
No ratings yet
Lab 5
5 pages
Course Details
No ratings yet
Course Details
1 page
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Assignment 6
No ratings yet
Assignment 6
3 pages
ANN Question Paper 2022
No ratings yet
ANN Question Paper 2022
4 pages
Unit Iv DM
No ratings yet
Unit Iv DM
15 pages
Unit 4 Mining
No ratings yet
Unit 4 Mining
12 pages
RegEx and FSM Questions
No ratings yet
RegEx and FSM Questions
2 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Term ML
No ratings yet
Term ML
9 pages
Cluster Analysis: Kaushik B
No ratings yet
Cluster Analysis: Kaushik B
41 pages
LP 3 - Solving MILPs With PuLP
No ratings yet
LP 3 - Solving MILPs With PuLP
7 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
UNIT II Part-1
No ratings yet
UNIT II Part-1
59 pages
DWM 4
No ratings yet
DWM 4
14 pages
Model 3
No ratings yet
Model 3
31 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
Cluster Analysis
No ratings yet
Cluster Analysis
37 pages
Unit IV
No ratings yet
Unit IV
6 pages
Disk and File
No ratings yet
Disk and File
43 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
Problem Statement - Employees Database Management System
No ratings yet
Problem Statement - Employees Database Management System
1 page
12 Classical Synchronization Problems
No ratings yet
12 Classical Synchronization Problems
34 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
L11 Disjoint Set Kruskal's Algorithm
No ratings yet
L11 Disjoint Set Kruskal's Algorithm
23 pages
Syllabus
No ratings yet
Syllabus
2 pages
Cloud Computing
No ratings yet
Cloud Computing
23 pages
Question Set
No ratings yet
Question Set
1 page
Lab1 Linear Regression and Polynomial Regression
No ratings yet
Lab1 Linear Regression and Polynomial Regression
2 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages

Lecture 6

Uploaded by

Lecture 6

Uploaded by

AIML

Dr. Nitin A. Shelke

Silhouette Coefficient = (b - a) / max(a, b)

• a denotes the mean intra-cluster

You might also like