Unsupervised Learning
Shabbeer Basha
Unsupervised Learning
• Unsupervised learning is a type of machine learning that learns
from data without human supervision.
• Unlike supervised learning, unsupervised machine learning
models are given unlabeled data and allowed to discover
patterns and insights.
• Unsupervised learning models are useful for i) organizing large
datasets into clusters ii) identifying common relationship
between samples iii) Dimensionality reduction
Unsupervised Learning Problems
• Clustering
• K-Means
• Hierarchical Clustering, etc.,
• Association Rule Mining
• Aprioi Algorithm, FP growth algorithms, etc.,
• Dimensionality reduction
• Principal Component Analysis (PCA)
• Linear Discriminant Analysis (LDA)
• Autoencoders, etc.,
Clustering
• Clustering is task of grouping a set of objects in such a way that
objects in the same group (cluster) are more similar to each other
compared to others in different groups.
• Useful for
• Organizing the data
• Understanding the hidden structure in data
• Preprocessing for further analysis
Applications
• Marketing: Customer segmentation
• Recognize communities in social networks
• Document Clustering
• Many more
Clustering Methods
•
K-Means Clustering
• K-Means clustering algorithm was proposed by MacQueen, 19671.
• Given k,
1. Randomly choose k data points (seeds) to be cluster centres.
2. Assign each data point to the closest cluster centre.
3. Re-compute the cluster centres using the current cluster members.
4. If a convergence criterion is not met, go to step 2.
1. MacQueen, James. "Some methods for classification and analysis of multivariate observations."
Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. No. 14. 1967.
Stopping/Convergence Criterion
• No re-assignments of data points to different clusters.
OR
• No (or minimum) change of cluster centroids.
OR
• Minimum decrease in the sum of squared error
Demonstration of K-Means algorithm
Image source: From Christopher Bishop text book
Similarity/distance metrics
•
K-Means Clustering Contd.
• Visualization: http://alekseynp.com/viz/k-means.html
Numerical Problem
Group the given data points
into two clusters.
Reference: Lecture Slides used from Prof. Ramesh A _Lecture 54 - IIT Roorkee
Machine Learning
First Iteration:
Randomly chose two cluster centres, say C1 = (1.0,1.0), C2 = (3.0,4.0)
Points Distance from Distance to Assigned
Centroid (1,1) Centroid (3,4) Cluster
P1 (1,1) 0 3.60555 C1
P2 (1.5,2) 1.118034 2.5 C1
P3 (3,4) 3.60555 0 C2
P4 (5,7) 7.2111 3.60555 C2
P5 (3.5,5) 4.71699 1.118034 C2
P6 (4.5,5) 5.31507 1.802776 C2
P7 (3.5,4.5) 4.30116 0.707107 C2
This completes first iteration of K-Means clustering algorithm.
Second Iteration:
Find new cluster centres:
Cluster 1 has two points, its centre is (1.25,1.5). Similarly, Cluster 2 has
five points, its cluster centre is (3.9,5.1).
Points Distance from Distance from Assigned
Centroid Centroid (3.9,5.1) Cluster
(1.25,1.5)
P1 (1,1) 0.559017 5.02195 C1
P2 (1.5,2) 0.559017 3.92046 C1
P3 (3,4) 3.05164 1.421267 C2
P4 (5,7) 6.65676 2.19545 C2
P5 (3.5,5) 4.16083 0.4123106 C2
P6 (4.5,5) 4.77624 0.608276 C2
P7 (3.5,4.5) 3.75 0.72111 C2
This completes second iteration of K-Means clustering algorithm.
• K-Means algorithm stops since there is no change in cluster
assignment for any points compared to previous iteration.
Second Problem
Point Coordinates
A1 (2,10) We are also given the information
A2 (2,6) that we need to make 3 clusters. It
A3 (11,11) means we are given K=3.
A4 (6,9)
A5 (6,4)
A6 (1,2) First, we will randomly choose 3
A7 (5,10) centroids from the given data. Let us
A8 (4,9) consider A2 (2,6), A7 (5,10), and A15
A9 (10,12) (6,11) as the centroids of the initial
A10 (7,5) clusters.
A11 (9,11)
A12 (4,6)
A13 (3,10)
A14 (3,8)
A15 (6,11)
Problem source: https://codinginfinite.com/k-means-clustering-explained-with-numerical-example/
• First Iteration:
Initial cluster centers A2 (2,6), A7 (5,10), and A15 (6,11)
Points Distance from Centroid Distance from Centroid (5,10) Distance from Centroid (6,11) Assigned
(2,6) Cluster
P1 (2,10) 4 3 4.123106 C2
P2 (2,6) 0 5 6.40312 C1
P3 (11,11) 10.29563 6.08276 5 C3
P4 (6,9) 5 1.414214 2 C2 This completes first
P5 (6,4) 4.47214 6.08276 7 C1 iteration of K-Means
clustering algorithm.
P6 (1,2) 4.123106 8.94427 10.29563 C1
P7 (5,10) 5 0 1.414214 C2
P8 (4,9) 3.60555 1.414214 2.828427 C2
P9 (10,12) 10 5.38516 4.123106 C3
P10 (7,5) 5.09902 5.38516 6.08276 C1
P11 (9,11) 8.60233 4.123106 3 C3
P12 (4,6) 2 4.123106 5.38516 C1
P13 (3,10) 4.123106 2 3.16228 C2
P14 (3,8) 2.23607 2.828427 4.24264 C1
P15 (6,11) 6.40312 1.414214 0 C3
Second Iteration:
• Cluster 1 has 6 points, Cluster 2 has 5 points, and Cluster 3 has 4 points.
• Find cluster centroid for each cluster
• New cluster centre for cluster 1: (3.833, 5.167)
• New cluster centre for cluster 2: (4, 9.6)
• New cluster centre for cluster 3: (9, 11.25)
• Second Iteration:
Points Distance from Distance from Centroid Distance from Assigned Assigned
Centroid (3.833, (4, 9.6) Centroid Cluster Cluster
5.166) (9, 11.25) (Previous
iteration)
P1 (2,10) 5.16986 2.040 7.111 C2 C2
P2 (2,6) 2.013 4.118 8.750 C1 C1
P3 (11,11) 9.241 7.139 2.016 C3 C3
P4 (6,9) 4.403 2.088 3.750 C2 C2
This completes second
P5 (6,4) C1 C1
2.461 5.946 7.846 iteration of K-Means
P6 (1,2) 4.249 8.171 12.230 C1 C1 clustering algorithm.
P7 (5,10) 4.972 1.077 4.191 C2 C2
P8 (4,9) 3.837 0.600 5.483 C2 C2
P9 (10,12) 9.204 6.462 1.250 C3 C3
P10 (7,5) 3.171 5.492 6.562 C1 C1
P11 (9,11) 7.792 5.192 0.250 C3 C3
P12 (4,6) 0.850 3.600 7.250 C1 C1
P13 (3,10) 4.904 1.077 6.129 C2 C2
P14 (3,8) 2.953 1.887 6.824 C1 C2
P15 (6,11) 6.223 2.441 3.010 C3 C2
Third Iteration:
• Cluster 1 has 5 points, Cluster 2 has 7 points, and Cluster 3 has 3 points.
• Find cluster centroid for each cluster
• New cluster centre for cluster 1: (4,4.6)
• New cluster centre for cluster 2: (4.413,9.571)
• New cluster centre for cluster 3: (10,11.333)
• Third Iteration:
Points Distance from Distance from Centroid Distance from Assigned Assigned
Centroid (4,4.6) (4.413,9.571) Centroid Cluster Cluster
(10,11.333) (Previous
iteration)
P1 (2,10) 5.758 2.186 8.110 C2 C2
P2 (2,6) 2.441 4.165 9.615 C1 C1
P3 (11,11) 9.485 7.004 1.054 C3 C3
P4 (6,9) 4.833 1.943 4.631 C2 C2
This completes third
P5 (6,4) C1 C1
2.088 5.872 8.353 iteration of K-Means
P6 (1,2) 3.970 8.197 12.966 C1 C1 clustering algorithm.
P7 (5,10) 5.492 0.958 5.175 C2 C2
P8 (4,9) 4.400 0.589 6.438 C2 C2
P9 (10,12) 9.527 6.341 0.667 C3 C3
P10 (7,5) 3.027 5.390 7.008 C1 C1
P11 (9,11) 8.122 5.063 1.054 C3 C3
P12 (4,6) 1.400 3.574 8.028 C1 C1
P13 (3,10) 5.492 1.221 7.126 C2 C2
P14 (3,8) 3.544 1.943 7.753 C2 C2
P15 (6,11) 6.705 2.343 4.014 C2 C2
• K-Means algorithm stops since there is no change in cluster
assignment for any points compared to previous iteration.
Exercise Question 01: Consider the following dataset
consisting of points in a 2-dimensional space:
A(2,3),B(4,5),C(6,7),D(1,8)
Apply the K-means algorithm with k=2 to cluster these
points.
—--------------------------------------------------------------------------
------
Exercise Question 02: Consider the following dataset
consisting of points in a 2-dimensional space:
A(1,2),B(3,4),C(5,6),D(3,5),E(9,10), F(1,5)
Apply the K-means algorithm with k=3 to cluster these
points.
Elbow method
Dimensionality Reduction
High Dimensional data
• High dimension = lot of features
• Document classification
• Features per document = thousands of words, contextual information, etc.,
• Surveys – Netflix
• 480189 users 17770 movies
Image source: https://www.cs.cmu.edu/~mgormley/courses/10701-f16/slides/lecture14-pca.pdf
Principal Component Analysis (PCA)
• Principal Component Analysis (PCA) is popular unsupervised learning
technique for extracting hidden (potentially lower dimensional)
structure from higher dimensional datasets.
Useful for
• Visualization
• Compressing data
• Noise reduction
Principal Component Analysis (PCA)
• Originated from the work by Pearson(1901).
• Its purpose is to derive new features (variables) in the decreasing
order of importance.
• Dimensionality can be reduced without loosing much information and
structure present in the data.
Pearson, K. (1901). "On Lines and Planes of Closest Fit to Systems of Points in Space"
. Philosophical Magazine. 2 (11): 559–572. doi:10.1080/14786440109462720. S2CID 125037489.
Principal Component Analysis (PCA) example
Do all pixels are important to represent a number?
Image source: https://medium.com/fenwicks/tutorial-1-mnist-the-hello-world-of-deep-learning-abd252c47709
Principal Component Analysis (PCA)
• PCA finds an alternative representation of data in a smaller
dimensional space without loosing much information.
• Method (Intuitive understanding):
• Will find the directions where the data has maximum variance.
• Minimize the sum of squared distances between the points and its
projections
• Principal components are eigen vectors of dataset’s covariance matrix.
Problem Statement
PCA Algorithm
• Normalize the data
• Compute covariance matrix
• Calculate the eigen vectors and eigen values
• Sort the eigen values in descending order and compute the principal
components.
• Reduce the dimensions of the dataset
Normalizing the data
• Min-max normalization
• Z-score normalization
• We have seen these feature scaling methods in CO2.
Covariance matrix
• Covariance can be measured between two variables or features.
• It measures the directional relationship between two features.
Example
x1 x2 x1-mean(x1) x2-mean(x2) x1-mean(x1) * x2-
mean(x2)
2 1 -2.5 -4 10
3 5 -1.5 0 0
4 3 -0.5 -2 1
5 6 0.5 1 0.5
6 7 1.5 2 3
7 8 2.5 3 7.5
Mean(x1) = Mean(x2) =
4.5 5 Cov(x1,x2) = 3.66
Covariance matrix
• Find the remaining entries of the matrix
Example
x1 x2 x1- x2- x1-mean(x1) * x1-mean(x1) * x1- x2-mean(x2) * x2-
mean(x1) mean(x2) x2-mean(x2) mean(x1) mean(x2)
2 1 -2.5 -4 10 6.25 16
3 5 -1.5 0 0 2.25 0
4 3 -0.5 -2 1 0.25 4
5 6 0.5 1 0.5 0.25 1
6 7 1.5 2 3 2.25 4
7 8 2.5 3 7.5 6.25 9
Mean(x Mean(
) = 4.5 y) = 5 Cov(x1,x2) = 3.66 Cov(x1,x1)=2.916 Cov(x2,x2)=5.666
Find Eigen values and Eigen vectors
• Eigen values and Eigen vectors of a covariance matrix are required to
find principal components.
• Eigen vectors help in finding the new transformation of data, where
the variance is maximum (in which direction variance is maximum).
Compute Eigen Values and Eigen vectors
•
Eigen vectors
•
Reduce the dimensions of the dataset
• For which you have to multiply original data points with principal
component-1.
Resultant
x1 x2 feature
2 1 0.692 1 2.384
3 5 7.076
4 3 5.768
5 6 9.46
6 7 11.152
7 8 12.844
Numerical Question on K Means
Clustering Method
Reference: Lecture Slides used from Prof. Ramesh A _Lecture 54 - IIT Roorkee
Numerical Question
Numerical Problem
Group the given data points
into two clusters.
Reference: Lecture Slides used from Prof. Ramesh A _Lecture 54 - IIT Roorkee
Machine Learning
First Iteration:
Randomly choose two cluster centres, say C1 = (1.0,1.0), C2 =
(3.0,4.0)
Points Distance Distance to Assigned
from Centroid Cluster
Centroid (3,4)
(1,1)
P1 (1,1) 0 3.60555 C1
P2 (1.5,2) 1.118034 2.5 C1
P3 (3,4) 3.60555 0 C2
P4 (5,7) 7.2111 3.60555 C2
P5 (3.5,5) 4.71699 1.118034 C2
P6 (4.5,5) 5.31507 1.802776 C2
P7 (3.5,4.5) 4.30116 0.707107 C2
This completes first iteration of K-Means clustering
algorithm.
Second Iteration:
Find new cluster centres:
Cluster 1 has two points, its centre is (1.25,1.5). Similarly, Cluster 2 has
five points, its cluster centre is (3.9,5.1).
Points Distance from Distance from Assigned
Centroid Centroid (3.9,5.1) Cluster
(1.25,1.5)
P1 (1,1) 0.559017 5.02195 C1
P2 (1.5,2) 0.559017 3.92046 C1
P3 (3,4) 3.05164 1.421267 C2
P4 (5,7) 6.65676 2.19545 C2
P5 (3.5,5) 4.16083 0.4123106 C2
P6 (4.5,5) 4.77624 0.608276 C2
P7 (3.5,4.5) 3.75 0.72111 C2
This completes second iteration of K-Means clustering
algorithm.
● K-Means algorithm stops since there is no change in cluster
assignment for any points compared to previous iteration.
Second Problem
Point Coordinates
A1 (2,10) We are also given the information
A2 (2,6) that we need to make 3 clusters. It
A3 (11,11) means we are given K=3.
A4 (6,9)
A5 (6,4)
A6 (1,2) First, we will randomly choose 3
A7 (5,10) centroids from the given data. Let
A8 (4,9)
us consider A2 (2,6), A7 (5,10), and
A9 (10,12)
A15 (6,11) as the centroids of the
A10 (7,5)
initial clusters.
A11 (9,11)
A12 (4,6)
A13 (3,10)
A14 (3,8)
A15 (6,11)
Problem source: https://codinginfinite.com/k-means-clustering-explained-with-numerical-
example/
● First Iteration:
Initial cluster centers A2 (2,6), A7 (5,10), and A15 (6,11)
Points Distance from Centroid Distance from Centroid Distance from Centroid Assigned
(2,6) (5,10) (6,11) Cluster
P1 (2,10) 4 3 4.123106 C2
P2 (2,6) 0 5 6.40312 C1
P3 (11,11) 10.29563 6.08276 5 C3
P4 (6,9) 5 1.414214 2 C2
This completes first
iteration of K-Means
P5 (6,4) 4.47214 6.08276 7 C1
clustering algorithm.
P6 (1,2) 4.123106 8.94427 10.29563 C1
P7 (5,10) 5 0 1.414214 C2
P8 (4,9) 3.60555 1.414214 2.828427 C2
P9 (10,12) 10 5.38516 4.123106 C3
P10 (7,5) 5.09902 5.38516 6.08276 C1
P11 (9,11) 8.60233 4.123106 3 C3
P12 (4,6) 2 4.123106 5.38516 C1
P13 (3,10) 4.123106 2 3.16228 C2
P14 (3,8) 2.23607 2.828427 4.24264 C1
P15 (6,11) 6.40312 1.414214 0 C3
Second Iteration:
● Cluster 1 has 6 points, Cluster 2 has 5 points, and Cluster 3 has 4 points.
● Find cluster centroid for each cluster
● New cluster centre for cluster 1: (3.833, 5.167)
● New cluster centre for cluster 2: (4, 9.6)
● New cluster centre for cluster 3: (9, 11.25)
● Second Iteration:
Points Distance from Distance from Centroid Distance from Assigned Assigned
Centroid (3.833, (4, 9.6) Centroid Cluster Cluster
5.166) (9, 11.25) (Previous
iteration)
P1 (2,10) 5.16986 2.040 7.111 C2 C2
P2 (2,6) 2.013 4.118 8.750 C1 C1
P3 (11,11) 9.241 7.139 2.016 C3 C3
P4 (6,9) 4.403 2.088 3.750 C2 C2
This completes second
P5 (6,4) 2.461 5.946 7.846 C1 C1 iteration of K-Means
P6 (1,2) 4.249 8.171 12.230 C1 C1
clustering algorithm.
P7 (5,10) 4.972 1.077 4.191 C2 C2
P8 (4,9) 3.837 0.600 5.483 C2 C2
P9 (10,12) 9.204 6.462 1.250 C3 C3
P10 (7,5) 3.171 5.492 6.562 C1 C1
P11 (9,11) 7.792 5.192 0.250 C3 C3
P12 (4,6) 0.850 3.600 7.250 C1 C1
P13 (3,10) 4.904 1.077 6.129 C2 C2
P14 (3,8) 2.953 1.887 6.824 C1 C2
P15 (6,11) 6.223 2.441 3.010 C3 C2
Third Iteration:
● Cluster 1 has 5 points, Cluster 2 has 7 points, and Cluster 3 has 3 points.
● Find cluster centroid for each cluster
● New cluster centre for cluster 1: (4,4.6)
● New cluster centre for cluster 2: (4.413,9.571)
● New cluster centre for cluster 3: (10,11.333)
● Third Iteration:
Points Distance from Distance from Centroid Distance from Assigned Assigned
Centroid (4,4.6) (4.413,9.571) Centroid Cluster Cluster
(10,11.333) (Previous
iteration)
P1 (2,10) 5.758 2.186 8.110 C2 C2
P2 (2,6) 2.441 4.165 9.615 C1 C1
P3 (11,11) 9.485 7.004 1.054 C3 C3
P4 (6,9) 4.833 1.943 4.631 C2 C2
This completes third
P5 (6,4) 2.088 5.872 8.353 C1 C1 iteration of K-Means
P6 (1,2) 3.970 8.197 12.966 C1 C1
clustering algorithm.
P7 (5,10) 5.492 0.958 5.175 C2 C2
P8 (4,9) 4.400 0.589 6.438 C2 C2
P9 (10,12) 9.527 6.341 0.667 C3 C3
P10 (7,5) 3.027 5.390 7.008 C1 C1
P11 (9,11) 8.122 5.063 1.054 C3 C3
P12 (4,6) 1.400 3.574 8.028 C1 C1
P13 (3,10) 5.492 1.221 7.126 C2 C2
P14 (3,8) 3.544 1.943 7.753 C2 C2
P15 (6,11) 6.705 2.343 4.014 C2 C2
● K-Means algorithm stops since there is no change
in cluster assignment for any points compared to
previous iteration.
Exercise Question 01: Consider the following dataset consisting of points
in a 2-dimensional space:
A(2,3),B(4,5),C(6,7),D(1,8)
Apply the K-means algorithm with k=2 to cluster these points.
—--------------------------------------------------------------------------------
Exercise Question 02: Consider the following dataset consisting of points
in a 2-dimensional space:
A(1,2),B(3,4),C(5,6),D(3,5),E(9,10), F(1,5)
Apply the K-means algorithm with k=3 to cluster these points.