Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views66 pages

Unsupervised Learning

Uploaded by

Karim Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views66 pages

Unsupervised Learning

Uploaded by

Karim Saad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Unsupervised

Learning Your Image Here

AIcademy Summer Camp – Day 2


Learning Outcomes
Unsupervised learning 01
Supervised vs unsupervised learning 02
Case Study 03
Benefits and Challenges 04
K-Means Clustering 05
Elbow Method 06
Principle Component Analysis 07
Supervised vs Unsupervised Learning

Supervised Learning: Unsupervised Learning: Reinforcement Learning:

Labeled training data to teach Uses input data without The model learns by
the model predefined outputs interacting with an
Desired output values to No labeled data or training environment.
define correct answers context provided Desired behaviors are
reinforced through rewards
Learning algorithm to map Employs algorithms to discover
inputs to outputs inherent data patterns Undesired behaviors may
result in penalties
Labeled validation data to test
model accuracy Develop a policy that maps
states to actions
Blank
Unsupervised Learning
Unsupervised Learning
• Algorithms that learn from unlabeled data
• Used for exploratory analysis, image processing, and
identifying key data structures
• Applications: object recognition, medical imaging, anomaly
detection, recommendations
• Key Methods:
• Clustering (e.g., k-means)
• Dimensionality Reduction (e.g., Principal Component
Analysis)
Case Study
Imagine you have a large farm where various animals are randomly collected in one
container. You need an automated sorter. In this scenario, you can use unsupervised
learning techniques to automatically group similar animals together
Benefits and Challenges

Benefits: Challenges:

Reduced manual data preparation High computational complexity,


- no need for labeled data especially with large datasets

Ability to discover unknown patterns Increased risk of inaccurate results


in data
Potential need for human
Simpler algorithms intervention to validate groupings
K-Means Clustering

Imagine you had


some data that you
could plot on a line,
and you knew you
needed to put it into 3
clusters.
K-Means Clustering

Cluster 1 Cluster 2 Cluster 3

In this case the data


make three, relatively
obvious, clusters.
But, rather than rely on our eye, let's see if we can get a
computer to identify the same 3 clusters.
K-Means Clustering

Step I: Select the number of clusters you want to


identify in your data. This is the "K" in "K-means
clustering".

In this case, we will select K=3. That is to say, we


want to identify 3 clusters.
These are the initial
K-Means Clustering clusters

Step II: Randomly select three distinct data points.


K-Means Clustering

Distance from
the 1st point to
the blue cluster

Step III: Measure the distance between the 1st point


and the three initial clusters
K-Means Clustering

Distance from the 1st


point to the green
cluster

Step III: Measure the distance between the 1st point


and the three initial clusters
K-Means Clustering

Distance from the 1st point to the


orange cluster

Step III: Measure the distance between the 1st point


and the three initial clusters
K-Means Clustering

Step IV: Assign the 1st point to the nearest cluster.


In this case the nearest cluster is the blue cluster
Now we do the same
K-Means Clustering thing for the next point

Measure the distances


K-Means Clustering

Assign the point to the


nearest cluster (green
one)
Now figure out which
K-Means Clustering cluster the 3rd point
belongs to

Measure the distances


K-Means Clustering

Assign the point to the


nearest cluster (orange
one)
K-Means Clustering

The rest of these point


are closest to the orange
cluster
K-Means Clustering

The rest of these point


are closest to the orange
cluster
K-Means Clustering

Step V: Calculate the mean of each cluster


K-Means Clustering

Then we repeat what we just did, measure and


cluster using the mean value.
K-Means Clustering

Cluster 1 Cluster 2 Cluster 3

The K means clustering are terrible compared to what we did by eye

Cluster 1 Cluster 2 Cluster 3


K-Means Clustering

Cluster 1 Cluster 2 Cluster 3

Total Variation within the clusters

Since K-means clustering can't "see" the best clustering, its only
option is to keep track of these clusters, and their total variance,
and do the whole thing over again with different starting points.
K-Means Clustering

Pick three initial random clusters


• Cluster all the remaining points based
on the closest cluster
K-Means Clustering

• calculate the mean of each cluster and


then re-cluster based on the new
means.
K-Means Clustering

• Repeat until the clusters no longer


change.
K-Means Clustering

• It repeats until the clusters no longer


change.
K-Means Clustering

Cluster 1 Cluster 2 Cluster 3

Total Variation within the clusters


At this point, K-means clustering knows that the 2nd clustering is the best
clustering so far. But it doesn't know if it's the best overall, so it will do a few
more clusters (it does as many as you tell it to do) and then come back
and return that one if it is still the best.
K-Means Clustering
What if our data is plotted in 2 dimensions
K-Means Clustering
Just like before we pick three random points
and we use the euclidean distance

𝑥 2 + y 2
y
x
K-Means Clustering

Cluster 1 Cluster 2

Cluster 3
K-Means Clustering

• Groups the data into ‘K’ groups based on similarities (or


distance) between the features of the items in the data
• Finds K cluster centers that best splits the data
• Minimizes the inter-cluster variances
• When testing a new item, it's placed in the group it's most
similar to
Let’s group this point together
Given the point S with coordinates S(0.4,0.4) and two cluster X and O.
To which cluster does this new point can be classified ?
Justify your answer!
Euclidean Distance Reminder!

Let’s group this point together


Given the point S with coordinates S(0.4,0.4) and two cluster X and O.
To which cluster does this new point can be classified ?
Justify your answer!

𝑑 𝑆, 𝑋 = 0.4 − 0.25 2 + 0.4 − 0.72 2=0.1024

𝑑 𝑆, 𝑂 = 0.4 − 0.7 2 + 0.4 − 0.31 2=0.0081

𝑑 𝑆, 𝑂 < 𝑑(𝑆, 𝑋)

Our new point S belongs to the cluster O


How do we specify number of clusters K?

Choosing the right number of groups (k) can be tricky!


What happens if we pick k=2? k=4? k=5?
There are many ways to solve this:
• Elbow method (we will learn this)
• Silhouette method
• Gap statistics
Elbow Method

1. We try k-means for different values of


k (like k=1,2,...,10).
2. For each k, we calculate how far each
point is from the center of its group.
3. We plot these distances against k.
4. The "elbow" point, where the plot
bends, shows the best number of
groups.
Test Your Knowledge

True or False: In k-means clustering, the clusters are


defined by boundaries .
Test Your Knowledge

False
Clusters are defined by their centroids
Principal Component
Analysis
Taking the picture of a Teapot

How to take a picture to


capture the most information
about the teapot?
Which angle is the best?

A B

C D
Which angle is the best?

A B

C D
Best position for a teapot snapshot?
Why this position?
Because it provides the most visual
information.
How do we find this position ?
Rotate the teapot according to the PCA
algorithm.
Finding the longest axis
Finding the second longest axis while fixing the first axis
How PCA works?

Rotate the object around its center to find the best orientation:
• First find the axis so that the object has largest extend in average
along the axis.
• Rotate the object around the first axis to find the axis that is
perpendicular to the first axis, and the object has largest extend in
average along this axis.

The two axises found are the first and the second principal component.
The extends in average along the axises are called the eigenvalues
PCA

• PCA is a technique that allows the extraction of the most


important trends in the data
• It helps reveal the underlying trends by constructing a new
coordinate system by rotating axes
• The first direction is the direction that the data varies the
most, the second is the one that varies second most,…
• In summary it learns a few principal components that are
representative of the whole dataset from which any element
of the dataset can be reconstructed
Popular Applications

• Visualization of High dimensional data


• Find essential attributes and variables
• Dictionary Learning
• Dimensionality Reduction
• Filtering of data
Partner & a Business Idea

Capital required=$ 4M
Expected per partner contribution =
$ 4M / 4=$ 1 M
Partner & a Business Idea
Expected per partner contribution =
$ 4M / 4=$ 1 M

Actual 1.8M 1.2M 0.6M 0.4M


Contribution
Who is more important?

1.8M 1.2M 0.6M 0.4M


Proportion % 45% 30% 15% 10%
Cumulative % 45% 75% 90% 100%
45+30
Conclusion

• Principal Components are the partners (Eigenvectors)


• Each has their own contribution (Eigenvalues)
• Keeping only 90% of the contributions results in removing partner 4
from the equation.
• Maybe the company is better off with 3 partners : the top 3
principal components!
• Notes: Data should be free of outliers and should be on the same
scale
Test Your Knowledge

True or False: We can say that PCA is a compression


technique
Test Your Knowledge

True
Eigenfaces using PCA

• When PCA is applied on face images, the


Eigenvectors extracted are called
Eigenfaces
• Each person's face has unique features
that distinguish them from others.
• Is it possible to identify some facial
features that can represent all the faces in
the world?
• Example: normal ear, pointy ears, round
eyes, almond shaped eyes, hair, chin
shapes,…
Eigenfaces using PCA

• Consider the faces in the figure


• We want to learn the basis features of
these faces using PCA
• The most common features are
represented in the form Eigenface 1 (PC1),
the second most common features in the
form of Eigenface 2 (PC2), etc.
Eigen Faces
• Applying PCA on the faces dataset we extract say 2000
Eigenfaces
• The two dominant Eigenfaces are shown
• Every face can now be reconstructed using these Eigenfaces
Eigenface 1 Eigenface 2
• Example:

Original Face Eigenface 2 Eigenface 2000

= -1.3* + 2.3* + … + 0.02*


How many Eigenfaces should we consider?
Original Face

With 400 PCs, the


Reconstructed
face starts to look
Faces
like the original
Selecting the optimal number of
Principal Components

• A well-known technique to
selecting the optimal number of
components is to choose the
number of PCs that express 95% of
the variance.
• This assumes that the remaining
5% is noise
Test Your Knowledge

What are two examples of unsupervised machine


learning methods?
Test Your Knowledge

Clustering and Dimensionality Reduction


TestYour
Test YourKnowledge
Knowledge
Scan Me

Applying PCA
THANK YOU!

You might also like