Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views6 pages

Unsupervised Learning

Unsupervised learning is a machine learning approach that uses unlabeled datasets to identify hidden patterns without human supervision, focusing on tasks like clustering, association, and dimensionality reduction. Clustering techniques, such as hierarchical and K means clustering, group data points based on similarity, with hierarchical clustering forming a tree structure and K means optimizing cluster centers. Key methods include hard and soft clustering, various linkage criteria for hierarchical clustering, and determining the optimal number of clusters using WCSS values.

Uploaded by

musicwithrishi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Unsupervised Learning

Unsupervised learning is a machine learning approach that uses unlabeled datasets to identify hidden patterns without human supervision, focusing on tasks like clustering, association, and dimensionality reduction. Clustering techniques, such as hierarchical and K means clustering, group data points based on similarity, with hierarchical clustering forming a tree structure and K means optimizing cluster centers. Key methods include hard and soft clustering, various linkage criteria for hierarchical clustering, and determining the optimal number of clusters using WCSS values.

Uploaded by

musicwithrishi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Unsupervised

Learning
Agenda

● What is unsupervised Learning?


● Clustering Techniques
● Hierarchical Clustering
● K Means Clustering
Unsupervised Learning

● Unsupervised learning is a type of machine learning in which models are


trained using unlabeled dataset and are allowed to act on that data without
any supervision
● These algorithms discover hidden patterns or data groupings without the
need for human intervention
● Unsupervised learning models are utilized for three main tasks—clustering,
association, and dimensionality reduction
● Its ability to discover similarities and differences in information make it the
ideal solution for exploratory data analysis, cross-selling strategies, customer
segmentation, and image recognition
Clustering Technique

● Grouping unlabeled examples is called clustering


● Clustering is the task of dividing the population or data points into a number
of groups such that data points in the same groups are more similar to other
data points in the same group than those in other groups.
● Hard Clustering
○ Each data point either belongs to a cluster completely or not
● Soft Clustering
○ Instead of putting each data point into a separate cluster, a probability or likelihood
of that data point to be in those clusters is assigned
Hierarchical Clustering

● We develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is
known as the dendrogram
● The dendrogram is a tree-like structure that is mainly used to store each step as a memory
that the HC algorithm performs
● The hierarchical clustering technique has two approaches:
○ Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data points as
single clusters and merging them until one cluster is left.
○ Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach
● Linkage Criteria:
○ Single Linkage: It is the Shortest Distance between the closest points of the clusters
○ Complete Linkage: It is the farthest distance between the two points of two different clusters. It is one
of the popular linkage methods as it forms tighter clusters than single-linkage
○ Average Linkage: It is the linkage method in which the distance between each pair of datasets is
added up and then divided by the total number of datasets to calculate the average distance between
two clusters.
○ Centroid Linkage: It is the linkage method in which the distance between the centroid of the clusters is
calculated
K Means Clustering

● It is an iterative algorithm that divides the unlabeled dataset into k different


clusters in such a way that each dataset belongs only one group that has
similar properties
● It is a centroid-based algorithm, where each cluster is associated with a
centroid. The main aim of this algorithm is to minimize the sum of distances
between the data point and their corresponding clusters
● Cluster centers are initialized randomly, so you might need to try repeatedly
to get best possible clusters
● How to get best / optimum numbers of clusters:
○ Plots a curve between calculated WCSS (Within Cluster Sum of Squares) values and
the number of clusters K
○ The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K

You might also like