Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views10 pages

Machine Learning Unsupervised Learning Methods

The document discusses unsupervised learning, comparing it to supervised learning, and outlines various applications such as clustering, anomaly detection, and association rule learning. It details the K-means clustering algorithm and the Apriori algorithm for finding patterns in data. The text emphasizes the importance of unsupervised learning in identifying hidden patterns and relationships in unlabeled data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

Machine Learning Unsupervised Learning Methods

The document discusses unsupervised learning, comparing it to supervised learning, and outlines various applications such as clustering, anomaly detection, and association rule learning. It details the K-means clustering algorithm and the Apriori algorithm for finding patterns in data. The text emphasizes the importance of unsupervised learning in identifying hidden patterns and relationships in unlabeled data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unsupervised Learning

5.1 Supervised vs. Unsupervised Learning


5.2 Applications of unsupervised learning
5.3 Clustering
5.3.1 K-means clustering Algorithm
5.4 Finding pattern using Association Rule
5.4.1 Apriori Algorithm
Question Bank

5.1 Supervised vs. Unsupervised Learning


Criteria
Supervised learning Unsupervised learning
Input Data Algorithms are trained using Algorithms are used against
labeled data. data that is unlabeled.
Computational Complexity Simpler method Computationally complex
Accuracy Highly accurate Less accurate
No. of classes No. of classes is known No. of classes is not known
Data Analysis Uses offline analysis Uses real-time analysis of data
Algorithms used Linear and Logistics K-Means clustering,
regression, Random forest, Hierarchical
Support Vector. Machine, clustering, Apriori
Neural Network, etc. algorithm, etc.
Output Desired output is given. Desired output is not given.
Training data Use training data to infer model. No training data is used.
Complex model It is not possible to learn It is possible to learn larger
larger and more complex and more complex models
models than with supervised with unsupervised learming.
learning.
Model We can test our model. We can not test our model.
Example Example: Optical character Example: Find a face in an
recognition. image.
Ünsupervised Learning: or structure within the data on its own.
Unsupervised learning is a type of machine Unlike supervised learning, there are no target
learning algorithm where the input data is not variables to predict, and the algorithm is left to
labeled and the algorithm must find patterns discover patterns and relationships on its
own.
52 Fundamentals of Machine Learning

5.2 Applications of unsupervised basket analysis, product


recommendation, and customer
learning
behavior analysis.
1. Clustering: Unsupervised learning
algorithms like k-means and 5. Topic modeling: Unsupervised learning
hierarchical clustering are widely used algorithms like latent Dirichlet
in customer segmentation, image allocation (LDA) and non-negative
segmentation, document clustering, and matrix factorization (NMF) can be used
social network analysis. to identify topics in large document
collections and social media data.
For example, a clothing retailer may
use unsupervised learning to identify 6. Generative models: Unsupervised
three distinct customer segments: young learning algorithms like variational
fashion-conscious shoppers, budget autoencoders (VAEs) and generative
conscious families, and outdoor adversarial networks (GANs) are used
enthusiasts. They can then tailor their for generating synthetic data, image
advertising and product offerings to synthesis, and style transfer.
each segment, improving customer 7. Recommendation Systems:
engagement and increasing sales. Collaborative filtering algorithms like
2. Anomaly detection: Ünsupervised matrix factorization or nearest-neighbor
learning algorithms like isolation forest can be used to make personalized
and one-class SVM can be used to recommendations to users based on
detect anomalies in data such as fraud their past behavior or preferences. This
detection, intrusion detection, and is commonly used in e-commerce,
manufacturing quality control. music or video streaming, and social
For example, a bank may use media platforms.
unsupervised learning to detect credit 8. Natural Language Processing:
card fraud by identifying tr¡nsactions Unsupervised learning algorithms like
that are significantly different from a topic modeling or word embeddings
customer's usual spending pattern. can be used to identify patterns and
They can then flag these transactions relationships in iarge text datasets,
for further investigation or
improving text classification, sentiment
automatically block them to prevent analysis, and language translation.
fraudulent activity. 9. Bioinformatics: Unsupervised learning
3. Dimensionality reduction: Techniques
like principal component analysis
algorithms like clustering or principal
component analysis can be used to
(PCA), t-SNE, and autoencoders are identify patterns in gene expressi00
used for reducing the complexity of
data, improve drug discovery, or predict
high-dimensional data to improve disease risk.
visualization, compression, and feature
selection. 10.Outlier Detection: Clustering of
4. Association rule learning: Unsupervised density-based algorithms can be used to
identify outliers or anomalies in data
learning algorithms like Apriori and that may indicate errors or outliers.
FP-Growth can be used for market
Unsupervised Learning 53

This is used in finance, manufacturing, depends on the characteristics of the data


and healthcare applications.
and the specific requirements of the
5.3 Clustering application. Clustering algorithms may
Clustering is a type of unsupervised also require preprocessing steps Iike
normalization or feature scaling to ensure
learning algorithm in machine learning
that involves grouping similar data points that the features are comparable across
together into clusters based on their different data points.
features or attributes. The main objective Overall, clustering is a powerful technique
of clustering is to partition the data in in machine learning that can help uncover
such a way that the points within each hidden patterns in data and facilitate
cluster are similar to each other and decision-making in various applications.
different from points in other clusters. The following figure shows the steps of
The choice of clustering algorithm clustering process:

Interpretation

Validation of
results Knowledge

Clustering Algorithm
Selection Final
Clusters
Algorithm
Feature results
Selection

Data for
process

Dat

Clustering

Clustering is very much important as it determines the intrinsic grouping among the
unlabelled data present. There are no criteria for g00d clustering. It depends on the user,
what is the criteria they may use which satisfy their need.
Let's see how clustering is differ from classification2
Clustering is the method of unsupervised learning which classification is the method of
supervised learning.
54 Fundamentals of Machine Learning

Clustering
classiticat'om Users unlabelled data as the input
Users labelled data as the input
The output is known The output is unknown

Uses supervised machine learning Uses unsupervised machine learning


Atraining data set is provided.and used to Atraining data set is provided and used to
produce classifications produce clusters

Examples of algorithms: Decision-trees, Examples of algorithms: Partition-based


Bayesian Classifiers and Support Vector clustering (k-means), Hierarchical clustering
Machine (SVM) (agglomerative & divisive) and DBSCAN

Can be more compex than clustering Can be less compex than classification

Does not specify areas for improvement specifies areas for improvement
Two-phase Single-phase
Boundary conditions must be specified Boundary conditions do not always need to
specified

Classification VS Clustering

weight Weight
Adts Custer2
WAduk
Chanctrk

thitdren
w/ Châd
Chancieristles

Helght Haight

5.2.1 K-means clustering Algorithm the sum of squared distances between data
K-means clustering is a popular points and their assigned centroids.
unsupervised machine learning algorithm The steps of the k-Means clustering
used for clustering data points into K algorithm are as follows:
clusters. The algorithm starts with 1. Initialize the algorithm by selecting k
randomly selecting K centroids, and then random points from the dataset as the
assigns each data point to the nearest initial centroids.
centroid. It then calculates the new 2. Assign each data point to the nearest
centroid of each cluster and repeats the centroid, based on the Euclidean
process until convergence. The main distance.
objective of the algorithm is to minimize 3. Calculate the new centroid of each
Unsupervised Learning 55

cluster by taking the mean of all data points assigned to that cluster.
4. Repeat steps 2 and 3until the centroids no longer change or a specified number of
iterations is reached.
5. centroids.
The resulting custers are the groups of data points that are closest to their respective
Pseudo code for the k-means clustering algorithm is:
Algorithm 1 k-mcans algorithm
1: Specify the number k of clusters to assign.
2: Randomly initialize k centroids.
3: repeat
4:
expectation: Assign each point to its closest centroid.
5 maximization: Compute the new centroid (mean) of each cluster.
6: until The centroid positions do not change.
Example: Cluster the following eight points Ñ(AI, CI)
(with (x, y) representing locations) into = }x2 - xl| + |y2 - yl|
three clusters:
=|2 - 2] + |10 - 10|
AI(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), = 0
AS(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
Initial cluster centers are : Calculating Distance Between Al(2, 10)
and C2(5, 8)
A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points
Ñ(A1, C2)
= |x2 - xl| + |y2 - yl|
a= (x1, yl) and b =(x2, y2) is defined
as = |5 - 2| + |8 - 10|
= 3+ 2
N(a, b) = |x2 - x1| + y2 - yl| = 5
Use K-Means Algorithm to find the three
cluster centers after the first iteration. Calculating Distance Between Al(2, 10)
Iteration-01 : and C3(1, 2)
" We calculate the distance of each P(A1, C3)
point from each of the center of the = |x2 - xl| + |y2 - yl|
three clusters. =|1 - 2| + |2 - 10|
= |+ 8
" The distance is calculated by using
= 9
the given distance function.
The following illustration shows the In the similar manner, we calculate the
calculation of distance between point distance of other points from each of the
A1(2, 10) and each of the center of the center of the three clusters.
three clusters Next,
Calculating Distance Between Al(2, 10) We draw a table showing all the results.
and C1(2, 10) Using the table, we decide which point
belongs to which cluster.
Fundamentals of Machine Learning
S6

center is nearest to it.


The given point belongs to that cluster whose
Distance from Distance from Point belongs
GivenPoints Distance trom
to Cluster
center (5, 8) center (1, 2)
center (2, 10)
of Cluster-03
of Cluster-01 of Cluster-02
C1
Al(2. 10) 0
4 C3
A2(2. 5) C2
12 7
A3(8, 4) 10 C2
0
A4(5. 8) 9 C2
10 5
A5(7, 5) C2
5 7
A6(6, 4) 10
C3
10
A7(1. 2) 10 C2
A8(4, 9)
From here, New clusters are " It can be used for a wide range of
applications, including customer
Cluster-01:
segmentation, image analysis, and
First cluster contains points text mining.
AI(2, 10) Cons:
Cluster-02:
The performance of K-means
Second cluster contains points depends on the initial placement of
A3(8, 4) centroids, which can lead to
A4(5, 8) suboptimal solutions.
AS(7, 5) The algorithm may converge to local
A6(6, 4) optima, which may not be the global
A8(4, 9) optimum.
Cluster-03:
K-means assumes that all clusters
Third cluster contains points have the same variance, which may
not be the case in some datasets.
A2(2, 5)
A7(1, 2) " It may not work well with non
linearly separable data.
Similarly, we can apply this for iteration
2. 3 and so on. Applications:
Pros:
" Customer Segmentation: K-means
" K-means is fast and scalable, making clustering can be used to group
customers based 'on their behavior,
it ideal for large datasets.
preferences, or needs, allowing
" It is relatively simple to implement
businesses to tailor their marketing
and can work well with high
dimensional data.
strategies and offers to specific
customer segments.
" K-means is efficient and can handle Image Segmentation: K-means can be
noisy data.
used to segment images by grouping
Unsupervised Learning 57

similar pixels together into clusters, For example, if a transaction dataset


allowing for image compression or contains 100 transactions and a particular
object recognition. itemset occurs in 20 of them, the support
" Text Clustering: K-means can be of the itemset would be 20%.
used to group similar text documents Confidence, on the other hand, refers to
together based on their content, the conditional probability that a
improving search results and transaction containing one set of items
recommendation systems. will also contain another set of items. It
Anomaly Detection: K-means can be measures the strength of the association
used to identify outliers or anomalies between items and is used to generate
in data that may indicate fraud, association rules that express the
intrusion, or equipment malfunction. conditional relationships between items.
" Bioinformatics: K-means can be used Confidence(X => Y) = (Number of
to identify patterns in gene 'transactions containing X and Y) I
expression data or to cluster proteins (Number of transactions containing X)
based on their properties, improving For example, if a transaction dataset
drug discovery or disease diagnosis. contains 100 transactions and a particular
5.4 Finding pattern using Association association rule has a confidence of 80%,
Rule it means that in 80% of the transactions
that contain the antecedent (left-hand side)
The Association Rule is a rule-based
of the rule, the consequent (right-hand
machine learning method for identifying side) also occurs.
associations between unrelated elements
using pattem recognition. 5.4.1 Apriori Algorithm
Support and confidence are two important Purpose: The Apriori Algorithm is an
measures used in association rule mining influential algorithm for mining frequent
itemsets for Boolean association rules.
to evaluate the significance of frequent
itemsets and association rules. Key Concepts:
Support refers to the frequency of Frequent Itemsets: The sets of item
occurrence of an itemset in a dataset, which has minimum support (denoted by
expressed as a percentage or proportion Li for ih-Itemset). Coenshg1,
of the total transactions in the dataset that Apriori Property: Any subset of frequent
contain the itemset. itemset must be frequent.
Support(A) = (Number of transactions Join Operation: To find L,, a set of
containing A) / (Total number of candidate k-itemsets is generated by
transactions) joining Lk-l itself.
It measures the degree of association Find the frequent itemsets: the sets of
between items and is used to identify items that have minimum support - A
frequent itemsets that occur frequently subset of a frequent itemset must also be
enough to be considered interesting or a frequent itemset (Apriori Property)
significant. i.e., if AB} is a frequent itemset, both
(A} and (B) should be a frequent itenset
Fundamental of Machine Learning /2023 /8
S8
Fundamentals of Machine Learnin.
Iteratively find frequent itemsets with pairs of frequent (k-1)-itemsets,
cardinality from Ito k (k-itemset) b. Prune the candidate itemsets th
Use the frequent itemsets to generate contain infrequent (k-1)-itemsets.
association rules. c. Count the support of each candida.
The Apriori Algorithm : itemset by scanning the dataset.
The steps of the Apriori algorithm are as d. Keep only the frequent itemsets th
follows: meet thc minimum support threshol
1. Set the minimum support threshold to 4. Generate association rules from th
a desired value. frequent itemsets by applying
2. Generate all frequent 1-itemsets by minimum confidence threshold.
scanning the dataset and counting the Pseudo code
support of each item. Join Step: C k is generated b
3. Repeat the following steps until no Joining L,-lwith itself
more frequent itemsets can be Prune Step: Any (k-l)-itemset that
generated:
not frequent cannot be a subset of :
a. Generate candidate itemsets by joining frequent k-itemset
Algorithn,: Apriori algorithm
Input: D: Input Datasct
minSup: minimum support threshold
Output: AI|2 to k-lrcquent itemsets

1. L=;1-irequent itemset; / found separately


2.. for (k 2. 1.. Q: k-)
3. C= upriori gen(L) / tinds k-cand1date ilemscts by
joining and pruning Li ;with itsell'
for each transaction / in D
C uhset (C) /finds candidate itemsets in
for vachcinC

8. endfor wach
) end for each
10 L,=cCccount > unSup:
end for
Return U, LL

Let's see an example of the Apriori Algorithm.


Transaction ID Items
T1 Hot Dogs, Bun_. Ketchup
T2 Hot Dogs, Buns.
T3 Hot Degs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 Hot Dogs, Coke, Chips
59
Unsupervised Learning

Find the frequent itemsets and generate association rules on this. Assume that minimum
support hreshold (s = 33.33%) and minimum confident threshold (c =600)
Let's start,

minimum support count = 33.33 X 6


100
= 2

Item set Sup-count Item set Sup-count


Hot Dogs Hot Dogs 4
Buns Buns 2
Ketchup 2 Ketchup 2
Coke 3 Coke 3
Chips Chips

Item set Sup-count llem set Sup-count


Hot Dogs, Buns 2 Hot Dogs, Buns 2
Hot Dogs, Coke 2 Hot Dogs. Ketchup
2 Hot Dogs. Coke 2
Hot Dogs, Chips
Hot Dogs. Chips 2
Coke, Chips 1
Buns, Ketchup
Buns, Coko
Buns, Chips
Item set Sup-count Ketchup, Coke
Ketchup, Chips
Hot Dogs, Buns, Coke 3
Hot Dogs, Buns, Chips Coke, Chips
Hot Dogs, Coke, Chips 2

Item set Sup-count


Hot Dogs. Coke. Chips 2

There is only one itemset with minimum Dogs^Coke^Chips)/sup(Coke^Chips)


support 2. = 2/3*100-66.67% I/Selected
So only one itemset is frequent. [Hot Dogs]=>[Coke^Chips] Il
Frequent Itemset (I) = {Hot Dogs, confidence = sup(Hot
Coke, Chips) Dogs^Coke^Chips)/sup(Hot Dogs) =
Association rules,
2/4*100=50% //Rejected
(Hot Dogs^Coke]=>(Chips] / [Coke]=>(Hot Dogs^Chips] I
confidence = sup(Hot
confidence = sup(Hot
Dogs^Coke^Chips)/sup(Hot
Dogs^Coke^Chips)/sup(Coke) = 2/
2/2*100=100% / 3*100=66.67% I/Selected
Dogs^Coke) =
Selected [Chips]=>[Hot Dogs^Coke] l
[Hot- Dogs^Chips]=>Coke] ! confidence = sup(Hot
confidence =´sup(Hot Dogs^Coke^Chips)/sup(Chips) = 2/
Dogs^Coke^Chips)/sup(Hot
4*100=50% I/Rejected
Dogs^Chips) = 2/2*100=100% I/ There are four strong results (minimum
Selected confidence greater than 60%)
(Coke^Chips]=>[Hot Dogs] /
confidence = sup(Hot
60
Fundamentals of Machine Learning

Advantages of Apriori Algorithm: items or attributes in the


data

Scalability: The Apriori algorithm is increases.

a scalable algorithm and can handle Applicationsof Apriori Algorithm:


Apriori
large datasets eficiently. " Market Basket Analysis : The
algorithm is. commonly
used in
" Interpretability: The Apriori
algorithm generates frequent itemsets market basket analysis to identify the
CO-0cCurrence of items in customer
and association rules that are easy to and generate
transactions
understand and interpret. recommendations for cross-selling
Flexibility: The algorithm can be and up-selling.
customized by adjusting the
minimum support and confidence Web Usage Mining : The algorithm
and
thresholds to suit the specific can be used to mine patterns
requirements of the user. trends in web logs, such as
frequently visited pages
Applicability : The Apriori algorithm
clickstreams, to improve website
can be used to mine association rules
design and user experience.
from various types of data, including
market basket transactions, web logs, Bioinformatics The Apriori
and biological sequences. algorithm can be applied to biological
Disadvantages of Apriori Algorithm: sequences, such as DNA or protein
sequences, to discover frequent
" Computational Complexity : The patterns or motifs that may be related
Apriori algorithm has a high
computational complexity and may to gene expression or function.
take a long time to generate frequent " Fraud Detection : The algorithm can
itemsets and association rules from be used to detect fraudulent behavior
financial transactions by
large datasets.
Memory Requirements : The identifying patterns of unusual or
suspicious activities.
algorithm requires a significant
amount of memory to store the Social Network Analysis: The
candidate itemsets and their support algorithm can be applied to social
Counts. network data to identify groups or
" Curse of Dimensionality The communities of individuals with
performance of the algorithm may similar interests or behaviors.
degrade rapidly as the number of

You might also like