Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views12 pages

Unsupervised Unit 1

The document provides an overview of unsupervised learning, focusing on clustering techniques such as K-Means. It explains the importance of unsupervised learning in automating data grouping and its applications in various fields. Additionally, it details the K-Means algorithm, its advantages and disadvantages, and introduces Bisecting K-Means as an improvement for handling non-spherical clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Unsupervised Unit 1

The document provides an overview of unsupervised learning, focusing on clustering techniques such as K-Means. It explains the importance of unsupervised learning in automating data grouping and its applications in various fields. Additionally, it details the K-Means algorithm, its advantages and disadvantages, and introduces Bisecting K-Means as an improvement for handling non-spherical clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Unit 1 - KMeans

Introduction to Unsupervised Learning


Unsupervised learning is a machine learning (ML) technique that finds patterns in
unlabeled data.
It contrasts supervised learning, which uses labeled data. It is useful in pattern
detection, clustering, Association and dimensionality reduction.
Why Unsupervised Learning is Important
- Saves time by automating grouping of data
- Used in network traffic analysis (NTA)
- Helps in threat detection and dimensionality reduction
- Simplifies datasets by removing irrelevant features
Clustering Analysis
Clustering groups similar data points into clusters.
It enables better data profiling, customer segmentation, and dimensionality
reduction.
Types include K-Means, Hierarchical, DBSCAN, GMM.
K-Means Clustering
K-Means groups data based on similarities.
K = number of clusters.
Example: K=5 creates 5 clusters.
Key concepts: Squared Euclidean Distance & Cluster
Inertia.
K-Means Algorithm Steps
1. Choose K clusters (e.g., using elbow method)
2. Randomly select K centroids
3. Assign each data point to the nearest centroid
4. Recalculate centroids
5. Repeat until convergence
K-Means: Pros and Cons
Advantages:
- Efficient computation
- Easy to implement

Disadvantages:
- Poor for non-spherical clusters
- Sensitive to initial centroids
Hierarchical Clustering
Builds clusters by progressively merging them
No need to specify K
Uses dendrograms to visualize
Advantages:
- No preset K
- Good for hierarchy

Disadvantages:
- Sensitive to outliers
- Computationally expensive
Stopping Criteria for K-Means
- No change in centroids
- Points stay in the same cluster
- Max iterations reached
Bisecting K-Means
Improves on K-Means:
- Works with non-spherical clusters
- More efficient for large K
Uses hybrid partitional and hierarchical approach
Bisecting K-Means Algorithm
1. Start with all points as one cluster
2. Bisect the largest SSE cluster using K-means
3. Repeat until K clusters formed
Choose splits based on SSE or size
Map Clustering with R
Use R packages like 'factoextra' to visualize K-means on city location data.
Plot clusters with and without predefined centers.
Code uses:
kmeans(), fviz_cluster(), and coord_flip()

You might also like