Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views20 pages

Data Science

The document provides an overview of clustering in data science, detailing its importance as a machine learning technique for grouping data points based on similarity. It discusses various types of clustering, methods, algorithms, and applications, highlighting its role in fields such as marketing and medical imaging. Additionally, it mentions the potential of clustering to enhance supervised learning algorithms by using cluster labels as independent variables.

Uploaded by

mujjuh308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views20 pages

Data Science

The document provides an overview of clustering in data science, detailing its importance as a machine learning technique for grouping data points based on similarity. It discusses various types of clustering, methods, algorithms, and applications, highlighting its role in fields such as marketing and medical imaging. Additionally, it mentions the potential of clustering to enhance supervised learning algorithms by using cluster labels as independent variables.

Uploaded by

mujjuh308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343059183

Clustering in Data Science

Presentation · July 2020

CITATIONS READS

0 150

1 author:

Nilu Singh
Koneru Lakshmaiah Education Foundation
121 PUBLICATIONS 383 CITATIONS

SEE PROFILE

All content following this page was uploaded by Nilu Singh on 04 May 2023.

The user has requested enhancement of the downloaded file.


Clustering in Data Science

Dr. Nilu Singh


School of Computer Applications
Babu Banarasi Das University
Lucknow-UP
Content

• Introduction of Clustering
• Clustering in Machine Learning
• Need of Clustering
• Types of Clustering
• Clustering Methods
• Types of clustering algorithms
• Applications of Clustering
• References
Clustering

• Clustering is a Machine Learning technique


that involves the grouping of data points.
• Given a set of data points, we can use a
clustering algorithm to classify each data
point into a specific group.
Clustering in Machine Learning

• It is basically a type of unsupervised


learning method.
• Clustering is the task of dividing the
population or data points into a number of
groups.
• Ex: Data points in the same groups are
more similar to other data points in the
same group and dissimilar to the data
points in other groups.
Cont...
• It is basically a collection of objects on
the basis of similarity and dissimilarity
between them.
Need of Clustering
• It is very much important as it determines
the intrinsic grouping among the
unlabeled data present.
• There are no criteria for a good
clustering.
• It depends on the user, what is the
criteria they may use which satisfy their
need.
Types of Clustering

clustering can be divided into two


subgroups:
Hard Clustering- In this each data point
either belongs to a cluster completely or
not.
Soft Clustering- In this instead of putting
each data point into a separate cluster, a
probability or likelihood of that data point to
be in those clusters is assigned.
Clustering Methods
 Density-Based Methods
 Hierarchical Based Methods
 Partitioning Methods
 Grid-based Methods
Types of clustering algorithms

• There are more than 100 clustering


algorithms known. But few of the
algorithms are used popularly, such as-
 Connectivity models
 Centroid models
 Distribution models
 Density Models
Cont...
Connectivity models:
• These models are based on the notion that
the data points closer in data space exhibit
more similarity to each other than the data
points lying farther away.
• These models are very easy to interpret but
lacks scalability for handling big datasets.
• Examples of these models are hierarchical
clustering algorithm and its variants.
Cont...
Centroid models:
• These are iterative clustering algorithms in
which the notion of similarity is derived by
the closeness of a data point to the centroid
of the clusters.
• Ex: K-Means clustering algorithm.
Cont...
Distribution models:
• These clustering models are based on the
notion of how probable is it that all data
points in the cluster belong to the same
distribution.
• Example of these models is Expectation-
maximization algorithm which uses
multivariate normal distributions.
Cont...
Density Models:
• These models search the data space for
areas of varied density of data points in the
data space.
• Examples of density models are DBSCAN
and OPTICS.
Applications of Clustering

Some of the most popular applications of


clustering are:
 Recommendation engines
 Market segmentation
 Social network analysis
 Search result grouping
 Medical imaging
 Image segmentation
 Anomaly detection
Cont...
Marketing : It can be used to characterize &
discover customer segments for marketing
purposes.
Libraries : It is used in clustering different books
on the basis of topics and information.
Cont...
City Planning: It is used to make groups of
houses and to study their values based on
their geographical locations and other
factors present.
Earthquake studies: By learning the
earthquake-affected areas we can
determine the dangerous zones.
Improving Supervised Learning Algorithms
with Clustering

• Clustering is an unsupervised machine


learning approach.
• but can it be used to improve the accuracy
of supervised machine learning algorithms
as well by clustering the data points into
similar groups and using these cluster labels
as independent variables in the supervised
machine learning algorithm.
 https://www.dummies.com/programming/big-data/data-
science/clustering-algorithms-used-in-data-science/
 https://www.geeksforgeeks.org/clustering-in-machine-
learning/
 https://www.analyticsvidhya.com/blog/2016/11/an-
introduction-to-clustering-and-different-methods-of-
clustering/
 https://medium.com/cracking-the-data-science-
interview/an-introduction-to-big-data-clustering-
1a911b83e590
View publication stats

You might also like