Data Science

The document provides an overview of clustering in data science, detailing its importance as a machine learning technique for grouping data points based on similarity. It discusses various types of clustering, methods, algorithms, and applications, highlighting its role in fields such as marketing and medical imaging. Additionally, it mentions the potential of clustering to enhance supervised learning algorithms by using cluster labels as independent variables.

Uploaded by

mujjuh308

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views20 pages

Data Science

Uploaded by

mujjuh308

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343059183

Clustering in Data Science

Presentation · July 2020

CITATIONS READS

0 150

1 author:

Nilu Singh
Koneru Lakshmaiah Education Foundation
121 PUBLICATIONS 383 CITATIONS

SEE PROFILE

All content following this page was uploaded by Nilu Singh on 04 May 2023.

The user has requested enhancement of the downloaded file.

Clustering in Data Science

Dr. Nilu Singh

School of Computer Applications
Babu Banarasi Das University
Lucknow-UP
Content

• Introduction of Clustering
• Clustering in Machine Learning
• Need of Clustering
• Types of Clustering
• Clustering Methods
• Types of clustering algorithms
• Applications of Clustering
• References
Clustering

• Clustering is a Machine Learning technique

that involves the grouping of data points.
• Given a set of data points, we can use a
clustering algorithm to classify each data
point into a specific group.
Clustering in Machine Learning

• It is basically a type of unsupervised

learning method.
• Clustering is the task of dividing the
population or data points into a number of
groups.
• Ex: Data points in the same groups are
more similar to other data points in the
same group and dissimilar to the data
points in other groups.
Cont...
• It is basically a collection of objects on
the basis of similarity and dissimilarity
between them.
Need of Clustering
• It is very much important as it determines
the intrinsic grouping among the
unlabeled data present.
• There are no criteria for a good
clustering.
• It depends on the user, what is the
criteria they may use which satisfy their
need.
Types of Clustering

clustering can be divided into two

subgroups:
Hard Clustering- In this each data point
either belongs to a cluster completely or
not.
Soft Clustering- In this instead of putting
each data point into a separate cluster, a
probability or likelihood of that data point to
be in those clusters is assigned.
Clustering Methods
 Density-Based Methods
 Hierarchical Based Methods
 Partitioning Methods
 Grid-based Methods
Types of clustering algorithms

• There are more than 100 clustering

algorithms known. But few of the
algorithms are used popularly, such as-
 Connectivity models
 Centroid models
 Distribution models
 Density Models
Cont...
Connectivity models:
• These models are based on the notion that
the data points closer in data space exhibit
more similarity to each other than the data
points lying farther away.
• These models are very easy to interpret but
lacks scalability for handling big datasets.
• Examples of these models are hierarchical
clustering algorithm and its variants.
Cont...
Centroid models:
• These are iterative clustering algorithms in
which the notion of similarity is derived by
the closeness of a data point to the centroid
of the clusters.
• Ex: K-Means clustering algorithm.
Cont...
Distribution models:
• These clustering models are based on the
notion of how probable is it that all data
points in the cluster belong to the same
distribution.
• Example of these models is Expectation-
maximization algorithm which uses
multivariate normal distributions.
Cont...
Density Models:
• These models search the data space for
areas of varied density of data points in the
data space.
• Examples of density models are DBSCAN
and OPTICS.
Applications of Clustering

Some of the most popular applications of

clustering are:
 Recommendation engines
 Market segmentation
 Social network analysis
 Search result grouping
 Medical imaging
 Image segmentation
 Anomaly detection
Cont...
Marketing : It can be used to characterize &
discover customer segments for marketing
purposes.
Libraries : It is used in clustering different books
on the basis of topics and information.
Cont...
City Planning: It is used to make groups of
houses and to study their values based on
their geographical locations and other
factors present.
Earthquake studies: By learning the
earthquake-affected areas we can
determine the dangerous zones.
Improving Supervised Learning Algorithms
with Clustering

• Clustering is an unsupervised machine

learning approach.
• but can it be used to improve the accuracy
of supervised machine learning algorithms
as well by clustering the data points into
similar groups and using these cluster labels
as independent variables in the supervised
machine learning algorithm.
 https://www.dummies.com/programming/big-data/data-
science/clustering-algorithms-used-in-data-science/
 https://www.geeksforgeeks.org/clustering-in-machine-
learning/
 https://www.analyticsvidhya.com/blog/2016/11/an-
introduction-to-clustering-and-different-methods-of-
clustering/
 https://medium.com/cracking-the-data-science-
interview/an-introduction-to-big-data-clustering-
1a911b83e590
View publication stats

Clustering
No ratings yet
Clustering
4 pages
Unit 3
No ratings yet
Unit 3
34 pages
Hotel
No ratings yet
Hotel
41 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
Pure+Moderation Brochure+General+2020+
No ratings yet
Pure+Moderation Brochure+General+2020+
20 pages
Autos Automobile.. EDA Project by Anjali Sinha
No ratings yet
Autos Automobile.. EDA Project by Anjali Sinha
26 pages
Other Planes of There Selected Writings Renée Green - The Complete Ebook Is Available For Download With One Click
100% (5)
Other Planes of There Selected Writings Renée Green - The Complete Ebook Is Available For Download With One Click
50 pages
1 Udemy For Business Courses in Native Bahasa Indonesia
No ratings yet
1 Udemy For Business Courses in Native Bahasa Indonesia
7 pages
Unit 4
No ratings yet
Unit 4
62 pages
Ict 6
No ratings yet
Ict 6
31 pages
Final Project Report Found
No ratings yet
Final Project Report Found
86 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
FPA Unit 3
No ratings yet
FPA Unit 3
17 pages
V Unit
No ratings yet
V Unit
27 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Full Clustering
No ratings yet
Full Clustering
10 pages
Implementation of QKD BB84 Protocol in Qiskit
No ratings yet
Implementation of QKD BB84 Protocol in Qiskit
7 pages
Machine Learning Clustering Guide
No ratings yet
Machine Learning Clustering Guide
7 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
ML
No ratings yet
ML
28 pages
Smart Moderator Project Report
No ratings yet
Smart Moderator Project Report
113 pages
MSC Report - Final
No ratings yet
MSC Report - Final
142 pages
Unit 5
No ratings yet
Unit 5
67 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
66 pages
Salesforce Developer Cheat Sheet
No ratings yet
Salesforce Developer Cheat Sheet
2 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Unit 5
No ratings yet
Unit 5
33 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Clustering
No ratings yet
Clustering
21 pages
Direct Memory Access Overview
No ratings yet
Direct Memory Access Overview
21 pages
Advanced Container Loading Strategies
No ratings yet
Advanced Container Loading Strategies
15 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Unit 4
No ratings yet
Unit 4
16 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
ML 8
No ratings yet
ML 8
5 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Clustering Methods in Machine Learning
No ratings yet
Clustering Methods in Machine Learning
45 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Clustering
No ratings yet
Clustering
8 pages
Clustering
No ratings yet
Clustering
12 pages
Offgrid Telecom Power Solutions
100% (1)
Offgrid Telecom Power Solutions
5 pages
How To Access XRK Files Data Without Aim Software - 100
No ratings yet
How To Access XRK Files Data Without Aim Software - 100
5 pages
Clustering
No ratings yet
Clustering
20 pages
Data Mining Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Cluster Analysis: Basic Concepts and Algorithms
26 pages
Clustering
No ratings yet
Clustering
57 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
11 pages
Module 5
No ratings yet
Module 5
91 pages
Classification vs Clustering Guide
No ratings yet
Classification vs Clustering Guide
31 pages
Clustering New
No ratings yet
Clustering New
6 pages
Unit 5 DWM by DR KSR Cluster Analysis
No ratings yet
Unit 5 DWM by DR KSR Cluster Analysis
72 pages
Pi RS485&CAN Module User Manual - V1.3
No ratings yet
Pi RS485&CAN Module User Manual - V1.3
27 pages
CLA Guitars
No ratings yet
CLA Guitars
13 pages
Cbsyllabus Bda
No ratings yet
Cbsyllabus Bda
5 pages
Clustering
No ratings yet
Clustering
6 pages
Techciti: Managed Services
No ratings yet
Techciti: Managed Services
6 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
DWDM 5
No ratings yet
DWDM 5
12 pages
47 Projects To Do With A 555!
No ratings yet
47 Projects To Do With A 555!
14 pages
Clustering
No ratings yet
Clustering
3 pages
Cluster Analysis in Data Mining
No ratings yet
Cluster Analysis in Data Mining
36 pages
Venkata Rami Reddy Resume
No ratings yet
Venkata Rami Reddy Resume
1 page
Ect303 Digital Signal Processing, December 2022
No ratings yet
Ect303 Digital Signal Processing, December 2022
3 pages
Clustering in Data Mining Lecture
No ratings yet
Clustering in Data Mining Lecture
80 pages
Oracle Applications - Query To Get Employee and Supervisor Hierarchy Details in Oracle Apps HRMS R12
No ratings yet
Oracle Applications - Query To Get Employee and Supervisor Hierarchy Details in Oracle Apps HRMS R12
3 pages
Exam Guide - 406 - Kinetic Tools Management
No ratings yet
Exam Guide - 406 - Kinetic Tools Management
8 pages
Data Mining: Cluster Analysis Guide
No ratings yet
Data Mining: Cluster Analysis Guide
40 pages
SOFTWARE ENGINEERING March 2021
No ratings yet
SOFTWARE ENGINEERING March 2021
4 pages
Course Work Database Programming
No ratings yet
Course Work Database Programming
18 pages
4.3.8 Packet Tracer - Configure Layer 3 Switching and Inter-VLAN Routing - ILM
No ratings yet
4.3.8 Packet Tracer - Configure Layer 3 Switching and Inter-VLAN Routing - ILM
6 pages
BSNL Cellone Phase Iv FMCC
No ratings yet
BSNL Cellone Phase Iv FMCC
13 pages
Introduction To UX Design
No ratings yet
Introduction To UX Design
8 pages
01 Introduction Clustering
No ratings yet
01 Introduction Clustering
11 pages
Data Mining: Cluster Analysis Basics
No ratings yet
Data Mining: Cluster Analysis Basics
25 pages
Time Table - 1, B.Tech (Electronics and Communication Engineering, Esr /iot ), V Sem
No ratings yet
Time Table - 1, B.Tech (Electronics and Communication Engineering, Esr /iot ), V Sem
1 page
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
93 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Clustering Techniques and SWEM Algorithm
No ratings yet
Clustering Techniques and SWEM Algorithm
1 page
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
Survey of Clustering Data Mining Techniques: Pavel Berkhin
100% (1)
Survey of Clustering Data Mining Techniques: Pavel Berkhin
56 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages