Clustering Algorithm and Analyasis

This is related to information technology subject related topic and you can read it

Uploaded by

Sidra n

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views12 pages

Clustering Algorithm and Analyasis

This is related to information technology subject related topic and you can read it

Uploaded by

Sidra n

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Clustering Algorithm

and Its Applications in Data

Mining
Presented by
Sidra Siddiqa BIT-21-04 Tasneem Gull BIT-
21-10
Amna Noor BIT-21-04 Irsa Malik BIT-21-32
UmeTehreem BIT-21-82 Maryam Iqbal BIT-21-86
Mahnoor BIT-21-34
Clustering (Introduction)
 Clustering is a type of unsupervised machine learning
 Clustering analysis is one of the most important research fields in data mining.
Data analysis can find useful information and is widely used in fields such as
market research, data analysis, pattern recognition, image processing, and
artificial intelligence and Web document classification.
 It is distinguished from supervised learning by the fact that there is not a priori
output (i.e. no labels)
 The task is to learn the classification/grouping from the data
 A cluster is a collection of objects which are similar in some way
 Example: a group of people clustered based on their height and weight
 Normally, clusters are created using distance measures
 Two or more objects belong to the same cluster if they are “close” according
to
a given distance (in this case geometrical distance like Euclidean or
Manhattan)
 For Example: Bread, Eggs, Butter, Milk these items may include in a cluster
 Business Intelligence applications
 Biological Applications
 Some possible applications of clustering
 data reduction – reduce data that are homogeneous (similar)
 find “natural clusters” and describe their unknown properties
 find useful and suitable groupings
Clustering Analysis of the Basic Concepts
 Cluster Analysis is a method of studying individuals based on the
characteristics of things themselves, with the purpose of
classifying similar things. Its principle is that individuals in the
same category have greater similarity, and individuals in different
categories have the smallest similarity (that is, the difference is
greater)
 Assuming the data set contains n data objects
 there is a data matrix (Data Matrix)
 x11 ⋯ x1f ⋯ x1p
xi1 ⋯ xif ⋯ xip
xn1 ⋯ xnf ⋯ xnp

where xif represents the fth attribute value of the ith object in
the data set. The matrix represents the sum of the attribute
records for each object in the dataset.
 Calculate the average of the absolute deviation
sf = 1/n (|x1f − mf| + |x2f − mf| + ⋯ + |xnf − mf|)
Clustering Analysis of the Basic Concepts
 Among them:
mf = 1/n ( x1f + x2f + ⋯ + xnf )
 The normalized measure is:
zif = xif − mf/ sf
 The corresponding metric distance formula has the following
common forms: Euclidean distance formula:
d(x, y) = √ (x1 − y1)2 + (x2 − y2)2 + ⋯ + (xn − yn)2
 Manhattan distance formula:
d(x, y) = |x1 − y1 | | + | |x2 − y2 | | + ⋯ + | |xn − yn |
 Mingkosiji distance formula:
 d(x, y)=( | |x1 − y1 | |q + | |x2 − y2 | | q + ⋯ + | |xn − yn | | q ) 1 /q
 where q is a positive integer. When q=1, it represents the Manhattan
distance, When q=2, it represents the Euclidean distance.
 . Thus, the dissimilarity between two objects, all composed of
discrete variables, can be calculated by a simple matching method
as follows:
d(x, y) = p − m/ p
where m is the number of attributes that match the attribute values in
object x and y; p is the total number of attributes.
Cluster Analysis
Algorithm
 First, feature selection. Features must be chosen appropriately
to include as much of the task-related information as possible
 Second, the similarity measure used to quantitatively
measure how two feature vectors are “similar” or “dissimilar
 Third, the clustering algorithm. Having chosen the appropriate
similarity measure, this step involves selecting a particular
clustering algorithm to reveal the clustering structure in the
data set.
 Fourth, the result verification. Once the result is obtained
using the clustering algorithm, its validity needs to be verified.
 Fifth, the result is judged. In many cases, experts in the feld of
application must use other experimental data and analysis to
determine the clustering results, and fnally make the correct
conclusions
 Given the number of clusters k and the objective function F
Clustering Algorithm
Process
 Clustering algorithm process Feature
Selection Similarity Measure Clustering
Algorithm Result Verification Result
Determination
Feature
Selection
Result
determination
Similarity
Measure

Clustering Result
Algorithm verification
K means Algorithm
Implementation Process
 K-means algorithm is a kind of rapid clustering analysis method
which is widely used. It has higher execution efficiency and larger
sample data volume.
 However, the sample size of the research design is not large, and
the processing time is definitely not the primary consideration in
dealing with this type of problem. Therefore, K-means clustering
can be considered. It provides a cluster analysis function, which can
perform cluster analysis of samples or variables on a variety of data
types.
Given K, the K-means algorithm is implemented in four steps:
1. Choose K points at random as cluster centres (centroids)
2. Assign each instance to its closest cluster centre using certain
distance measure (usually Euclidean or Manhattan)
3. Calculate the centroid of each cluster, use it as the new cluster
centre (one measure of centroid is mean)
4. Go back to Step 2, stop when cluster centres do not change any
more
K- means algorithm an
example...
 Say, we have the data: {20, 3, 9, 10, 9, 3, 1, 8, 5, 3,
24, 2, 14, 7, 8, 23, 6, 12, 18} and we are asked to
use K-means to cluster these data into 3 groups
 Assume we use Manhattan distance

 Step one: Choose K points at random to be cluster

centres
 Say 6, 12, 18 are chosen
K- means algorithm an
example...
Step two: Assign each
instance to its closest cluster
centre using Manhattan
distance
For instance:
20 is assigned to cluster 3
3 is assigned to cluster 1
K- means algorithm
example...
Step two continued: 9 can be assigned
to cluster 1, 2 but let us say that it is
arbitrarily assigned to cluster 2
Repeat for all the rest of the instances
K -means algorithm an
example...
Step three: Calculate the centroid (i.e. mean)
of each cluster, use it as the new cluster
centre

End of iteration 1
Step four: Iterate (repeat steps 2 and 3) until
the cluster centres do not change any more
Conclusion
 With the development of society and
science and technology, the big data of
society has been paid more and more
attention by people and the information
that people can use is also increasing.
However, users’ ability to process and
understand these data information remains
the same. How to accurately fnd the parts
of their interest from these huge data
information and how to classify these
information involves a new direction, that
is, data mining research. The text proposes
a method of research and analysis using

Segment 7 (Ch10)
No ratings yet
Segment 7 (Ch10)
60 pages
Unit VI Clustering
No ratings yet
Unit VI Clustering
72 pages
Session 3-Clustering
No ratings yet
Session 3-Clustering
41 pages
SEEM2460 Unsupervised Learning Clustering
No ratings yet
SEEM2460 Unsupervised Learning Clustering
76 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
K Medoids
No ratings yet
K Medoids
101 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Unit 4
No ratings yet
Unit 4
43 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Clustering
No ratings yet
Clustering
80 pages
Cluster Analysis Techniques Guide
No ratings yet
Cluster Analysis Techniques Guide
152 pages
Chapter 7. Cluster Analysis
No ratings yet
Chapter 7. Cluster Analysis
48 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Cluster Analysis Methods Guide
No ratings yet
Cluster Analysis Methods Guide
51 pages
PART2
No ratings yet
PART2
61 pages
Lab Manual 6
No ratings yet
Lab Manual 6
10 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Clustering
No ratings yet
Clustering
125 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Clustering for Data Analysts
No ratings yet
Clustering for Data Analysts
69 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
19 - Clustering in Operation Research
No ratings yet
19 - Clustering in Operation Research
11 pages
Unit 2
No ratings yet
Unit 2
89 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Clustering Techniques for CS Students
100% (1)
Clustering Techniques for CS Students
26 pages
Clustering Part-1
No ratings yet
Clustering Part-1
48 pages
Cluster Analysis
No ratings yet
Cluster Analysis
60 pages
تنقيب بيانات 7 بعد التعديل Maj
No ratings yet
تنقيب بيانات 7 بعد التعديل Maj
35 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
K Means
No ratings yet
K Means
3 pages
Human Friendly Robotics 2020 13th International Workshop 1st Edition Matteo Saveriano Instant Download
100% (1)
Human Friendly Robotics 2020 13th International Workshop 1st Edition Matteo Saveriano Instant Download
56 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Clustring Data Mining
No ratings yet
Clustring Data Mining
21 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Aiml 5th Module Part2
No ratings yet
Aiml 5th Module Part2
28 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
12 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
29 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
No ratings yet
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
9 pages
Cluster Analysis Essentials
No ratings yet
Cluster Analysis Essentials
24 pages
Lect 4
No ratings yet
Lect 4
34 pages
Ijcttjournal V1i1p12
No ratings yet
Ijcttjournal V1i1p12
3 pages
Cluster Analysis Techniques Guide
No ratings yet
Cluster Analysis Techniques Guide
97 pages
Course Plan 21CSC307P - Machine Learning For Data Analytics
No ratings yet
Course Plan 21CSC307P - Machine Learning For Data Analytics
13 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
41 pages
(IJCST-V3I1P7) Author: Kanika, Gargi Narula
No ratings yet
(IJCST-V3I1P7) Author: Kanika, Gargi Narula
3 pages
Customer Segmentation Using K
No ratings yet
Customer Segmentation Using K
16 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Article Segmentation Clients
No ratings yet
Article Segmentation Clients
6 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
No ratings yet
CS 229, Public Course Problem Set #3: Learning Theory and Unsuper-Vised Learning
4 pages
BCS602 ML Extra Important Questions
No ratings yet
BCS602 ML Extra Important Questions
2 pages
Which Contractor Selection Methodology?
No ratings yet
Which Contractor Selection Methodology?
12 pages
Icesc48915.2020.9155615
No ratings yet
Icesc48915.2020.9155615
6 pages
Package KML': October 23, 2024
No ratings yet
Package KML': October 23, 2024
35 pages
Comparative Study of K-Means and Hierarchical Clustering Techniques
No ratings yet
Comparative Study of K-Means and Hierarchical Clustering Techniques
7 pages
Data Science Foundations Syllabus
No ratings yet
Data Science Foundations Syllabus
5 pages
Optimizing Customer Segmentationinthe Banking Sector
No ratings yet
Optimizing Customer Segmentationinthe Banking Sector
8 pages
K Means Clustering
100% (1)
K Means Clustering
14 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
Applied Computational Intelligence and Soft Computing - 2024 - Geleta - Semisupervised Learning Based Word Sense
No ratings yet
Applied Computational Intelligence and Soft Computing - 2024 - Geleta - Semisupervised Learning Based Word Sense
11 pages
Data Science Guide for Beginners
No ratings yet
Data Science Guide for Beginners
138 pages
A Markovian-Genetic Algorithm Model For Predicting Pavement Deterioration
No ratings yet
A Markovian-Genetic Algorithm Model For Predicting Pavement Deterioration
9 pages
Wireless Network Anomaly Detection
No ratings yet
Wireless Network Anomaly Detection
6 pages
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
No ratings yet
Quiz - Data Science and Big Data Analytics (1) (Autosaved)
43 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
20 pages
Image Color Transfer To Evoke Different Emotions Based On Color Combinations
No ratings yet
Image Color Transfer To Evoke Different Emotions Based On Color Combinations
9 pages
Hybrid Movie Recommender System
No ratings yet
Hybrid Movie Recommender System
9 pages
AI for Partial Discharge Signal Analysis
No ratings yet
AI for Partial Discharge Signal Analysis
5 pages
A Review On The Current Segmentation Algorithms For Medical Images
No ratings yet
A Review On The Current Segmentation Algorithms For Medical Images
6 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
k-Means Clustering with Hadoop
No ratings yet
k-Means Clustering with Hadoop
15 pages
Clustering Algorithms Guide
No ratings yet
Clustering Algorithms Guide
85 pages
Digital Image Processing Lecture
No ratings yet
Digital Image Processing Lecture
63 pages

Clustering Algorithm and Analyasis

Uploaded by

Clustering Algorithm and Analyasis

Uploaded by

Clustering Algorithm

and Its Applications in Data

 Step one: Choose K points at random to be cluster

You might also like