K Cosine Means Clustering Algorithm

The document presents the K-Cosine-Means Clustering Algorithm, an extension of the K-means algorithm that improves clustering accuracy by using cosine similarity instead of Euclidean distance for data point assignment. Experimental results demonstrate that the K-Cosine-Means algorithm outperforms standard K-means and its variants on both homogeneous and heterogeneous datasets, particularly achieving a 97.33% accuracy on the Iris dataset. The paper discusses the methodology, experimental settings, and comparative analysis with other clustering algorithms.

Uploaded by

Josephine Kezia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

K Cosine Means Clustering Algorithm

Uploaded by

Josephine Kezia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

202l International Conference on Electronics, Connnunications and Infonnation Technology (ICECIT), 14-16 September 2021, Khulna, Bangladesh.

K-Cosine-Means Clustering Algorithm

Md. Kafi Khan l , Sakil Sarker2 , Syed Mahmud Ahmed 3 , Mozammel H A Khan 4
Department of Computer Science and Engineering
East West University
Aftabnagar, Dhaka-1212, Bangladesh
1 [email protected], 2 [email protected], 3 mahmudewu 17 @gmail.com, 4 [email protected]

o
00
<t
.-<
<t
w Abstract-K-means algorithm is a clustering algorithm that As a result, it is bound to have some shortcomings; the most
m
.--i is one of the most widely used unsupervised techniques in data pre-eminent ones are (i) the user has to make preconceived
mining. This paper presents an extension of K-means algorithms
N
o
N assumptions about the number of clusters, (ii) it does not
....: named K-cosine-means algorithm. While the K-means algorithm
"o<t initializes the centroids randomly and uses the Euclidean distance always yield optimum results, (iii) the quality of clustering
'"f- measure to assign data points to clusters, our proposed algorithm is predetermined by the initial guess of centroids, (iv) the
U
u.J
U
inherits a systematic approach from K-means++ to initialize the presence of outliers tend to drag the centroids towards a
.:::::.
m
centroids and utilizes Cosine similarity to assign data points to suboptimal position, and (v) it assumes all the features carries
o clusters. We have performed experiments on both homogeneous
.-<
the same weight [9]. Consequently, there have been numerous
datasets (Iris and Seeds datasets) and heterogeneous dataset
.-<
ci
.-<
(Hepatitis dataset). From experimental results, we have observed researches centered around improving some shortcomings of
is
o better clustering accuracy on homogeneous datasets compared the K-means algorithm.
to other variants of the K-means algorithm, namely, K-means,
u.J
u.J
u.J IK-means, K-means++, WK-means, MWK-means, iWK-means, In the works of [10]-[12], the authors focused solely on
and iMWK-means. However, for heterogeneous dataset, we have solving the fifth issue of K-means algorithm and in tum,
observed better clustering accuracy compared to standard K- proposed the Weighted K-means algorithm (WK-means). This
means, K-means++, and iK-means algorithms. algorithm assigns weights to each of the features based on
Index Terms-Centroids initialization, Cosine similarity, K- the significance of their impact while clustering. The work
Cosine-Means clustering, K-Means clustering, Unsupervised
learning of [13] follows a further extension of the WK-means algorithm
proposed by Huang et at. [10], where they have used cluster
I. INTRODUCTION independent weights instead of cluster-specific weights. They
have done extensive research on the subject matter and coined
Due to the astounding proliferation of data caused by
algorithms such as MWK-means [13] and CMWK-means [9].
modem technologies such as the internet, human limitations
make it improbable to manually analyze such vast amounts of The first drawback of K-means algorithm is tackled by the
information in any meaningful way. Consequently, data mining Intelligent K-means or IK-means algorithm introduced in [3].
i=' has gained major traction in the recent decades to detect The Algorithm acts upon normalized data and selects the data
u
patterns among such large collections of data [1]. Clustering instances farthest distance away from the center and uses them
u.J
!:!
>-
bJJ is an unsupervised data mining method that classifies the as initial seeds. Furthermore, the clusters are formed by taking
o
oc raw data reasonably and searches the underlying patterns that into account the distance between entities as well as their
..c
u
OJ
exist in datasets [2]. In other words, clustering is the division distances from the center.
of datasets into groups based on mutual similarity among
f-
c
.o;:; There is still scope of improving K-means and its variants
ro instances; in doing so, it simplifies data and increases com-
E to improve the clustering accuracy. In this paper, we present
prehensibility. Clustering has been effectively implemented to
E an improvement of the standard K-means algorithm named
E solve many diverse problems such as, anomaly detection [3],
""0 K-cosine-means algorithm. The difference between these two
C
ro data summarization [4], malware detection [5], etc. While
algorithms is that our algorithms use cosine similarity instead
'"c some information may be lost in the process [6], clustering still
..o;:; of Euclidean distance to assign data points to clusters. To
ro remains an important method for gaining knowledge through
u
·c evaluate the performance of our algorithm, we have applied
determining patterns within large amounts of data.
it on multiple datasets, namely, the Iris, Seeds and Hepatitis
:::J
E
E In data mining literature, clustering algorithms are usually
8 datasets. In doing so, we have witnessed an increase in
divided into two groups; namely, Hierarchical and Partitional.
",'
performance in terms of accuracy compared to K-means and
.~ Hierarchical algorithms generate dendrograms by continuously
gu
c
other similar algorithms.
merging or dividing clusters. Whereas, Partitional clustering
OJ
L:U requires prior knowledge about the number of clusters within The rest of the paper is organized as follows: In Section II,
c
o a dataset before assigning data instances to their corresponding we discuss our proposed methodology. We present our ex-
OJ
u
c clusters through iterations. Among the Partitional algorithms, perimental results with analysis in comparison to previously
~
2c K-means [7], [8] is the simplest clustering algorithms in terms published result in Section III. Finally, we conclude the paper
8 of the number of criteria it takes into account while clustering. in Section IV with direction to future works.
ro
c
..o;:;
ro
E
2
E
.-<
N

Authorized licensed use limited to: Qatar Foundation (Qatar National Library). Downloaded on May 21,2025 at 14:18:06 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Method for centroid initialization
6 .E
1: Choose one centroid randomly from the data instances.
2: Calculate the Euclidean distances between all data points 5 .D .F
that were not chosen as centroids from all of the chosen
centroids and select the minimum distance. 4 .c
3: Using a weighted probability distribution, choose the next
centroid from these data points based on a probability
..
'x'"
>-
3
proportional to minimum of the distances calculated in
step 2. .A
4: Repeat steps 2 and 3 until the required number of centroids
are initialized. .8
3 4 5 6 7
X axis

II. PROPOSED METHODOLOGY

Fig. 1: Example Dataset
Our proposed algorithm can be divided into two segments,
specifically, (A) Initializing centroids and (B) Clustering. TABLE I: Example Dataset
A. Initializing Centroids Data Points/Coordinates X Y
One of the major drawbacks of the K-means algorithm A 1 2
B 2 1
resides significantly in the selection of centroids by the C 3 4
system. We have found that the quality of initial centroids D 4 5
can deteriorate the accuracy of the clusters up to 47.33% E 6 6
F 7 5
for the Iris dataset. Theoretically, if we can select the initial
centroids algorithmically rather than randomly, the quality
TABLE II: Distance of all the datapoints from {B}
of clusters has the potential to improve. The K-means++
algorithm proposed by Arthur et at. [14] presents a novel Data PointslDistance B Minimum
approach of selecting the initial centroids. By maximizing the A 1.414 1.414
probability of initial centroids being from different clusters, the B 0 0
C 3.162 3.162
algorithm increases the probability of generating final clusters D 4.472 4.472
with comparatively better fitness. The process of selecting the E 6.403 6.403
initial centroids is shown in Algorithm 1. F 6.403 6.403
Let us consider Fig. 1 and corresponding Table I, as an
example of a two-dimensional dataset having six data points TABLE III: Distance of all the datapoints from {B, E}
and three clusters. So, we need to initialize three centroids for
Data PointslDistance B E Minimum
three clusters, respectively. First, we have randomly chosen A 1.414 6.403 1.414
data point B as a centroid. In Table II, we have calculated the B 0 6.403 0
distances of all the data points from B. In doing so, we get C 3.162 3.605 3.162
D 4.472 2.236 2.236
E and F with the highest distance from B. As a result, we
E 6.403 0.0 0
can randomly choose any of E and F as the second centroid. F 6.403 1.414 1.414
Here, we have chosen E as the second centroid. Finally, in
Table III, we have calculated the distances of all the data points
from the previously selected centroids {B, E} and reported the inner product of the two vectors normalized to have a
the minimum distance. According to the minimum distance, length of 1. Given two non-zero vectors and b such that a
C yields the highest distance. Thus, C is the third centroid.
Thus, after applying this procedure the initial centroids are
{B, C, E}, which clearly belong to the three separate clusters Iiall = J[ai + a§ + a§ + ... + a~] and
in Fig. 1. Before generating clusters, we have incorporated this
method for initializing the centroids.
B. Clustering Ilbll = J[bI + b§ + b§ + ... + b~].
Our K-cosine-means algorithm inherits a few characteristics
from the K-means algorithm. However, instead of using The cosine similarity between these two vectors is formulated
Euclidean distance, we have implemented a cosine-similarity- as
based measure. Cosine similarity measures the cosine of
the angle between two non-zero vectors to determine the
similarity among them. Therefore, it can also be defined as cos e= a·7]
------=c-

Authorized licensed use limited to: Qatar Foundation (Qatar National Library). Downloaded on May 21,2025 at 14:18:06 UTC from IEEE Xplore. Restrictions apply.
Algorithm 2 Proposed clustering algorithm TABLE IV: Confusion Matrix of Iris Dataset
I: Define the number of clusters denoted by K. Actual/predicted Setosa Versicolor Virginica
2: Using the method for choosing initial centroids as shown Setosa 50 0 0
in Algorithml, designate K number of data points as initial Versicolor 0 46 4
Virginica 0 0 50
centroids Ci, C2, "', CK.
3: For each data point x, calculate cosine similarity between
TABLE V: Comparative Accuracy on Iris Dataset
all centroids Ci, C2, "', CK.
4: From step 3, assign each data point x to a cluster with Algorithm Accuracy
centroid Ci such that x has the maximum similarity with Cosine K-means 97.33%
WK-means [9] 96.0%
Ci amongst all the centroids Ci, C2, "', CK.
MWK-means [9] 96.7%
5: Average the coordinates of all data points in each cluster iWK-means [13] 96.7%
and select the averaged coordinates as the new centroid. iMWK-means [13] 96.7%
6: Repeat steps 3 to 5 until all of the centroids can no longer K-means [9] 89.3%
iK-means [13] 88.7%
be updated.
TABLE VI: Confusion Matrix of Hepatitis Dataset

A higher value of cos e is desirable since it indicates closeness Actual vs predicted

of the two data points. False
True
Our proposed algorithm is shown in Algorithm 2.
However, we can theoretically predict that the algorithm
may perform poorly in terms of clustering datasets with hetero- different variations of K-means algorithms from [15] and [13].
geneous data. Heterogeneous datasets have different attribute To evaluate the performance of our clustering model, we have
types, e.g. integer, real, and categorical. When categorical data conducted our experiment on real datasets, namely, Iris, Seeds,
are arbitrarily assigned numerical values, it causes a loss of and Hepatitis datasets downloaded from the UCI Machine
information as the randomly assigned numerical values have Learning Repository [16].
no correlation with the original essence of the data. This
problem becomes more prominent when a dataset contains A. Iris Dataset
both real and categorical features. Even though categorical The Iris dataset [17] holds the measurements of petals
data poses a problem while representing in vector space, it and sepals of three different Iris species, namely, Setosa,
is not that pronounced when a dataset contains only categor- Versicolor, and Virginica. This dataset contains 3 classes (K
ical data as all the data were converted following the same = 3), 150 data instances, and 4 real-valued features. Upon
convention. However, when both of the data types are present applying our K-cosine-means algorithm, we have achieved
in the same dataset, as the converted values do not hold the an accuracy of 97.33%. The confusion matrix is displayed
essence of the categorical data, the cosine similarity becomes in Table IV, and comparison between variants of K-means
less meaningful. As a result, data points are not accurately is tabulated in Table V. Compared to these algorithms, our
assigned to their proper clusters. algorithm has produced the maximum accuracy when applied
to the Iris datasets.
C. Experimental Settings
Our proposed K-cosine-means algorithm is coded on a B. Hepatitis Dataset
Windows PC with Intel 8th gen Core i3 CPU that has 2 The Hepatitis dataset [18] contains 155 data instances,
cores each with base frequency of 2.20 GHz. Moreover, it 18 features, and 2 classes (K = 2). Out of the 18 total
also has 8GB RAM and 128GB NVMe SSD. We have coded features of the dataset, 12 features are categorical, that is,
the K-cosine-means algorithm using Python with NumPy and their values are binary encoded. The confusion matrix and per-
Pandas. The seed for the random number generator of NumPy formance comparison between other algorithms are presented
was set to 1. in Tables VI, and VII, respectively. While our algorithm did
perform better than the standard K-means, K-means++, and
III. RESULT ANALYSIS
iK-means algorithms, it was not able to outperform some of
The evaluation of the performance of different models is the other K-means variants. This is in line with our earlier
done with accuracy measures and confusion matrix in this prediction that, when applied to datasets that contain a mixture
paper. The equation of accuracy measure is as follows: of features such as categorical values and real values, our K-
Number of Correct Predictions 100 at cosine-means approach will display decreased performance.
Accuracy = . . X 10
Total Number of PredictlOns Made (1) C. Seeds Dataset
The confusion matrix demonstrates the performance of a Seeds dataset [19] consists of measurements of three differ-
model by tabulating the success and failure to classify all the ent wheat kernels, namely, Kama, Rosa, and Canadian. This
classes. For result analysis, we have collected the accuracies of dataset contains 210 data instances, 7 real-valued features, and

Authorized licensed use limited to: Qatar Foundation (Qatar National Library). Downloaded on May 21,2025 at 14:18:06 UTC from IEEE Xplore. Restrictions apply.
TABLE VII: Comparative Accuracy on Hepatitis Dataset standard K-means and the Intelligent K-means (IK-means)
Algorithm Accuracy
algorithms, it showed poorer performance relative to the other
iMWK-means [13] 84.52% algorithms, namely WK-means, MWK-means, iWK-means,
MWK-means [9] 80.0% iMWK-means. Therefore, one potential direction for future
WK-means [9] 80.0% works can be to increase the accuracy of our algorithm for
iWK-means [13] 78.71%
K-Cosine-means 77.50% heterogeneous datasets. Feature selection might improve the
iK-means [13] 72.26% accuracy of our algorithm in such cases by pruning certain
K-means [9] 72.26% categorical features. From our primary observation, pruning
two specific real attributes from the Hepatitis dataset led to
TABLE VIII: Confusion Matrix of Seeds Dataset improved clustering accuracy from 76.25% to 77.5%.
Actual/predicted Kama Rosa Canadian
Outlier detection and handling may be incorporated to
Kama 61 4 5 improve the clustering accuracy. Using an approach similar to
Rosa 2 67 1 K-medians to update cluster centers might help with arbitrarily
Canadian 8 0 62 shaped clusters.
TABLE IX: Comparative Accuracy on Seeds Dataset REFERENCES
[1] O. A. Abbas. "Comparisons between data clustering algorithms." Inter-
Algorithm Accuracy national Arab lournal of Information Technology (IAlIT), vol. 5, no. 3,
K-Cosine-means 90.47% 2008.
K-means [15] 89.20% [2] Z. Huang, "Extensions to the k-means algorithm for clustering large data
K-means++ [15] 89.0% sets with categorical values," Data mining and knowledge discovery,
iK-means [15] 87.10% vol. 2, no. 3, pp. 283-304, 1998.
[3] B. Mirkin, "Clustering for data mining-a data recovery approach. new
york," 2005.
3 classes (K = 3). We only found a limited number of literature [4] A. K. Jain, "Data clustering: 50 years beyond k-means," Pattern recog-
nition letters, vol. 31, no. 8, pp. 651--666, 2010.
that reported on their algorithm's accuracy when applied to [5] R. Cordeiro de Amorim and P. Komisarczuk, "On partitional clustering
the Seeds dataset. The Confusion matrix and performance of malware;' 2012.
[6] J. C. Zak and M. R. Willig, "Fungal biodiversity patterns;' Biodiversity
comparison are displayed in Tables VIII and IX, respectively. of fungi: Inventory and monitoring methods, pp. 59-75, 2004.
Our K-cosine-means algorithm has produced better accuracy [7] J. MacQueen et aI., "Some methods for classification and analysis of
than K-means, K-means++, and iK-means algorithms. multivariate observations;' in Proceedings of the fifth Berkeley sympo-
sium on mathematical statistics and probability, vol. 1, no. 14. Oakland,
The datasets that have a mixture of both real and categorical CA, USA, 1967, pp. 281-297.
features are called heterogeneous datasets that are susceptible [8] G. H. Ball and D. J. Hall, "A clustering technique for summarizing
to information loss when converted to numerical values. As multivariate data;' Behavioral science, vol. 12, no. 2, pp. 153-155, 1967.
[9] R. C. de Amorim, "Constrained clustering with minkowski weighted k-
we previously predicted, one of the major drawbacks of our means;' in 2012 IEEE 13th International Symposium on Computational
algorithm is the absence of methods that can handle the Intelligence and Informatics (CINTI). IEEE, 2012, pp. 13-17.
mixture of data types properly. In our experiment, the Hepatitis [10] J. Z. Huang, J. Xu, M. Ng, and Y. Ye, "Weighting method for feature
selection in k-means;' in Computational Methods of feature selection.
dataset was a heterogeneous dataset and Iris and Seeds were Chapman and Hall/CRC, 2007, pp. 209-226.
homogeneous datasets. As it is evident by our results, our [11] E. Y. Chan, W. K. Ching, M. K. Ng, and J. Z. Huang, "An optimization
algorithm only falls short when applied to Hepatitis dataset algorithm for clustering using weighted dissimilarity measures;' Pattern
recognition, vol. 37, no. 5, pp. 943-952, 2004.
due to the presence of heterogeneous data. On the other hand, [12] J. Z. Huang, M. K. Ng, H. Rong, and Z. Li, "Automated variable
our algorithm performs considerably well when applied to weighting in k-means type clustering;' IEEE transactions on pattern
homogenous datasets. analysis and machine intelligence, vol. 27, no. 5, pp. 657-668, 2005.
[13] R. C. De Amorim and B. Mirkin, "Minkowski metric, feature weight-
IV. CONCLUSION ing and anomalous cluster initializing in k-means clustering;' Pattern
Recognition, vol. 45, no. 3, pp. 1061-1075,2012.
From the empirical analysis gained from our experiments, [14] D. Arthur and S. Vassilvitskii, "K-means++: The advantages of careful
we can conclude that using the cosine similarity measure to seeding in: Proceedings of the eighteenth annual acm-siam symposium
on discrete algorithms. soda' 07, society for industrial and applied
assign data points to clusters does improve the accuracy for mathematics, 1027-1035, philadelphia, pa, usa;' 2007.
some datasets when compared to the standard K-means and [15] M. A. Masud, M. M. Rahman, S. Bhadra, and S. Saha, "Improved
other K-means-like algorithms. More specifically, our cosine- k-means algorithm using density estimation;' in 2019 International
Conference on Sustainable Technologies for Industry 4.0 (STI). IEEE,
similarity-based approach has consistently outperformed these 2019, pp. 1--6.
algorithms when applied to homogeneous datasets, namely [16] "UC Irvine Machine Learning Repository;'
the Iris and Seeds dataset. These datasets exclusively contain https:llarchive.ics.uci.edu/ml/index.php.
[17] R. A. Fisher, "The use of multiple measurements in taxonomic prob-
features with the same type of values. lems;' Annals of eugenics, vol. 7, no. 2, pp. 179-188, 1936.
Our K-cosine-means algorithm falls short when the dataset [18] P. Diaconis and B. Efron, "Computer-intensive methods in statistics;'
is heterogeneous, i.e. when there are different types of values Scientific American, vol. 248, no. 5, pp. 116-131, 1983.
[19] M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski,
for different features. This became evident when our algo- S. Lukasik, and S. Zak, "Complete gradient clustering algorithm for
rithm was applied on the heterogeneous Hepatitis dataset; features analysis of x-ray images;' in Information technologies in
while our cosine-similarity-based approach outperformed the biomedicine. Springer, 2010, pp. 15-24.

Authorized licensed use limited to: Qatar Foundation (Qatar National Library). Downloaded on May 21,2025 at 14:18:06 UTC from IEEE Xplore. Restrictions apply.

Chapter 4
No ratings yet
Chapter 4
30 pages
7 Clustering1
No ratings yet
7 Clustering1
72 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
10 Lecture AI 10
No ratings yet
10 Lecture AI 10
48 pages
Day 3
No ratings yet
Day 3
74 pages
Effectiveness of Structured Teaching Programme On Knowledge Regarding Acid Peptic Disease and Its Prevention Among The Industrial Workers
No ratings yet
Effectiveness of Structured Teaching Programme On Knowledge Regarding Acid Peptic Disease and Its Prevention Among The Industrial Workers
6 pages
On Not Teaching Greek
100% (1)
On Not Teaching Greek
7 pages
Crash Barrier BBS & QTY
100% (10)
Crash Barrier BBS & QTY
4 pages
Unit 4
No ratings yet
Unit 4
125 pages
Neural Network Clustering Guide
No ratings yet
Neural Network Clustering Guide
168 pages
AI ML Lecture 6
No ratings yet
AI ML Lecture 6
20 pages
Unit 4
No ratings yet
Unit 4
46 pages
K Clustering
No ratings yet
K Clustering
28 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Unit 5 Transformation Notes
No ratings yet
Unit 5 Transformation Notes
33 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
Spare Parts Book SK550 1.1
No ratings yet
Spare Parts Book SK550 1.1
26 pages
Clustering
No ratings yet
Clustering
18 pages
Clustering
No ratings yet
Clustering
125 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
K Means
No ratings yet
K Means
40 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Electronics 09 01295 v2
No ratings yet
Electronics 09 01295 v2
12 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Kmea
No ratings yet
Kmea
53 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
Pilot
No ratings yet
Pilot
3 pages
Na 2010
No ratings yet
Na 2010
5 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Cluster Analysis and K-Means Guide
No ratings yet
Cluster Analysis and K-Means Guide
20 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
Lesson 2 Political Ideologies
No ratings yet
Lesson 2 Political Ideologies
15 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
K-Means Clustering Algorithm and Its Improvement R
No ratings yet
K-Means Clustering Algorithm and Its Improvement R
6 pages
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
No ratings yet
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
4 pages
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
No ratings yet
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
4 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
K-Means Clustering in Data Mining
No ratings yet
K-Means Clustering in Data Mining
8 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
No ratings yet
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
9 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
Catamaran Inclining Report
No ratings yet
Catamaran Inclining Report
24 pages
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
No ratings yet
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
7 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Research On K-Value Selection Method of K-Means Clustering Algorithm
No ratings yet
Research On K-Value Selection Method of K-Means Clustering Algorithm
10 pages
Azimi 2017
No ratings yet
Azimi 2017
26 pages
Enhancing Clustering Performance: A Hybrid Generalized K-Means Approach
No ratings yet
Enhancing Clustering Performance: A Hybrid Generalized K-Means Approach
9 pages
Kratus 2017 Music Listening Is Creative
No ratings yet
Kratus 2017 Music Listening Is Creative
6 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Presentation SEM
No ratings yet
Presentation SEM
25 pages
Ensayo Sobre El Patriotismo
100% (1)
Ensayo Sobre El Patriotismo
6 pages
ERP Training Schedule
No ratings yet
ERP Training Schedule
21 pages
2875 27398 1 SP
No ratings yet
2875 27398 1 SP
4 pages
Dorothy Allison
No ratings yet
Dorothy Allison
2 pages
Graph Analysis for Scientists
No ratings yet
Graph Analysis for Scientists
5 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
Comprehensive Review of K-Means Clustering Algorithms
No ratings yet
Comprehensive Review of K-Means Clustering Algorithms
5 pages
Calculus & Algebra for Engineers
No ratings yet
Calculus & Algebra for Engineers
2 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
Generator Spare Parts Budget-2020
No ratings yet
Generator Spare Parts Budget-2020
106 pages
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
No ratings yet
Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648
6 pages
PDF No Bake Asweseeit - Compress
No ratings yet
PDF No Bake Asweseeit - Compress
132 pages
Experiment 2
No ratings yet
Experiment 2
11 pages
Recognize A Potential Market
No ratings yet
Recognize A Potential Market
50 pages
Rinkasan Materi Vane Shear Test
No ratings yet
Rinkasan Materi Vane Shear Test
7 pages
Unit 1 - What Kind of Movies Have You Been Watching Recently
No ratings yet
Unit 1 - What Kind of Movies Have You Been Watching Recently
12 pages
Bentley Openbuildings Designer Connect Edition-Architectural Bim Quickstart A102: Modeling Interior Floors
No ratings yet
Bentley Openbuildings Designer Connect Edition-Architectural Bim Quickstart A102: Modeling Interior Floors
28 pages
6EP1332-1SH31 - Industry Support Siemens
No ratings yet
6EP1332-1SH31 - Industry Support Siemens
3 pages
U3 w22 Revision 4b (Handout)
No ratings yet
U3 w22 Revision 4b (Handout)
12 pages
Instruction Manual: Sync-Check Relay BE1-25
No ratings yet
Instruction Manual: Sync-Check Relay BE1-25
53 pages
RDS to HDFS Data Ingestion via Sqoop
No ratings yet
RDS to HDFS Data Ingestion via Sqoop
5 pages
Tent Pole Project Report
No ratings yet
Tent Pole Project Report
6 pages
Mahatma Gandhi University Revised Scheme For B Tech Syllabus Revision 2010 (Civil Engineering)
No ratings yet
Mahatma Gandhi University Revised Scheme For B Tech Syllabus Revision 2010 (Civil Engineering)
4 pages
6 Structured Query Language (II) : Data Query and Update: ICT Focus
No ratings yet
6 Structured Query Language (II) : Data Query and Update: ICT Focus
5 pages
OB - Product Design - Eng
No ratings yet
OB - Product Design - Eng
29 pages
Journal of Materials Processing Tech.: Harikrishna Rana, Vishvesh Badheka
No ratings yet
Journal of Materials Processing Tech.: Harikrishna Rana, Vishvesh Badheka
13 pages

K Cosine Means Clustering Algorithm

Uploaded by

K Cosine Means Clustering Algorithm

Uploaded by

202l International Conference on Electronics, Connnunications and Infonnation Technology (ICECIT), 14-16 September 2021, Khulna, Bangladesh.

K-Cosine-Means Clustering Algorithm

?':l 978-1-6654-2363-2/21/$31.00 ©2021 IEEE

II. PROPOSED METHODOLOGY

A higher value of cos e is desirable since it indicates closeness Actual vs predicted

You might also like