0% found this document useful (0 votes)

15 views27 pages

A Parameter-Free Nearest Neighbor Algorithm With R

This research article introduces a new parameter-free variant of the K-nearest neighbor (KNN) algorithm that reduces prediction time and improves performance by utilizing a binary search tree and injecting randomness through ensemble methods. The proposed KNN variant demonstrates superior prediction accuracy and reduced training time compared to traditional KNN and other machine learning algorithms. Experiments conducted on 26 datasets validate the effectiveness of the proposed approach in overcoming the limitations of existing KNN variants.

Uploaded by

amit Ji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views27 pages

A Parameter-Free Nearest Neighbor Algorithm With R

Uploaded by

amit Ji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Neural Computing and Applications

https://doi.org/10.1007/s00521-024-10565-9 (0123456789().,-volV)(0123456789().
,- volV)

S.I.: TIMELY ADVANCES OF DEEP LEARNING WITH APPLICATIONS AND DATA

DRIVEN MODELING

A parameter-free nearest neighbor algorithm with reduced prediction

time and improved performance through injected randomness
Manpreet Singh1 • Jitender Kumar Chhabra1

Received: 23 February 2024 / Accepted: 1 October 2024

Ó The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024

Abstract
K-nearest neighbor is considered in top machine learning algorithms because of its effectiveness in pattern classification
and simple implementation. However, usage of KNN is limited due to its larger prediction time than model-based machine
learning algorithms, its sensitivity to the existing outliers in the training dataset, and tuning parameter neighborhood size
(k). Therefore, this research article proposes a new variant of the KNN to reduce the training and prediction time with
improved performance. The prediction time of the KNN is reduced by making a binary search tree (BST) using the divide-
and-conquer strategy, and prediction performance is improved using ensembling by injecting randomness such as bootstrap
aggregation, random subspace, and random node splitting. The proposed KNN variant is parameter-free and, hence, not
sensitive to the hyperparameter neighborhood size. Finally, three experiments have been performed based on 26 selected
datasets to show the prediction time and prediction power superiority of the proposed KNN over random forest and six
selected KNN variants. Results prove that the proposed KNN variant gives better prediction results with reduced prediction
and training time.

Keywords K-means clustering Nearest neighbors Injected randomness Binary search tree

1 Introduction samples with minimum Euclidean distance where k is the

neighborhood size (k is the only tuning parameter of
K-nearest neighbor (KNN) is a simple yet effective KNN). Finally, the class label is assigned to the query
machine learning algorithm and is hence considered one of sample based on the majority voting rule of the identified
the top 10 machine learning algorithms [1, 2]. Unlike nearest neighbors’ class labels. It has many applications in
model-based machine learning algorithms, it consists of no pattern classification, such as defect prediction, video
training phase, and all the computations are delayed to the semantics detection, image classification, and remote
prediction phase [3]. Therefore, it is also called a lazy sensing, because of its effectiveness and simple imple-
learner. It calculates the similarity of the training samples mentation [7–11].
from the query sample based on Euclidean distance [4] Researchers are attracted to KNN due to its simplicity
(Minkowski distance [5], Hamming distance for categori- and effectiveness in pattern classification, and much
cal attributes [6], and Manhattan distance [6] as some other research has been done on this classifier [12]. Hence, many
alternates) and identifies the top k most similar training improved variants of the original KNN are available [13].
Improved versions of KNN are divided into two major
& Manpreet Singh
categories based on the working. The first category utilizes
[email protected] the local mean vectors of the selected nearest neighbors,
Jitender Kumar Chhabra
and the second category utilizes the nearest centroid
[email protected] neighbors to improve the performance [14]. These methods
are developed to improve the performance of the nearest
1
Computer Engineering Department, National Institute of neighbors-based classifiers by reducing the sensitivity to
Technology, Kurukshetra, India

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

the existing outliers in the training dataset, especially in helps to improve the prediction power of the proposed
small-scale datasets. classifier.
A significant improvement has been made by the This research article is organized into four sections after
researchers in the prediction performance of neighborhood- the introduction. Section 2 summarizes the previous
based classification. However, neighborhood-based classi- research on KNN and classifiers used to compare the
fiers have drawbacks limiting their use in pattern classifi- proposed variant. Section 3 explains the drawbacks of
cation. The first drawback of the neighborhood-based KNN and its motivation to develop the proposed KNN
classifiers is tuning parameter k (neighborhood size). All variant. Section 4 gives the in-depth detail of the proposed
variants of KNN are sensitive to neighborhood size. So, we algorithm. Section 5 compares the proposed KNN variant
need to tune it properly to make neighborhood-based with the latest KNN variants based on prediction accuracy,
learners effective. Generally, k-fold cross-validation is parameter tuning time, and prediction time and discusses
used to tune the hyperparameters of the classifiers, which is the results, followed by a conclusion.
a time-consuming task. The second drawback is its pre-
diction time, which is Oðnd þ nkÞ based on brute-force
neighborhood search, far more than that of model-based 2 Related works
machine learning algorithms. Algorithms such as KD-tree
and ball tree have been developed to reduce the prediction The KNN algorithm has a long history, dating back to 1967
time of the neighborhood-based classifiers [15, 16]. How- when it was first discovered by Cover and Hart [17]. In this
ever, these algorithms do not simulate the working of the section, we briefly review the evolution of this algorithm.
brute-force method-based neighborhood search and hence Original KNN predicts the class label of the query
fail to find the proper set of nearest neighbors, which leads instance based on majority voting of the selected nearest
to a loss of generality of the original working of the KNN neighbors using Euclidean distance [17]. This algorithm
and makes it inconsistent. Therefore, developing a has some drawbacks, such as being sensitive to outliers in
parameter-free variant of the KNN with reduced prediction the training dataset and the size of the neighborhood (hy-
time without degrading the performance is necessary. perparameter of the algorithm k, number of nearest
This research article proposes a new variant of the tra- neighbors). Its prediction time is also higher than model-
ditional KNN algorithm to improve prediction accuracy based machine learning algorithms. Many efforts have
with reduced prediction and training (hyperparameter tun- been made to improve the performance of the original
ing) time. We add a training phase to generate a tree-like KNN.
structure by recursively dividing the d-dimensional search The concept of weighted voting was introduced to
space into two parts using the k-means clustering algorithm improve the effectiveness of the original KNN, and many
until each tree node contains the same type of instances (all weighted voting-based KNN algorithms are available
node instances have the same class labels). This tree [18–20]. Out of all weighted KNN algorithms, distance-
structure provides two benefits; first, it reduces the pre- weighted k-nearest neighbor (WKNN) is the most effective
diction time of the KNN from Oðnd þ nkÞ to Oðd:log2 nÞ algorithm proposed by Dudani in 1976 [21] and then
because half of the instances are eliminated with one revised by Bicego and Loog [22]. Distance metric learning
comparison in prediction phase while searching in tree-like is also proposed to improve the effectiveness of the tradi-
structure; second, it makes KNN parameter-free because all tional KNN [23, 24] that learns the most suitable distance
the instances of each leaf node are similar which are con- metric to make the KNN robust. However, KNN is still
sidered as nearest neighbors of the query sample and hence sensitive to existing outliers in the training dataset and the
no need to explicitly decide the size of neighborhood based size of the neighborhood. Many variants of the original
on k-fold cross-validation. However, it slightly degrades KNN have been developed recently to remove the effect of
the prediction power of the KNN. Therefore, we propose the existing outliers and make the algorithm less sensitive
an ensemble method based on three types of injected ran- to the value of hyperparameter k (neighborhood size),
domness to improve the prediction power of the proposed which are divided into two categories explained below.
algorithm. First is bootstrap aggregation, second is random The first category is based on the concept of local mean
subspace of features, and third is random node splitting vectors and LMKNN (local mean k-nearest neighbor),
(based on a random subset of features) where the proposed PNN (pseudo-nearest neighbors), LMPNN (local mean
BST structure is used as a component learner of the pro- pseudo-nearest neighbor), MLMKHNN (multi-local mean
posed ensemble method. As our proposed tree structure k-harmonic distance nearest neighbor), GMDKNN (gen-
(component learner) contains both stable and unstable leaf eralized mean distance-based k-nearest neighbors), and
nodes, ensembling works fine on unstable nodes and hence LMRKNN (local mean representation-based k-nearest
neighbors) are some latest improved variants of this

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

category. Mitani and Hamamoto [25] proposed LMKNN to Beyond KNN, many other latest machine learning
reduce the impact of existing outliers in the training data- algorithms have been proposed by the researchers to
set. Instead of selecting k-nearest neighbors from the whole improve the prediction performance. Radhika et al. [33]
dataset, it selects k-nearest neighbors per class from the investigated the stochastic Cohen–Grossberg bidirectional
training datasets and then finds the local mean vector associative memory-based neural network based on input-
(centroid) of each class based on selected nearest neigh- to-state stability theory. Markovian jump parameters are
bors. Finally, the class label is assigned to the query sample also considered in the investigation of the model to deter-
based on the nearest local mean vector. Zeng et al. [26] mine the continuous time. Finally, a numerical example is
proposed another variant called PNN, which finds a given to show the superiority of the proposed mode. Cao
pseudo-nearest neighbor using the weighted distance of et al. [34] developed a new genetic network with temporal
selected nearest neighbors per class, and the class label to delays for genetic regularization necessary for slow bio-
the query point is assigned based on the nearest pseudo- chemical processes such as gene transcription and trans-
neighbor. Gou et al. [27] proposed a hybrid method of PNN lation. The efficiency of the proposed model is
and LMKNN, called LMPNN, which calculated multi-local demonstrated based on numerical examples. Aslam et al.
mean vectors based on pseudo-nearest neighbor and then [35] examined a T-S fuzzy-model-based networked control
assigned the class label to the query sample based on the system (NCS) to address the output tracking control
nearest local mean vector. Pan et al. [28] proposed an problem. A new strategy based on event-triggering has
improved variant of LMPNN called MLMKHNN; it also been proposed to reduce the bandwidth utilization in NCS.
utilizes multi-local mean vectors like LMPNN, but it uses Finally, three results have been obtained based on the
k-harmonic mean distance to find the nearest mean vector Lyapunov–Krasovskii function to compare the proposed
instead of Euclidean distance. K-harmonic mean distance model with other models. Duraipandian [36] applied the
gives more weightage to local mean vectors with smaller self-organizing network in LTE to minimize call termina-
distances, improving prediction accuracy. Gou et al. [29] tion and improve the quality of voice calls. Paul et al. [37]
proposed a new distance metric called generalized mean did a comparative analysis of stacking ensemble methods
distance to improve the results of MLMKHNN by giving, with benchmark machine learning classifiers and proved
even more, weightage to local mean vectors with smaller that stacking is better than benchmark machine learning
distances. Arithmetic mean distance and harmonic mean classifiers for disease detection which gives 97.5% average
distance are special cases of generalized mean distance. accuracy.
Gou et al. [30] proposed LMRKNN, which uses ridge Out of all the existing variants of KNN, we selected six
regression (linear combination of the points) to find the studies to compare the performance of our proposed vari-
nearest neighbors per class instead of Euclidean distance, ant, summarized in Table 1.
and the final class label is assigned to query instance based
on novel weighted voting proposed in their article.
The second category is based on nearest centroid-based 3 Motivation
neighbors and KNCN (K-nearest centroid-based neigh-
bors), LMKNCN (local mean vector of k-nearest centroid As we already discussed in the introduction of this research
neighbors), and RCKNCN (representation coefficient- article, the prediction cost of the original KNN is higher
based k-nearest centroid neighbors) are some variants than that of model-based machine learning algorithms
available in the literature based on this concept. Sánchez because it does not have any training phase and does all the
et al. [31] proposed the first variant of KNN belonging to computation in the prediction phase. The prediction cost of
this category, called KNCN, which finds neighbors whose the original KNN is Oðnd þ nkÞ, where n is the number of
centroid is nearest to the query sample instead of nearest training instances, d is the number of dimensions of the
neighbors. This strategy considers the neighbors’ special dataset (number of features), and k is neighborhood size
locality (how well the nearest neighbors surround the query (hyperparameter of the algorithm). The prediction cost of
sample) to increase the prediction power by mitigating the improved variants of the original KNN is even higher,
effect of existing outliers. Gou et al. [32] proposed an which makes KNN impractical for large-scale datasets.
improved version of KNCN, called LMKNCN, which finds Algorithms are available in the literature, such as KD-tree
the nearest centroid neighbors per class and then assigns and Ball-tree, to make the prediction of the KNN faster,
the class label to the query instance whose centroid is Oðd:log2 nÞ in the average case and OðndÞ in the worst case
nearest. Gou et al. [14] proposed the latest variant of this when the tree is unbalanced, by finding the nearest
category called RCKNCN. This algorithm considers both neighbors in less time. However, the KD-tree and Ball-tree
special locality and representation coefficients for pre- make it inefficient because they cannot simulate the
dicting the class label of the query instance. working of the brute-force method of finding nearest

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 1 Summary of existing studies used for comparison

Method Authors Methodology and strength Weaknesses
name and year

KNN Cover The k-nearest neighbors (KNN) algorithm finds similar It is sensitive to neighborhood size and existing outliers in
and instances from the training dataset based on the the training dataset. The prediction cost of KNN is far
Hart Euclidean distance and assigns the label to the query more than that of model-based machine learning
[17] sample based on the majority voting of the class labels of algorithms. It also has one hyperparameter k
selected nearest neighbors. It is a simple machine (neighborhood size), which must be tuned using k-fold
learning algorithm and is still effective in pattern cross-validation
classification
WKNN Dudani The weighted k-nearest neighbors (WKNN) algorithm is It is also sensitive to neighborhood size and existing
et al. an improved variant of the traditional KNN, which outliers in the training dataset. The prediction cost of
[21] assigns the weights to each selected nearest neighbor KNN is far more than that of model-based machine
based on the Euclidean distance from the query sample. learning algorithms. It also has one hyperparameter k
Finally, the class label is assigned to the query sample (neighborhood size), which must be tuned using k-fold
based on weighted voting instead of the majority voting cross-validation
rule
KNCN Sánchez The k-nearest centroid neighbors (KNCN) algorithm finds It has a high prediction cost, which is OðndÞ. It also has
et al. the nearest neighbors whose centroid is nearest to the one tuning parameter (neighborhood size), which must
[31] query sample. It also looks for special locality instead of be tuned using k-fold cross-validation. Prediction power
just nearest neighbors, improving the KNN classifier’s is better than KNN and WKNN but still can be improved
prediction power. The final class label is assigned based
on the majority voting rule of the nearest neighbors
LMKNCN Gou The local mean k-nearest centroid neighbors (LMKNCN) Robust to outliers but still sensitive to neighborhood size.
et al. algorithm finds k-nearest centroid neighbors of each It also has a high prediction cost: OðndÞ. It has better
[32] class available in the training dataset from the query prediction power than KNCN, but there is still room for
sample instead of finding k-nearest centroid neighbors improvement
from whole dataset. Finally, the class label is assigned to
the query sample whose neighbors’ centroid is nearest to
the query sample. It further improves the prediction
power of the learner by reducing the effect of the
existing outliers
LMRKNN Gou The local mean representation-based k-nearest neighbor It is robust to outliers existing in training data and less
et al. (LMRKNN) algorithm finds the k-nearest neighbors sensitive to neighborhood size. However, it has a high
[30] from each class based on Euclidean distance and then prediction cost, Oðnd þ nk þ k3 dÞ, and has two tuning
finds the local mean vectors of selected nearest parameters, neighborhood size and ridge regression
neighbors of each class. Finally, it utilizes the concept of coefficient, which is even more time-consuming
ridge regression to assign the weights to all local mean
vectors and assign the class label to the query sample
based on weighted voting by considering the sum of
weights of the local mean vectors of all classes available
in the training dataset. It reduces the sensitivity of the
neighborhood size and existing outliers in the dataset
RCKNCK Gou The representation coefficient k-nearest centroid neighbor It is also robust to outliers in training data but still has a
et al. (RCKNCN) is the latest version of KNN, which high prediction cost, Oðnd þ k3 dÞ, based on the closed
[14] considers both special locality and representation form of ridge regression. Like LMRKNN, it also has two
coefficients of the nearest neighbors to fully discover the hyperparameters, neighborhood size and regression
pattern discrimination from the nearest centroid coefficient, which must be tuned based on k-fold cross-
neighbors. First, it finds the nearest centroid neighbors validation. There is still room for improvement in
from each class based on Euclidean distance and then prediction power
assigns weights to all nearest neighbors of each class
based on ridge regression. Finally, the class label is
assigned to the query sample based on weighted voting
of the selected nearest neighbors. It is an improved
version of KNN, which combines the benefits of both
LMKNCN and LMRKNN

neighbors (unable to find correct neighbors) and hence It is also important to find the efficient neighborhood
degrade the prediction power (accuracy or AUC). size (value of hyperparameter k), generally done based on
grid search using k-fold cross-validation and the time

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

needed to decide the value of neighborhood size is which needs to be tuned properly to make it effective. To
Oðnd þ nkðNN Þ Þ kð foldÞ s, where Oðnd þ nkNN Þ is the mitigate all these issues, a training phase in KNN is added
time complexity to find the k nearest neighbors, kð foldÞ is in this research article in which a multi-dimensional binary
the total number of folds, and s is the size of the array that search tree (BST) is generated based on the divide-and-
conquer strategy, as shown in Fig. 1, to reduce the pre-
consists of different values of kðNN Þ , which is a time-con-
diction cost and make it hyperparameter-free. As the BST
suming task.
is generated based on the divide-and-conquer strategy, the
Therefore, reducing the prediction time and hyperpa-
divide-and-conquer strategy recursively divides the train-
rameter tuning time of the KNN without degrading the
ing data into two approximately equal parts for the gen-
prediction power is important to make it competitive with
eration of BST until the termination condition is met. This
other machine learning algorithms. Hence, a new KNN
research article utilizes a k-means clustering algorithm to
variant is proposed in this research article to reduce the
divide the training data into approximately two equal parts
training time and prediction time while increasing the
recursively until all the points of the cluster consist of the
prediction accuracy. The main contributions of the research
same labels. The k-means clustering algorithm can divide
article are as follows:
the data into two clusters in linear time OðndÞ if the
1. This research article proposes a new multi-dimensional number of clusters and number of iterations are fixed. In
binary search tree (BST) based on a divide-and- our case, the number of clusters is 2 and the number of
conquer strategy to make the nearest neighbor search iterations is 100.
faster. The proposed KNN variant reduces the nearest Figure 1 is divided into three parts where Fig. 1a shows
neighbor search cost to Oðd:log2 nÞ from Oðnd þ nkÞ, the overall structure of the generated BST where RðnÞ
leading to a reduction in prediction time. shows the root node of the tree consists of n training
2. The proposed algorithm utilizes a dynamic number of ðmÞ
instances (n ¼ 15 in Fig. 1), I i represents ith internal
nearest neighbors for predicting the class label of the
node of the tree consists of m training instances (m ¼ n2
query samples, which is desirable for better perfor- ð lÞ
mance of nearest neighbor-based classifiers. approximately), and Lj represents jth leaf node of the tree
3. The proposed algorithm eliminates the time taken by with l training instances. Figure 1b represents the node
the standard KNN to decide the optimal neighborhood structure of the internal nodes, which consists of pointers to
size because the proposed model is free to hyperpa- !
left and right children, and a vector C stores the centroid of
rameter neighborhood size. the cluster of points. Figure 1c represents the node structure
4. The proposed KNN variant is robust to noise, gives of the leaf node where pointers to left and right children are
better prediction accuracy, and prevents overfitting by !
null (/), a vector C to store the node’s centroid, and an
utilizing bootstrap aggregation, random subspace, and extra field to store the node’s label.
random node splitting. This article proposes a recursive algorithm to generate
The complete details of the working of our proposed the BST shown in Fig. 1a, where the K-means clustering
variant are described in the methodology. algorithm is used as the node splitting method (K-mode for
categorical attributes and K-prototypes for mixed attributes
can be used). Suppose we have a dataset X 2 Rnd with
4 Methodology labels y 2 fc1 ; c2 ; . . .; cm gn where ci represents the ith class
label, and there are total m classes. The main steps to
This section of the article presents the overall structure of generate multi-dimensional BST based on a divide-and-
the proposed KNN variant and steps of the algorithm to conquer strategy are as follows:
generate it.
1. Initially, the algorithm considers the whole dataset as a
single cluster of instances and calculates the centroid of
4.1 Binary search tree structure
the whole dataset to store it in the root node of the
BST.
This section explains the proposed binary search tree
2. In the second step, the proposed algorithm divides the
(BST) structure and the process to generate it based on a
root node into two approximately equal parts based on
divide-and-conquer strategy to reduce the prediction and
K-means clustering.
parameter tuning time. Original KNN has no training
3. In the third step, the algorithm finds the centroids of
phase, and all the computations are done in the prediction
both sub-clusters and stores them in child nodes of the
phase, which is time-consuming and makes it infeasible for
BST.
real-time usage. It is also sensitive to hyperparameter k,

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

4. In the fourth step, the algorithm checks whether all the Algorithm 1 presents the procedure to generate the BST.
points of the generated sub-clusters consist of the same Function generate TreeðÞ consists of three arguments: X is
class label and if any sub-cluster consists of the same the training dataset containing n instances of d dimensions,
class labels for all points belonging to that cluster then y contains the labels of training instances, and R is the root
the algorithm marks its leaf node, and stores the class node of BST, which is the initial null ;.
label and centroid on the leaf node. All these points The first step calculates the centroid of the cluster of
will be considered nearest neighbors of the query point points (instances) shown in line 1 of Algorithm 1, where
if the centroid of this leaf node is nearest to the query !
jXj is the cardinality of the set of points and C is the
point in the prediction phase. cluster center. The second step allocates the memory to the
5. In the fifth step, if any of the generated sub-clusters root node using the new nodeðÞ function shown in line 2 of
contains instances belonging to different classes the the algorithm.
algorithm calls itself recursively for the further division Lines 3 to 8 of Algorithm 1 are used to terminate the
of that node by considering its next root. Steps 2 to 5 recursion and generate leaf nodes of the BST. It checks
are repeated until all the instances of the clusters whether all cluster instances have the same labels in line 3.
belong to the same class. If all the cluster instances have the same labels, it makes
6. Finally, the algorithm returns the generated BST. The both left and right pointers of the node null ;, saves the
internal nodes, including the root of the BST, contain centroid and class label in the node, and simply returns it.
only centroids of the intermediate partitions which help Lines 9 to 13 represent the fourth step of the algorithm,
to find the nearest leaf node in logarithmic time in the where it saves the cluster’s centroid in the node and then
prediction phase and leaf nodes of the tree consist of divides the cluster of points into two sub-clusters using the
centroid as well as class labels because all the instances k-means clustering algorithm. C1 contains the indices of
of the leaf nodes consist of same class labels. the instances belonging to cluster-1, and C2 contains the
A cluster of instances is called atomic if all the cluster indices of instances belonging to cluster-2. Function
instances have the same labels. Algorithm 1 shows the BST generate TreeðÞ is called for both clusters, and links are
generation process. assigned to the left and right pointers of the root node. The
Algorithm 1: Recursive procedure to generate BST last step returns the root node T, shown in line 14.
The proposed BST reduces the prediction time of the
original KNN by making nearest neighbors search from

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

linear to logarithmic and makes it parameter-free because 5. Bootstrap aggregation also makes the model robust to
all the instances of the leaf node are considered nearest outliers in the training data.
neighbors of the query instance. However, it slightly
The next section explains the injected randomness, the
degrades the prediction power (accuracy and AUC) of the
process of generating the ensemble method by injecting
original KNN. Therefore, we propose an ensemble method
randomness, and why injected randomness works on the
by injecting randomness to improve the prediction power
proposed BST.
where the proposed BST is used as a component learner of
the proposed ensemble method.
4.2 Randomness injection
The standard KNN is a stable learner and is not affected
by the perturbation in the dataset such as bootstrap
We propose an ensemble method by injecting three types
aggregation. Further, the standard KNN is sensitive to the
of randomness to improve the prediction power of the
value of hyperparameter k. If the value of hyperparameter k
algorithm. Original KNN is a stable learner, and injected
is too small then it misses important nearest neighbors
randomness does not work well on stable learners [5].
which are useful for the correct prediction. Conversely, too
However, the proposed variant of KNN consists of both
large a value of hyperparameter k can include superfluous
stable and unstable leaf nodes, and unstable nodes are
instances, leading to wrong prediction. This fact is proved
sensitive to the variation in the dataset. The leaf node that
by Pan et al. [28] and Gou et al. [29]. Efforts have been
consists of only one training instance is the most unsta-
made by the researchers to reduce the sensitiveness of the
ble node, L5 in Fig. 1a, and the injected randomness works
algorithm toward hyperparameter k [25–29], leading to an
well for these nodes because these nodes are most sensitive
increase in prediction time which is already higher than
to the perturbation in the training data. Moreover, leaf
other machine learning algorithms. Zhang et al. [3] sug-
nodes with several training instances become more stable,
gested that the nearest neighbor for each query sample
L1 and L4 are the most stable nodes in Fig. 1a, and injected
should be variable to reduce the sensitivity of the hyper-
randomness does not work well for stable nodes. But, with
parameter. So, this research article proposes a BST to find
an increase in the number of instances in the leaf node, the
the variable number of nearest neighbors in logarithmic
probability of correct prediction increases because all the
time. However, the proposed BST is no longer stable be-
instances of the leaf node belong to the same class, which
cause it consists of unstable leaf nodes. In the case of
represents a locality of that class in d-dimensional space,
unstable base learners, ensembling methods, such as
and the query point belonging to that locality must be of
bootstrap aggregation and random subspace, work fine to
that class except outliers. All three types of injected ran-
increase the prediction performance as proved by Breiman
domness to improve the prediction power of the proposed
[38]. Bootstrap aggregation is preferred over boosting in
KNN variant are explained below:
this article due to parallel execution to make it time effi-
cient. The main advantages of ensembling by injecting
4.2.1 Bootstrap aggregation
randomness are as follows:
1. Component learners developed based on bootstrap Breiman [38] proposed the concept of bootstrap aggrega-
aggregation and random subspace sampling are diverse tion, which is widely used to develop diverse component
and less likely to overfit to noise or specific patterns in learners with high accuracy from unstable base learners
training data. such as decision trees [39]. Original KNN is a
2. Random subspace sampling and random node splitting stable learner, and bootstrap aggregation alone does not
help to generate diverse and strong component learners work well on stable learners. However, the proposed KNN
by utilizing different subsets of features. Diversity is variant is an unstable variant of the original KNN; hence,
important to increase the generalization of the model bootstrap aggregation can be successfully applied to the
because diverse component learners can capture a wide proposed KNN variant.
range of patterns. Bootstrap aggregation creates simulated datasets by
3. Random subspace and random node splitting also selecting n random instances with replacements from the
increase the robustness of the model by training on original dataset, as shown in Eq. (1).
different subsets of the features. It minimizes the effect ðiÞ
n ði Þ on
of irrelevant and redundant features on the final output. X 0 ¼ X 0 [ xrandð1;nÞ ð1Þ
j¼1
4. Bootstrap aggregation reduces the variance of the
learner by utilizing a majority voting strategy, leading X 0ðiÞ represents the ith simulated datasets developed by
to an increase in generalizability and reduction in selecting random instances n time from the original data-
overfitting. sets X. j ¼ 1 to n is a loop that runs n times and each time

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Fig. 1 Proposed component

learner (K-Tree)

adds a randomly selected instance from the original dataset then that dimension does not contribute to the distance
X to the ith simulated dataset X 0ðiÞ . Repetition of the calculation.
instances is allowed in the simulated datasets.
4.2.3 Random node splitting
4.2.2 Random subspace
Random node splitting is proposed to make the component
Random subspace works by selecting a random subset of learners even more diverse. Each component learner con-
features/dimensions by making the values of the unselected sists of approximately d=2 dimensions after random sub-
features zero in the dataset and projecting the dataset in space sampling. At each node split, only 1=3 random
selected dimensions. Different subsets of features/dimen- dimensions out of available d=2 dimensions are used, and a
!
sions may provide different views on the data, and hence, vector R 2 f0; 1gd is stored at each node of the BST to
trained component learners based on random subspace are track the randomly selected dimensions for node splitting.
quite diverse. The concept of random subspace is well So, a new term is added to Eq. (2) to calculate the Eucli-
exploited by Tin Kam [40] to develop a decision forest and dean distance, as shown in Eq. (3).
is also applied to the KNN classifier [41]. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u d
In this article, each feature from the available features is u X
Dðp; qÞ ¼ t ðp q Þ2 s r
2
i i i i ð3Þ
selected with 0.5 probability for all component learners.
i¼1
So, all the component learners are trained with approxi-
mately d=2 features/dimensions. Euclidean distance is In Eq. (3), r i represents the ith element of the random
calculated based on Eq. (2) with selected dimensions. !
split vector R 2 f0,1gd , whose only 1=3 elements are 1.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u d So, the probability of selecting each dimension from the
u X
Dðp; qÞ ¼ t
2
ðp q Þ2 s
i i i ð2Þ available d=2 dimensions is 0.33. It means the k-means
i¼1 clustering algorithm uses only 1=6 randomly selected
dimensions for a node split out of available d dimensions.
Here, Dð p; qÞ represents the Euclidean distance between
any two points p; q 2 X in d-dimensional space, and si
!
represents the ith element of vector S 2 f0,1gd . If si ¼ 0,

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

4.3 Training of the proposed algorithm 4. While generating BST from the simulated dataset, the
algorithm selects only 1=3 of features randomly from
The main steps to generate the K-Forest (proposed KNN the available d=2 features for each node split and keeps
variant) are shown below: track of the selected features.
5. In the prediction phase, only selected features for each
1. In the first step, the algorithm generates simulated
node split are used to find the Euclidean distance of the
datasets from the original dataset using bootstrap
query point from the node centroid.
aggregation.
2. In the second step, the algorithm chooses d=2 random Algorithm 2 explains the training steps of the proposed
features for each simulated dataset and replaces the ensemble learner (K-Forest).
unselected d=2 columns with 0 s in each simulated Algorithm 2: Algorithm to generate K-Forest
dataset to remove their contribution.
3. In the third step, the algorithm generates BST for each
simulated dataset based on the procedure explained in
Sect. 4.1.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Algorithm 2 consists of two functions, out of which In Algorithm 3, Lines 1 and 2 are the termination con-
generate_forest() is used to train our proposed ensemble ditions of the search, which returns the class label of the
learner, and generate_tree() is used to train each compo- leaf node. Lines 4 to 8 are used to reach the nearest cen-
nent learner of the proposed ensemble method. !
troid leaf node. Line 4 gets the stored vector R 2 f0,1gd
Line 1 of function generate_forest() initializes the forest on the internal node, which consists of value 1 for selected
F as an empty set ;. Lines 2 to 5 train component learners features/dimensions. Line 5 compares the distances of both
based on generated simulated datasets by injecting ran- children from the query point based on selected dimensions
domness where t is the ensemble size of the proposed KNN !
using vector R . It may be noted that only dimensions with
variant. Line 3 generates simulated dataset X0 with labels !
! one value in R contribute to the distance finding. Lines 6
y0. Line 4 generates a vector S 2 f0,1gd where 1 repre-
to 8 chose the correct branch based on the nearest centroid
sents selected features, and 0 represents unselected fea-
child node to reach the leaf node of the tree.
tures. Line 5 calls the generate_tree() method to build a
new component learner based on the simulated dataset and
4.5 Time complexity
add the built tree to forest F. Finally, line 6 returns the
forest.
The standard KNN has no training phase, and its time
The process of building BST is already explained in
complexity is Oð1Þ because it performs all computations at
Algorithm 1. However, the original process is a little bit
! the time of prediction. So, the prediction time of the
modified for ensemble learners. A new vector R 2 f0,1gd standard KNN is Oðnd þ nkÞ. Further, the standard KNN
is introduced, which consists of selected features for each consists of hyperparameter k, which should be tuned
!
node split. Selected features are 1/3 of the S . This vector is properly to achieve the best performance, which is also an
stored at every internal and leaf node of the tree to use at overhead. This article added a training phase to reduce the
the prediction time while finding the nearest centroid node. prediction time of the standard KNN. In the training phase,
a BST is generated, where the time complexity of node
4.4 Prediction splitting is Oðc:n:d Þ and algorithm is called recursively for
both branches. So, the time complexity of the BST gen-
Algorithm 3 explains the prediction process of the com- eration can be calculated based on the recurrence relation
ponent learner. After getting predicted value of all com- shown in Eq. (4).
ponent learners, final prediction is given based on majority n
T ðnÞ ¼ 2T þ c:d:n ð4Þ
voting strategy. 2
Algorithm 3: Prediction process of each component learner Here, c:d:n is the time complex of the k-means clus-
tering algorithm to divide the dataset into two clusters

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

(node splitting of BST). The time complexity of the k- neighborhood size. So, no hyperparameter tuning is needed
means clustering algorithm is Oðn:k:d:tÞ, where n repre- for the proposed KNN variant. All the experiments are
sents the total number of instances, d is the dimensions of performed on a machine with 8 GB RAM and a Core i5
the dataset, k is the number of clusters, which is 2 in our processor. Six of the eight cores are used to parallelize our
case, and t is the number of iterations fixed to 100 for our proposed algorithm.
algorithm. So, both t and k are constants represented by c in
Eq. (4). K-means clustering splits the node into two 5.1 Benchmark datasets
approximately equal-sized children consisting of n2
instances each, and we call the algorithm recursively for This section of the research article presents the details of
both branches. So, both a and b in the recurrence relation the benchmark datasets used for experimentation. The
are 2. After solving the recurrence relation shown in datasets used in this article are divided into two categories.

Eq. (4), we get O d6 :n:log2 n Oðd:n:log2 nÞ. Table 2 presents small- to medium-scale datasets, Table 3
Further, an ensemble method is proposed based on presents large-scale datasets, and Table 4 presents the
bootstrap aggregation, random subspace sampling, and noisy datasets. All the datasets used in this article are
random node splitting. The time complexity of bootstrap openly available in the KEEL repository [42]. All the
aggregation and random subspace is OðE:n:dÞ. So, the time datasets are divided into a 70–30 ratio after proper shuf-
complexity of the proposed ensemble method is fling, out of which 70% of data is used to train the model
OðE:d:n:log2 n þ E:n:d Þ for serial implementation, where E and 30% is used for testing the performance of the
is the ensemble size. However, the time complexity of the algorithms.
proposed ensemble method reduces to Oðd:n:log2 nÞ in Table 2 presents small- to medium-scale datasets. These
the parallel environment, which is the actual training time datasets are used to compare the performance of the pro-
complexity of the proposed KNN variant. posed KNN variant with existing KNN variants based on
In the prediction phase, generated BSTs of the proposed accuracy and AUC because prediction and parameter tun-
ensemble method are traversed until the leaf node is ing time reduction at the cost of accuracy are not accept-
encountered. At each internal node, only one branch is able. So, all the experiments on these datasets are
selected based on the nearest centroid child and only d=6 performed five times, and an average accuracy and AUC is
features are used for the selection of the branch. So, the taken.
recurrence relation of search in the proposed BST is shown Table 3 presents large-scale datasets to show the actual
in Eq. (5). prediction and parameter tuning time difference between
n the proposed KNN variant and existing KNN variants. As
T ðnÞ ¼ T þ c:d ð5Þ the main focus of the experiments based on Table 3 data-
2
sets is to show the time difference, all the experiments on
So, the nearest neighbor search cost of the proposed these datasets are done only once.

BST is O d6 :log2 n Oðd:log2 nÞ. Further, the prediction Table 4 presents the noisy datasets used to check the
time complexity of the proposed ensemble method is robustness of the proposed KNN variants as suggested by
OðE:d:log2 nÞ for serial implementation but becomes Gou et al. [29] because it is more difficult to get good
Oðd:log2 nÞ in the parallel environment which is the actual results in noisy datasets. Five percent attribute noise is
prediction time of the proposed algorithm. inserted in each dataset of Table 4 based on the procedure
The space complexity of the proposed KNN variant is followed by Zhu et al. [43]. In this scheme, 5% of the
OðE:n:dÞ where Oðn:dÞ is the dataset size and E is the dataset samples is selected and the value of one attribute of
ensemble size. the selected samples is replaced by a random number
between the minimum and maximum value of that attri-
bute. The name of each noisy dataset in Table 4 is
5 Results and discussion appended with the letter ‘n.’

This section of the article shows the prediction power 5.2 Evaluation metric
(accuracy and AUC) and time comparison of the proposed
KNN variant with existing selected KNN variants based on This research article uses four evaluation metrics, such as
experimentation on selected benchmark classification accuracy, AUC, training time, and prediction time to
datasets. To perform the experimentation, the neighbor- compare the proposed KNN variant with existing KNN
hood size (hyperparameters) of the existing KNN variants variants. The accuracy of the model can be calculated
is selected based on fivefold cross-validation. On the other based on Eq. (6) [9].
hand, the proposed KNN variant is free to hyperparameter

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

TP þ TN performance metrics using datasets of Table 2 because

Accuracy ¼ ð6Þ
TP þ TN þ FP þ FN without achieving equivalent accuracy, reduction in pre-
diction and parameter tuning time is useless. All the
Accuracy is the widely used performance metric to
existing variants of the original KNN consist of a hyper-
compare the performance of the models. However, AUC is
parameter. So, we use fivefold cross-validation to tune the
more important for imbalanced datasets but only applicable
hyperparameters of these algorithms. KNN, WKNN,
to binary classification problems. So, the AUC perfor-
KNCN, and LMKNCN consist of one hyperparameter,
mance metric as shown in Eq. (7) [9] is used to compare
neighborhood size (k), and all the odd values from 1 to 15
the selected binary classification datasets.
are tested to select the best neighborhood size for each run.
k1
X
TPRiþ1 þ TPRi LMRKNN and RCKNCN consist of two hyperparameters,
AUC ¼ ðFPRiþ1 FPRi Þ ð7Þ
2 which are neighborhood size (k) and L2-regularization (k)
i¼1
because these both algorithms are based on representation
In Eq. (7), TPR represents the true positive rate, FPR coefficients so all the odd values of neighborhood size (k)
represents the false positive rate, and k is the total number from 1 to 15 and 0.1, 1.0, 10, and 100 values of L2-regu-
of threshold values after sorting the instances in ascending larization (k) are tested for best hyperparameter configu-
order of prediction probabilities. ration. The proposed algorithm consists of no
The prediction and training time of the proposed variant hyperparameter to tune but consists of a training phase.
and existing variants are calculated by tracking the start The ensemble size of the proposed KNN variant is set to
and end time of the prediction and training phase of the 100 based on the literature [5]. However, efficient ensem-
algorithm using the inbuilt Python’s time function. ble size can be set based on k-fold cross-validation on
training data by testing different values of ensemble size
5.3 Experiment – 1 using grid search.
Table 5 compares the proposed KNN variant with
The main focus of the first experiment is to compare the existing ones based on accuracy, and the best results are
prediction power (accuracy and AUC) of the proposed shown in bold letters. All the results shown in Table 5 are
KNN variant with selected KNN variants. So, this section collected by taking the average accuracy of five runs after
compares the proposed KNN variant (K-Forest) with proper shuffle of training data. It may be noted that after
existing KNN variants based on accuracy and AUC each shuffle, the results of all algorithms are collected, and

Table 2 Small- to medium-scale datasets used for experimentation

Datasets Name Instances Dimensions Classes Distribution

Tae 151 5 3 (49, 50, 52)

Wine 178 13 3 (59, 71, 48)
Glass 214 9 6 (70, 76, 17, 13, 9, 29)
Dermatology 366 34 6 (112, 61, 72, 49, 52, 20)
Image 210 19 7 (30, 30, 30, 30, 30, 30, 30)
Ionosphere 351 33 2 (126, 225)
Newthyroid 215 5 3 (150, 35, 30)
Libras 360 90 15 (24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24)
Lsvt 126 310 2 (42, 84)
Bands 539 19 2 (227, 312)
Vehicle 846 18 4 (195, 309, 311, 31)
Pima 768 8 2 (500, 268)
Vowel 990 13 11 (90, 90, 90, 90, 90, 90, 90, 90, 90, 90, 90)
qsar-biodeg 1055 41 2 (699, 356)
Musk 476 168 2 (269, 207)
Segment 2310 19 7 (330, 330, 330, 330, 330, 330, 330)
Cnae 240 856 2 (120, 120)
Parkinson 756 752 2 (192, 564)

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 3 Large-scale datasets

Datasets Instances Dimensions Classes Distribution

Abalone 4177 8 3 (1307, 1342, 1528)

Optdigits 5620 64 10 (554, 571, 557, 572, 568, 558, 558, 566, 554, 562)
Penbased 10,992 16 10 (1143, 1143, 1144, 1055, 1144, 1055, 1056, 1142, 1055, 1055)
Texture 5500 40 11 (500, 500, 500, 500, 500, 500, 500, 500, 500, 500, 500)
Letter 20,000 16 26 (789, 766, 736, 805, 768, 775, 773, 734, 755, 747, 739, 761, 792, 783, 753, 803, 783, 758, 748,
796, 813, 764, 752, 787, 786, 734)
Phoneme 5404 5 2 (3818,1586)
Spambase 4597 57 2 (2785, 1812)
Ring 7400 20 2 (3664, 3736)

Table 4 Noisy datasets

Datasets Instances Dimensions Classes Distribution

Iris_n 150 4 3 (50, 50, 50)

Wine_n 176 13 3 (59, 71, 48)
Glass_n 214 9 6 (70, 76, 17, 13, 9, 29)
Heart_n 270 13 2 (150, 120)
Sonar_n 208 60 2 (111, 97)
Wdbc_n 569 30 2 (357, 212)
Pima_n 768 8 2 (500, 268)
Yeast_n 1484 8 10 (463, 5, 35, 44, 51, 163, 244, 429, 20, 30)
Ionosphere_n 351 33 2 (126, 225)
Page_Blocks_n 5472 10 5 (4913, 329, 28, 87, 115)

then training data is reshuffled for the next run. In the case slightly less than LMKNCN, but the difference is negligi-
of 12/18 datasets, the proposed KNN variant gives the best ble (only 0.00186).
results; for 6/18 datasets, LMKNCN gives the best results; Figure 2 compares the proposed variant with existing
and for one dataset, KNCN gives the best results. Let us KNN variants using box plots. Figure 2a shows the com-
compare the average accuracy of all 18 datasets. The parison in terms of accuracy, and Fig. 2b shows the com-
proposed algorithm gives approximately 1% better results parison in terms of AUC. In Fig. 2a, all box plots consist of
than LMKNCN, which is the second-best algorithm based one outlier, shown as a small circle, except LMRKNN
on experiments performed on selected small- to medium- because the median accuracy of LMRKNN is too low
scale datasets. Further, Table 1 consists of both binary- and compared to other selected variants. The median accuracy
multi-class classification datasets. In the case of binary- of the proposed KNN variant is more than 0.9, which is
class classification datasets, the proposed variant gives the better than other KNN variants, and the box plot has no
best accuracy on 3/8 datasets, and in the case of multi-class skewness. In Fig. 2b, there are no outliers, and the median
classification, the proposed variant gives the best accuracy AUC value of the proposed KNN variant is slightly lower
on 9/10 datasets. It means the proposed KNN variant per- than LMKNCN, which gives the best median value of
forms better for multi-class than binary-class classification. AUC for selected binary-class classification datasets.
As we already discussed, Table 1 consists of binary- and However, we cannot say that LMKNCN is better than the
multi-class classification datasets. For binary-class classi- proposed KNN variant because the box plot of LMKNCN
fication datasets, accuracy is not a good performance is negatively skewed. On the other hand, the box plot of
metric for comparing the performance of the algorithms. K-Forest is evenly distributed. So, we compare the mean
So, Table 6 compares the proposed variant with existing value of both algorithms on selected binary-class classifi-
KNN variants based on the AUC value. In the case of 4/8 cation datasets, and the difference between the two is much
datasets, the proposed KNN variant gives better results less, which is 0.00186. Hence, we can say that the proposed
than existing KNN variants. However, the proposed variant algorithm is as good as LMKNCN for binary-class
is second best if we compare the average results, which are classification.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 5 Accuracy comparison

Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
of K-Forest with existing KNN
variants Tae 0.5 0.48696 0.51739 0.52174 0.45652 0.53043 0.57826
± 0.0514 ± 0.0353 ± 0.0841 ± 0.0389 ± 0.0275 ± 0.0832 ± 0.0876
Wine 0.96296 0.94815 0.95926 0.97778 0.94815 0.96296 0.98518
± 0.0166 ± 0.0319 ± 0.0181 ± 0.0139 ± 0.0378 ± 0.0331 ± 0.0138
Glass 0.72308 0.73231 0.72615 0.74769 0.62769 0.67692 0.76307
± 0.0218 ± 0.0372 ± 0.0282 ± 0.0473 ± 0.0724 ± 0.0435 ± 0.0156
Dermatology 0.96909 0.96727 0.96545 0.97273 0.89091 0.96364 0.97272
± 0.022 ± 0.0169 ± 0.021 ± 0.0152 ± 0.0163 ± 0.0141 ± 0.0152
Image 0.86349 0.86349 0.85397 0.88254 0.59048 0.86667 0.89841
± 0.0327 ± 0.0327 ± 0.0211 ± 0.0277 ± 0.0442 ± 0.0238 ± 0.0294
Ionosphere 0.89245 0.78868 0.89245 0.89057 0.73774 0.85094 0.90754
± 0.0284 ± 0.0194 ± 0.0277 ± 0.0175 ± 0.0375 ± 0.044 ± 0.0162
Newthyroid 0.96308 0.95692 0.96308 0.97538 0.77846 0.83692 0.96307
± 0.0185 ± 0.0179 ± 0.0209 ± 0.0123 ± 0.0431 ± 0.0564 ± 0.0208
Libras 0.78889 0.75926 0.79074 0.81481 0.41296 0.77778 0.84259
± 0.0358 ± 0.0287 ± 0.0339 ± 0.0465 ± 0.0378 ± 0.0203 ± 0.0356
Lsvt 0.82105 0.83684 0.82105 0.81579 0.63684 0.76842 0.83684
± 0.0562 ± 0.0609 ± 0.051 ± 0.0499 ± 0.0421 ± 0.0258 ± 0.0609
Bands 0.65185 0.62716 0.65062 0.67654 0.57654 0.67531 0.70864
± 0.0291 ± 0.0139 ± 0.0401 ± 0.0335 ± 0.0342 ± 0.023 ± 0.0162
Vehicle 0.78189 0.80236 0.78583 0.78898 0.65906 0.77559 0.82992
± 0.0168 ± 0.0233 ± 0.0101 ± 0.0064 ± 0.0123 ± 0.0216 ± 0.0156
Pima 0.75238 0.74286 0.75671 0.74719 0.62251 0.72727 0.75324
± 0.0335 ± 0.0191 ± 0.03 ± 0.0217 ± 0.0192 ± 0.0252 ± 0.0287
Vowel 0.9064 0.92795 0.9064 0.96229 0.79327 0.86801 0.96430
± 0.0113 ± 0.0159 ± 0.0113 ± 0.0107 ± 0.0081 ± 0.013 ± 0.0075
qsar-biodeg 0.86309 0.84669 0.85931 0.86562 0.54322 0.84606 0.86309
± 0.0187 ± 0.0122 ± 0.0181 ± 0.0157 ± 0.0512 ± 0.0144 ± 0.0100
Musk 0.91469 0.85594 0.91888 0.93287 0.78182 0.91329 0.91748
± 0.0267 ± 0.021 ± 0.0163 ± 0.0169 ± 0.0454 ± 0.0186 ± 0.0213
Segment 0.95873 0.95786 0.95469 0.96941 0.49177 0.94372 0.97402
± 0.005 ± 0.007 ± 0.003 ± 0.0032 ± 0.0139 ± 0.0075 ± 0.0049
Cnae 0.99167 0.96944 0.98611 0.99722 0.98333 0.99167 0.98333
± 0.0068 ± 0.0222 ± 0.0 ± 0.0056 ± 0.0162 ± 0.0111 ± 0.0103
Parkinson 0.89515 0.89956 0.90044 0.93744 0.82819 0.92952 0.90572
± 0.0254 ± 0.0175 ± 0.025 ± 0.0106 ± 0.0092 ± 0.0263 ± 0.0318
Average 0.84444 0.83165 0.84491 0.85981 0.68663 0.82806 0.86930

Table 7 presents the efficient values of the hyperpa- Table 2. Table 8 compares the time difference of the pro-
rameters for each run, tuned based on fivefold cross-vali- posed KNN variant with existing KNN variants based on
dation. The proposed KNN variant has no tuning small- to medium-scale datasets. Datasets in Table 8 are
parameter, so the last column of Table 7 is empty. sorted based on size, and the best results are shown in bold
The proposed KNN variant is parameter-free and hence letters. The training time of the proposed KNN variant is
saves the parameter tuning time. However, it consists of a better than all compared KNN variants except LMRKNN.
training phase because it generates BST in the training, and The theoretical time complexity of the proposed KNN
both parameter tuning time and training time are part of the variant is better than all compared KNN variants, but
model’s training process. So, we compare the parameter practically (Based on Table 8), LMRKNN performs better
tuning time of the existing KNN variants with the training than the proposed KNN variant (K-Forest) for all datasets
time of the proposed KNN variant using datasets shown in except Parkinson (largest dataset of Table 2), which

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 6 AUC value comparison

Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
of K-Forest with existing KNN
variants Tae – – – – – – –
Wine – – –– – – – –
Glass – – – – – – –
Dermatology – – – – – – –
Image – – – – – – –
Ionosphere 0.8749 0.71582 0.87795 0.87467 0.69439 0.86611 0.87985
± 0.0264 ± 0.0087 ± 0.03 ± 0.0199 ± 0.0287 ± 0.0396 ± 0.0174
Newthyroid – – – – – – –
Libras - – –– - – – –
Lsvt 0.7783 0.79283 0.77857 0.779 0.55061 0.72736 0.81957
± 0.064 ± 0.0918 ± 0.0465 ± 0.0578 ± 0.0794 ± 0.0328 ± 0.0667
Bands 0.6351 0.62103 0.63316 0.65497 0.58213 0.64585 0.68237
± 0.0277 ± 0.0123 ± 0.037 ± 0.0332 ± 0.0353 ± 0.0212 ± 0.0185
Vehicle – – – – – – –
Pima 0.71448 0.70139 0.71568 0.70515 0.58745 0.69075 0.69011
± 0.0335 ± 0.0314 ± 0.0327 ± 0.0211 ± 0.016 ± 0.0268 ± 0.0296
Vowel – - – – – – –
qsar-biodeg 0.83554 0.8325 0.8307 0.83363 0.55786 0.81304 0.83923
± 0.0249 ± 0.0194 ± 0.0238 ± 0.0207 ± 0.045 ± 0.017 ± 0.0134
Musk 0.91186 0.86056 0.9172 0.92764 0.77818 0.91274 0.92103
± 0.0291 ± 0.0262 ± 0.0197 ± 0.0186 ± 0.0527 ± 0.0174 ± 0.0204
Segment – – – – – – –
Cnae 0.99166 0.97025 0.9857 0.99737 0.98291 0.99157 0.98299
± 0.0068 ± 0.0217 ± 0.0012 ± 0.0053 ± 0.0166 ± 0.0112 ± 0.0113
Parkinson 0.83148 0.83714 0.83889 0.89344 0.76689 0.89793 0.83582
± 0.0396 ± 0.0231 ± 0.0315 ± 0.0057 ± 0.0137 ± 0.0434 ± 0.0501
Average 0.82166 0.79143 0.82223 0.83323 0.68755 0.81816 0.83137

Fig. 2 Performance comparison of K-Forest with existing KNN variants

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 7 Values of hyperparameter of all KNN variants tuned based on fivefold cross-validation
Datasets KNN WKNN KNCN LMKNCN LMRNN RCKNCN K-
Forest

Tae K = ( 7, 13, 3, 7, K = ( 3, 7, 13, K = (5, 7, 9, 7, 3) K = ( 5, 5, 7, 5, K = (7, 3, 3, 3, 3) K = ( 5, 5, 5, 3. 13.) –

7) 13, 5) 11) k=1 k = (10, 10, 10, 10, 100)
Wine K = (11, 5, 5, 5, K = (3, 3, 13, 7, K = (5, 5, 5, 7, 3) K = (3, 7, 3, 3, K = (3, 3, 3, 5, 5) K = (7, 13, 3, 9, 11) –
11) 11) 7) k=1 k = (1, 1, 1, 10, 1)
Glass K = (7, 3, 7, 3, 3) K = (5, 3, 5, 5, 3) K = (5, 3, 5, 5, 3) K = (3, 3, 5, 3, K = (3, 3, 3, 3, 3) K = (3, 5, 3, 3, 3) –
3) k=1 k = (10, 10, 10, 10, 10)
Dermatology K = (13, 9, 11, 9, K = (3, 3, 3, 5, 7) K = (13, 9, 7, 7, K = (7, 3, 9, 5, K = (13, 3, 11, 13, 13) K = ( 9, 5, 11, 5, 7) –
9) 11) 7) k=1 k = (10, 10, 10, 10, 10)
Image K = (5, 3, 3, 7, 5) K = (5, 3, 5, 3, 3) K = (5, 5, 3, 5, 3) K = (3, 3, 3, 3, K = (3, 3, 3, 3, 3) K = (5, 7, 5, 9, 7) –
3) k=1 k = (10, 10, 10, 10, 10)
Ionosphere K = (7, 7, 5, 5, 5) K = (3, 3, 3, 3, 3) K = (9, 7, 5, 7, 5) K = (5, 3, 3, 3, K = (3, 3, 3, 3, 3) K = (5, 9, 7, 5, 7) –
5) k=1 k = ( 10, 10, 10, 100,
10)
Newthyroid K = (3, 13, 3, 5, 5) K = (5, 3, 5, 3, 3) K = (3, 3, 7, 5, 5) K = (3, 3, 3, 3, K = (7, 5, 5, 5, 5) K = (5, 5, 5, 7, 5) –
3) k=1 k = (1, 1, 1, 1, 1)
Libras K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 3) K = (3, 3, 5, 3, 3) K = (3, 3, 3, 3, K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 3) –
3) k=1 k = (10, 10, 10, 10, 10)
Lsvt K = (5, 13, 5, 11, K = (13, 5, 13, 9, K = ( 5, 7, 11, 13, K = (3, 5, 3, 7, K = (13, 13, 13, 13, 13) K = (5, 7, 5, 9, 5) –
7) 9) 7) 5) k=1 k = (100, 100, 100, 100,
100)
Bands K = (5, 9, 11, 9, 3) K = (5, 5, 13, 7, K = (3, 5, 7, 3, 9) K = (5, 5, 9, 7, K = (9, 3, 3, 3, 3) K = (3, 3, 3, 7, 5) –
13) 5) k=1 k = (100, 100, 10, 10,
100)
Vehicle K = (7, 3, 3, 5, 7) K = (7, 7, 11, 7, K = (7, 7, 7, 7, 13) K = (3, 3, 7, 3, K = (3, 3, 3, 3, 3) K = ( 9, 9, 13, 9, 11) –
5) 5) k=1 k = (10, 10, 10, 10, 10)
Pima K = (13, 13, 9, 13, K = (11, 7, 3, 7, K = (13, 13, 13, K = (11, 9, 5, K = (3, 5, 3, 3, 5) K = (13, 13, 11, 11, 9) –
11) 7) 13, 9) 11, 7) k=1 k = (10, 10, 10, 10, 10)
Vowel K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 3) –
3) k=1 k = (10, 10, 10, 10, 10)
qsar-biodeg K = (9, 11, 13, 9, K = (9, 5, 11, 9, K = (13, 7, 9, 11, K = [ 9, 13, 5, 9, K = (3, 3, 3, 3, 3) K = (3, 13, 13, 13, 13) –
13) 11) 13) 5) k=1 k = (10, 100, 100, 100,
100)
Musk K = (5, 9, 7, 5, 3) K = (3, 3, 3, 3, 3) K = (3, 3, 3, 5, 5) K = (5, 3, 5, 5, K = (3, 3, 3, 5, 3) K = (5, 5, 7, 5, 7) –
5) k=1 k = (100, 100, 100, 100,
100)
Segment K = (5, 5, 3, 3, 3) K = (3, 3, 3, 3, 3) K = (5, 7, 3, 3, 5) K = (3, 3, 3, 3, K = (3, 3, 3, 3, 3) K = (5, 5, 7, 5, 7) –
3) k=1 k = (10, 10, 10, 10, 10)
Cnae K = (5, 9, 7, 9, 3) K = (3, 3, 3, 7, 3) K = (7, 7, 5, 11, K = (3, 5, 5, 9, K = (7, 11, 7, 9, 5) K = (5, 5, 5, 5, 3) –
11) 5) k=1 k = (10, 1, 10, 10, 1)
Parkinson K = (3, 3, 5, 3, 3) K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 5) K = (3, 3, 3, 3, K = (3, 3, 3, 3, 3) K = (3, 3, 3, 3, 3) –
3) k=1 k = (10, 100, 100, 100,
100)

indicates that the prediction time of the proposed KNN variant with the original KNN, Fig. 3b with WKNN,
variant becomes better than compared variants with an Fig. 3c with KNCN, Fig. 3d with LMKNCN, Fig. 3e with
increase in dataset size. We will prove this fact in the LMRKNN, and Fig. 3f with RCKNCN. The Y-axis of all
second experiment held on large-scale datasets shown in graphs presents the time in seconds, and the X-axis pre-
Table 3. sents the datasets used for comparison. The orange line in
Figure 3 presents the pairwise comparison of the pro- Fig. 3 presents the time consumed by the proposed variant,
posed KNN variant with existing selected KNN variants. and the blue line presents the time consumed by existing
Figure 3a compares the training time of the proposed KNN KNN variants. Datasets on the X-axis are sorted from left

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 8 Training time comparison of the proposed algorithm with existing variants on small- to medium-scale datasets
Datasets KNN WKNN KNCN LMKNCN LMKRNN RCKNCN K-Forest

Tae 2.29039 0.35977 2.2593 2.43956 0.68551 13.31694 42.22578

Wine 3.76721 1.04537 3.78274 3.89582 0.95996 18.39449 28.90680
Glass 4.82404 1.09266 4.77903 5.24331 1.62333 25.60004 57.98701
Dermatology 26.38598 10.5554 26.26918 26.46129 5.21869 125.46255 39.50817
Image 6.66747 2.15319 6.34437 6.68066 2.3141 33.35251 43.14326
Ionosphere 22.7572 9.86165 22.88534 22.64098 3.56608 111.62772 53.24531
Newthyroid 4.3717 0.72551 4.36492 4.59087 1.11401 23.7478 39.70813
Libras 46.07229 27.50583 47.09319 43.20761 10.88454 195.58477 86.90704
Lsvt 15.38939 11.54187 15.13559 14.19217 2.94688 103.00105 32.38561
Bands 41.66095 13.51786 42.73716 41.7952 6.25266 205.6281 136.33328
Vehicle 96.08662 31.88656 96.14304 97.37422 15.3667 561.04186 180.44173
Pima 59.795 12.83227 59.91675 60.78164 8.9401 293.21398 188.45820
Vowel 122.07242 33.0581 119.9129 126.65112 25.064 588.21801 261.05899
qsar-biodeg 258.36583 104.21271 256.49703 251.74054 33.86361 1186.49633 200.46501
Musk 141.85224 89.48409 143.93033 136.93586 22.95488 840.41369 88.00781
Segment 753.41671 246.11982 761.15199 766.30598 112.08945 3584.57597 300.87514
Cnae 156.7608 109.20967 155.90158 147.04062 30.58961 1268.90637 45.81643
Parkinson 1446.48016 994.83914 1435.16684 1394.83888 243.14081 8583.9246 195.52188

Fig. 3 Training time comparison of proposed KNN variant with parameter tuning time comparison of existing selected KNN variants

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 9 Prediction time

Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
comparison of proposed
algorithm with existing variants Tae 0.18391 0.03098 0.15922 0.18504 0.04824 0.22873 4.94411
on small- to medium-scale
datasets Wine 0.33477 0.09708 0.22231 0.2269 0.06417 0.46658 4.94520
Glass 0.26653 0.09394 0.24417 0.24728 0.11583 0.31916 5.15649
Dermatology 3.01064 0.9204 2.78237 2.08878 0.46075 2.48419 5.46476
Image 0.34785 0.19232 0.31284 0.26681 0.14173 0.68321 5.62717
Ionosphere 1.48214 0.86101 1.69947 1.00071 0.28754 2.10962 6.39348
Newthyroid 0.28422 0.0627 0.22495 0.17034 0.07833 0.37514 6.43717
Libras 1.58225 2.43809 1.91127 1.79122 0.74757 2.13354 6.69370
Lsvt 1.50677 1.10538 1.56529 0.84762 0.28164 1.79714 6.77099
Bands 3.46894 1.21267 2.55434 2.94964 0.5278 2.40198 7.37003
Vehicle 5.43567 2.82466 8.89432 4.67849 1.22572 15.59038 7.67329
Pima 7.95297 1.14898 8.27489 5.89439 0.74429 9.03926 8.02000
Vowel 4.13585 2.98717 4.0289 4.45503 1.77313 5.89462 8.34027
qsar-biodeg 31.8508 9.38025 29.91429 26.2639 2.93497 35.96401 9.71159
Musk 9.7892 7.95733 6.4717 7.4893 1.70644 10.85987 10.06364
Segment 31.76889 21.91051 38.79166 25.9576 9.08528 59.4475 13.24362
Cnae 12.85554 10.15246 15.08008 9.90386 2.71811 18.2476 18.33445
Parkinson 56.67243 88.40936 56.10176 47.8575 23.16717 86.40967 38.08839

Fig. 4 Prediction time comparison based on small- to medium-scale datasets

to right based on the size. Clearly, the gap between the variants. It means the proposed variant becomes better with
orange and blue lines reduces as we traverse from left to an increase in dataset size.
right on the X-axis, and for the last three datasets, the Table 9 compares the prediction time of the proposed
proposed variant takes less time than the existing selected KNN variant with existing selected KNN variants based on

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

small- to medium-scale datasets, and the best results are all datasets based on each selected KNN variant. Table 10
shown in bold letters. LMRKNN gives prediction results in compares the proposed KNN variant with selected KNN
minimum time for most datasets shown in Table 9. In most variants based on training time using large-scale datasets
cases, the proposed KNN variant takes more time than shown in Table 3, and the best results are shown in bold
other selected KNN variants, but it may be noted that the letters. Datasets are sorted in ascending order based on the
time difference decreases with an increase in the size of the size of the datasets.
datasets. Datasets shown in Table 9 are sorted in ascending Figure 5 compares the proposed KNN variant with
order based on the size of the dataset. existing KNN variants based on training time using
Figure 4 compares the proposed KNN variant with selected large-scale datasets shown in Table 3. Figure 5a
existing KNN variants based on prediction time. Figure 4a compares the training time of the proposed KNN variant
compares the prediction time of the proposed KNN variant with the original KNN, Fig. 5b with WKNN, Fig. 5c with
with the original KNN, Fig. 4b with WKNN, Fig. 4c with KNCN, Fig. 5d with LMKNCN, Fig. 5e with LMRKNN,
KNCN, Fig. 4d with LMKNCN, Fig. 4e with LMRKNN, and Fig. 5f with RCKNCN. Clearly, the proposed KNN
and Fig. 4f with RCKNCN. The gap between the orange variant outperforms all selected variants regarding training
and blue lines decreases with an increase in dataset size, time, and the proposed variant becomes much better with
and K-Forest becomes better for the last three datasets in increased dataset size. LMRKNN gives better performance
comparison to all selected KNN variants except than the proposed KNN variant in some cases, but results
LMRKNN. However, the gap reduces with an increase in of LMRKNN are collected based on only one hyperpa-
dataset size for LMRKNN, too, but datasets are not big rameter tuning, which is neighborhood size (k), and the
enough to show the prediction time superiority of our other hyperparameter is fixed to a constant value (k = 1). If
proposed KNN variant over LMRKNN. both hyperparameters are tuned based on fivefold cross-
Conclusion: Based on Experiment 1, it can be con- validation, then the proposed variant will completely out-
cluded that the proposed KNN variant has better prediction perform LMRKNN, too. One more thing should be noted
power than existing selected KNN variants. In the case of here: the Y-axis shows the time in thousands of seconds,
multi-class classification, proposed KNN variants perform and after conversion, the difference will be in hours and
much better compared to binary-class classification. days, depending upon the size of the datasets.
However, the datasets selected for Experiment 1 are not big Table 11 compares the proposed KNN variant with
enough to show the training and prediction time superiority selected KNN variants in terms of prediction time on
of our proposed KNN variant. So, we perform Experiment selected large-scale datasets shown in Table 3, and the best
2 based on large-scale datasets to show the actual training results are shown in bold letters. The proposed KNN
and prediction time difference. variant outperforms all selected KNN variants by a large
margin except LMRKNN. LMRKNN takes less time than
5.4 Experiment – 2 the proposed KNN variant for two datasets out of selected
eight large-scale datasets, and for the rest, the proposed
Our second experiment is performed on large-scale data- KNN variant outperforms LMRKNN too.
sets, as shown in Table 3. The main focus of the second Figure 6 presents the pairwise comparison of the pre-
experiment is to show the reduction in prediction and diction time of the proposed KNN variant with other
training time. This experiment is performed only once for selected KNN variants using selected large-scale datasets.

Table 10 Training time comparison of the proposed algorithm with existing variants on large-scale datasets
Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest

Abalone 1781.0531 372.94143 1765.42376 1767.32249 233.20295 7271.71004 1247.70338

Phoneme 3186.98207 511.87044 3165.82589 3133.40069 405.4972 12,550.22518 1023.90311
Texture 6300.80723 2859.38085 7180.68941 6384.93338 865.90428 25,868.09518 615.77348
Ring 8350.77559 2861.26735 8462.94101 9218.51091 1051.98956 35,089.7651 1483.00125
Optdigits 10,787.4637 5213.18171 10,793.92185 10,397.69518 1257.98861 62,004.53259 910.50841
Penbased 17,501.6925 5295.41999 17,170.05464 16,633.12813 2314.79894 67,731.34338 1318.50149
Spambase 6744.84803 2865.00501 6492.95354 6281.3419 768.72692 27,696.30097 1127.1675
Letter 52,070.0506 16,982.53836 58,832.6932 61,840.85772 8064.95872 263,929.35059 3442.83986

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Fig. 5 Training time comparison of proposed KNN variant with selected KNN variants using large-scale datasets

Table 11 Prediction time comparison of the proposed algorithm with existing variants on large-scale datasets
Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest

Abalone 216.53545 33.11493 255.7155 220.56411 20.71738 119.60065 23.09049

Phoneme 108.52195 44.43948 103.81626 109.2683 38.57058 424.89377 24.05136
Texture 647.59239 254.50435 749.87979 211.35066 73.62361 891.20676 31.39713
Ring 276.80448 255.52376 289.47458 299.79811 92.35418 309.19055 41.92938
Optdigits 1084.79626 463.92642 1075.1574 346.50661 109.58096 873.53247 43.34405
Penbased 1334.94919 472.81082 1722.67197 526.88914 202.1015 878.8941 50.98553
Spambase 1003.43658 246.62441 359.81095 634.12201 68.0484 819.21726 111.60316
Letter 5256.69914 1473.09252 4072.49621 1951.75219 736.43846 7547.68256 117.14194

Figure 6a compares the prediction time of the proposed algorithm on selected large-scale datasets in Table 12. We
KNN variant with the original KNN, Fig. 6b with WKNN, cannot identify the best algorithm for selected datasets
Fig. 6c with KNCN, Fig. 6d with LMKNCN, Fig. 6e with based on only one run, but it can be concluded that, on
LMRKNN, and Fig. 6f with RCKNCN. The Y-axis of all average, the proposed algorithm is slightly better than
graphs shows the prediction time in thousands of seconds. existing algorithms.
So, we can say that the proposed KNN variant is far better Figure 7 compares the average prediction accuracy of
than the selected KNN variants in terms of prediction time. the proposed KNN variant with selected KNN variants
The main focus of Experiment 2 is to show the superi- using bar graphs, which shows that the proposed KNN
ority of the proposed KNN variant in terms of prediction variant is slightly better than selected KNN variants.
and training time. However, we also show the accuracy Conclusion: The main focus of Experiment 2 is to show
along with tuned hyperparameters for a single run of each the training and prediction time superiority of the proposed

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Fig. 6 Prediction time comparison using large-scale datasets

Table 12 Performance
Datasets KNN WKNN KNCN LMKNCN LMRNN RCKNCN K-Forest
comparison of K-Forest with
existing variants based on large- Abalone 0.55502 0.53987 0.54785 0.52791 0.47448 0.45375 0.5319
scale datasets using an accuracy
performance metric K = 11 K = 13 K = 13 K = 11 K = 7, k = 1 K = 5, k = 1
Phoneme 0.89766 0.88286 0.89766 0.85943 0.72072 0.82984 0.90012
K=3 K=3 K=3 K=3 K = 3, k = 1 K = 13, k = 10
Texture 0.99515 0.98485 0.99515 0.99455 0.85818 0.94303 0.97697
K=9 K=3 K=9 K=3 K = 3, k = 1 K = 13, k = 10
Ring 0.87342 0.65405 0.87342 0.7455 0.50631 0.5536 0.93243
K=3 K=3 K=3 K=3 K = 3, k = 1 K = 3, k = 10
Optdigits 0.98695 0.98754 0.98695 0.99229 0.87841 0.99424 0.9828
K=9 K=3 K=9 K=3 K = 3, k = 1 K = 5, k = 10
Penbased 0.99606 0.99363 0.99515 0.99606 0.83475 0.99545 0.99121
K=7 K=3 K=9 K=3 K = 3, k = 1 K = 5, k = 10
Spambase 0.93841 0.92174 0.93986 0.94275 0.75072 0.92319 0.92899
K = 13 K=5 K=5 K=9 K = 3, k = 1 K = 11, k = 100
Letter 0.94183 0.94667 0.94383 0.96483 0.715 0.88467 0.9535
K=9 K=3 K=7 K=3 K = 3, k = 1 K = 13, k = 10
Average 0.89806 0.86390 0.89748 0.87790 0.71732 0.82222 0.89974

KNN variant. So, the comparison is made on eight large- prediction accuracy, and the proposed KNN variant gives
scale datasets, and the experiment proves that the proposed slightly better performance than the existing selected KNN
KNN variant is much better than selected KNN variants, variants.
also supported by theoretical proof. However, we also
compare the proposed KNN variant based on average

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

random subspace. So, we compare the proposed KNN

variant with random forest based on accuracy and AUC
performance metrics using small- to medium-scale datasets
shown in Table 2. The ensemble size of both algorithms is
set to 100 based on the literature [5]. Component learners
of both algorithms are trained with randomly selected d=2
features, and no pruning is done.
Table 13 compares the proposed KNN variant with
random forest based on average accuracy and average
AUC value of 5 runs on small- to medium-scale datasets
shown in Table 2, and the best results are shown in bold
letters. The proposed KNN variant gives better accuracy
than random forest for 15/18 selected small- to medium-
scale datasets, both binary- and multi-class classification
datasets. In the case of multi-class classification datasets,
the proposed KNN variant gives better results on 10/10
Fig. 7 Comparison of proposed KNN variant with existing KNN
variants based on average accuracy datasets. In the case of binary classification, the proposed
KNN variant gives better accuracy for 4/8 datasets. How-
Table 13 Comparison of K-Forest with random forest based on
ever, AUC is more important than accuracy for binary
accuracy and AUC classification, and the proposed KNN variant gives better
AUC values for 4/8 datasets. On average, the proposed
Datasets Accuracy AUC
KNN variant achieves 1.4% better accuracy but 0.002%
K-Forest RF K-Forest RF less AUC.
Tae 0.57826 0.53478 – – Figure 8 compares the K-Forest with the random forest
Wine 0.98518 0.97778 – – based on accuracy and AUC using bar graphs where blue
Glass 0.76307 0.71385 – –
bars represent the outcome of the random forest and orange
Dermatology 0.97272 0.97091 – –
bars represent the outcome of the proposed KNN variant.
The exact value of each bar is written on the top of that bar
Image 0.89841 0.89206 – –
up to two decimal places. Figure 8a shows the comparison
Ionosphere 0.90754 0.92453 0.87985 0.91809
based on the accuracy performance metric, and Fig. 8b
Newthyroid 0.96307 0.93538 – –
shows the comparison based on the AUC performance
Libras 0.84259 0.75741 – –
metric.
Lsvt 0.83684 0.82105 0.81957 0.76765
Conclusion: Based on Table 13 and Fig. 8, it can be
Bands 0.70864 0.71481 0.68237 0.69972
concluded that the proposed KNN variant is better than the
Vehicle 0.82992 0.81181 – –
random forest for multi-class classification. Regarding the
Pima 0.75324 0.74286 0.69011 0.67598
binary-class classification proposed, KNN gives a 0.02%
Vowel 0.96430 0.92593 – –
less AUC value. In the case of the Musk dataset, random
Qsar-biodeg 0.86309 0.87823 0.83923 0.85365
forest gives 6% better results, which makes the difference.
Musk 0.91748 0.98042 0.92103 0.98086
If we exclude the Musk dataset, the proposed KNN gives
Segment 0.97402 0.96652 – – better results than random forest for binary-class
Cnae 0.98333 0.96944 0.98299 0.96927 classification.
Parkinson 0.90572 0.88899 0.83582 0.80261
Average 0.86930 0.85593 0.83137 0.83347 5.6 Experiment – 4

Experiment 4 is performed to show the robustness of the

proposed KNN variant in the case of noisy datasets. All the
5.5 Experiment – 3 datasets used for this experiment consist of 5% attribute
noise. In this experiment, the hyperparameter values of the
The comparison of the proposed KNN variant with a ran- selected KNN variants are tuned based on fivefold cross-
dom forest is worth showing because the technique to build validation on training data. The possible values of k from 1
a random forest is also based on bootstrap aggregation and to 15 are used for hyperparameter tuning.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Fig. 8 Comparison of proposed KNN variant with random forest

Table 14 presents the obtained results on noisy datasets 6 Conclusion

by applying proposed and existing KNN variants where the
best results are boldfaced. In the case of 6/10 datasets, the This article proposes a new variant of the original KNN
proposed KNN variant achieves the best results. The pro- algorithm to improve performance with reduced prediction
posed algorithm gives slightly less accuracy than the best- and training time. A binary search tree (BST) structure is
performing algorithm for the remaining datasets. On developed based on a divide-and-conquer strategy to
average, the proposed algorithm gives 1.3% better results reduce the prediction time, where leaf nodes represent the
than KNCN, which is the second-best KNN variant based nearest neighbors of the query point. As the entire leaf
on performed experimentation on noisy datasets. node instances are considered nearest neighbors, there is no
Figure 9 compares the proposed KNN variant with need to tune the hyperparameter (k); hence, the proposed
existing KNN variants based on accuracy performance KNN variant is parameter-free. However, the prediction
metric using box plots. The horizontal line in the middle of power of the proposed BST-based learner is slightly lower
each box plot is the median of the accuracy achieved by the than the original KNN. Therefore, this research article
learners, and the red diamond shows the mean accuracy proposed an ensemble method called K-Forest by injecting
achieved by the KNN variants. The brown-colored box plot randomness such as random subspace, random node
shows the accuracy achieved by the proposed KNN vari- splitting, and bootstrap aggregation, where the proposed
ants, which is better than all other selected KNN variants as BST structure is used as a component learner. Finally, the
median and mean accuracy is approximately 1% better than proposed KNN variant is compared with six existing KNN
existing KNN variants. variants and random forest based on accuracy, AUC,
Conclusion: The main focus of Experiment 4 is to show training time, and prediction time using 18 small- to
the robustness of the proposed KNN variant in the case of a medium-scale, eight large-scale, and ten noisy datasets.
noisy environment. So, this experiment is performed on ten Results prove that the proposed KNN variant gives a 1.1%
noisy datasets shown in Table 4. Based on Fig. 9 and performance improvement over the best-selected KNN
Table 14, it can be concluded that the proposed KNN variant and a 1.5% improvement over the random forest
variants perform equally well for noisy datasets as the with reduced prediction and training time. Further, the
mean accuracy of the proposed KNN variant is 1% better proposed KNN variant gives a 1.3% improvement on noisy
than the compared KNN variants and consists of no datasets, which proves that the proposed KNN variant is
outliers. robust to noise.
In the future, we will further improve the performance
of the proposed KNN variant using boosting techniques.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

Table 14 Performance
Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
comparison of K-Forest with
selected KNN variants on noisy Iris_n 0.88222 0.88667 0.88 0.87111 0.71111 0.88222 0.90222
datasets
± 0.0373 ± 0.0351 ± 0.0301 ± 0.0239 ± 0.0243 ± 0.041 ± 0.0109
(7) (7) (8) (3) (11) (3)
Wine_n 0.94074 0.94444 0.9437 0.95 0.64815 0.93148 0.95185
± 0.0246 ± 0.0234 ± 0.0207 ± 0.0287 ± 0.0642 ± 0.0446 ± 0.0343
(13) (10) (5) (5) (3) (3)
Glass_n 0.67231 0.69077 0.66462 0.64308 0.54 0.67231 0.66462
± 0.0681 ± 0.0847 ± 0.0626 ± 0.063 ± 0.0597 ± 0.0506 ± 0.0369
(3) (3) (3) (4) (3) (3)
Heart_n 0.7037 0.69877 0.7321 0.73951 0.50123 0.70988 0.80247
± 0.0579 ± 0.0566 ± 0.0445 ± 0.0507 ± 0.0418 ± 0.0453 ± 0.0366
(13) (13) (9) (9) (3) (15)
Sonar_n 0.82698 0.83175 0.82222 0.83968 0.60317 0.83492 0.84127
± 0.027 ± 0.0258 ± 0.0361 ± 0.018 ± 0.0224 ± 0.286 ± 0.0362
(3) (4) (3) (3) (3) (6)
Wdbc_n 0.94854 0.94854 0.94971 0.9438 0.6386 0.92456 0.95205
± 0.0101 ± 0.0101 ± 0.0108 ± 0.0076 ± 0.0275 ± 0.0137 ± 0.0167
(5) (5) (12) (8) (3) (4)
Pima_n 0.72641 0.72424 0.73506 0.73593 0.57792 0.71645 0.74632
± 0.0318 ± 0.279 ± 0.0222 ± 0.0211 ± 0.02 ± 0.0348 ± 0.0272
(12) (12) (13) (14) (14) (13)
Yeast_n 0.51928 0.52377 0.51614 0.50852 0.31143 0.49193 0.51166
± 0.0156 ± 0.0186 ± 0.0246 ± 0.0227 ± 0.0148 ± 0.0139 ± 0.0267
(14) (14) (13) (13) (3) (15)
Ionosphere_n 0.79057 0.76604 0.88 0.85245 0.61698 0.86226 0.86038
± 0.0139 ± 0.0183 ± 0.0228 ± 0.0096 ± 0.0613 ± 0.0194 ± 0.034
(4) (4) (4) (4) (9) (5)
Page_Blocks_n 0.95371 0.95371 0.95554 0.95128 0.77832 0.94896 0.94654
± 0.0028 ± 0.0024 ± 0.0023 ± 0.0037 ± 0.0477 ± 0.0025 ± 0.0041
(5) (5) (5) (6) (3) (15)
Average 0.79644 0.79687 0.80790 0.80453 0.59269 0.79749 0.81793

Funding No funding information is available.

Data availability Data will be made available on reasonable request.

Declarations

Conflict of interest We wish to confirm that there are no known

conflicts of interest associated with this publication and there has
been no significant financial support for this work that could have
influenced its outcome.

Ethical approval and consent to participate We confirm that the

manuscript has been read and approved by all named authors and that
there are no other persons who satisfied the criteria for authorship but
are not listed. We further confirm that the order of authors listed in the
Fig. 9 Accuracy comparison based on noisy datasets manuscript has been approved by all of us.
We understand that the corresponding author is the sole contact for

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

the Editorial process (including Editorial Manager and direct com- 15. Bentley JL (1975) Multidimensional binary search trees used for
munications with the office). He/she is responsible for communicat- associative searching. Commun ACM 18(9):509–517
ing with the other authors about progress, submissions of revisions, 16. Liu T, Andrew WM, Alexander G, Claire C (2006) New algo-
and final approval of proofs. We confirm that we have provided a rithms for efficient high-dimensional nonparametric classifica-
current, correct email address which is accessible by the corre- tion. J Mach Learn Res 7(6):75–102
sponding author. Signed by author as follows. 17. Cover T, Hart P (1967) Nearest neighbor pattern classification.
IEEE Trans Inf Theory 13(1):21–27
Consent to Publication We confirm that we have given due consid- 18. Bo C, Huchuan Lu, Wang D (2017) Weighted generalized nearest
eration to the protection of intellectual property associated with this neighbor for hyperspectral image classification. IEEE Access
work and that there are no impediments to publication, including the 5:1496–1509
timing of publication, with respect to intellectual property. In so doing 19. Gou J, Xiong T, Kuang Y (2011) A novel weighted voting for
we confirm that we have followed the regulations of our institutions K-nearest neighbor rule. J Comput 6(5):833–840
concerning intellectual property. 20. Gou J, Lan Du, Zhang Y, Xiong T (2012) A new distance-
weighted k-nearest neighbor classifier. J Inf Comput Sci
9(6):1429–1436
References 21. Dudani SA (1976) The distance-weighted k-nearest-neighbor
rule. IEEE Trans Syst Man Cybern 4:325–327
22. Bicego M, Marco L (2016) Weighted K-nearest neighbor revis-
1. Wu X, Vipin Kumar J, Quinlan R, Ghosh J, Yang Q, Motoda H, ited. In 2016 23rd International Conference on Pattern Recog-
McLachlan GJ et al (2008) Top 10 algorithms in data mining. nition (ICPR), pp. 1642–1647. IEEE.
Knowl Inform Syst 14:1–37 23. Domeniconi C, Peng J, Gunopulos D (2002) Locally adaptive
2. Zhang J, Qi H, Ji Y, Ren Y, He M, Mingxu Su, Cai X (2021) metric nearest-neighbor classification. IEEE Trans Pattern Anal
Nonlinear acoustic tomography for measuring the temperature Mach Intell 24(9):1281–1285
and velocity fields by using the covariance matrix adaptation 24. Weinberger KQ, Lawrence KS (2009) Distance metric learning
evolution strategy algorithm. IEEE Trans Instrum Meas 71:1–14 for large margin nearest neighbor classification. J Mach Learn
3. Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient KNN Res 10(2):207–244
classification with different numbers of nearest neighbors. IEEE 25. Mitani Y, Hamamoto Y (2006) A local mean-based nonpara-
Trans Neural Netw Learn Syst 29(5):1774–1785 metric classifier. Pattern Recogn Lett 27(10):1151–1159
4. Arif Ridho L, Muharman L (2020) Optimization of distance 26. Zeng Y, Yang Y, Zhao L (2009) Pseudo nearest neighbor rule for
formula in K-nearest neighbor method. Bull Electr Eng Inform pattern classification. Expert Syst Appl 36(2):3587–3595
9(1):326–338 27. Gou J, Zhan Y, Rao Y, Shen X, Wang X, He Wu (2014)
5. Zhi-Hua Z, Yang Y (2005) Ensembling local learners through- Improved pseudo nearest neighbor classification. Knowl-Based
multimodal perturbation. IEEE Trans Syst Man Cybern Part B Syst 70:361–375
(Cybern) 35(4):725–735 28. Pan Z, Wang Y, Weiping Ku (2017) A new k-harmonic nearest
6. Chomboon K, Pasapitch C, Pongsakorn T, Kittisak K, Nittaya K neighbor classifier based on the multi-local means. Expert Syst
(2015) An empirical study of distance metrics for k-nearest Appl 67:115–125
neighbor algorithm. In Proceedings of the 3rd international 29. Gou J, Ma H, Weihua Ou, Zeng S, Rao Y, Yang H (2019) A
conference on industrial application engineering, vol. 2. generalized mean distance-based k-nearest neighbor classifier.
7. Gweon H, Hao Yu (2021) A nearest neighbor-based active Expert Syst Appl 115:356–372
learning method and its application to time series classification. 30. Gou J, Qiu W, Yi Z, Yong Xu, Mao Q, Zhan Y (2019) A local
Pattern Recogn Lett 146:230–236 mean representation-based K-nearest neighbor classifier. ACM
8. Tran TM, Le X-MT, Nguyen HT, Huynh V-N (2019) A novel Trans Intel Syst Technol (TIST) 10(3):1–25
non-parametric method for time series classification based on 31. Sánchez JS, Filiberto P, Francesc JF (1997) On the use of
k-Nearest Neighbors and dynamic time warping barycenter neighbourhood-based non-parametric classifiers. Pattern Recogn
averaging. Eng Appl Artif Intel 78:173–185 Lett 18(11–13):1179–1186
9. Singh M, Jitender KC (2024) Improved software fault prediction 32. Gou J, Yi Z, Lan Du, Xiong T (2012) A local mean-based
using new code metrics and machine learning algorithms. k-nearest centroid neighbor classifier. Comput J 55(9):1058–1071
J Comput Lang 78:101253 33. Radhika T, Chandrasekar A, Vijayakumar V, Zhu Quanxin
10. Singh M, Jitender KC (2023) A hybrid approach based on (2023) Analysis of Markovian jump stochastic Cohen-Grossberg
k-nearest neighbors and decision tree for software fault predic- BAM neural networks with time delays for exponential input-to-
tion. Kuwait J Sci 50(2A):18331 state stability. Neural Process Lett 55(8):11055–11072
11. Zhan Y, Liu J, Gou J, Wang M (2016) A video semantic detection 34. Cao Yang, Chandrasekar A, Radhika T, Vijayakumar V (2024)
method based on locality-sensitive discriminant sparse repre- Input-to-state stability of stochastic Markovian jump genetic
sentation and weighted KNN. J Vis Commun Image Represent regulatory networks. Math Comput Simul 222:174–187
41:65–73 35. Aslam MS, Radhika T, Chandrasekar A, Zhu Q (2024) Improved
12. Uddin S, Ibtisham H, Haohui L, Mohammad AM, Ergun G event-triggered-based output tracking for a class of delayed net-
(2022) Comparative performance analysis of K-nearest neighbour worked T-S fuzzy systems. Int J Fuzzy Syst. https://doi.org/10.
(KNN) algorithm and its different variants for disease prediction. 1007/s40815-023-01664-1
Sci Rep 12(1):6256 36. Duraipandian M (2020) Long term evolution-self organizing
13. Taunk K, Sanjukta D, Srishti V, Aleena S (2019) A brief review network for minimization of sudden call termination in mobile
of nearest neighbor algorithm for learning and classification. radio access networks. J Trends Comput Sci Smart Technol
In 2019 international conference on intelligent computing and (TCSST) 2(02):89–97
control systems (ICCS), pp. 1255–1260. IEEE. 37. Paul A, Tejaswini K, Sasmita P, Priya CS, Biswaranjan B (2024)
14. Gou J, Sun L, Lan Du, Ma H, Xiong T, Weihua Ou, Zhan Y Performance comparison of different disease detection using
(2022) A representation coefficient-based k-nearest centroid stacked ensemble learning model. J Soft Comput Paradigm
neighbor classifier. Expert Syst Appl 194:116529 6(1):26–39

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Neural Computing and Applications

38. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140 43. Zhu X, Wu X, Yang Y (2004) Error detection and impact-sen-
39. Krogh A, Jesper V (1994) Neural network ensembles, cross sitive instance ranking in noisy datasets. In Proceedings of the
validation, and active learning. Adv Neural Inform Process Syst. national conference on artificial intelligence (pp. 378–384).
40. Ho TK (1998) The random subspace method for constructing Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT
decision forests. IEEE Trans Pattern Anal Mach Intell Press; 1999.
20(8):832–844
41. Ho TK (1998) Nearest neighbors in random subspaces. In Ad- Publisher’s Note Springer Nature remains neutral with regard to
vances in pattern recognition: Joint IAPR International Work- jurisdictional claims in published maps and institutional affiliations.
shops SSPR’98 and SPR’98 Sydney, Australia, August 11–13,
1998 Proceedings, pp. 640–648. Springer Berlin Heidelberg.
Springer Nature or its licensor (e.g. a society or other partner) holds
42. Derrac J, Garcia S, Sanchez L, Herrera F (2015) Keel data-
exclusive rights to this article under a publishing agreement with the
mining software tool: data set repository, integration of algo-
author(s) or other rightsholder(s); author self-archiving of the
rithms and experimental analysis framework. J Mult Valued Log
accepted manuscript version of this article is solely governed by the
Soft Comput 17:255–287
terms of such publishing agreement and applicable law.

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

Statistical Prediction and Machine Learning
100% (4)
Statistical Prediction and Machine Learning
314 pages
04 KNN
No ratings yet
04 KNN
60 pages
K-Nearest Neighbors (KNN) Algorithm: Dr. Nagaraju K, CSE
No ratings yet
K-Nearest Neighbors (KNN) Algorithm: Dr. Nagaraju K, CSE
24 pages
Azure Custom Vision Guide
No ratings yet
Azure Custom Vision Guide
14 pages
Challenges in KNN Classification: Shichao Zhang
No ratings yet
Challenges in KNN Classification: Shichao Zhang
13 pages
ML 4
No ratings yet
ML 4
33 pages
Machine Learning Clustering Quiz
No ratings yet
Machine Learning Clustering Quiz
8 pages
Unit 2
No ratings yet
Unit 2
30 pages
K-Nearest Neighbors (KNN) Algorithm
No ratings yet
K-Nearest Neighbors (KNN) Algorithm
26 pages
Lect 06
No ratings yet
Lect 06
26 pages
Efficient KNN Classification With Different Numbers of Nearest Neighbors
No ratings yet
Efficient KNN Classification With Different Numbers of Nearest Neighbors
12 pages
Lec 11,12
No ratings yet
Lec 11,12
14 pages
637227449508725497DataMining (Chapter8)
No ratings yet
637227449508725497DataMining (Chapter8)
8 pages
Topic 7.7 K-Nearest Neighbor Analysis
No ratings yet
Topic 7.7 K-Nearest Neighbor Analysis
5 pages
Machine Learning (BTCOC603 - Y23) Supplementary December 2024
No ratings yet
Machine Learning (BTCOC603 - Y23) Supplementary December 2024
4 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
Data Science Student Projects
No ratings yet
Data Science Student Projects
1 page
Clarification Notional Increament
No ratings yet
Clarification Notional Increament
2 pages
Unit 5 ML
No ratings yet
Unit 5 ML
13 pages
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
25 pages
A New Locally Adaptive K Nearest Neighbor Algorithm Based On Discrimination Class 2020
No ratings yet
A New Locally Adaptive K Nearest Neighbor Algorithm Based On Discrimination Class 2020
13 pages
Office of The Principal
No ratings yet
Office of The Principal
1 page
Improved Segmentation Model For Melanoma Lesion de
No ratings yet
Improved Segmentation Model For Melanoma Lesion de
14 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
445 Lecture 5
No ratings yet
445 Lecture 5
28 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Amrendra
No ratings yet
Amrendra
9 pages
Thesis Presentation
No ratings yet
Thesis Presentation
22 pages
Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
DAmb 2009 A
No ratings yet
DAmb 2009 A
16 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
02-knn Slides
No ratings yet
02-knn Slides
57 pages
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
No ratings yet
Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications
55 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
No ratings yet
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
16 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
1ST Ai ML
No ratings yet
1ST Ai ML
21 pages
CP - Theory - ML
No ratings yet
CP - Theory - ML
6 pages
Shubh
No ratings yet
Shubh
10 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
13 pages
K-Nearest Neighbors Algorithm
No ratings yet
K-Nearest Neighbors Algorithm
11 pages
Asthma Diagnosis via Machine Learning
No ratings yet
Asthma Diagnosis via Machine Learning
7 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Unit V Non Parametric Machine Learning
No ratings yet
Unit V Non Parametric Machine Learning
47 pages
KNN Report
No ratings yet
KNN Report
28 pages
Mathematics
No ratings yet
Mathematics
12 pages
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
No ratings yet
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
3 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
KNN Algorithm
No ratings yet
KNN Algorithm
9 pages
Face Recognition Based On Machine Learning
No ratings yet
Face Recognition Based On Machine Learning
6 pages
Variational Ising Classifier (VIC) CIPHER: Salahaddin Univercity-Erbil Collage of Science and IT Department
No ratings yet
Variational Ising Classifier (VIC) CIPHER: Salahaddin Univercity-Erbil Collage of Science and IT Department
4 pages
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
No ratings yet
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
6 pages
IMAJOR DS Vol5 4
No ratings yet
IMAJOR DS Vol5 4
14 pages
Analysisand Modellingof Agricultural Landuseusing RSGIS
No ratings yet
Analysisand Modellingof Agricultural Landuseusing RSGIS
12 pages
Neural Network Activation Functions
No ratings yet
Neural Network Activation Functions
7 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Aplikasi KNN
No ratings yet
Aplikasi KNN
5 pages
Research Paper
No ratings yet
Research Paper
6 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Week 07
No ratings yet
Week 07
24 pages
KNN & Decision Tree Basics
No ratings yet
KNN & Decision Tree Basics
9 pages
K-Nearest Neighbour Classifiers
No ratings yet
K-Nearest Neighbour Classifiers
18 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
k-Nearest Neighbors Lecture Slides
No ratings yet
k-Nearest Neighbors Lecture Slides
57 pages
MKNN Modified K Nearest Neighbor
No ratings yet
MKNN Modified K Nearest Neighbor
4 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
No ratings yet
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
5 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
38 pages
ML BIT Ans
No ratings yet
ML BIT Ans
5 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Lib Book Requirement
No ratings yet
Lib Book Requirement
9 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Machine Learning Algorithms For GeoSpatial Data - Applications and Software Tools
No ratings yet
Machine Learning Algorithms For GeoSpatial Data - Applications and Software Tools
9 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Coding and Classification
No ratings yet
Coding and Classification
2 pages
k-Nearest Neighbors Lecture Notes
No ratings yet
k-Nearest Neighbors Lecture Notes
23 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Adaptive Learning-Based K-Nearest Neighbor Classifiers With Resilience To Class Imbalance
No ratings yet
Adaptive Learning-Based K-Nearest Neighbor Classifiers With Resilience To Class Imbalance
17 pages
Data Science and Machine Learning - MCQ
No ratings yet
Data Science and Machine Learning - MCQ
19 pages
Teacher and Taught
No ratings yet
Teacher and Taught
1 page
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Strategy For Complete Discriminant Analysis
No ratings yet
Strategy For Complete Discriminant Analysis
87 pages
Velasco Et Al. (2000) - Manual of Crime Analysis Map Production
No ratings yet
Velasco Et Al. (2000) - Manual of Crime Analysis Map Production
36 pages
Offline Chinese Handwriting Recognition PDF
100% (1)
Offline Chinese Handwriting Recognition PDF
32 pages
Intro to k-Nearest Neighbor Algorithm
No ratings yet
Intro to k-Nearest Neighbor Algorithm
3 pages

A Parameter-Free Nearest Neighbor Algorithm With R

Uploaded by

A Parameter-Free Nearest Neighbor Algorithm With R

Uploaded by

Neural Computing and Applications

S.I.: TIMELY ADVANCES OF DEEP LEARNING WITH APPLICATIONS AND DATA

A parameter-free nearest neighbor algorithm with reduced prediction

Received: 23 February 2024 / Accepted: 1 October 2024

1 Introduction samples with minimum Euclidean distance where k is the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 1 Summary of existing studies used for comparison

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 1 Proposed component

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

TP þ TN performance metrics using datasets of Table 2 because

Table 2 Small- to medium-scale datasets used for experimentation

Tae 151 5 3 (49, 50, 52)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 3 Large-scale datasets

Abalone 4177 8 3 (1307, 1342, 1528)

Table 4 Noisy datasets

Iris_n 150 4 3 (50, 50, 50)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 5 Accuracy comparison

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 6 AUC value comparison

Fig. 2 Performance comparison of K-Forest with existing KNN variants

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Tae K = ( 7, 13, 3, 7, K = ( 3, 7, 13, K = (5, 7, 9, 7, 3) K = ( 5, 5, 7, 5, K = (7, 3, 3, 3, 3) K = ( 5, 5, 5, 3. 13.) –

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Tae 2.29039 0.35977 2.2593 2.43956 0.68551 13.31694 42.22578

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Table 9 Prediction time

Fig. 4 Prediction time comparison based on small- to medium-scale datasets

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Abalone 1781.0531 372.94143 1765.42376 1767.32249 233.20295 7271.71004 1247.70338

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Abalone 216.53545 33.11493 255.7155 220.56411 20.71738 119.60065 23.09049

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 6 Prediction time comparison using large-scale datasets

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

random subspace. So, we compare the proposed KNN

Experiment 4 is performed to show the robustness of the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Fig. 8 Comparison of proposed KNN variant with random forest

Table 14 presents the obtained results on noisy datasets 6 Conclusion

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Funding No funding information is available.

Data availability Data will be made available on reasonable request.

Conflict of interest We wish to confirm that there are no known

Ethical approval and consent to participate We confirm that the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

You might also like