A Parameter-Free Nearest Neighbor Algorithm With R
A Parameter-Free Nearest Neighbor Algorithm With R
https://doi.org/10.1007/s00521-024-10565-9 (0123456789().,-volV)(0123456789().
,- volV)
Abstract
K-nearest neighbor is considered in top machine learning algorithms because of its effectiveness in pattern classification
and simple implementation. However, usage of KNN is limited due to its larger prediction time than model-based machine
learning algorithms, its sensitivity to the existing outliers in the training dataset, and tuning parameter neighborhood size
(k). Therefore, this research article proposes a new variant of the KNN to reduce the training and prediction time with
improved performance. The prediction time of the KNN is reduced by making a binary search tree (BST) using the divide-
and-conquer strategy, and prediction performance is improved using ensembling by injecting randomness such as bootstrap
aggregation, random subspace, and random node splitting. The proposed KNN variant is parameter-free and, hence, not
sensitive to the hyperparameter neighborhood size. Finally, three experiments have been performed based on 26 selected
datasets to show the prediction time and prediction power superiority of the proposed KNN over random forest and six
selected KNN variants. Results prove that the proposed KNN variant gives better prediction results with reduced prediction
and training time.
Keywords K-means clustering Nearest neighbors Injected randomness Binary search tree
123
the existing outliers in the training dataset, especially in helps to improve the prediction power of the proposed
small-scale datasets. classifier.
A significant improvement has been made by the This research article is organized into four sections after
researchers in the prediction performance of neighborhood- the introduction. Section 2 summarizes the previous
based classification. However, neighborhood-based classi- research on KNN and classifiers used to compare the
fiers have drawbacks limiting their use in pattern classifi- proposed variant. Section 3 explains the drawbacks of
cation. The first drawback of the neighborhood-based KNN and its motivation to develop the proposed KNN
classifiers is tuning parameter k (neighborhood size). All variant. Section 4 gives the in-depth detail of the proposed
variants of KNN are sensitive to neighborhood size. So, we algorithm. Section 5 compares the proposed KNN variant
need to tune it properly to make neighborhood-based with the latest KNN variants based on prediction accuracy,
learners effective. Generally, k-fold cross-validation is parameter tuning time, and prediction time and discusses
used to tune the hyperparameters of the classifiers, which is the results, followed by a conclusion.
a time-consuming task. The second drawback is its pre-
diction time, which is Oðnd þ nkÞ based on brute-force
neighborhood search, far more than that of model-based 2 Related works
machine learning algorithms. Algorithms such as KD-tree
and ball tree have been developed to reduce the prediction The KNN algorithm has a long history, dating back to 1967
time of the neighborhood-based classifiers [15, 16]. How- when it was first discovered by Cover and Hart [17]. In this
ever, these algorithms do not simulate the working of the section, we briefly review the evolution of this algorithm.
brute-force method-based neighborhood search and hence Original KNN predicts the class label of the query
fail to find the proper set of nearest neighbors, which leads instance based on majority voting of the selected nearest
to a loss of generality of the original working of the KNN neighbors using Euclidean distance [17]. This algorithm
and makes it inconsistent. Therefore, developing a has some drawbacks, such as being sensitive to outliers in
parameter-free variant of the KNN with reduced prediction the training dataset and the size of the neighborhood (hy-
time without degrading the performance is necessary. perparameter of the algorithm k, number of nearest
This research article proposes a new variant of the tra- neighbors). Its prediction time is also higher than model-
ditional KNN algorithm to improve prediction accuracy based machine learning algorithms. Many efforts have
with reduced prediction and training (hyperparameter tun- been made to improve the performance of the original
ing) time. We add a training phase to generate a tree-like KNN.
structure by recursively dividing the d-dimensional search The concept of weighted voting was introduced to
space into two parts using the k-means clustering algorithm improve the effectiveness of the original KNN, and many
until each tree node contains the same type of instances (all weighted voting-based KNN algorithms are available
node instances have the same class labels). This tree [18–20]. Out of all weighted KNN algorithms, distance-
structure provides two benefits; first, it reduces the pre- weighted k-nearest neighbor (WKNN) is the most effective
diction time of the KNN from Oðnd þ nkÞ to Oðd:log2 nÞ algorithm proposed by Dudani in 1976 [21] and then
because half of the instances are eliminated with one revised by Bicego and Loog [22]. Distance metric learning
comparison in prediction phase while searching in tree-like is also proposed to improve the effectiveness of the tradi-
structure; second, it makes KNN parameter-free because all tional KNN [23, 24] that learns the most suitable distance
the instances of each leaf node are similar which are con- metric to make the KNN robust. However, KNN is still
sidered as nearest neighbors of the query sample and hence sensitive to existing outliers in the training dataset and the
no need to explicitly decide the size of neighborhood based size of the neighborhood. Many variants of the original
on k-fold cross-validation. However, it slightly degrades KNN have been developed recently to remove the effect of
the prediction power of the KNN. Therefore, we propose the existing outliers and make the algorithm less sensitive
an ensemble method based on three types of injected ran- to the value of hyperparameter k (neighborhood size),
domness to improve the prediction power of the proposed which are divided into two categories explained below.
algorithm. First is bootstrap aggregation, second is random The first category is based on the concept of local mean
subspace of features, and third is random node splitting vectors and LMKNN (local mean k-nearest neighbor),
(based on a random subset of features) where the proposed PNN (pseudo-nearest neighbors), LMPNN (local mean
BST structure is used as a component learner of the pro- pseudo-nearest neighbor), MLMKHNN (multi-local mean
posed ensemble method. As our proposed tree structure k-harmonic distance nearest neighbor), GMDKNN (gen-
(component learner) contains both stable and unstable leaf eralized mean distance-based k-nearest neighbors), and
nodes, ensembling works fine on unstable nodes and hence LMRKNN (local mean representation-based k-nearest
neighbors) are some latest improved variants of this
123
category. Mitani and Hamamoto [25] proposed LMKNN to Beyond KNN, many other latest machine learning
reduce the impact of existing outliers in the training data- algorithms have been proposed by the researchers to
set. Instead of selecting k-nearest neighbors from the whole improve the prediction performance. Radhika et al. [33]
dataset, it selects k-nearest neighbors per class from the investigated the stochastic Cohen–Grossberg bidirectional
training datasets and then finds the local mean vector associative memory-based neural network based on input-
(centroid) of each class based on selected nearest neigh- to-state stability theory. Markovian jump parameters are
bors. Finally, the class label is assigned to the query sample also considered in the investigation of the model to deter-
based on the nearest local mean vector. Zeng et al. [26] mine the continuous time. Finally, a numerical example is
proposed another variant called PNN, which finds a given to show the superiority of the proposed mode. Cao
pseudo-nearest neighbor using the weighted distance of et al. [34] developed a new genetic network with temporal
selected nearest neighbors per class, and the class label to delays for genetic regularization necessary for slow bio-
the query point is assigned based on the nearest pseudo- chemical processes such as gene transcription and trans-
neighbor. Gou et al. [27] proposed a hybrid method of PNN lation. The efficiency of the proposed model is
and LMKNN, called LMPNN, which calculated multi-local demonstrated based on numerical examples. Aslam et al.
mean vectors based on pseudo-nearest neighbor and then [35] examined a T-S fuzzy-model-based networked control
assigned the class label to the query sample based on the system (NCS) to address the output tracking control
nearest local mean vector. Pan et al. [28] proposed an problem. A new strategy based on event-triggering has
improved variant of LMPNN called MLMKHNN; it also been proposed to reduce the bandwidth utilization in NCS.
utilizes multi-local mean vectors like LMPNN, but it uses Finally, three results have been obtained based on the
k-harmonic mean distance to find the nearest mean vector Lyapunov–Krasovskii function to compare the proposed
instead of Euclidean distance. K-harmonic mean distance model with other models. Duraipandian [36] applied the
gives more weightage to local mean vectors with smaller self-organizing network in LTE to minimize call termina-
distances, improving prediction accuracy. Gou et al. [29] tion and improve the quality of voice calls. Paul et al. [37]
proposed a new distance metric called generalized mean did a comparative analysis of stacking ensemble methods
distance to improve the results of MLMKHNN by giving, with benchmark machine learning classifiers and proved
even more, weightage to local mean vectors with smaller that stacking is better than benchmark machine learning
distances. Arithmetic mean distance and harmonic mean classifiers for disease detection which gives 97.5% average
distance are special cases of generalized mean distance. accuracy.
Gou et al. [30] proposed LMRKNN, which uses ridge Out of all the existing variants of KNN, we selected six
regression (linear combination of the points) to find the studies to compare the performance of our proposed vari-
nearest neighbors per class instead of Euclidean distance, ant, summarized in Table 1.
and the final class label is assigned to query instance based
on novel weighted voting proposed in their article.
The second category is based on nearest centroid-based 3 Motivation
neighbors and KNCN (K-nearest centroid-based neigh-
bors), LMKNCN (local mean vector of k-nearest centroid As we already discussed in the introduction of this research
neighbors), and RCKNCN (representation coefficient- article, the prediction cost of the original KNN is higher
based k-nearest centroid neighbors) are some variants than that of model-based machine learning algorithms
available in the literature based on this concept. Sánchez because it does not have any training phase and does all the
et al. [31] proposed the first variant of KNN belonging to computation in the prediction phase. The prediction cost of
this category, called KNCN, which finds neighbors whose the original KNN is Oðnd þ nkÞ, where n is the number of
centroid is nearest to the query sample instead of nearest training instances, d is the number of dimensions of the
neighbors. This strategy considers the neighbors’ special dataset (number of features), and k is neighborhood size
locality (how well the nearest neighbors surround the query (hyperparameter of the algorithm). The prediction cost of
sample) to increase the prediction power by mitigating the improved variants of the original KNN is even higher,
effect of existing outliers. Gou et al. [32] proposed an which makes KNN impractical for large-scale datasets.
improved version of KNCN, called LMKNCN, which finds Algorithms are available in the literature, such as KD-tree
the nearest centroid neighbors per class and then assigns and Ball-tree, to make the prediction of the KNN faster,
the class label to the query instance whose centroid is Oðd:log2 nÞ in the average case and OðndÞ in the worst case
nearest. Gou et al. [14] proposed the latest variant of this when the tree is unbalanced, by finding the nearest
category called RCKNCN. This algorithm considers both neighbors in less time. However, the KD-tree and Ball-tree
special locality and representation coefficients for pre- make it inefficient because they cannot simulate the
dicting the class label of the query instance. working of the brute-force method of finding nearest
123
KNN Cover The k-nearest neighbors (KNN) algorithm finds similar It is sensitive to neighborhood size and existing outliers in
and instances from the training dataset based on the the training dataset. The prediction cost of KNN is far
Hart Euclidean distance and assigns the label to the query more than that of model-based machine learning
[17] sample based on the majority voting of the class labels of algorithms. It also has one hyperparameter k
selected nearest neighbors. It is a simple machine (neighborhood size), which must be tuned using k-fold
learning algorithm and is still effective in pattern cross-validation
classification
WKNN Dudani The weighted k-nearest neighbors (WKNN) algorithm is It is also sensitive to neighborhood size and existing
et al. an improved variant of the traditional KNN, which outliers in the training dataset. The prediction cost of
[21] assigns the weights to each selected nearest neighbor KNN is far more than that of model-based machine
based on the Euclidean distance from the query sample. learning algorithms. It also has one hyperparameter k
Finally, the class label is assigned to the query sample (neighborhood size), which must be tuned using k-fold
based on weighted voting instead of the majority voting cross-validation
rule
KNCN Sánchez The k-nearest centroid neighbors (KNCN) algorithm finds It has a high prediction cost, which is OðndÞ. It also has
et al. the nearest neighbors whose centroid is nearest to the one tuning parameter (neighborhood size), which must
[31] query sample. It also looks for special locality instead of be tuned using k-fold cross-validation. Prediction power
just nearest neighbors, improving the KNN classifier’s is better than KNN and WKNN but still can be improved
prediction power. The final class label is assigned based
on the majority voting rule of the nearest neighbors
LMKNCN Gou The local mean k-nearest centroid neighbors (LMKNCN) Robust to outliers but still sensitive to neighborhood size.
et al. algorithm finds k-nearest centroid neighbors of each It also has a high prediction cost: OðndÞ. It has better
[32] class available in the training dataset from the query prediction power than KNCN, but there is still room for
sample instead of finding k-nearest centroid neighbors improvement
from whole dataset. Finally, the class label is assigned to
the query sample whose neighbors’ centroid is nearest to
the query sample. It further improves the prediction
power of the learner by reducing the effect of the
existing outliers
LMRKNN Gou The local mean representation-based k-nearest neighbor It is robust to outliers existing in training data and less
et al. (LMRKNN) algorithm finds the k-nearest neighbors sensitive to neighborhood size. However, it has a high
[30] from each class based on Euclidean distance and then prediction cost, Oðnd þ nk þ k3 dÞ, and has two tuning
finds the local mean vectors of selected nearest parameters, neighborhood size and ridge regression
neighbors of each class. Finally, it utilizes the concept of coefficient, which is even more time-consuming
ridge regression to assign the weights to all local mean
vectors and assign the class label to the query sample
based on weighted voting by considering the sum of
weights of the local mean vectors of all classes available
in the training dataset. It reduces the sensitivity of the
neighborhood size and existing outliers in the dataset
RCKNCK Gou The representation coefficient k-nearest centroid neighbor It is also robust to outliers in training data but still has a
et al. (RCKNCN) is the latest version of KNN, which high prediction cost, Oðnd þ k3 dÞ, based on the closed
[14] considers both special locality and representation form of ridge regression. Like LMRKNN, it also has two
coefficients of the nearest neighbors to fully discover the hyperparameters, neighborhood size and regression
pattern discrimination from the nearest centroid coefficient, which must be tuned based on k-fold cross-
neighbors. First, it finds the nearest centroid neighbors validation. There is still room for improvement in
from each class based on Euclidean distance and then prediction power
assigns weights to all nearest neighbors of each class
based on ridge regression. Finally, the class label is
assigned to the query sample based on weighted voting
of the selected nearest neighbors. It is an improved
version of KNN, which combines the benefits of both
LMKNCN and LMRKNN
neighbors (unable to find correct neighbors) and hence It is also important to find the efficient neighborhood
degrade the prediction power (accuracy or AUC). size (value of hyperparameter k), generally done based on
grid search using k-fold cross-validation and the time
123
needed to decide the value of neighborhood size is which needs to be tuned properly to make it effective. To
Oðnd þ nkðNN Þ Þ kð foldÞ s, where Oðnd þ nkNN Þ is the mitigate all these issues, a training phase in KNN is added
time complexity to find the k nearest neighbors, kð foldÞ is in this research article in which a multi-dimensional binary
the total number of folds, and s is the size of the array that search tree (BST) is generated based on the divide-and-
conquer strategy, as shown in Fig. 1, to reduce the pre-
consists of different values of kðNN Þ , which is a time-con-
diction cost and make it hyperparameter-free. As the BST
suming task.
is generated based on the divide-and-conquer strategy, the
Therefore, reducing the prediction time and hyperpa-
divide-and-conquer strategy recursively divides the train-
rameter tuning time of the KNN without degrading the
ing data into two approximately equal parts for the gen-
prediction power is important to make it competitive with
eration of BST until the termination condition is met. This
other machine learning algorithms. Hence, a new KNN
research article utilizes a k-means clustering algorithm to
variant is proposed in this research article to reduce the
divide the training data into approximately two equal parts
training time and prediction time while increasing the
recursively until all the points of the cluster consist of the
prediction accuracy. The main contributions of the research
same labels. The k-means clustering algorithm can divide
article are as follows:
the data into two clusters in linear time OðndÞ if the
1. This research article proposes a new multi-dimensional number of clusters and number of iterations are fixed. In
binary search tree (BST) based on a divide-and- our case, the number of clusters is 2 and the number of
conquer strategy to make the nearest neighbor search iterations is 100.
faster. The proposed KNN variant reduces the nearest Figure 1 is divided into three parts where Fig. 1a shows
neighbor search cost to Oðd:log2 nÞ from Oðnd þ nkÞ, the overall structure of the generated BST where RðnÞ
leading to a reduction in prediction time. shows the root node of the tree consists of n training
2. The proposed algorithm utilizes a dynamic number of ðmÞ
instances (n ¼ 15 in Fig. 1), I i represents ith internal
nearest neighbors for predicting the class label of the
node of the tree consists of m training instances (m ¼ n2
query samples, which is desirable for better perfor- ð lÞ
mance of nearest neighbor-based classifiers. approximately), and Lj represents jth leaf node of the tree
3. The proposed algorithm eliminates the time taken by with l training instances. Figure 1b represents the node
the standard KNN to decide the optimal neighborhood structure of the internal nodes, which consists of pointers to
size because the proposed model is free to hyperpa- !
left and right children, and a vector C stores the centroid of
rameter neighborhood size. the cluster of points. Figure 1c represents the node structure
4. The proposed KNN variant is robust to noise, gives of the leaf node where pointers to left and right children are
better prediction accuracy, and prevents overfitting by !
null (/), a vector C to store the node’s centroid, and an
utilizing bootstrap aggregation, random subspace, and extra field to store the node’s label.
random node splitting. This article proposes a recursive algorithm to generate
The complete details of the working of our proposed the BST shown in Fig. 1a, where the K-means clustering
variant are described in the methodology. algorithm is used as the node splitting method (K-mode for
categorical attributes and K-prototypes for mixed attributes
can be used). Suppose we have a dataset X 2 Rnd with
4 Methodology labels y 2 fc1 ; c2 ; . . .; cm gn where ci represents the ith class
label, and there are total m classes. The main steps to
This section of the article presents the overall structure of generate multi-dimensional BST based on a divide-and-
the proposed KNN variant and steps of the algorithm to conquer strategy are as follows:
generate it.
1. Initially, the algorithm considers the whole dataset as a
single cluster of instances and calculates the centroid of
4.1 Binary search tree structure
the whole dataset to store it in the root node of the
BST.
This section explains the proposed binary search tree
2. In the second step, the proposed algorithm divides the
(BST) structure and the process to generate it based on a
root node into two approximately equal parts based on
divide-and-conquer strategy to reduce the prediction and
K-means clustering.
parameter tuning time. Original KNN has no training
3. In the third step, the algorithm finds the centroids of
phase, and all the computations are done in the prediction
both sub-clusters and stores them in child nodes of the
phase, which is time-consuming and makes it infeasible for
BST.
real-time usage. It is also sensitive to hyperparameter k,
123
4. In the fourth step, the algorithm checks whether all the Algorithm 1 presents the procedure to generate the BST.
points of the generated sub-clusters consist of the same Function generate TreeðÞ consists of three arguments: X is
class label and if any sub-cluster consists of the same the training dataset containing n instances of d dimensions,
class labels for all points belonging to that cluster then y contains the labels of training instances, and R is the root
the algorithm marks its leaf node, and stores the class node of BST, which is the initial null ;.
label and centroid on the leaf node. All these points The first step calculates the centroid of the cluster of
will be considered nearest neighbors of the query point points (instances) shown in line 1 of Algorithm 1, where
if the centroid of this leaf node is nearest to the query !
jXj is the cardinality of the set of points and C is the
point in the prediction phase. cluster center. The second step allocates the memory to the
5. In the fifth step, if any of the generated sub-clusters root node using the new nodeðÞ function shown in line 2 of
contains instances belonging to different classes the the algorithm.
algorithm calls itself recursively for the further division Lines 3 to 8 of Algorithm 1 are used to terminate the
of that node by considering its next root. Steps 2 to 5 recursion and generate leaf nodes of the BST. It checks
are repeated until all the instances of the clusters whether all cluster instances have the same labels in line 3.
belong to the same class. If all the cluster instances have the same labels, it makes
6. Finally, the algorithm returns the generated BST. The both left and right pointers of the node null ;, saves the
internal nodes, including the root of the BST, contain centroid and class label in the node, and simply returns it.
only centroids of the intermediate partitions which help Lines 9 to 13 represent the fourth step of the algorithm,
to find the nearest leaf node in logarithmic time in the where it saves the cluster’s centroid in the node and then
prediction phase and leaf nodes of the tree consist of divides the cluster of points into two sub-clusters using the
centroid as well as class labels because all the instances k-means clustering algorithm. C1 contains the indices of
of the leaf nodes consist of same class labels. the instances belonging to cluster-1, and C2 contains the
A cluster of instances is called atomic if all the cluster indices of instances belonging to cluster-2. Function
instances have the same labels. Algorithm 1 shows the BST generate TreeðÞ is called for both clusters, and links are
generation process. assigned to the left and right pointers of the root node. The
Algorithm 1: Recursive procedure to generate BST last step returns the root node T, shown in line 14.
The proposed BST reduces the prediction time of the
original KNN by making nearest neighbors search from
123
linear to logarithmic and makes it parameter-free because 5. Bootstrap aggregation also makes the model robust to
all the instances of the leaf node are considered nearest outliers in the training data.
neighbors of the query instance. However, it slightly
The next section explains the injected randomness, the
degrades the prediction power (accuracy and AUC) of the
process of generating the ensemble method by injecting
original KNN. Therefore, we propose an ensemble method
randomness, and why injected randomness works on the
by injecting randomness to improve the prediction power
proposed BST.
where the proposed BST is used as a component learner of
the proposed ensemble method.
4.2 Randomness injection
The standard KNN is a stable learner and is not affected
by the perturbation in the dataset such as bootstrap
We propose an ensemble method by injecting three types
aggregation. Further, the standard KNN is sensitive to the
of randomness to improve the prediction power of the
value of hyperparameter k. If the value of hyperparameter k
algorithm. Original KNN is a stable learner, and injected
is too small then it misses important nearest neighbors
randomness does not work well on stable learners [5].
which are useful for the correct prediction. Conversely, too
However, the proposed variant of KNN consists of both
large a value of hyperparameter k can include superfluous
stable and unstable leaf nodes, and unstable nodes are
instances, leading to wrong prediction. This fact is proved
sensitive to the variation in the dataset. The leaf node that
by Pan et al. [28] and Gou et al. [29]. Efforts have been
consists of only one training instance is the most unsta-
made by the researchers to reduce the sensitiveness of the
ble node, L5 in Fig. 1a, and the injected randomness works
algorithm toward hyperparameter k [25–29], leading to an
well for these nodes because these nodes are most sensitive
increase in prediction time which is already higher than
to the perturbation in the training data. Moreover, leaf
other machine learning algorithms. Zhang et al. [3] sug-
nodes with several training instances become more stable,
gested that the nearest neighbor for each query sample
L1 and L4 are the most stable nodes in Fig. 1a, and injected
should be variable to reduce the sensitivity of the hyper-
randomness does not work well for stable nodes. But, with
parameter. So, this research article proposes a BST to find
an increase in the number of instances in the leaf node, the
the variable number of nearest neighbors in logarithmic
probability of correct prediction increases because all the
time. However, the proposed BST is no longer stable be-
instances of the leaf node belong to the same class, which
cause it consists of unstable leaf nodes. In the case of
represents a locality of that class in d-dimensional space,
unstable base learners, ensembling methods, such as
and the query point belonging to that locality must be of
bootstrap aggregation and random subspace, work fine to
that class except outliers. All three types of injected ran-
increase the prediction performance as proved by Breiman
domness to improve the prediction power of the proposed
[38]. Bootstrap aggregation is preferred over boosting in
KNN variant are explained below:
this article due to parallel execution to make it time effi-
cient. The main advantages of ensembling by injecting
4.2.1 Bootstrap aggregation
randomness are as follows:
1. Component learners developed based on bootstrap Breiman [38] proposed the concept of bootstrap aggrega-
aggregation and random subspace sampling are diverse tion, which is widely used to develop diverse component
and less likely to overfit to noise or specific patterns in learners with high accuracy from unstable base learners
training data. such as decision trees [39]. Original KNN is a
2. Random subspace sampling and random node splitting stable learner, and bootstrap aggregation alone does not
help to generate diverse and strong component learners work well on stable learners. However, the proposed KNN
by utilizing different subsets of features. Diversity is variant is an unstable variant of the original KNN; hence,
important to increase the generalization of the model bootstrap aggregation can be successfully applied to the
because diverse component learners can capture a wide proposed KNN variant.
range of patterns. Bootstrap aggregation creates simulated datasets by
3. Random subspace and random node splitting also selecting n random instances with replacements from the
increase the robustness of the model by training on original dataset, as shown in Eq. (1).
different subsets of the features. It minimizes the effect ðiÞ
n ði Þ on
of irrelevant and redundant features on the final output. X 0 ¼ X 0 [ xrandð1;nÞ ð1Þ
j¼1
4. Bootstrap aggregation reduces the variance of the
learner by utilizing a majority voting strategy, leading X 0ðiÞ represents the ith simulated datasets developed by
to an increase in generalizability and reduction in selecting random instances n time from the original data-
overfitting. sets X. j ¼ 1 to n is a loop that runs n times and each time
123
adds a randomly selected instance from the original dataset then that dimension does not contribute to the distance
X to the ith simulated dataset X 0ðiÞ . Repetition of the calculation.
instances is allowed in the simulated datasets.
4.2.3 Random node splitting
4.2.2 Random subspace
Random node splitting is proposed to make the component
Random subspace works by selecting a random subset of learners even more diverse. Each component learner con-
features/dimensions by making the values of the unselected sists of approximately d=2 dimensions after random sub-
features zero in the dataset and projecting the dataset in space sampling. At each node split, only 1=3 random
selected dimensions. Different subsets of features/dimen- dimensions out of available d=2 dimensions are used, and a
!
sions may provide different views on the data, and hence, vector R 2 f0; 1gd is stored at each node of the BST to
trained component learners based on random subspace are track the randomly selected dimensions for node splitting.
quite diverse. The concept of random subspace is well So, a new term is added to Eq. (2) to calculate the Eucli-
exploited by Tin Kam [40] to develop a decision forest and dean distance, as shown in Eq. (3).
is also applied to the KNN classifier [41]. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u d
In this article, each feature from the available features is u X
Dðp; qÞ ¼ t ðp q Þ2 s r
2
i i i i ð3Þ
selected with 0.5 probability for all component learners.
i¼1
So, all the component learners are trained with approxi-
mately d=2 features/dimensions. Euclidean distance is In Eq. (3), r i represents the ith element of the random
calculated based on Eq. (2) with selected dimensions. !
split vector R 2 f0,1gd , whose only 1=3 elements are 1.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u d So, the probability of selecting each dimension from the
u X
Dðp; qÞ ¼ t
2
ðp q Þ2 s
i i i ð2Þ available d=2 dimensions is 0.33. It means the k-means
i¼1 clustering algorithm uses only 1=6 randomly selected
dimensions for a node split out of available d dimensions.
Here, Dð p; qÞ represents the Euclidean distance between
any two points p; q 2 X in d-dimensional space, and si
!
represents the ith element of vector S 2 f0,1gd . If si ¼ 0,
123
4.3 Training of the proposed algorithm 4. While generating BST from the simulated dataset, the
algorithm selects only 1=3 of features randomly from
The main steps to generate the K-Forest (proposed KNN the available d=2 features for each node split and keeps
variant) are shown below: track of the selected features.
5. In the prediction phase, only selected features for each
1. In the first step, the algorithm generates simulated
node split are used to find the Euclidean distance of the
datasets from the original dataset using bootstrap
query point from the node centroid.
aggregation.
2. In the second step, the algorithm chooses d=2 random Algorithm 2 explains the training steps of the proposed
features for each simulated dataset and replaces the ensemble learner (K-Forest).
unselected d=2 columns with 0 s in each simulated Algorithm 2: Algorithm to generate K-Forest
dataset to remove their contribution.
3. In the third step, the algorithm generates BST for each
simulated dataset based on the procedure explained in
Sect. 4.1.
123
Algorithm 2 consists of two functions, out of which In Algorithm 3, Lines 1 and 2 are the termination con-
generate_forest() is used to train our proposed ensemble ditions of the search, which returns the class label of the
learner, and generate_tree() is used to train each compo- leaf node. Lines 4 to 8 are used to reach the nearest cen-
nent learner of the proposed ensemble method. !
troid leaf node. Line 4 gets the stored vector R 2 f0,1gd
Line 1 of function generate_forest() initializes the forest on the internal node, which consists of value 1 for selected
F as an empty set ;. Lines 2 to 5 train component learners features/dimensions. Line 5 compares the distances of both
based on generated simulated datasets by injecting ran- children from the query point based on selected dimensions
domness where t is the ensemble size of the proposed KNN !
using vector R . It may be noted that only dimensions with
variant. Line 3 generates simulated dataset X0 with labels !
! one value in R contribute to the distance finding. Lines 6
y0. Line 4 generates a vector S 2 f0,1gd where 1 repre-
to 8 chose the correct branch based on the nearest centroid
sents selected features, and 0 represents unselected fea-
child node to reach the leaf node of the tree.
tures. Line 5 calls the generate_tree() method to build a
new component learner based on the simulated dataset and
4.5 Time complexity
add the built tree to forest F. Finally, line 6 returns the
forest.
The standard KNN has no training phase, and its time
The process of building BST is already explained in
complexity is Oð1Þ because it performs all computations at
Algorithm 1. However, the original process is a little bit
! the time of prediction. So, the prediction time of the
modified for ensemble learners. A new vector R 2 f0,1gd standard KNN is Oðnd þ nkÞ. Further, the standard KNN
is introduced, which consists of selected features for each consists of hyperparameter k, which should be tuned
!
node split. Selected features are 1/3 of the S . This vector is properly to achieve the best performance, which is also an
stored at every internal and leaf node of the tree to use at overhead. This article added a training phase to reduce the
the prediction time while finding the nearest centroid node. prediction time of the standard KNN. In the training phase,
a BST is generated, where the time complexity of node
4.4 Prediction splitting is Oðc:n:d Þ and algorithm is called recursively for
both branches. So, the time complexity of the BST gen-
Algorithm 3 explains the prediction process of the com- eration can be calculated based on the recurrence relation
ponent learner. After getting predicted value of all com- shown in Eq. (4).
ponent learners, final prediction is given based on majority n
T ðnÞ ¼ 2T þ c:d:n ð4Þ
voting strategy. 2
Algorithm 3: Prediction process of each component learner Here, c:d:n is the time complex of the k-means clus-
tering algorithm to divide the dataset into two clusters
123
(node splitting of BST). The time complexity of the k- neighborhood size. So, no hyperparameter tuning is needed
means clustering algorithm is Oðn:k:d:tÞ, where n repre- for the proposed KNN variant. All the experiments are
sents the total number of instances, d is the dimensions of performed on a machine with 8 GB RAM and a Core i5
the dataset, k is the number of clusters, which is 2 in our processor. Six of the eight cores are used to parallelize our
case, and t is the number of iterations fixed to 100 for our proposed algorithm.
algorithm. So, both t and k are constants represented by c in
Eq. (4). K-means clustering splits the node into two 5.1 Benchmark datasets
approximately equal-sized children consisting of n2
instances each, and we call the algorithm recursively for This section of the research article presents the details of
both branches. So, both a and b in the recurrence relation the benchmark datasets used for experimentation. The
are 2. After solving the recurrence relation shown in datasets used in this article are divided into two categories.
Eq. (4), we get O d6 :n:log2 n Oðd:n:log2 nÞ. Table 2 presents small- to medium-scale datasets, Table 3
Further, an ensemble method is proposed based on presents large-scale datasets, and Table 4 presents the
bootstrap aggregation, random subspace sampling, and noisy datasets. All the datasets used in this article are
random node splitting. The time complexity of bootstrap openly available in the KEEL repository [42]. All the
aggregation and random subspace is OðE:n:dÞ. So, the time datasets are divided into a 70–30 ratio after proper shuf-
complexity of the proposed ensemble method is fling, out of which 70% of data is used to train the model
OðE:d:n:log2 n þ E:n:d Þ for serial implementation, where E and 30% is used for testing the performance of the
is the ensemble size. However, the time complexity of the algorithms.
proposed ensemble method reduces to Oðd:n:log2 nÞ in Table 2 presents small- to medium-scale datasets. These
the parallel environment, which is the actual training time datasets are used to compare the performance of the pro-
complexity of the proposed KNN variant. posed KNN variant with existing KNN variants based on
In the prediction phase, generated BSTs of the proposed accuracy and AUC because prediction and parameter tun-
ensemble method are traversed until the leaf node is ing time reduction at the cost of accuracy are not accept-
encountered. At each internal node, only one branch is able. So, all the experiments on these datasets are
selected based on the nearest centroid child and only d=6 performed five times, and an average accuracy and AUC is
features are used for the selection of the branch. So, the taken.
recurrence relation of search in the proposed BST is shown Table 3 presents large-scale datasets to show the actual
in Eq. (5). prediction and parameter tuning time difference between
n the proposed KNN variant and existing KNN variants. As
T ðnÞ ¼ T þ c:d ð5Þ the main focus of the experiments based on Table 3 data-
2
sets is to show the time difference, all the experiments on
So, the nearest neighbor search cost of the proposed these datasets are done only once.
BST is O d6 :log2 n Oðd:log2 nÞ. Further, the prediction Table 4 presents the noisy datasets used to check the
time complexity of the proposed ensemble method is robustness of the proposed KNN variants as suggested by
OðE:d:log2 nÞ for serial implementation but becomes Gou et al. [29] because it is more difficult to get good
Oðd:log2 nÞ in the parallel environment which is the actual results in noisy datasets. Five percent attribute noise is
prediction time of the proposed algorithm. inserted in each dataset of Table 4 based on the procedure
The space complexity of the proposed KNN variant is followed by Zhu et al. [43]. In this scheme, 5% of the
OðE:n:dÞ where Oðn:dÞ is the dataset size and E is the dataset samples is selected and the value of one attribute of
ensemble size. the selected samples is replaced by a random number
between the minimum and maximum value of that attri-
bute. The name of each noisy dataset in Table 4 is
5 Results and discussion appended with the letter ‘n.’
This section of the article shows the prediction power 5.2 Evaluation metric
(accuracy and AUC) and time comparison of the proposed
KNN variant with existing selected KNN variants based on This research article uses four evaluation metrics, such as
experimentation on selected benchmark classification accuracy, AUC, training time, and prediction time to
datasets. To perform the experimentation, the neighbor- compare the proposed KNN variant with existing KNN
hood size (hyperparameters) of the existing KNN variants variants. The accuracy of the model can be calculated
is selected based on fivefold cross-validation. On the other based on Eq. (6) [9].
hand, the proposed KNN variant is free to hyperparameter
123
123
then training data is reshuffled for the next run. In the case slightly less than LMKNCN, but the difference is negligi-
of 12/18 datasets, the proposed KNN variant gives the best ble (only 0.00186).
results; for 6/18 datasets, LMKNCN gives the best results; Figure 2 compares the proposed variant with existing
and for one dataset, KNCN gives the best results. Let us KNN variants using box plots. Figure 2a shows the com-
compare the average accuracy of all 18 datasets. The parison in terms of accuracy, and Fig. 2b shows the com-
proposed algorithm gives approximately 1% better results parison in terms of AUC. In Fig. 2a, all box plots consist of
than LMKNCN, which is the second-best algorithm based one outlier, shown as a small circle, except LMRKNN
on experiments performed on selected small- to medium- because the median accuracy of LMRKNN is too low
scale datasets. Further, Table 1 consists of both binary- and compared to other selected variants. The median accuracy
multi-class classification datasets. In the case of binary- of the proposed KNN variant is more than 0.9, which is
class classification datasets, the proposed variant gives the better than other KNN variants, and the box plot has no
best accuracy on 3/8 datasets, and in the case of multi-class skewness. In Fig. 2b, there are no outliers, and the median
classification, the proposed variant gives the best accuracy AUC value of the proposed KNN variant is slightly lower
on 9/10 datasets. It means the proposed KNN variant per- than LMKNCN, which gives the best median value of
forms better for multi-class than binary-class classification. AUC for selected binary-class classification datasets.
As we already discussed, Table 1 consists of binary- and However, we cannot say that LMKNCN is better than the
multi-class classification datasets. For binary-class classi- proposed KNN variant because the box plot of LMKNCN
fication datasets, accuracy is not a good performance is negatively skewed. On the other hand, the box plot of
metric for comparing the performance of the algorithms. K-Forest is evenly distributed. So, we compare the mean
So, Table 6 compares the proposed variant with existing value of both algorithms on selected binary-class classifi-
KNN variants based on the AUC value. In the case of 4/8 cation datasets, and the difference between the two is much
datasets, the proposed KNN variant gives better results less, which is 0.00186. Hence, we can say that the proposed
than existing KNN variants. However, the proposed variant algorithm is as good as LMKNCN for binary-class
is second best if we compare the average results, which are classification.
123
Table 7 presents the efficient values of the hyperpa- Table 2. Table 8 compares the time difference of the pro-
rameters for each run, tuned based on fivefold cross-vali- posed KNN variant with existing KNN variants based on
dation. The proposed KNN variant has no tuning small- to medium-scale datasets. Datasets in Table 8 are
parameter, so the last column of Table 7 is empty. sorted based on size, and the best results are shown in bold
The proposed KNN variant is parameter-free and hence letters. The training time of the proposed KNN variant is
saves the parameter tuning time. However, it consists of a better than all compared KNN variants except LMRKNN.
training phase because it generates BST in the training, and The theoretical time complexity of the proposed KNN
both parameter tuning time and training time are part of the variant is better than all compared KNN variants, but
model’s training process. So, we compare the parameter practically (Based on Table 8), LMRKNN performs better
tuning time of the existing KNN variants with the training than the proposed KNN variant (K-Forest) for all datasets
time of the proposed KNN variant using datasets shown in except Parkinson (largest dataset of Table 2), which
123
123
Table 7 Values of hyperparameter of all KNN variants tuned based on fivefold cross-validation
Datasets KNN WKNN KNCN LMKNCN LMRNN RCKNCN K-
Forest
indicates that the prediction time of the proposed KNN variant with the original KNN, Fig. 3b with WKNN,
variant becomes better than compared variants with an Fig. 3c with KNCN, Fig. 3d with LMKNCN, Fig. 3e with
increase in dataset size. We will prove this fact in the LMRKNN, and Fig. 3f with RCKNCN. The Y-axis of all
second experiment held on large-scale datasets shown in graphs presents the time in seconds, and the X-axis pre-
Table 3. sents the datasets used for comparison. The orange line in
Figure 3 presents the pairwise comparison of the pro- Fig. 3 presents the time consumed by the proposed variant,
posed KNN variant with existing selected KNN variants. and the blue line presents the time consumed by existing
Figure 3a compares the training time of the proposed KNN KNN variants. Datasets on the X-axis are sorted from left
123
Table 8 Training time comparison of the proposed algorithm with existing variants on small- to medium-scale datasets
Datasets KNN WKNN KNCN LMKNCN LMKRNN RCKNCN K-Forest
Fig. 3 Training time comparison of proposed KNN variant with parameter tuning time comparison of existing selected KNN variants
123
to right based on the size. Clearly, the gap between the variants. It means the proposed variant becomes better with
orange and blue lines reduces as we traverse from left to an increase in dataset size.
right on the X-axis, and for the last three datasets, the Table 9 compares the prediction time of the proposed
proposed variant takes less time than the existing selected KNN variant with existing selected KNN variants based on
123
small- to medium-scale datasets, and the best results are all datasets based on each selected KNN variant. Table 10
shown in bold letters. LMRKNN gives prediction results in compares the proposed KNN variant with selected KNN
minimum time for most datasets shown in Table 9. In most variants based on training time using large-scale datasets
cases, the proposed KNN variant takes more time than shown in Table 3, and the best results are shown in bold
other selected KNN variants, but it may be noted that the letters. Datasets are sorted in ascending order based on the
time difference decreases with an increase in the size of the size of the datasets.
datasets. Datasets shown in Table 9 are sorted in ascending Figure 5 compares the proposed KNN variant with
order based on the size of the dataset. existing KNN variants based on training time using
Figure 4 compares the proposed KNN variant with selected large-scale datasets shown in Table 3. Figure 5a
existing KNN variants based on prediction time. Figure 4a compares the training time of the proposed KNN variant
compares the prediction time of the proposed KNN variant with the original KNN, Fig. 5b with WKNN, Fig. 5c with
with the original KNN, Fig. 4b with WKNN, Fig. 4c with KNCN, Fig. 5d with LMKNCN, Fig. 5e with LMRKNN,
KNCN, Fig. 4d with LMKNCN, Fig. 4e with LMRKNN, and Fig. 5f with RCKNCN. Clearly, the proposed KNN
and Fig. 4f with RCKNCN. The gap between the orange variant outperforms all selected variants regarding training
and blue lines decreases with an increase in dataset size, time, and the proposed variant becomes much better with
and K-Forest becomes better for the last three datasets in increased dataset size. LMRKNN gives better performance
comparison to all selected KNN variants except than the proposed KNN variant in some cases, but results
LMRKNN. However, the gap reduces with an increase in of LMRKNN are collected based on only one hyperpa-
dataset size for LMRKNN, too, but datasets are not big rameter tuning, which is neighborhood size (k), and the
enough to show the prediction time superiority of our other hyperparameter is fixed to a constant value (k = 1). If
proposed KNN variant over LMRKNN. both hyperparameters are tuned based on fivefold cross-
Conclusion: Based on Experiment 1, it can be con- validation, then the proposed variant will completely out-
cluded that the proposed KNN variant has better prediction perform LMRKNN, too. One more thing should be noted
power than existing selected KNN variants. In the case of here: the Y-axis shows the time in thousands of seconds,
multi-class classification, proposed KNN variants perform and after conversion, the difference will be in hours and
much better compared to binary-class classification. days, depending upon the size of the datasets.
However, the datasets selected for Experiment 1 are not big Table 11 compares the proposed KNN variant with
enough to show the training and prediction time superiority selected KNN variants in terms of prediction time on
of our proposed KNN variant. So, we perform Experiment selected large-scale datasets shown in Table 3, and the best
2 based on large-scale datasets to show the actual training results are shown in bold letters. The proposed KNN
and prediction time difference. variant outperforms all selected KNN variants by a large
margin except LMRKNN. LMRKNN takes less time than
5.4 Experiment – 2 the proposed KNN variant for two datasets out of selected
eight large-scale datasets, and for the rest, the proposed
Our second experiment is performed on large-scale data- KNN variant outperforms LMRKNN too.
sets, as shown in Table 3. The main focus of the second Figure 6 presents the pairwise comparison of the pre-
experiment is to show the reduction in prediction and diction time of the proposed KNN variant with other
training time. This experiment is performed only once for selected KNN variants using selected large-scale datasets.
Table 10 Training time comparison of the proposed algorithm with existing variants on large-scale datasets
Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
123
Fig. 5 Training time comparison of proposed KNN variant with selected KNN variants using large-scale datasets
Table 11 Prediction time comparison of the proposed algorithm with existing variants on large-scale datasets
Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
Figure 6a compares the prediction time of the proposed algorithm on selected large-scale datasets in Table 12. We
KNN variant with the original KNN, Fig. 6b with WKNN, cannot identify the best algorithm for selected datasets
Fig. 6c with KNCN, Fig. 6d with LMKNCN, Fig. 6e with based on only one run, but it can be concluded that, on
LMRKNN, and Fig. 6f with RCKNCN. The Y-axis of all average, the proposed algorithm is slightly better than
graphs shows the prediction time in thousands of seconds. existing algorithms.
So, we can say that the proposed KNN variant is far better Figure 7 compares the average prediction accuracy of
than the selected KNN variants in terms of prediction time. the proposed KNN variant with selected KNN variants
The main focus of Experiment 2 is to show the superi- using bar graphs, which shows that the proposed KNN
ority of the proposed KNN variant in terms of prediction variant is slightly better than selected KNN variants.
and training time. However, we also show the accuracy Conclusion: The main focus of Experiment 2 is to show
along with tuned hyperparameters for a single run of each the training and prediction time superiority of the proposed
123
Table 12 Performance
Datasets KNN WKNN KNCN LMKNCN LMRNN RCKNCN K-Forest
comparison of K-Forest with
existing variants based on large- Abalone 0.55502 0.53987 0.54785 0.52791 0.47448 0.45375 0.5319
scale datasets using an accuracy
performance metric K = 11 K = 13 K = 13 K = 11 K = 7, k = 1 K = 5, k = 1
Phoneme 0.89766 0.88286 0.89766 0.85943 0.72072 0.82984 0.90012
K=3 K=3 K=3 K=3 K = 3, k = 1 K = 13, k = 10
Texture 0.99515 0.98485 0.99515 0.99455 0.85818 0.94303 0.97697
K=9 K=3 K=9 K=3 K = 3, k = 1 K = 13, k = 10
Ring 0.87342 0.65405 0.87342 0.7455 0.50631 0.5536 0.93243
K=3 K=3 K=3 K=3 K = 3, k = 1 K = 3, k = 10
Optdigits 0.98695 0.98754 0.98695 0.99229 0.87841 0.99424 0.9828
K=9 K=3 K=9 K=3 K = 3, k = 1 K = 5, k = 10
Penbased 0.99606 0.99363 0.99515 0.99606 0.83475 0.99545 0.99121
K=7 K=3 K=9 K=3 K = 3, k = 1 K = 5, k = 10
Spambase 0.93841 0.92174 0.93986 0.94275 0.75072 0.92319 0.92899
K = 13 K=5 K=5 K=9 K = 3, k = 1 K = 11, k = 100
Letter 0.94183 0.94667 0.94383 0.96483 0.715 0.88467 0.9535
K=9 K=3 K=7 K=3 K = 3, k = 1 K = 13, k = 10
Average 0.89806 0.86390 0.89748 0.87790 0.71732 0.82222 0.89974
KNN variant. So, the comparison is made on eight large- prediction accuracy, and the proposed KNN variant gives
scale datasets, and the experiment proves that the proposed slightly better performance than the existing selected KNN
KNN variant is much better than selected KNN variants, variants.
also supported by theoretical proof. However, we also
compare the proposed KNN variant based on average
123
123
123
Table 14 Performance
Datasets KNN WKNN KNCN LMKNCN LMRKNN RCKNCN K-Forest
comparison of K-Forest with
selected KNN variants on noisy Iris_n 0.88222 0.88667 0.88 0.87111 0.71111 0.88222 0.90222
datasets
± 0.0373 ± 0.0351 ± 0.0301 ± 0.0239 ± 0.0243 ± 0.041 ± 0.0109
(7) (7) (8) (3) (11) (3)
Wine_n 0.94074 0.94444 0.9437 0.95 0.64815 0.93148 0.95185
± 0.0246 ± 0.0234 ± 0.0207 ± 0.0287 ± 0.0642 ± 0.0446 ± 0.0343
(13) (10) (5) (5) (3) (3)
Glass_n 0.67231 0.69077 0.66462 0.64308 0.54 0.67231 0.66462
± 0.0681 ± 0.0847 ± 0.0626 ± 0.063 ± 0.0597 ± 0.0506 ± 0.0369
(3) (3) (3) (4) (3) (3)
Heart_n 0.7037 0.69877 0.7321 0.73951 0.50123 0.70988 0.80247
± 0.0579 ± 0.0566 ± 0.0445 ± 0.0507 ± 0.0418 ± 0.0453 ± 0.0366
(13) (13) (9) (9) (3) (15)
Sonar_n 0.82698 0.83175 0.82222 0.83968 0.60317 0.83492 0.84127
± 0.027 ± 0.0258 ± 0.0361 ± 0.018 ± 0.0224 ± 0.286 ± 0.0362
(3) (4) (3) (3) (3) (6)
Wdbc_n 0.94854 0.94854 0.94971 0.9438 0.6386 0.92456 0.95205
± 0.0101 ± 0.0101 ± 0.0108 ± 0.0076 ± 0.0275 ± 0.0137 ± 0.0167
(5) (5) (12) (8) (3) (4)
Pima_n 0.72641 0.72424 0.73506 0.73593 0.57792 0.71645 0.74632
± 0.0318 ± 0.279 ± 0.0222 ± 0.0211 ± 0.02 ± 0.0348 ± 0.0272
(12) (12) (13) (14) (14) (13)
Yeast_n 0.51928 0.52377 0.51614 0.50852 0.31143 0.49193 0.51166
± 0.0156 ± 0.0186 ± 0.0246 ± 0.0227 ± 0.0148 ± 0.0139 ± 0.0267
(14) (14) (13) (13) (3) (15)
Ionosphere_n 0.79057 0.76604 0.88 0.85245 0.61698 0.86226 0.86038
± 0.0139 ± 0.0183 ± 0.0228 ± 0.0096 ± 0.0613 ± 0.0194 ± 0.034
(4) (4) (4) (4) (9) (5)
Page_Blocks_n 0.95371 0.95371 0.95554 0.95128 0.77832 0.94896 0.94654
± 0.0028 ± 0.0024 ± 0.0023 ± 0.0037 ± 0.0477 ± 0.0025 ± 0.0041
(5) (5) (5) (6) (3) (15)
Average 0.79644 0.79687 0.80790 0.80453 0.59269 0.79749 0.81793
Declarations
123
the Editorial process (including Editorial Manager and direct com- 15. Bentley JL (1975) Multidimensional binary search trees used for
munications with the office). He/she is responsible for communicat- associative searching. Commun ACM 18(9):509–517
ing with the other authors about progress, submissions of revisions, 16. Liu T, Andrew WM, Alexander G, Claire C (2006) New algo-
and final approval of proofs. We confirm that we have provided a rithms for efficient high-dimensional nonparametric classifica-
current, correct email address which is accessible by the corre- tion. J Mach Learn Res 7(6):75–102
sponding author. Signed by author as follows. 17. Cover T, Hart P (1967) Nearest neighbor pattern classification.
IEEE Trans Inf Theory 13(1):21–27
Consent to Publication We confirm that we have given due consid- 18. Bo C, Huchuan Lu, Wang D (2017) Weighted generalized nearest
eration to the protection of intellectual property associated with this neighbor for hyperspectral image classification. IEEE Access
work and that there are no impediments to publication, including the 5:1496–1509
timing of publication, with respect to intellectual property. In so doing 19. Gou J, Xiong T, Kuang Y (2011) A novel weighted voting for
we confirm that we have followed the regulations of our institutions K-nearest neighbor rule. J Comput 6(5):833–840
concerning intellectual property. 20. Gou J, Lan Du, Zhang Y, Xiong T (2012) A new distance-
weighted k-nearest neighbor classifier. J Inf Comput Sci
9(6):1429–1436
References 21. Dudani SA (1976) The distance-weighted k-nearest-neighbor
rule. IEEE Trans Syst Man Cybern 4:325–327
22. Bicego M, Marco L (2016) Weighted K-nearest neighbor revis-
1. Wu X, Vipin Kumar J, Quinlan R, Ghosh J, Yang Q, Motoda H, ited. In 2016 23rd International Conference on Pattern Recog-
McLachlan GJ et al (2008) Top 10 algorithms in data mining. nition (ICPR), pp. 1642–1647. IEEE.
Knowl Inform Syst 14:1–37 23. Domeniconi C, Peng J, Gunopulos D (2002) Locally adaptive
2. Zhang J, Qi H, Ji Y, Ren Y, He M, Mingxu Su, Cai X (2021) metric nearest-neighbor classification. IEEE Trans Pattern Anal
Nonlinear acoustic tomography for measuring the temperature Mach Intell 24(9):1281–1285
and velocity fields by using the covariance matrix adaptation 24. Weinberger KQ, Lawrence KS (2009) Distance metric learning
evolution strategy algorithm. IEEE Trans Instrum Meas 71:1–14 for large margin nearest neighbor classification. J Mach Learn
3. Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient KNN Res 10(2):207–244
classification with different numbers of nearest neighbors. IEEE 25. Mitani Y, Hamamoto Y (2006) A local mean-based nonpara-
Trans Neural Netw Learn Syst 29(5):1774–1785 metric classifier. Pattern Recogn Lett 27(10):1151–1159
4. Arif Ridho L, Muharman L (2020) Optimization of distance 26. Zeng Y, Yang Y, Zhao L (2009) Pseudo nearest neighbor rule for
formula in K-nearest neighbor method. Bull Electr Eng Inform pattern classification. Expert Syst Appl 36(2):3587–3595
9(1):326–338 27. Gou J, Zhan Y, Rao Y, Shen X, Wang X, He Wu (2014)
5. Zhi-Hua Z, Yang Y (2005) Ensembling local learners through- Improved pseudo nearest neighbor classification. Knowl-Based
multimodal perturbation. IEEE Trans Syst Man Cybern Part B Syst 70:361–375
(Cybern) 35(4):725–735 28. Pan Z, Wang Y, Weiping Ku (2017) A new k-harmonic nearest
6. Chomboon K, Pasapitch C, Pongsakorn T, Kittisak K, Nittaya K neighbor classifier based on the multi-local means. Expert Syst
(2015) An empirical study of distance metrics for k-nearest Appl 67:115–125
neighbor algorithm. In Proceedings of the 3rd international 29. Gou J, Ma H, Weihua Ou, Zeng S, Rao Y, Yang H (2019) A
conference on industrial application engineering, vol. 2. generalized mean distance-based k-nearest neighbor classifier.
7. Gweon H, Hao Yu (2021) A nearest neighbor-based active Expert Syst Appl 115:356–372
learning method and its application to time series classification. 30. Gou J, Qiu W, Yi Z, Yong Xu, Mao Q, Zhan Y (2019) A local
Pattern Recogn Lett 146:230–236 mean representation-based K-nearest neighbor classifier. ACM
8. Tran TM, Le X-MT, Nguyen HT, Huynh V-N (2019) A novel Trans Intel Syst Technol (TIST) 10(3):1–25
non-parametric method for time series classification based on 31. Sánchez JS, Filiberto P, Francesc JF (1997) On the use of
k-Nearest Neighbors and dynamic time warping barycenter neighbourhood-based non-parametric classifiers. Pattern Recogn
averaging. Eng Appl Artif Intel 78:173–185 Lett 18(11–13):1179–1186
9. Singh M, Jitender KC (2024) Improved software fault prediction 32. Gou J, Yi Z, Lan Du, Xiong T (2012) A local mean-based
using new code metrics and machine learning algorithms. k-nearest centroid neighbor classifier. Comput J 55(9):1058–1071
J Comput Lang 78:101253 33. Radhika T, Chandrasekar A, Vijayakumar V, Zhu Quanxin
10. Singh M, Jitender KC (2023) A hybrid approach based on (2023) Analysis of Markovian jump stochastic Cohen-Grossberg
k-nearest neighbors and decision tree for software fault predic- BAM neural networks with time delays for exponential input-to-
tion. Kuwait J Sci 50(2A):18331 state stability. Neural Process Lett 55(8):11055–11072
11. Zhan Y, Liu J, Gou J, Wang M (2016) A video semantic detection 34. Cao Yang, Chandrasekar A, Radhika T, Vijayakumar V (2024)
method based on locality-sensitive discriminant sparse repre- Input-to-state stability of stochastic Markovian jump genetic
sentation and weighted KNN. J Vis Commun Image Represent regulatory networks. Math Comput Simul 222:174–187
41:65–73 35. Aslam MS, Radhika T, Chandrasekar A, Zhu Q (2024) Improved
12. Uddin S, Ibtisham H, Haohui L, Mohammad AM, Ergun G event-triggered-based output tracking for a class of delayed net-
(2022) Comparative performance analysis of K-nearest neighbour worked T-S fuzzy systems. Int J Fuzzy Syst. https://doi.org/10.
(KNN) algorithm and its different variants for disease prediction. 1007/s40815-023-01664-1
Sci Rep 12(1):6256 36. Duraipandian M (2020) Long term evolution-self organizing
13. Taunk K, Sanjukta D, Srishti V, Aleena S (2019) A brief review network for minimization of sudden call termination in mobile
of nearest neighbor algorithm for learning and classification. radio access networks. J Trends Comput Sci Smart Technol
In 2019 international conference on intelligent computing and (TCSST) 2(02):89–97
control systems (ICCS), pp. 1255–1260. IEEE. 37. Paul A, Tejaswini K, Sasmita P, Priya CS, Biswaranjan B (2024)
14. Gou J, Sun L, Lan Du, Ma H, Xiong T, Weihua Ou, Zhan Y Performance comparison of different disease detection using
(2022) A representation coefficient-based k-nearest centroid stacked ensemble learning model. J Soft Comput Paradigm
neighbor classifier. Expert Syst Appl 194:116529 6(1):26–39
123
38. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140 43. Zhu X, Wu X, Yang Y (2004) Error detection and impact-sen-
39. Krogh A, Jesper V (1994) Neural network ensembles, cross sitive instance ranking in noisy datasets. In Proceedings of the
validation, and active learning. Adv Neural Inform Process Syst. national conference on artificial intelligence (pp. 378–384).
40. Ho TK (1998) The random subspace method for constructing Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT
decision forests. IEEE Trans Pattern Anal Mach Intell Press; 1999.
20(8):832–844
41. Ho TK (1998) Nearest neighbors in random subspaces. In Ad- Publisher’s Note Springer Nature remains neutral with regard to
vances in pattern recognition: Joint IAPR International Work- jurisdictional claims in published maps and institutional affiliations.
shops SSPR’98 and SPR’98 Sydney, Australia, August 11–13,
1998 Proceedings, pp. 640–648. Springer Berlin Heidelberg.
Springer Nature or its licensor (e.g. a society or other partner) holds
42. Derrac J, Garcia S, Sanchez L, Herrera F (2015) Keel data-
exclusive rights to this article under a publishing agreement with the
mining software tool: data set repository, integration of algo-
author(s) or other rightsholder(s); author self-archiving of the
rithms and experimental analysis framework. J Mult Valued Log
accepted manuscript version of this article is solely governed by the
Soft Comput 17:255–287
terms of such publishing agreement and applicable law.
123
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at