0% found this document useful (0 votes)

16 views6 pages

Research and Implementation of Machine

This paper discusses the implementation of a KNN-based machine learning classifier for data classification, focusing on its theoretical analysis and algorithm implementation. It details the KNN method's functionality, mathematical model, and program design, demonstrating its application using a dataset from the UCI machine learning library. The results indicate an 82% accuracy rate in classification, achieving the basic goals of the research.

Uploaded by

dipti.guptaa01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Research and Implementation of Machine

Uploaded by

dipti.guptaa01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

IOP Conference Series:

Materials Science and

Engineering

PAPER • OPEN ACCESS You may also like

- Electrical property and phase transition
Research and Implementation of Machine analysis of KNN-based lead-free
ferroelectric films
Learning Classifier Based on KNN Teng Li, Song Dai, Liqiang Xu et al.

- A novel unbalanced weighted KNN based

on SVM method for pipeline defect
To cite this article: Lishan Wang 2019 IOP Conf. Ser.: Mater. Sci. Eng. 677 052038 detection using eddy current
measurements
Senxiang Lu, Yiqiao Yue, Xiaoyuan Liu et
al.

- Overall survival time prediction for

View the article online for updates and enhancements. glioblastoma using multimodal deep KNN
Zhenyu Tang, Hongda Cao, Yuyun Xu et
al.

This content was downloaded from IP address 106.215.85.226 on 25/02/2025 at 07:50

IMMAEE 2019 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 677 (2019) 052038 doi:10.1088/1757-899X/677/5/052038

Research and Implementation of Machine Learning Classifier

Based on KNN

Lishan Wang
Fujian Normal University, Fuzhou 350108, China

Abstract. Machine learning classifier is an important part of pattern recognition system;

it is also an important research field of machine learning. The main research object of
this paper is K data mining (KNN, K Nearest Neighbor) classification method, using
KNN to classify the data, and compare the classification results. The research work of
this paper mainly discusses the implementation of KNN-based machine learning
classifier, mainly focusing on the theoretical analysis of K-data mining, algorithm
implementation, and implementing KNN-based machine learning classifier.

1. Introduction
Machine learning classifier definition: The input data contains thousands of records, each record has
many attributes, and one special attribute is called class (such as high, medium and low credit). The
purpose of the machine learning classifier is to analyze the input data, and build a model, and use this
model to classify future data. Data classification technology in credit card approval, target market
positioning, medical diagnosis, fault detection, effectiveness analysis, graphics processing And in the
field of insurance fraud analysis, you can see that machine learning classifiers are widely used.
The data used for classification is a set of samples of a known category, each sample containing a
set of identical attributes. According to the role in the classification, attributes can be divided into
conditional attributes and target attributes. Thus, a sample can be expressed in the form of (X1,
X2,...Xm, Y), where Xi is a conditional attribute and Y is a target attribute. The purpose of classification
is to discover the dependencies between X1, X2, Xm… and Y, which are also called classification
models or machine learning classifiers. It can be considered that the machine learning classifier is a
function whose input is a sample of an unknown category and the output is the category of the sample.

2. K-data mining concept

KNN stands for k nearest neighbor classifications, identifying new records by a combination of K's
most recent historical records. KNN is a well-known statistical method that has been studied
intensively in pattern recognition over the past 40 years. KNN has been applied to text categorization
in early research strategies and is one of the highly operational methods of the benchmark Reuters
body. Other methods, such as LLSF, decision trees, and neural networks.
The idea of KNN is as follows: First, calculate the distance between the new sample and the
training sample, find the nearest K neighbors; then, according to the category to which the neighbor
belongs, determine the category of the new sample, if they all belong to the same category, then The
new sample also falls into this category; otherwise, each post-selection category is scored and the new
sample category is determined according to certain rules.

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
IMMAEE 2019 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 677 (2019) 052038 doi:10.1088/1757-899X/677/5/052038

Take the K neighbors of the unknown sample X, and look at which category the K neighbors
belong to, and classify X into which category. That is, among the K samples of X, K neighbors of X
are found. The KNN grows from the test sample X, continuously expanding the area until it contains
K training samples, and classifies the test sample X as the most frequently occurring category among
the most recent K training samples. For example, in the case of K=6 in Fig. 1, the test sample X is
classified into a black category according to the decision rule.

Figure 1. K Nearest Neighbor

The neighborhood classification is a lazy learning method based on the eyeball, that is, it stores all
the training samples and knows that the new samples need to be classified to establish the
classification. This is in stark contrast to decision numbers and backpropagation algorithms, which
need to construct a general model before accepting new samples to be classified. Lazy learning is
faster in training than in eager learning, but slower in classification because all calculations are
postponed until then.

3. Mathematical model of KNN algorithm

The reason for prediction using the nearest neighbor method is based on the assumption that objects of
neighbors have similar prediction values. The basic idea of the nearest neighbor algorithm is to find k
points nearest to the unknown sample in the multidimensional space Rn, and judge the class of the
unknown sample according to the categories of the k points. These k points are the k-nearest neighbors
of the unknown samples. The algorithm assumes that all instances correspond to points in n-
dimensional space. The nearest neighbor of an instance is defined according to the standard Euclidean
distance. Let the eigenvector of x be:

<a1(x), a2(x), …,an (x)>

Where ar(x) represents the rth attribute value of instance x. The distance between the two instances
xi and xj is defined as d (xi, xj), where:

n
d(xi, xj)=  (ar(xi) - ar(xj))2
r 1

In nearest neighbor learning, the discrete object classification function is f: Rn->V where V is a
finite set {v1, v2, ... vs}, ie different sets of categories. The selection of the nearest neighbor k value is
based on the number and degree of dispersion in each type of sample, and different k values can be
selected for different applications.

2
IMMAEE 2019 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 677 (2019) 052038 doi:10.1088/1757-899X/677/5/052038

If the number of sample points around the unknown sample si is small, the area covered by the k
points will be large, and vice versa. Therefore, the nearest neighbor algorithm is susceptible to noise
data, especially the effects of isolated points in the sample space. The root cause lies in the basic KNN
algorithm, in which the positions of the k nearest neighbor samples of the sample to be predicted are
equal. In a natural society, usually an object is affected by its neighbors, and the closer the object is,
the more influence it has.

4. KNN research method

The algorithm has no learning process, and predicts the category of the new sample by the samples
with known categories at the time of classification, so it belongs to the instance-based reasoning
method. If K is equal to 1, the category of the sample to be divided is the category of the nearest
neighbor, called the NN algorithm.
As long as there are enough training samples, the NN algorithm can achieve a good classification
effect. When the number of training samples approaches -∞, the classification error of the NN
algorithm is twice the optimal Bayesian error; in addition, when K approaches ∞, the classification
error of the KNN algorithm converges to the optimal Bayesian Error. The following describes the
KNN algorithm:
Input: training data set D = {(Xi, Yi), 1≤i≤N}, where Xi is the conditional attribute of the ith sample,
Yi is the category, new sample X, distance function d.
Output: Category Y of X.
For i=1 to N do
Calculate the distance d (Xi, X) between X and Xi;
End for
Sort the distance and get d (X, Xi1) ≤d(X, Xi2) ≤… ≤d(X, XiN);
Select the first K samples: S= {(Xi1, Yi1)…(XiK, YiK)};
Count the number of occurrences of each category in S and determine the category Y of X.

5. Program interface design

In the C# integrated development environment, use the form designer, control toolbox, and properties
window to create an application interface.
The requirements for each control property setting are as follows:
The form contains 4 groupBox controls, 6 TextBox controls, 2 ListBox controls, 3 Button controls,
8 label controls, 5 radioButton controls, and 1 checkBox control. The groupBox control, the TextBox
control, the label control, and the ListBox control are named by default, and the value values of the
remaining controls are as shown in Table 1.

Table 1. Name attribute value of each control

Control Name attribute value
determine ok
next next
calculation solve
numerical numeric
type value category
normalization normalization
total summation
Euclid euclidean
weights weighted

3
IMMAEE 2019 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 677 (2019) 052038 doi:10.1088/1757-899X/677/5/052038

6. Database linkage
This article uses the data provided by the UCI machine learning library to test the program. The letter
dataset is used, with 20,000 rows of data, 16 attributes, and 26 classification labels.
Create a database in the SQL server, named "datamin_problem", and then import the dataset
downloaded from UCI, letter (text form), into the SQL server by importing the database. After
importing, name the table "problem". The design view of the table is shown in Table 2.

Table 2. Training set data design table

Field name (attribute) Type
col000 varchar
col001 varchar
col002 varchar
col003 varchar
… …
col015 varchar
col016 varchar
col017 varchar

Since there are many UCI data, 1000 data is selected as the training set in this program. And select
the other 100 data in the letter data for testing.

7. Program operation and debugging

Press the F5 key to run the program and enter the following values in each input box:
(a). Enter col000 in the "property name" input box;
(b). In the "Classification Properties" group box, enter col017 in the "Name" box; enter A, B, C, ...,
Z in the "Value" box; then click the "OK" button;
(c). In the "Attribute Data" group box, select the "Value" radio button; "Name" and "New Record"
enter the value of the new record; each time you enter an attribute name and corresponding data, click
the "Next" button; Enter a test set.
(d). In the Enter K Value text box, enter 30;
(e). Select "Euclidean" in "Workaround"; click the "Calculate" button.
The results of the operation are shown in Figure 2:

Figure 2. Program running result graph

This result is the probability that the program judges that the data is a classification label such as A,
B, C..., and the highest probability is that the program judges that the data belongs to that category.
This finds 100 data from the letter data set to continue testing.
From the results of the program operation, we can see that the data obtained for the 100 data we
entered during the test is compared with the data set of the letter. The final result is different in 28 data
tests and data sets, so the correct rate reaches 82%. Therefore, the design requirements are basically

4
IMMAEE 2019 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 677 (2019) 052038 doi:10.1088/1757-899X/677/5/052038

met, and the KNN machine learning classifier is basically realized. This program can be used to
classify such data.

8. Conclusion
This paper implements the KNN machine learning classifier, and the results of the data test show that
the basic goal is achieved and the classification effect is achieved. The KNN classification algorithm is
subjective because a distance scale must be defined. Since the understanding of the distance is not
profound, the result of the classification depends entirely on the distance used. Thus, with a set of data,
two different classification algorithms will produce two A completely different classification result
usually requires experts to evaluate whether the results are valid. Since the recognition of results is
often empirical, this limits the use of various distances.

References
[1] Ji S, Xu W, Yang M, et al. 3D Convolutional Neural Networks for Human Action Recognition
[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35 (1): 221.
[2] Silla Jr C N, Freitas A A. A survey of hierarchical classification across different application
domains [J].Data Mining and Knowledge Discovery, 2011, 22 (2): 31-72.
[3] Han J, Kamber M. Data Mining: Concepts and Techniques[J]. Data Mining Concepts Models
Methods &Algorithms Second Edition, 2011, 5 (4): 1-18.
[4] Liu Y, Bi J W, Fan Z P. A method for multi-class sentiment classification based on an improved
one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm [J].
Information Sciences, 2017, 394 (9): 38-52.
[5] Gong M, Liang Y, Shi J, et al. Fuzzy C-means clustering with local information and kernel
metric for image segmentation.[J]. IEEE Transactions on Image Processing A Publication of
the IEEE Signal Processing Society, 2013, 22 (2): 573.
[6] Liu Z G, Pan Q, Dezert J, et al. Credal c-means clustering method based on belief functions
[J].Knowledge-Based Systems, 2015, 74 (1): 119-132.
[7] Fernandez-Gago C, Agudo I, Lopez J. Building trust from context similarity measures [J].
Computer Standards & Interfaces, 2014, 36 (4): 792-800.

6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
HART Guide
No ratings yet
HART Guide
174 pages
04 Unit-Iv - ML
No ratings yet
04 Unit-Iv - ML
23 pages
new90程梅洁电子商务 202111080313
No ratings yet
new90程梅洁电子商务 202111080313
12 pages
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
No ratings yet
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
6 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
CH 2
No ratings yet
CH 2
30 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
DM - MP
No ratings yet
DM - MP
15 pages
K-NN Algorithm and Clustering Analysis
No ratings yet
K-NN Algorithm and Clustering Analysis
93 pages
Unit 2
No ratings yet
Unit 2
30 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
MKNN Modified K Nearest Neighbor
No ratings yet
MKNN Modified K Nearest Neighbor
4 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
k-Nearest Neighbors Lecture Notes
No ratings yet
k-Nearest Neighbors Lecture Notes
23 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
Lecture Note #3 - PEC-CS701E
No ratings yet
Lecture Note #3 - PEC-CS701E
27 pages
Challenges in KNN Classification: Shichao Zhang
No ratings yet
Challenges in KNN Classification: Shichao Zhang
13 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Unit - II
No ratings yet
Unit - II
37 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
KNN Classifier for Data Scientists
No ratings yet
KNN Classifier for Data Scientists
16 pages
ML KN
No ratings yet
ML KN
12 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Updated K-Nearest Neighbors in Machine Learning
No ratings yet
Updated K-Nearest Neighbors in Machine Learning
11 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
Research Paper
No ratings yet
Research Paper
6 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
ML 03 Classification
No ratings yet
ML 03 Classification
15 pages
ML-LECTURE9 KNN Classification
No ratings yet
ML-LECTURE9 KNN Classification
23 pages
Machine Learning: Supervised Learning Basics
No ratings yet
Machine Learning: Supervised Learning Basics
46 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Unit 4 ML
No ratings yet
Unit 4 ML
18 pages
A Review On Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms Based On Supervised Learning
No ratings yet
A Review On Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms Based On Supervised Learning
6 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Amrendra
No ratings yet
Amrendra
9 pages
Unit 3 ML
No ratings yet
Unit 3 ML
25 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning - Javatpoint
18 pages
MachineLearning Unit-III
No ratings yet
MachineLearning Unit-III
26 pages
Data Mining Algorithms Comparison
No ratings yet
Data Mining Algorithms Comparison
32 pages
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
No ratings yet
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
2 pages
Cryptographic Hash Functions Guide
No ratings yet
Cryptographic Hash Functions Guide
56 pages
AIM CrossChex User Guide
No ratings yet
AIM CrossChex User Guide
121 pages
Manual
No ratings yet
Manual
72 pages
C Program To Implement Infix To Postfix Expression Conversion Algorithm
100% (2)
C Program To Implement Infix To Postfix Expression Conversion Algorithm
3 pages
ICT Books As Reference
100% (3)
ICT Books As Reference
47 pages
Java Stack-Based Solutions
No ratings yet
Java Stack-Based Solutions
7 pages
Tasm
100% (2)
Tasm
6 pages
Computer Systems Servicing NCII: Competency Assessment Results Summary (CARS)
No ratings yet
Computer Systems Servicing NCII: Competency Assessment Results Summary (CARS)
1 page
Sap Basis Multiple Choice Objective Type Questions
No ratings yet
Sap Basis Multiple Choice Objective Type Questions
6 pages
Visvesvaraya Technological University: Data Base Management System
No ratings yet
Visvesvaraya Technological University: Data Base Management System
7 pages
Mathematics Year 3 2021 2022
No ratings yet
Mathematics Year 3 2021 2022
9 pages
Study Material: Free Master Class Series
No ratings yet
Study Material: Free Master Class Series
22 pages
Python Assignments Set
No ratings yet
Python Assignments Set
10 pages
1 Output Based Question Part With Answer
No ratings yet
1 Output Based Question Part With Answer
6 pages
Yoder Schrag Nassi - Schart
No ratings yet
Yoder Schrag Nassi - Schart
8 pages
3D Archicad Training: Module 1 Guide
No ratings yet
3D Archicad Training: Module 1 Guide
3 pages
Easy Coding Challenges List
No ratings yet
Easy Coding Challenges List
28 pages
Capstone and Research Proposal Presentation Rubric
No ratings yet
Capstone and Research Proposal Presentation Rubric
5 pages
YOLO-PowerLite A Lightweight YOLO Model For Transmission Line Abnormal Target Detection
No ratings yet
YOLO-PowerLite A Lightweight YOLO Model For Transmission Line Abnormal Target Detection
12 pages
2.16 UG BCA NEP Syllabus 3 4th Sem 2022 23 Onwards 17 11 22
No ratings yet
2.16 UG BCA NEP Syllabus 3 4th Sem 2022 23 Onwards 17 11 22
28 pages
Srdy 00
No ratings yet
Srdy 00
61 pages
Operating Systems Lab 1 2013 Regulation
No ratings yet
Operating Systems Lab 1 2013 Regulation
116 pages
Sudoku Validator for Coders
No ratings yet
Sudoku Validator for Coders
2 pages
4 Semester Syllabus BTECH - CSE - 13.03
No ratings yet
4 Semester Syllabus BTECH - CSE - 13.03
61 pages
BMS Institute of Technology PDF
No ratings yet
BMS Institute of Technology PDF
53 pages
Exploring The Sysmaster Database INFORMIX
No ratings yet
Exploring The Sysmaster Database INFORMIX
22 pages
BTech IT
No ratings yet
BTech IT
81 pages
VISI Progress
No ratings yet
VISI Progress
4 pages
Troubleshooting IP Addressing: 1. Open A DOS Window and Ping 127.0.0.1
No ratings yet
Troubleshooting IP Addressing: 1. Open A DOS Window and Ping 127.0.0.1
20 pages