0% found this document useful (0 votes)

15 views9 pages

Lab4 KNN

The document outlines a step-by-step guide for implementing a K-Nearest Neighbors (KNN) algorithm using the IRIS dataset. It covers data loading, exploratory data analysis, distance calculation, neighbor finding, voting on labels, and model evaluation. The document includes code snippets and explanations for each part of the process.

Uploaded by

ammarkusow2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views9 pages

Lab4 KNN

Uploaded by

ammarkusow2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 9

Imports

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

Load IRIS dataset

iris = datasets.load_iris()

print(iris)

As you can see the dataset is in the form of a dictionay. What are the keys of the
dictionary?

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names',

'filename', 'data_module'])

What is the value of the key data? Assign the value to a variable X

What is the shape of X?

What is the value of the key target? Assign the value to a variable y

What is the shape of y?

What is the value of the key target_names? Assign the value to a variable
target_names

What is the value of the key feature_names? Assign the value to a variable
feature_names

#Solution
X = iris['data']
y = iris['target']
feature_names = iris['feature_names']
target_names = iris['target_names']

#note: you can also get access to the elements by dot (.) access operator,
e.g.,
# X = iris.data

print(type(X))
print(type(y))
print(X.shape)
print(y.shape)
print(feature_names)
print(target_names)

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
(150, 4)
(150,)
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width
(cm)']
['setosa' 'versicolor' 'virginica']

Figure below illustrates the features and target labels for iris
dataset.

Print the 5th datapoint in your dataset X

Print the features and target label of flower 1 to 5.

Iterate over all datapoints in X and calculate the area of Sepal and Petal for each
flower in the dataset.

Exploratory Data Analysis

Box plot of all features

plt.figure()
plt.boxplot(X)
plt.ylabel("[cm]")
plt.xlabel(feature_names)
plt.show()

[]

Scatter plot for each pair of features

Plot the scater plot for the pair of first and second features

(X[:,0], X[:,1])

Dont't forget to label your axes.

hint: use c=y inside the scatter plot to color the points based on the
target labels.

#your code here

Write a function called plot_pairwise that takes the pair of feaure and their
labels and plot the scatter plot.

def plot_pair(X1, X2, x1_label , x2_label, y):

...

Use plot_pari functions and plot the scatter plot for all pairs of features.

X[:,0], X[:,1], 'Sepal Length', 'Sepal Width'

X[:,0], X[:,2], 'Sepal Length', 'Petal Length'
X[:,0], X[:,3], 'Sepal Length', 'Petal Width'
X[:,1], X[:,2], 'Sepal Width', 'Petal Length'
X[:,1], X[:,3], 'Sepal Width', 'Petal Width'
X[:,2], X[:,3], 'Petal Length', 'Petal Width'

#your code here

(Optional) The plots shown above do not have legend. To add legend to
the plot, you can use the following code snippet.

def plot_pair_with_legned(x1, x2, x1_label , x2_label, y):

plt.figure()
for i, target_name in enumerate(iris.target_names):
plt.scatter(x1[y == i], x2[y == i], label=target_name)

plt.xlabel(x1_label)
plt.ylabel(x2_label)
plt.legend()
plt.show()

plot_pair_with_legned(X[:,0], X[:,1], feature_names[0], feature_names[1], y)

[]

Histogram of each feature

Plot the histogram of each feature.

#your code here

K Nearest Neighbors (KNN)

Euclidean Distance (2D)

In geometry, the Euclidean distance is the straight-line distance

between two points.

Given two points $ P(x_1, y_1) $ and $ Q(x_2, y_2)$ in a 2D plane, the
Euclidean distance between them is calculated as follows:

$ d(P, Q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $

Example (2D)

Let's say we have two points:

- $ P(2, 2) $
- $ P_2(5, 5) $

$ d(P_1, P_2) = \sqrt{(2 - 5)^2 + (2 - 5)^2}= \sqrt{18} \approx 4.2 $

We can calculate the distance between these two points.

P = np.array([2, 2])
Q = np.array([5, 5])
distance = np.sqrt(np.sum((P - Q)**2))
distance

np.float64(4.242640687119285)

Example (3 Dimensions)

Consider two points in 3D space:

- $ P_1(1, 2, 3) $
- $ P_2(4, 0, 8) $

We can calculate the Euclidean distance as follows:

$ d(P_1, P_2) = \sqrt{(4 - 1)^2 + (0 - 2)^2 + (8 - 3)^2} $

$ d(P_1, P_2) = \sqrt{3^2 + (-2)^2 + 5^2} = \sqrt{9 + 4 + 25} =
\sqrt{38} \approx 6.16 $

# Define two points in 3D space

P1 = np.array([1, 2, 3])
P2 = np.array([4, 0, 8])

# Calculate the Euclidean distance

distance = np.sqrt(np.sum((P2 - P1)**2))

print(f'The Euclidean distance between P1 and P2 is: {distance:.2f}')

The Euclidean distance between P1 and P2 is: 6.16

Write a function that get two np arrays P and Q and return the Euclidean distance
between them.

def straight_line_distance(P, Q):

...

KNN Algorithm

KNN from scratch

0 - Look at the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,

random_state=42)

Explain each term in the cell above. X_train, X_test, y_train, y_test?

????

1 - Calculate distances

Take one sample from test set and find the distance between this sample and all
samples in the training set. In addition to the distance, you need to store the
index of the sample in the training set.

So for exaple if the distance between the test sample and the 5th sample in the
training set is 3.5, you need to store (5, 3.5).

test_instance = X_test[0]

distances = [] # append the (index, distance) tuples to this list

# your code here

Write a function called calculate_distances that takes the test sample and the
training set and return the distances and the indices of the training samples.

def calculate_distances(test_instance, X_train):

#return distances
...

What you pass as input to the function calculate_distances? What you get as output
when you call this function?

????
What is shape of input arrays to the function calculate_distances? What is the
shape of output?

???

2 - Find neighbors

Step 1: Sort the (index, distance) tuples based on distance value in

anascending order.

distances = calculate_distances(test_instance, X_train)

distances.sort(key=lambda x: x[1])
distances

[(34, np.float64(0.22360679774997896)),
(45, np.float64(0.30000000000000027)),
(28, np.float64(0.5099019513592785)),
(35, np.float64(0.5099019513592788)),
(66, np.float64(0.5196152422706639)),
(47, np.float64(0.5291502622129183)),
(17, np.float64(0.5830951894845297)),
(36, np.float64(0.6164414002968978)),
(65, np.float64(0.6244997998398398)),
(41, np.float64(0.6480740698407859)),
(48, np.float64(0.6999999999999995)),
(70, np.float64(0.7071067811865478)),
(63, np.float64(0.728010988928052)),
(23, np.float64(0.741619848709566)),
(14, np.float64(0.754983443527075)),
(68, np.float64(0.774596669241483)),
(73, np.float64(0.7874007874011811)),
(0, np.float64(0.8124038404635955)),
(50, np.float64(0.8124038404635965)),
(9, np.float64(0.8602325267042631)),
(60, np.float64(0.9273618495495711)),
(18, np.float64(0.9433981132056598)),
(67, np.float64(0.9643650760992956)),
(20, np.float64(0.9746794344808962)),
(5, np.float64(0.9746794344808963)),
(37, np.float64(1.0049875621120894)),
(42, np.float64(1.0440306508910553)),
(2, np.float64(1.0535653752852738)),
(64, np.float64(1.0954451150103324)),
(62, np.float64(1.1045361017187258)),
(8, np.float64(1.1575836902790226)),
(44, np.float64(1.224744871391589)),
(43, np.float64(1.296148139681572)),
(11, np.float64(1.2999999999999998)),
(71, np.float64(1.3490737563232036)),
(38, np.float64(1.3490737563232043)),
(31, np.float64(1.407124727947029)),
(40, np.float64(1.4247806848775015)),
(1, np.float64(1.438749456993816)),
(52, np.float64(1.5556349186104048)),
(56, np.float64(1.6186414056238647)),
(29, np.float64(1.6278820596099706)),
(58, np.float64(1.6431676725154982)),
(16, np.float64(1.7349351572897476)),
(74, np.float64(1.8138357147217057)),
(55, np.float64(1.8165902124584952)),
(24, np.float64(1.8493242008906932)),
(4, np.float64(1.8601075237738276)),
(54, np.float64(1.8973665961010275)),
(32, np.float64(1.9157244060668017)),
(15, np.float64(1.997498435543818)),
(61, np.float64(2.0346989949375804)),
(51, np.float64(2.090454496036687)),
(19, np.float64(2.4020824298928627)),
(69, np.float64(3.2939338184001206)),
(3, np.float64(3.3674916480965473)),
(13, np.float64(3.4161381705077445)),
(39, np.float64(3.551056180912941)),
(49, np.float64(3.5623026261113755)),
(53, np.float64(3.5623026261113755)),
(10, np.float64(3.5735136770411273)),
(12, np.float64(3.5791060336346563)),
(26, np.float64(3.6318039594669758)),
(6, np.float64(3.6537651812890224)),
(59, np.float64(3.6565010597564442)),
(25, np.float64(3.685105154537656)),
(57, np.float64(3.765634076752546)),
(30, np.float64(3.782856063875548)),
(7, np.float64(3.823610858861032)),
(33, np.float64(3.8314488121336034)),
(72, np.float64(3.844476557348217)),
(21, np.float64(3.845776904605882)),
(46, np.float64(3.8961519477556315)),
(27, np.float64(3.9357337308308855)),
(22, np.float64(4.177319714841085))]

Step 2: Select the first k elements of the sorted list. And, store the
index of these k elements in a list.

k = 5
distances[:k]

[(34, np.float64(0.22360679774997896)),
(45, np.float64(0.30000000000000027)),
(28, np.float64(0.5099019513592785)),
(35, np.float64(0.5099019513592788)),
(66, np.float64(0.5196152422706639))]

Extract the index of the k nearest neighbors from (index, distance) tuples.

neighbor_index =[]
# your code here

Step 3: Find the labels of these top k samples from y_train array.

neighbor_label = []
#your code here

Now write a function find_neighbors to do all the steps above from 1 to 3.

def find_neighbors(test_instance, X_train, y_train, k):

"""
Inputs
test_instance: One data point form test set
X_train: train dataset
y_train: train labels
k: number of neighbours

Output
neighbor_label: list of k neighbours labels
"""
#your code here

What you pass as input to the function find_neighbors? What you get as output when
you call this function?

???

What is shape of input arrays to the function find_neighbors? What is the shape of
output?

???

Explain what operations are done inside the function find_neighbors to calculate
the label of k nearest neighbors?

???

3 - Vote on labels

You have this function to vote on labels of the k nearest neighbors.

def vote_on_labels(neighbor_label):
prediction_dict = {}
for label in neighbor_label:
if label in prediction_dict:
prediction_dict[label] += 1
else:
prediction_dict[label] = 1
prediction = max(prediction_dict, key=prediction_dict.get)
return prediction

y_pred = vote_on_labels(neighbor_label)
y_pred

np.int64(1)

What you pass as input to the function vote_on_label? What you get as output when
you call this function?

????

What is shape of input arrays to the function vote_on_label? What is the shape of
output?

???

4 - put it all together

Now iterate over all datapoints of X_test and calculate their label.

y_pred = []
#your code here

Turn code into a function KNN that takes the training set, the target labels of the
training set, the test set, and the value of k and return the predicted labels of
the test set.

def KNN(X_train, y_train, X_test, k):

...

5 - Evaluate the model

Finally, calculate the accuracy of the KNN algorithm.

y_test == y_pred

array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, False, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, False, True, True,
True, True, True, True, True, True, True, True, True,
False, True, True])

accuracy = sum(y_test == y_pred) / len(y_test) #takes True as 1 and False as 0

print(f"accuracy: {accuracy * 100} %")

accuracy: 94.66666666666667 %

Turn your code into a function evaluate that takes the predicted labels and the
true labels and return the accuracy of the model.

def evaluate(y_test, y_pred):

# your code here
...

KNN in Scikit-Learn

knn_model = KNeighborsClassifier(n_neighbors=4) # You can change the value of

'k' as needed.
knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 93.33%

(Optional) 6 - Hyperparameter tuning

So far we have used k=3. Now, we are going to find the best value of k
for the KNN algorithm.

K = [1, 2, 3, 4, 5, 6, 7, 8]
my_accs = []
# your code here

plot the accuracy of the model for different values of k with

scikit-learn and compare the results with the results from the scratch
implementation.

K = [1, 2, 3, 4, 5, 6, 7, 8]
sklearn_accs = []
#your code here

Can you justify the difference between the results of the two
implementations?

ColorGATE RIP-Software Release Notes 8.00 Build 5055
No ratings yet
ColorGATE RIP-Software Release Notes 8.00 Build 5055
34 pages
Apple iPhone 6S Plus Invoice Receipt
No ratings yet
Apple iPhone 6S Plus Invoice Receipt
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
V
No ratings yet
V
8 pages
Machine Learning Programs
No ratings yet
Machine Learning Programs
10 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
ML#07
No ratings yet
ML#07
21 pages
ML Programs
No ratings yet
ML Programs
14 pages
M PDF
No ratings yet
M PDF
13 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Implementing KNN Algorithm On The Iris Dataset
No ratings yet
Implementing KNN Algorithm On The Iris Dataset
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
12 pages
DSM 2
No ratings yet
DSM 2
7 pages
Experiment 1111
No ratings yet
Experiment 1111
25 pages
Wa0003
No ratings yet
Wa0003
16 pages
DSM 1
No ratings yet
DSM 1
6 pages
DS Report
No ratings yet
DS Report
11 pages
Iris Dataset Analysis with KNN & K-Means
No ratings yet
Iris Dataset Analysis with KNN & K-Means
6 pages
ML Experiment WithDataset
No ratings yet
ML Experiment WithDataset
23 pages
Machine Learning Pract
No ratings yet
Machine Learning Pract
7 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
Lab Manual ML
No ratings yet
Lab Manual ML
23 pages
Aam Codes
No ratings yet
Aam Codes
8 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Lect7 Skrearing
No ratings yet
Lect7 Skrearing
23 pages
Week 6 (PCA, SVD, LDA)
No ratings yet
Week 6 (PCA, SVD, LDA)
14 pages
K-Nearest Neighbor: General Gist
No ratings yet
K-Nearest Neighbor: General Gist
14 pages
ML Lab Manual
No ratings yet
ML Lab Manual
43 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Argha's ML LAB - 240927 - 121838
No ratings yet
Argha's ML LAB - 240927 - 121838
13 pages
Strangers
No ratings yet
Strangers
8 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
34 pages
Data Science with Python Tools
No ratings yet
Data Science with Python Tools
1 page
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Minor Assignment 4
No ratings yet
Minor Assignment 4
17 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
BCSL606 Machine Learning Lab
No ratings yet
BCSL606 Machine Learning Lab
33 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
ML Spy Programs
No ratings yet
ML Spy Programs
16 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
No ratings yet
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
4 pages
ML 3
No ratings yet
ML 3
24 pages
ML Manual
No ratings yet
ML Manual
30 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
1 page
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
No ratings yet
Assignment #1: K Nearest Neighbor Classifier: Name: Srikanth Mujjiga (Roll No: 2015-50-831
8 pages
Machine Learning Lab Manaul BCSL606
No ratings yet
Machine Learning Lab Manaul BCSL606
27 pages
External
No ratings yet
External
11 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
DSBDA6
No ratings yet
DSBDA6
6 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
ML Lab Manual for CSE Students
No ratings yet
ML Lab Manual for CSE Students
32 pages
Pandas
No ratings yet
Pandas
21 pages
Maintenance Manual mb491 PDF
No ratings yet
Maintenance Manual mb491 PDF
298 pages
Morphological Analysis Guide
No ratings yet
Morphological Analysis Guide
5 pages
E-Guard: Home Security for Cairo
No ratings yet
E-Guard: Home Security for Cairo
23 pages
Chapter Three: Key System Applications For The Digital Age
No ratings yet
Chapter Three: Key System Applications For The Digital Age
37 pages
Bits ZG553 Ec-2r First Sem 2019-2020
No ratings yet
Bits ZG553 Ec-2r First Sem 2019-2020
2 pages
Independent Speed Test Analysis of 4G Mobile Networks Performed by DIKW Consulting
No ratings yet
Independent Speed Test Analysis of 4G Mobile Networks Performed by DIKW Consulting
50 pages
FRST
No ratings yet
FRST
19 pages
UTS - Lec 11 - Digital Self - Panganiban
No ratings yet
UTS - Lec 11 - Digital Self - Panganiban
13 pages
Siprotec 7sa511 Distance Protection Relay: Function Overview
No ratings yet
Siprotec 7sa511 Distance Protection Relay: Function Overview
3 pages
Crash 2021 01 23 - 14.57.33 Client
No ratings yet
Crash 2021 01 23 - 14.57.33 Client
5 pages
CORVETTE 14L PV 200813 1510 Locked
No ratings yet
CORVETTE 14L PV 200813 1510 Locked
85 pages
Research Paper
No ratings yet
Research Paper
5 pages
en Safety Manual VEGASWING 61 63 Two Wire (8 16 MA) With SIL
No ratings yet
en Safety Manual VEGASWING 61 63 Two Wire (8 16 MA) With SIL
20 pages
Gate Controlled Switch
No ratings yet
Gate Controlled Switch
14 pages
JioFiber Tariff For Business
No ratings yet
JioFiber Tariff For Business
1 page
Rolltech Rings: Integrated Technology Solutions From SMS Group
No ratings yet
Rolltech Rings: Integrated Technology Solutions From SMS Group
8 pages
Corp Internet Banking FAQs
No ratings yet
Corp Internet Banking FAQs
2 pages
C 5750 Users Guide
No ratings yet
C 5750 Users Guide
105 pages
Conducting Cambridge IGCSE ICT (0417) Practical Test Instructions
No ratings yet
Conducting Cambridge IGCSE ICT (0417) Practical Test Instructions
5 pages
IARPA Cyber-Attack Automated Unconventional Sensor Environment (CAUSE)
No ratings yet
IARPA Cyber-Attack Automated Unconventional Sensor Environment (CAUSE)
93 pages
The Meshing Sequence: Meshing With Default Settings
No ratings yet
The Meshing Sequence: Meshing With Default Settings
9 pages
Android App Development Exercises
No ratings yet
Android App Development Exercises
89 pages
18eln mergedPDFdocs PDF
100% (1)
18eln mergedPDFdocs PDF
125 pages
Dsei30 06a
No ratings yet
Dsei30 06a
3 pages
Computerised Accounting 2019
No ratings yet
Computerised Accounting 2019
2 pages
SWT 3000 Teleprotection Technical Data
No ratings yet
SWT 3000 Teleprotection Technical Data
8 pages
RX1 Getting Started
No ratings yet
RX1 Getting Started
60 pages
Labppaper
No ratings yet
Labppaper
3 pages

Lab4 KNN

Uploaded by

Lab4 KNN

Uploaded by

Imports

Load IRIS dataset

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names',

What is the shape of X?

What is the shape of y?

Print the 5th datapoint in your dataset X

Print the features and target label of flower 1 to 5.

Exploratory Data Analysis

Box plot of all features

Scatter plot for each pair of features

Dont't forget to label your axes.

#your code here

def plot_pair(X1, X2, x1_label , x2_label, y):

X[:,0], X[:,1], 'Sepal Length', 'Sepal Width'

#your code here

def plot_pair_with_legned(x1, x2, x1_label , x2_label, y):

plot_pair_with_legned(X[:,0], X[:,1], feature_names[0], feature_names[1], y)

Histogram of each feature

Plot the histogram of each feature.

#your code here

K Nearest Neighbors (KNN)

Euclidean Distance (2D)

In geometry, the Euclidean distance is the straight-line distance

$ d(P, Q) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} $

Let's say we have two points:

$ d(P_1, P_2) = \sqrt{(2 - 5)^2 + (2 - 5)^2}= \sqrt{18} \approx 4.2 $

We can calculate the distance between these two points.

Consider two points in 3D space:

We can calculate the Euclidean distance as follows:

$ d(P_1, P_2) = \sqrt{(4 - 1)^2 + (0 - 2)^2 + (8 - 3)^2} $

# Define two points in 3D space

# Calculate the Euclidean distance

print(f'The Euclidean distance between P1 and P2 is: {distance:.2f}')

The Euclidean distance between P1 and P2 is: 6.16

def straight_line_distance(P, Q):

KNN from scratch

0 - Look at the data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,

distances = [] # append the (index, distance) tuples to this list

def calculate_distances(test_instance, X_train):

Step 1: Sort the (index, distance) tuples based on distance value in

distances = calculate_distances(test_instance, X_train)

Now write a function find_neighbors to do all the steps above from 1 to 3.

def find_neighbors(test_instance, X_train, y_train, k):

You have this function to vote on labels of the k nearest neighbors.

4 - put it all together

def KNN(X_train, y_train, X_test, k):

5 - Evaluate the model

Finally, calculate the accuracy of the KNN algorithm.

accuracy = sum(y_test == y_pred) / len(y_test) #takes True as 1 and False as 0

def evaluate(y_test, y_pred):

knn_model = KNeighborsClassifier(n_neighbors=4) # You can change the value of

accuracy = accuracy_score(y_test, y_pred)

(Optional) 6 - Hyperparameter tuning

plot the accuracy of the model for different values of k with

You might also like