KNN Algorithm
VARSHA M S
2021503057
K-Nearest Neighbour Algorithm
KNN is a simple, supervised machine learning algorithm.
It can be used for classification or regression tasks.
Classifies a data point based on the majority class of its k nearest neighbors.
It is non-parametric, meaning no assumptions are made about the data distribution.
How KNN Works
1. Select a value of k (number of neighbors).
2. Calculate the distance between the new data point and all points in the dataset.
3. Select the k nearest data points (neighbors).
4. Majority voting for classification or average for regression.
Distance Metrics
KNN uses distance metrics to determine neighbors:
Euclidean Distance: Most common for continuous data.
Manhattan Distance: Used when features are on different scales.
Minkowski Distance: Generalization of both.
Choosing the Value of k
Small k: High variance, more sensitive to noise.
Large k: More biased, less flexible.
Optimal k is found through cross-validation.
Example Problem
x1 x2 Class
A new point (4, 4) is provided, and the goal is to classify this point using the
1 2 A
KNN algorithm with k = 3 and Euclidean distance.
2 3 A
3 3 B Step 1: The Euclidean distance between the new point (4, 4) and all points in
the dataset is calculated.
6 5 B
Distance calculations:
7 8 B To point (1, 2):
To point (2, 3):
To point (3, 3):
To point (6, 5):
To point (7, 8): (
Example Problem
x1 x2 Class
Step 2: The distances are then sorted from smallest to largest:
1 2 A
2 3 A Points Distance Class
(x1,x2)
3 3 B
(3,3) 1.41 B
6 5 B
(2,3) 2.24 A
7 8 B
(6,5) 2.24 B
(1,2) 3.61 A
(7,8) 5.00 B
Example Problem
x1 x2 Class
Step 3: The nearest 3 neighbors (since k = 3) are selected:
1 2 A
• (3, 3) → Class B
2 3 A • (2, 3) → Class A
3 3 B • (6, 5) → Class B
6 5 B
Step 4: The new point is classified based on the majority class among these 3
7 8 B neighbors:
• There are 2 points classified as Class B and 1 point as Class A.
Thus, the new point (4, 4) is classified as Class B.
Pros and Cons of KNN
Pros:
Simple and easy to implement.
No training phase (lazy learning).
Flexible with multi-class classification.
Cons:
Computationally expensive for large datasets.
Sensitive to irrelevant features.
Requires normalization of data.
Applications of KNN
Image Recognition: KNN is used to classify images.
Recommendation Systems: Similarity-based recommendation.
Medical Diagnosis: Classifying patients based on symptoms.
Anomaly Detection: Identifying outliers in data.
Thank You