Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views11 pages

KNN Algorithm

The K-Nearest Neighbour (KNN) algorithm is a supervised machine learning method used for classification and regression by determining the majority class of k nearest neighbors. It employs various distance metrics, such as Euclidean and Manhattan distances, to classify new data points based on their proximity to existing data. While KNN is simple and flexible, it can be computationally intensive and sensitive to irrelevant features, making it suitable for applications like image recognition and medical diagnosis.

Uploaded by

script1712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views11 pages

KNN Algorithm

The K-Nearest Neighbour (KNN) algorithm is a supervised machine learning method used for classification and regression by determining the majority class of k nearest neighbors. It employs various distance metrics, such as Euclidean and Manhattan distances, to classify new data points based on their proximity to existing data. While KNN is simple and flexible, it can be computationally intensive and sensitive to irrelevant features, making it suitable for applications like image recognition and medical diagnosis.

Uploaded by

script1712
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

KNN Algorithm

VARSHA M S
2021503057
K-Nearest Neighbour Algorithm
 KNN is a simple, supervised machine learning algorithm.
 It can be used for classification or regression tasks.
 Classifies a data point based on the majority class of its k nearest neighbors.
 It is non-parametric, meaning no assumptions are made about the data distribution.
How KNN Works
1. Select a value of k (number of neighbors).
2. Calculate the distance between the new data point and all points in the dataset.
3. Select the k nearest data points (neighbors).
4. Majority voting for classification or average for regression.
Distance Metrics
KNN uses distance metrics to determine neighbors:
 Euclidean Distance: Most common for continuous data.

 Manhattan Distance: Used when features are on different scales.

 Minkowski Distance: Generalization of both.


Choosing the Value of k
 Small k: High variance, more sensitive to noise.
 Large k: More biased, less flexible.
 Optimal k is found through cross-validation.
Example Problem
x1 x2 Class
A new point (4, 4) is provided, and the goal is to classify this point using the
1 2 A
KNN algorithm with k = 3 and Euclidean distance.
2 3 A
3 3 B Step 1: The Euclidean distance between the new point (4, 4) and all points in
the dataset is calculated.
6 5 B
Distance calculations:
7 8 B  To point (1, 2): ​
 To point (2, 3): ​
 To point (3, 3): ​
 To point (6, 5): ​
 To point (7, 8): (​
Example Problem
x1 x2 Class
Step 2: The distances are then sorted from smallest to largest:
1 2 A
2 3 A Points Distance Class
(x1,x2)
3 3 B
(3,3) 1.41 B
6 5 B
(2,3) 2.24 A
7 8 B
(6,5) 2.24 B
(1,2) 3.61 A
(7,8) 5.00 B
Example Problem
x1 x2 Class
Step 3: The nearest 3 neighbors (since k = 3) are selected:
1 2 A
• (3, 3) → Class B
2 3 A • (2, 3) → Class A
3 3 B • (6, 5) → Class B
6 5 B
Step 4: The new point is classified based on the majority class among these 3
7 8 B neighbors:
• There are 2 points classified as Class B and 1 point as Class A.

Thus, the new point (4, 4) is classified as Class B.


Pros and Cons of KNN
Pros:
 Simple and easy to implement.
 No training phase (lazy learning).
 Flexible with multi-class classification.
Cons:
 Computationally expensive for large datasets.
 Sensitive to irrelevant features.
 Requires normalization of data.
Applications of KNN
 Image Recognition: KNN is used to classify images.
 Recommendation Systems: Similarity-based recommendation.
 Medical Diagnosis: Classifying patients based on symptoms.
 Anomaly Detection: Identifying outliers in data.
Thank You

You might also like