Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views5 pages

Machine L

Uploaded by

Saniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Machine L

Uploaded by

Saniya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine Learning Algorithms (Classification)

Q4: Explain the K-Nearest Neighbors (K-NN) classification algorithm with the help
of a suitable example. Also, discuss its advantages and disadvantages.

A4: K-Nearest Neighbors (K-NN) Classification Algorithm:

K-Nearest Neighbors (K-NN) is a non-parametric, lazy learning algorithm used for


classification and regression. In K-NN classification, the output is a class
membership. An object is classified by a majority vote of its neighbors, with the
object being assigned to the class most common among its K nearest neighbors (K is
a positive integer, typically small).

How it works:

Training Phase: There is essentially no "training" phase in the traditional sense.


The algorithm simply stores all available training data points and their
corresponding class labels. This is why it's called a "lazy" algorithm – it defers
computation until classification time.

Prediction Phase (for a new, unseen data point):

Choose K: Select a positive integer K, which represents the number of nearest


neighbors to consider.

Calculate Distance: Calculate the distance (e.g., Euclidean distance, Manhattan


distance) between the new data point and all training data points.

Find K-Nearest Neighbors: Identify the K training data points that are closest to
the new data point based on the calculated distances.

Vote for Class: For classification, count the number of data points in each class
among these K neighbors.

Assign Class: Assign the new data point to the class that has the majority vote
among the K nearest neighbors. In case of a tie, various tie-breaking rules can be
used (e.g., choose randomly, choose the class of the closest neighbor).

Suitable Example:

Imagine you have a dataset of fruits, classified as either "Apple" or "Orange,"


based on their "Sweetness" and "Crunchiness" scores (on a scale of 1-10).

Sweetness Crunchiness Fruit Type


7 8 Apple
6 7 Apple
3 2 Orange
4 3 Orange
8 6 Apple
2 4 Orange

Export to Sheets
Now, a new fruit arrives with Sweetness = 5 and Crunchiness = 5. We want to
classify it using K-NN. Let's choose K = 3.

Steps:

Calculate Distance (Euclidean Distance):


Distance =
(x
2

−x
1

)
2
+(y
2

−y
1

)
2

New Fruit (5, 5) to Apple (7, 8):


(7−5)
2
+(8−5)
2

=
2
2
+3
2

=
4+9

=
13

≈3.61

New Fruit (5, 5) to Apple (6, 7):


(6−5)
2
+(7−5)
2

=
1
2
+2
2
=
1+4

=
5

≈2.24

New Fruit (5, 5) to Orange (3, 2):


(3−5)
2
+(2−5)
2

=
(−2)
2
+(−3)
2

=
4+9

=
13

≈3.61

New Fruit (5, 5) to Orange (4, 3):


(4−5)
2
+(3−5)
2

=
(−1)
2
+(−2)
2

=
1+4
=
5

≈2.24

New Fruit (5, 5) to Apple (8, 6):


(8−5)
2
+(6−5)
2

=
3
2
+1
2

=
9+1

=
10

≈3.16

New Fruit (5, 5) to Orange (2, 4):


(2−5)
2
+(4−5)
2

=
(−3)
2
+(−1)
2

=
9+1

=
10

≈3.16

Find 3-Nearest Neighbors (ordered by distance):


Apple (6, 7) - Distance ≈2.24

Orange (4, 3) - Distance ≈2.24

Apple (8, 6) - Distance ≈3.16 (or Orange (2,4) - Distance ≈3.16, let's pick Apple
for this example)

Vote for Class:

Among the 3 nearest neighbors: 2 are "Apple" and 1 is "Orange".

Assign Class:

The new fruit is classified as an Apple.

Advantages of K-NN:

Simple and Intuitive: Easy to understand and implement.

No Training Phase: As a lazy learner, there's no explicit training, making it quick


to update the model with new data.

Non-parametric: Makes no assumptions about the underlying data distribution, which


can be useful for complex datasets.

Flexible: Can be used for both classification and regression.

Effective for Small Datasets: Can perform well when the decision boundary is
irregular.

Disadvantages of K-NN:

Computationally Expensive at Prediction Time: For large datasets, calculating the


distance to every training data point for each new prediction can be very slow.

Sensitive to the Choice of K: The performance of K-NN is highly dependent on the


value of K. A small K can be noisy, while a large K can blur decision boundaries.

Sensitive to Feature Scaling: Features with larger ranges will have a


disproportionate impact on distance calculations. Feature scaling
(normalization/standardization) is crucial.

Curse of Dimensionality: Performance degrades significantly with a high number of


features (dimensions) as distances become less meaningful in high-dimensional
spaces.

Storage Requirement: Needs to store the entire training dataset.

Imbalanced Data: Can be biased towards the majority class if the dataset is
imbalanced.

You might also like