Experiment – 9
AIM : Apply KNN algorithm for classification and regression
Description :
K-Nearest Neighbors (KNN) is a supervised learning algorithm used for both classification
and regression problems.
• For classification, KNN predicts the label of a new data point based on the majority
vote of the nearest neighbors.
• For regression, KNN predicts the value of a new data point by taking the average of
its nearest neighbors' values.
This program demonstrates:
• Classification on the Iris dataset using KNeighborsClassifier
• Regression on the California Housing dataset using KNeighborsRegressor
Step-by-Step Execution:
Step 1: Import Required Libraries
import numpy as np
import pandas as pd
from sklearn... (modules)
• Required for data handling, model building, and evaluation.
Step 2: KNN Classification using Iris Dataset
Load the Dataset
iris = load_iris()
X_cls = iris.data
y_cls = iris.target
Split Data
train_test_split(...test_size=0.2)
Train Classifier
knn_classifier = KNeighborsClassifier(n_neighbors=3)
knn_classifier.fit(...)
Make Predictions and Evaluate
accuracy_score(...), print predicted vs actual
Step 3: KNN Regression using California Housing Dataset
Load the Dataset
housing = fetch_california_housing()
X_reg = housing.data
y_reg = housing.target
Split Data
train_test_split(...test_size=0.2)
Train Regressor
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(...)
Predict and Evaluate
mean_squared_error(...), r2_score(...)
Feature KNN Classification KNN Regression
Output Type Discrete (Class Labels) Continuous (Numerical Values)
Decision Criteria Majority Voting Average of Neighbors
Handwritten digit recognition, spam House price prediction, stock price
Use Case
filtering forecasting
Computational
High (for large datasets) High (for large datasets)
Cost
Source Code:
# Import required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score
from sklearn.datasets import load_iris, fetch_california_housing
# ---------------------------------------------
# KNN CLASSIFICATION - Using Iris Dataset
# ---------------------------------------------
print("===== KNN Classification: Iris Dataset =====\n")
# Load dataset
iris = load_iris()
X_cls = iris.data
y_cls = iris.target
# Split data (80% training, 20% testing)
X_train_cls, X_test_cls, y_train_cls, y_test_cls = train_test_split(
X_cls, y_cls, test_size=0.2, random_state=42
)
# Initialize and train the classifier
knn_classifier = KNeighborsClassifier(n_neighbors=3)
knn_classifier.fit(X_train_cls, y_train_cls)
# Predict and evaluate
y_pred_cls = knn_classifier.predict(X_test_cls)
accuracy = accuracy_score(y_test_cls, y_pred_cls)
# Output classification results
print("Predicted Labels:", y_pred_cls)
print("Actual Labels :", y_test_cls)
print(f"Classification Accuracy: {accuracy * 100:.2f}%\n")
# ---------------------------------------------
# KNN REGRESSION - Using California Housing
# ---------------------------------------------
print("===== KNN Regression: California Housing Dataset =====\n")
# Load dataset
housing = fetch_california_housing()
X_reg = housing.data
y_reg = housing.target
# Split data (80% training, 20% testing)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
X_reg, y_reg, test_size=0.2, random_state=42
)
# Initialize and train the regressor
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(X_train_reg, y_train_reg)
# Predict and evaluate
y_pred_reg = knn_regressor.predict(X_test_reg)
mse = mean_squared_error(y_test_reg, y_pred_reg)
r2 = r2_score(y_test_reg, y_pred_reg)
# Output regression results
print("Sample Predicted Prices:", y_pred_reg[:5])
print("Sample Actual Prices :", y_test_reg[:5])
print(f"Mean Squared Error : {mse:.2f}")
print(f"R² Score : {r2:.2f}")
Output
===== KNN Classification: Iris Dataset =====
Predicted Labels: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
Actual Labels : [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]
Classification Accuracy: 100.00%
===== KNN Regression: California Housing Dataset =====
Sample Predicted Prices: [1.623 1.0822 2.8924 2.2456 1.669 ]
Sample Actual Prices : [0.477 0.458 5.00001 2.186 2.78 ]
Mean Squared Error : 1.12
R² Score : 0.15
Result:
The program successfully demonstrates the implementation of:
• KNN Classification on the Iris dataset with high accuracy.
• KNN Regression on the California Housing dataset with good prediction performance
as measured by Mean Squared Error and R² Score.
This proves that KNN is a versatile and easy-to-use machine learning algorithm suitable
for both classification and regression tasks.