1
MACHINE LEARNING LAB MANUAL
Experiment - 1 : Write a python program to compute central tendency
measures : Mean, Median, Mode, Mode Measure of Dispersion: Variance,
Standard Deviation
import pandas as pd
data = {
'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Smith', 'Jack',
'Lee',Chanchal', 'Gasper', 'Naviya', ‘Andres']),
'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46]),
'Ra ng': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80,
4.10, 3.65])
}
df = pd.DataFrame(data)
print(df)
# Calculating mean, median, and mode for Rating
age_mean = df['Age'].mean()
age_median = df['Age'].median()
age_mode = df['Age'].mode()
ra ng_mean = df['Ra ng'].mean()
ra ng_median = df['Ra ng'].median()
ra ng_mode = df['Ra ng'].mode()
print("Mean Age:", age_mean)
print("Median Age:", age_median)
print("Mode Age:", age_mode)
print("Mean Ra ng:", ra ng_mean)
print("Median Ra ng:", ra ng_median)
print("Mode Ra ng:", ra ng_mode)
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
2
# Calculating variance and standard deviation
age_variance = df['Age'].var()
age_standard_devia on = df['Age'].std()
ra ng_variance = df['Ra ng'].var()
ra ng_standard_devia on = df['Ra ng'].std()
print("Variance...Age:", age_variance)
print("Standard devia on...Age:", age_standard_devia on)
print("Variance...Ra ng:", ra ng_variance)
print("Standard devia on...Ra ng:", ra ng_standard_devia on)
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
3
Experiment - 2 : Study of basic Python libraries such as Statistics, Math,
Numpy and Scipy.
1. Statistics Library
• The statistics module provides functions for statistical computations like mean,
median, mode, and standard deviation.
• It is built into Python, so no extra installation is required.
• Useful for simple data analysis and numerical summaries.
• Supports working with lists, tuples, and other iterable data structures.
• Ideal for beginners to perform basic statistical calculations.
•
Example Program:
import statistics
data = ([1,2,3,4,5])
print("Mean:", statistics.mean(data))
print("Median:", statistics.median(data))
print("Mode:", statistics.mode(data))
2. Math Library
• The math module provides mathematical functions such as square root, power,
trigonometry, and logarithms.
• It is built-in and does not require installation.
• Contains constants like pi and e.
• Helps perform complex mathematical operations ef ciently.
• Ideal for engineering, physics, and general numerical computations.
Example Program:
import math
a=16
b=4
print(“a+b=“,a+b)
print(“a-b=“,a-b)
print(“a*b=“,a*b)
print(“a/b=“,a/b)
print(“a%b=“,a%b)
print("Square root:", math.sqrt(num))
fi
4
3. NumPy Library
• NumPy (Numerical Python) is used for array manipulations and numerical computations.
• Provides multi-dimensional array objects (ndarray) with fast operations.
• Supports mathematical functions like linear algebra and statistics.
• Requires installation using pip install numpy.
• Widely used in scienti c computing and machine learning.
Example Program:
import numpy as np
Data = np.array([1, 2, 3, 4, 5])
print("Mean:", np.mean(data))
print("Median:", np.mean(data))
print(arr.ndim)
4. SciPy Library
• SciPy (Scienti c Python) is built on NumPy for advanced mathematical operations.
• Contains modules for optimization, integration, interpolation, and statistics.
• Useful for scienti c and engineering applications.
• Requires installation using pip install scipy.
• Provides specialized functions like signal processing and image manipulation.
Example Program:
from scipy.special import gcd
print("GCD:", gcd(8, 12))
fi
fi
fi
5
Experiment - 3 : Study of basic Python libraries for ML application such as
Pandas and Matplotlib.
Pandas :
Pandas is a powerful Python library for data manipulation and analysis.
It provides data structures like Series (1D) and DataFrame (2D) for handling
structured data.
Pandas supports data cleaning, ltering, aggregation, and visualization with built-in
functions.
It ef ciently handles CSV, Excel, SQL, JSON, and other le formats.
Pandas is widely used in data science, nance, and machine learning for preprocessing
data.
Example Program :
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Matplotlib :
Matplotlib is a Python library for creating static, animated, and interactive visualizations.
It provides the pyplot module, which offers a MATLAB-like interface for easy plotting.
Matplotlib supports line plots, bar charts, histograms, scatter plots, and more.
It allows customization of axes, labels, legends, colors, and styles for detailed
visualization.
Widely used in data science, machine learning, and engineering for data representation.
Example Program :
# importing libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(500, 4),
columns =['a', 'b', 'c', 'd'])
fi
fi
fi
fi
6
df.plot.scatter(x ='a', y ='b')
plt.show()
Output :
7
4. Write a Python program to implement Simple Linear Regression
import numpy as np
import matplotlib.pyplot as plt
# Func on to implement simple linear regression
def simple_linear_regression(X, y):
# Add a column of ones to X for the bias term (intercept)
X = np.c_[np.ones(X.shape[0]), X]
# Normal equa on: theta = (X^T * X)^-1 * X^T * y
theta = np.linalg.inv(X.T @ X) @ X.T @ y
return theta
# Func on to predict using the learned model
def predict(X, theta):
# Add a column of ones to X for the bias term (intercept)
X = np.c_[np.ones(X.shape[0]), X]
return X @ theta
# Genera ng some example data (linear rela onship)
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # 100 data points with 1 feature
y = 4 + 3 * X + np.random.randn(100, 1) # Linear equa on: y = 4 + 3*X + noise
# Applying simple linear regression to nd theta (parameters)
theta = simple_linear_regression(X, y)
print(f"Learned Parameters (theta): \n{theta}")
ti
ti
ti
ti
fi
ti
ti
8
# Predict using the learned model
y_pred = predict(X, theta)
# Plo ng the results
plt.sca er(X, y, color='blue', label='Data Points')
plt.plot(X, y_pred, color='red', label='Regression Line')
plt.xlabel('X')
plt.ylabel('y')
plt. tle('Simple Linear Regression')
plt.legend()
plt.show()
Output :
ti
tti
tt
9
10
5. Implementation of Multiple Linear Regression for House Price
Prediction using sklearn
import numpy as np
import pandas as pd
from sklearn.model_selec on import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# Example: Hypothe cal dataset (Replace this with actual data)
# Create a pandas DataFrame with house features and prices
data = {
'Square_Feet': [1500, 1800, 2400, 3000, 3500],
'Num_Bedrooms': [3, 4, 3, 5, 4],
'Num_Bathrooms': [2, 3, 2, 3, 3],
'Age_of_House': [10, 15, 20, 5, 8],
'Price': [400000, 500000, 600000, 650000, 700000] # Target variable
}
# Create a pandas DataFrame from the data
df = pd.DataFrame(data)
# Features (X) and target variable (y)
X = df[['Square_Feet', 'Num_Bedrooms', 'Num_Bathrooms', 'Age_of_House']]
# Independent variables
y = df['Price'] # Dependent variable (house price)
# Split the dataset into training and tes ng sets (80% training, 20% tes ng)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
ti
ti
ti
ti
11
# Ini alize the Mul ple Linear Regression model
model = LinearRegression()
# Train the model on the training data
model. t(X_train, y_train)
# Make predic ons on the test set
y_pred = model.predict(X_test)
# Evaluate the model performance
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print(f"Mean Squared Error: {mse}")
print(f"Root Mean Squared Error: {rmse}")
print(f"Model Coe cients: {model.coef_}")
print(f"Intercept: {model.intercept_}")
# Visualize the true vs predicted prices (if needed, this will work well for
smaller datasets)
plt.sca er(y_test, y_pred)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red',
linestyle='--')
plt.xlabel('True Prices')
plt.ylabel('Predicted Prices')
plt. tle('True vs Predicted House Prices')
plt.show()
ti
ti
tt
fi
ti
ffi
ti
12
Output :
13
6. Implementation of Decision tree using sklearn and its parameter tuning
from sklearn.datasets import load_iris
from sklearn.model_selec on import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassi er
from sklearn.metrics import classi ca on_report, accuracy_score
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
# 1. Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# 2. Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# 3. Create a Decision Tree classi er
dt = DecisionTreeClassi er(random_state=42)
# 4. De ne parameter grid for tuning
param_grid = {
'criterion': ['gini', 'entropy'], # or 'log_loss' for newer versions
'max_depth': [None, 2, 3, 4, 5],
'min_samples_split': [2, 3, 4],
'min_samples_leaf': [1, 2, 3]
}
fi
fi
ti
fi
fi
ti
fi
14
# 5. Use GridSearchCV to nd the best parameters
grid_search = GridSearchCV(es mator=dt, param_grid=param_grid, cv=5,
scoring='accuracy')
grid_search. t(X_train, y_train)
# 6. Print the best parameters
print("Best Parameters:", grid_search.best_params_)
# 7. Evaluate the best model
best_dt = grid_search.best_es mator_
y_pred = best_dt.predict(X_test)
# 8. Print evalua on metrics
print("\nAccuracy:", accuracy_score(y_test, y_pred))
print("\nClassi ca on Report:\n", classi ca on_report(y_test, y_pred))
# 9. Visualize the decision tree
plt. gure( gsize=(12, 8))
plot_tree(best_dt,
feature_names=iris.feature_names,
class_names=iris.target_names,
lled=True,
rounded=True)
plt. tle("Decision Tree Visualiza on (Best Es mator)")
plt.show()
ti
fi
fi
fi
fi
fi
ti
ti
fi
ti
ti
ti
fi
ti
ti
15
Output
16
17
7. Implementation of KNN using sklearn
from sklearn.datasets import load_iris
from sklearn.model_selec on import train_test_split
from sklearn.neighbors import KNeighborsClassi er
from sklearn.metrics import accuracy_score, classi ca on_report
# 1. Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# 2. Split into training and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# 3. Create and train the KNN classi er
knn = KNeighborsClassi er(n_neighbors=3) # Using k=3
knn. t(X_train, y_train)
# 4. Predict the test set results
y_pred = knn.predict(X_test)
# 5. Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classi ca on Report:\n", classi ca on_report(y_test, y_pred))
fi
fi
ti
fi
ti
fi
fi
ti
fi
fi
ti
18
Output
19
8. Implementation of Logistic Regression using sklearn
# Step 1: Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.linear_model import Logis cRegression
from sklearn.model_selec on import train_test_split
from sklearn.metrics import accuracy_score, classi ca on_report
# Step 2: Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# For binary classi ca on, use only two classes
X = X[y != 2]
y = y[y != 2]
# Step 3: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Step 4: Create logis c regression model
model = Logis cRegression()
# Step 5: Train the model
model. t(X_train, y_train)
# Step 6: Predict on test set
y_pred = model.predict(X_test)
# Step 7: Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classi ca on Report:\n", classi ca on_report(y_test, y_pred))
fi
fi
ti
ti
fi
ti
ti
ti
fi
ti
ti
fi
ti
20
Output :
21
9. Implementation of K-Means Clustering
# Step 1: Import necessary libraries
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# Step 2: Create synthe c dataset
X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.60,
random_state=0)
# Step 3: Apply KMeans
kmeans = KMeans(n_clusters=3, random_state=0)
kmeans. t(X)
y_kmeans = kmeans.predict(X)
# Step 4: Plot the results
plt.sca er(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
centers = kmeans.cluster_centers_
plt.sca er(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt. tle("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
ti
tt
tt
fi
ti
22
Output :
23
10. Performance analysis of Classification Algorithms on a specific dataset
(Mini Project)
Note : Download “Credit card fraud detection” Dataset from Kaggle website
using this link : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Program :
import pandas as pd
import numpy as np
from sklearn.model_selec on import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Logis cRegression
from sklearn.ensemble import RandomForestClassi er
from sklearn.metrics import classi ca on_report, confusion_matrix,
roc_auc_score, precision_recall_curve, auc
# Load the dataset
data = pd.read_csv('creditcard.csv')
# Separate features and target
X = data.drop('Class', axis=1)
y = data['Class']
# Scale the 'Amount' and 'Time' features
scaler = StandardScaler()
X['Amount'] = scaler. t_transform(X['Amount'].values.reshape(-1, 1))
X['Time'] = scaler. t_transform(X['Time'].values.reshape(-1, 1))
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42, stra fy=y)
# Ini alize classi ers
lr_model = Logis cRegression(max_iter=1000, class_weight='balanced',
random_state=42)
rf_model = RandomForestClassi er(n_es mators=100,
class_weight='balanced', random_state=42)
# Train classi ers
lr_model. t(X_train, y_train)
rf_model. t(X_train, y_train)
ti
fi
fi
fi
fi
ti
fi
fi
ti
ti
fi
fi
ti
ti
ti
fi
24
# Predict and evaluate
models = {'Logis c Regression': lr_model, 'Random Forest': rf_model}
for name, model in models.items():
y_pred = model.predict(X_test)
y_score = model.predict_proba(X_test)[:, 1]
print(f"\n{name} Evalua on")
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("Classi ca on Report:")
print(classi ca on_report(y_test, y_pred))
print("ROC AUC Score:", roc_auc_score(y_test, y_score))
precision, recall, _ = precision_recall_curve(y_test, y_score)
pr_auc = auc(recall, precision)
print("Precision-Recall AUC:", pr_auc)
# Note: Add model saving, further tuning, or use of other classi ers like
XGBoost if required.
fi
fi
ti
ti
ti
ti
fi
25
Output :