ROCHIS VALLEY, MANIKBANDAR
NIZAMABD -503 003
CERTIFICATE
NAME OF THE LABORATORY________________________________
ACADEMIC YEAR:20___-20___
Certified that is the bonafide record of work done in the_______________
____________laboratory by Mr/Ms _______________________________
__________________________of____Year B.Tech, ___SEM____Branch
With HALLTICKET NO:_____________________during Academic year
20___-20___ and has performed_____no.Of Experiments out of_____no.
Of experiment under my Supervision
LECTURER INCHARGE HEAD OF THE DEPARTMENT
WITH SEAL
DATE EXTERNAL EXAMINER
VIJAY RURAL ENGINEERING COLLEGE
MACHINE LEARNING LAB INDEX
DATE OF
DATE OF LECTURER
S.NO NAME OF THE EXPERIMENT EXPERIMENT PAGE NO. REMARK
SUBMISSION SIGN
PERFORMED
Write a python program to compute Central
Tendency Measures: Mean, Median,
1
Mode Measure of Dispersion: Variance,
Standard Deviation
Study of Python Basic Libraries such as
2
Statistics, Math, Numpy and Scipy
Study of Python Libraries for ML application
3
such as Pandas and Matplotlib
Write a Python program to implement Simple
4
Linear Regression
Implementation of Multiple Linear Regression
5
for House Price Prediction using sklearn
Implementation of Decision tree using sklearn
6
and its parameter tuning
7 Implementation of KNN using sklearn
Implementation of Logistic Regression using
8
sklearn
9 Implementation of K-Means Clustering
Performance analysis of Classification
10
Algorithms on a specific dataset (Mini Project)
MEACHINE LEARNING
LAB MANUAL
III - II
1)Write a python program to compute Central Tendency Measures: Mean, Median,
Mode Measure of Dispersion: Variance, Standard Deviation
Program:
import statistics
def ctm(data):
if not data:
return("no data found")
mean = statistics.mean(data)
median = statistics.median(data)
try:
mode = statistics.mode(data)
except statistics.StatisticsError:
mode= "No unique mode found"
variance = statistics.variance(data)
sd = statistics.stdev(data)
print(f"mean: {mean}")
print(f"median: {median}")
print(f"mode: {mode}")
print(f"varience: {variance}")
print(f"standard division: {sd}")
if __name__ == "__main__":
data = [10,20,30,40,40,50,60,70,80,90]
ctm(data)
Output:
mean: 49
median: 45.0
mode: 40
varience: 676.6666666666666
standard division: 26.01281735350223
2) Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy
1. Statistics Library
The statistics module in Python provides functions for calculating mathematical statistics of
numeric data.
Key Features:
Central Tendency Measures:
o mean(data): Arithmetic mean.
o median(data): Median value.
o mode(data): Most common value.
Spread Measures:
o variance(data): Variance of the data.
o stdev(data): Standard deviation.
Additional Functions:
o median_low(data): Low median of data.
o median_high(data): High median of data.
o harmonic_mean(data): Harmonic mean of data.
Example:
import statistics as stats
data = [1, 2, 2, 3, 4]
print("Mean:", stats.mean(data))
print("Median:", stats.median(data))
print("Mode:", stats.mode(data))
2. Math Library
The math module provides access to mathematical functions defined by the C standard.
Key Features:
Basic Math Operations:
o sqrt(x): Square root.
o pow(x, y): x raised to the power y.
o factorial(x): Factorial of x.
Trigonometric Functions:
o sin(x), cos(x), tan(x): Trigonometric functions.
Logarithmic Functions:
o log(x, base): Logarithm of x to the specified base.
o log10(x): Base-10 logarithm.
Constants:
o pi: Mathematical constant π.
o e: Euler's number.
Example:
import math
print("Square root of 16:", math.sqrt(16))
print("Value of Pi:", math.pi)
print("Sine of 90 degrees:", math.sin(math.radians(90)))
3. NumPy
NumPy is a powerful library for numerical computations.
Key Features:
Arrays:
o numpy.array(): Create arrays.
o numpy.zeros(shape): Create an array of zeros.
o numpy.ones(shape): Create an array of ones.
Mathematical Operations:
o Element-wise operations on arrays (+, -, *, /).
o Linear algebra functions (dot, cross, linalg).
Statistical Functions:
o numpy.mean(): Mean of array elements.
o numpy.std(): Standard deviation.
o numpy.median(): Median value.
Indexing and Slicing:
o Access specific elements or subarrays.
Example:
import numpy as np
arr = np.array([1, 2, 3, 4])
print("Array:", arr)
print("Mean:", np.mean(arr))
print("Standard Deviation:", np.std(arr))
4. SciPy
SciPy builds on NumPy and provides additional functionality for scientific computing.
Key Features:
Optimization:
o scipy.optimize.minimize(): Minimize a scalar function.
Integration:
o scipy.integrate.quad(): Numerical integration.
Linear Algebra:
o scipy.linalg.solve(): Solve linear systems.
Statistics:
o scipy.stats: Statistical distributions and functions.
Signal Processing:
o scipy.signal: Signal processing utilities.
Example:
from scipy import stats
data = [1, 2, 2, 3, 4]
print("Mode:", stats.mode(data).mode[0])
Applications:
1. Statistics: Data analysis and summarization.
2. Math: Solving equations, trigonometry, and logarithmic calculations.
3. NumPy: High-performance array manipulation.
4. SciPy: Advanced computation in engineering, machine learning, and scientific domains.
3) Study of Python Libraries for ML application such as Pandas and Matplotlib
Python libraries like Pandas and Matplotlib are essential for Machine Learning (ML)
applications, as they help with data manipulation, analysis, and visualization. Here’s a detailed
overview:
1. Pandas
Pandas is a powerful library for data manipulation and analysis. It provides data structures like
Series and DataFrame, which are widely used in ML for preprocessing and exploration.
Key Features:
Data Structures:
o Series: One-dimensional labeled array.
o DataFrame: Two-dimensional labeled data structure (like a table).
Data Manipulation:
o read_csv(), read_excel(): Load datasets from files.
o to_csv(), to_excel(): Save datasets to files.
o Filtering and indexing using .loc[] and .iloc[].
Data Cleaning:
o Handling missing values: dropna(), fillna().
o Duplicates: drop_duplicates().
Data Analysis:
o Aggregation: groupby(), pivot_table().
o Statistical methods: .mean(), .std(), .describe().
Integration with ML Libraries:
o Easily convert DataFrames to NumPy arrays or directly use them in ML libraries
like Scikit-learn.
Example:
import pandas as pd
# Load data
data = pd.read_csv('sample.csv')
# Display first few rows
print(data.head())
# Basic statistics
print(data.describe())
# Handling missing values
data.fillna(0, inplace=True)
2. Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations.
It is often used to visualize data trends and patterns in ML.
Key Features:
Basic Plotting:
o plot(): Line plots.
o scatter(): Scatter plots.
o bar(): Bar charts.
Customizations:
o Title, labels, and legends: title(), xlabel(), ylabel(), legend().
o Colors, markers, and styles.
Subplots:
o Multiple plots in one figure: subplot().
Integration:
o Works seamlessly with Pandas: Direct plotting from DataFrames.
Example:
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Line plot
plt.plot(x, y, label='Line Plot')
plt.title('Basic Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()
Combined Use of Pandas and Matplotlib in ML:
The combination of Pandas and Matplotlib is particularly useful in the exploratory data analysis
(EDA) phase of ML, where you examine your dataset to identify trends, correlations, and
anomalies.
Example: Data Analysis and Visualization
import pandas as pd
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('sample.csv')
# Check missing values
print(data.isnull().sum())
# Visualize a specific column
data['Age'].hist(bins=10)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()
# Scatter plot
plt.scatter(data['Height'], data['Weight'], color='blue', alpha=0.5)
plt.title('Height vs. Weight')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()
Applications in ML:
1. Pandas:
o Preprocessing: Data cleaning, normalization, and transformation.
o Feature engineering: Creating new features from existing ones.
o Handling time-series data.
2. Matplotlib:
o Visualizing data distributions, trends, and outliers.
o Understanding relationships between features.
o Plotting model performance (e.g., training and validation loss curves)
4)Write a Python program to implement Simple Linear Regression
PROGRAM
Example as pizza
import statistics
#defining sizes
data_items_collected = int(input("enter the parameter of data:"))
pizza_sizes = []
for i in range (data_items_collected):
size_of_pizza = int(input("enter the pizza sizes: "))
pizza_sizes.append(size_of_pizza)
print("list of pizza sizes collected are: ")
for size in pizza_sizes:
print(size)
#defining prizes
prizes = []
for i in range (data_items_collected):
no_of_prizes = int(input("enter the pizza prizes: "))
prizes.append(no_of_prizes)
print("list of pizza prizes collected are ")
for prize in prizes:
print(prize)
#mean of sizes(x)n
x_mean = statistics.mean(pizza_sizes)
print(f"mean of sizes:{x_mean}")
#mean of costs(y)
y_mean = statistics.mean(prizes)
print(f"mean of prizes:{y_mean}")
#diviation of sizes
diviation_sizes = []
for i in range (data_items_collected):
dev_x = pizza_sizes[i]-x_mean
diviation_sizes.append(dev_x)
print("diviation of x: ")
for dev_s in diviation_sizes:
print(dev_s)
#diviation of prizes
diviation_prizes = []
for i in range (data_items_collected):
dev_y = prizes[i]-y_mean
diviation_prizes.append(dev_y)
print("diviation of y:")
for dev_p in diviation_prizes:
print(dev_p)
#product of divitions(pod)
product_of_diviation = []
for i in range (data_items_collected):
p_o_d = diviation_sizes[i] * diviation_prizes[i]
product_of_diviation.append(p_o_d)
print("product_of_diviation")
for pod in product_of_diviation:
print(pod)
#sum of product
sum_of_product_of_diviation = sum(product_of_diviation)
print(f"sum of the product: {sum_of_product_of_diviation}")
#square_of_diviation_of_sizes
square_of_diviation_of_sizes = []
for i in range (data_items_collected):
sod = diviation_sizes[i]** 2
square_of_diviation_of_sizes.append(sod)
print("square_of_diviation_of_sizes:")
for sodos in square_of_diviation_of_sizes:
print(sodos)
#sum of square_of_diviation_of_sizes_x(sod)
sum_of_square_of_diviation_of_sizes_x = sum(square_of_diviation_of_sizes)
print(f"sum of the product: {sum_of_square_of_diviation_of_sizes_x}")
#m vlaue
slopp_m = sum_of_product_of_diviation/sum_of_square_of_diviation_of_sizes_x
print(f"m:{slopp_m}")
# Y = m * mean of x = > (mean of y)-m * (mean of x) final output is denoted as flag
flag = y_mean - slopp_m * x_mean
print(f"flag value:{flag}")
#doing a pridiction
new_size = int(input("enter the size of pizza you created: "))
prize_pridiction = slopp_m * new_size + flag
print(f"size of {new_size} pizza can be: {prize_pridiction} ")
OUTPUT
enter the parameter of data:3
enter the pizza sizes: 8
enter the pizza sizes: 10
enter the pizza sizes: 12
list of pizza sizes collected are:
10
12
enter the pizza prizes: 10
enter the pizza prizes: 13
enter the pizza prizes: 16
list of pizza prizes collected are
10
13
16
mean of sizes:10
mean of prizes:13
diviation of x:
-2
diviation of y:
-3
product_of_diviation
sum of the product: 12
square_of_diviation_of_sizes:
sum of the product: 8
m:1.5
flag value:-2.0
enter the size of pizza you created: 20
size of 20 pizza can be: 28.0
5. Implementation of Multiple Linear Regression for House Price Prediction using sklearn
Program:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
# Sample Data
data = {
'square_feets': [1000, 1500, 2000, 2500, 3000],
'bedrooms': [1, 2, 3, 4, 5],
'bathrooms': [1, 2, 2, 2.5, 3],
'prize': [5000, 8000, 12000, 18000, 25000]
df = pd.DataFrame(data)
print(df)
# Features and Target Variable
X = df[['square_feets', 'bedrooms', 'bathrooms']]
y = df[['prize']]
# Train the Model
model = LinearRegression()
model.fit(X, y)
# User Input
sq_feet = float(input("Enter the area of the plot: "))
bdrooms = float(input("Enter the number of bedrooms: ")) # Converted to float
btrooms = float(input("Enter the number of bathrooms: ")) # Converted to float
# Predict Rent
predict_rent = model.predict(np.array([[sq_feet, bdrooms, btrooms]]))
# Print Prediction
print(f"Predicted rent for {sq_feet} square feet, {bdrooms} bedrooms, and {btrooms} bathrooms
is: ₹{predict_rent[0][0]:,.2f}")
Output:
6. Implementation of Decision tree using sklearn and its parameter tuning
Program
from sklearn.tree import DecisionTreeClassifier
b_tech = ["cse","aiml","ece","eee","mech","civil"]
bse = ["mpcs", "mecs", "ba", "ca", "bba"]
X=[
[1, 450],
[1, 800],
[0, 0],
[1, 250],
[1, 600],
]
y = [0, 1, 2, 3, 1]
clf = DecisionTreeClassifier()
clf.fit(X, y)
emcet_rank = int(input("did you get rank(yes(1)/no(0))"))
if emcet_rank == 1 or emcet_rank == 0:
score_1 = int(input("enter the score you got in emcet: "))
prediction = clf.predict([[emcet_rank, score_1]])[0]
if prediction == 0:
print("You can apply for this courses::", b_tech[len(b_tech)//2:])
elif prediction == 1:
print("You can apply for this courses:", b_tech[:len(b_tech)//2])
elif prediction == 2:
print(bse)
elif prediction == 3:
print("You are not eligible for application")
else:
print("Invalid decision")
else:
print("Invalid input")
Output:
7. Implementation of KNN using sklearn
Program:
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
marks = [95, 75, 69, 51, 39, 21, 0]
grade = ["a1", "a", "b", "c", "d", "e", "f"]
testing_score = int(input("enter your score: "))
marks_array = np.array(marks).reshape(-1, 1)
k=3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(marks_array, grade)
distances, indices = knn.kneighbors([[testing_score]])
print(f"Distances: {distances[0]}")
print(f"Indices of the nearest neighbors: {indices[0]}")
predicted_grade = [grade[i] for i in indices[0]]
print(f"The grade for {testing_score} marks is: {predicted_grade[0]}")
Output:
8. Implementation of Logistic Regression using sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd
data = {
'in_study_hours' : [2,4,6,8,10,12],
'y_n':[0,0,1,1,1,1]
}
df = pd.DataFrame(data)
print(data)
x = df[['in_study_hours']]
y = df['y_n']
model = LogisticRegression()
model.fit(x,y)
study_hours = float(input("enter the no of study hours: "))
predict = model.predict(np.array([[study_hours]]))
if predict == 1:
print("pass")
else:
print("fail")
Output:
9. Implementation of K-Means Clustering
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
data = {
'sr_no':["cr1","cr2","cr3","cr4","cr5","cr6","cr7"],
'age':[20,40,30,18,28,35,45],
'amount':[500,1000,800,300,1200,1400,1800]
}
df = pd.DataFrame(data)
print(df)
x = df[['age','amount']].values
nc = KMeans(n_clusters=3)
nc.fit(x)
test_data = np.array([[13,750]])
pridect = nc.predict(test_data)
print(f"the cloeset class are: {pridect[0]}")
Output:
10. Performance analysis of Classification Algorithms on a specific dataset (Mini Project)
Program:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Standardize the features (important for some models like SVM, KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Initialize classifiers
models = {
"Logistic Regression": LogisticRegression(),
"Decision Tree": DecisionTreeClassifier(),
"KNN": KNeighborsClassifier(),
"SVM": SVC(),
"Random Forest": RandomForestClassifier()
}
# Function to evaluate models
def evaluate_model(model, X_train, X_test, y_train, y_test):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
cm = confusion_matrix(y_test, y_pred)
return accuracy, precision, recall, f1, cm
# Create a DataFrame to store results
results = []
# Loop through each model, evaluate and store results
for model_name, model in models.items():
accuracy, precision, recall, f1, cm = evaluate_model(model, X_train, X_test, y_train, y_test)
results.append([model_name, accuracy, precision, recall, f1, cm])
# Convert the results into a DataFrame for easy display
results_df = pd.DataFrame(results, columns=["Model", "Accuracy", "Precision", "Recall", "F1-
Score", "Confusion Matrix"])
# Print the results
print(results_df)
# Plot the performance comparison
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
for metric in metrics:
plt.figure(figsize=(10, 6))
sns.barplot(x="Model", y=metric, data=results_df)
plt.title(f'Comparison of {metric} across Models')
plt.show()
# Plot confusion matrix for the best model (based on accuracy)
best_model_index = results_df['Accuracy'].idxmax()
best_model_cm = results_df.iloc[best_model_index]['Confusion Matrix']
plt.figure(figsize=(6,6))
sns.heatmap(best_model_cm, annot=True, fmt="d", cmap='Blues', xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title(f'Confusion Matrix of {results_df.iloc[best_model_index]["Model"]}')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
Output