100% found this document useful (1 vote)

50 views108 pages

Deepanshu Machine Learning

The Machine Learning Lab Manual (CSL 313) is designed for B.Tech III year students in the Computer Science and Engineering department at The NorthCap University, providing practical experience in machine learning concepts. It includes guidelines for lab conduct, a syllabus covering topics like clustering, association rule mining, recommendation systems, and dimensionality reduction, as well as a list of experiments and projects. The manual emphasizes the importance of preparation, attendance, and adherence to academic integrity during lab sessions.

Uploaded by

abhishek22csu437

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

50 views108 pages

Deepanshu Machine Learning

Uploaded by

abhishek22csu437

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 108

Machine Learning

Lab Manual
Department of Computer Science and Engineering
The NorthCap University, Gurugram
2

Machine Learning Lab

Manual (CSL 313)

Machine Learning Lab Manual

CSL 313
3

Machine Learning Lab

Manual (CSL 313)

Department of Computer Science and Engineering

NorthCap University, Gurugram- 122017, India

Session 2024-2025
4

Machine Learning Lab

Manual (CSL 313)

Published by:

School of Engineering and Technology

Department of Computer Science & Engineering

The NorthCap University Gurugram

• Laboratory Manual is for Internal Circulation only

No part of this Practical Record Book may be

reproduced, used, stored without prior permission of The NorthCap University

Copying or facilitating copying of lab work comes under cheating and is considered as use
of unfair means. Students indulging in copying or facilitating copying shall be awarded zero
marks for that particular experiment. Frequent cases of copying may lead to disciplinary
action. Attendance in lab classes is mandatory.

Labs are open up to 7 PM upon request. Students are encouraged to make full use of labs
beyond normal lab hours.
5

Machine Learning Lab

Manual (CSL 313)

PREFACE
Machine Learning Lab Manual is designed to meet the course and program requirements of
NCU curriculum for B.Tech III year students of CSE branch. The concept of the lab work is
to give brief practical experience for basic lab skills to students. It provides the space and
scope for self-study so that students can come up with new and creative ideas.

The Lab manual is written on the basis of “teach yourself pattern” and expected that
students who come with proper preparation should be able to perform the experiments
without any difficulty. Brief introduction to each experiment with information about self-
study material is provided. The pre-requisite is having a basic working knowledge of
Python. The laboratory exercises will include familiarization with unsupervised ML like
Clustering and Association rule mining, introduction to Reinforcement Learning; Feature
Selection and Dimensionality Reduction. Students would learn the algorithms pertaining to
these and implement the same using a high-level language, i.e. Python. Students are
expected to come thoroughly prepared for the lab. General disciplines, safety guidelines
and report writing are also discussed.

The lab manual is a part of curriculum for the The NorthCap University, Gurugram.
Teacher’s copy of the experimental results and answer for the questions are available as
sample guidelines.

We hope that lab manual would be useful to students of CSE, IT, ECE and BSc branches and
author requests the readers to kindly forward their suggestions / constructive criticism for
further improvement of the work book.

Author expresses deep gratitude to Members, Governing Body-NCU for encouragement and
motivation.

Authors
The NorthCap University
6

Machine Learning Lab

Manual (CSL 313)

Gurugram, India

CONTENTS

Page
S.N. Details No.

Syllabus 6

1 Introduction 9

2 Lab Requirement 10

3 General Instructions 11

4 List of Experiments 13

5 List of Flip Assignment 15

6 List of Projects 16

7 Rubrics 17

Annexure 1 (Format of Lab

8 Report) 18
7

Machine Learning Lab

Manual (CSL 313)

SYLLABUS
1. Department: Department of Computer Science and Engineering

3. Course Code 4. L-T-P 5. Credits

2. Course Name: Machine Learning
CSL313 2-0-4 4

6. Type of Course
jd
(Check one): Programme Core Programme Elective ✔ Open Elective

7. Pre-requisite(s), if any: Introduction to AI and ML

8. Frequency of offering (check one): Odd Even ✔ Either semester Every semester

9. Brief Syllabus:

Overview to machine learning and pre-processing concepts, Model Selection, Model Selection, XGBoost. Feature
Selection- Filter and Wrapper, Dimensionality Reduction, Principal Component Analysis PCA, Linear Discriminant
Analysis LDA, Kernel PCA, Introduction to Self-Organizing Maps (SOM), Building a Self-Organizing Map. Overview
of clustering in machine learning, Different categories of clustering algorithms, similarity/distance measures, K
Means algorithm, Hierarchical, DBSCan, Fuzzy C-Means , Agglomerative clustering algorithm, Expectation
maximization (EM) for soft clustering. Semi-supervised learning with EM using labeled and unlabled data.,
Evaluation methods, A case study with clustering implementation, Eclat, Recommendation System , User Based
Recommendation System, Item Based Recommendation System, Matrix-Factorization Recommendation System

Total lecture, Tutorial and Practical Hours for this course (Take 15 teaching weeks per semester): 90 hours

The class size is maximum 30 learners

Machine Learning Lab

Manual (CSL 313)

Practice

Lectures: 30 hours Tutorials : 0 hours Lab Work: 60 hours

10. Course Outcomes (COs)

On successful completion of this course students will be able to:

Understand the difference between supervised and unsupervised approaches and design the model
CO 1
with no training data.

CO 2 Implement the methods to find frequent Patterns and associations in the patterns.

CO 3 Implement Recommendation System with real world data .

CO 4 Implement dimensionality reduction techniques to improve the efficiency of models

Implement the real-world problems with model selection and optimized feature selection for further
CO 5
processing of the data.

11. UNIT WISE DETAILS No. of Units: 5

Unit Number: 1 Title: Clustering No. of hours: 4

Content Summary:

A quick recap to introduction to machine learning, Overview of clustering in machine learning, Different categories
of clustering algorithms, similarity/distance measures, K Means algorithm, Hierarchical Clustering, Advantages,
limitations and comparison.

Unit Number: 2 Title: Association Rule Mining No. of hours:6

Machine Learning Lab

Manual (CSL 313)

Content Summary:

Basic concepts of frequent pattern mining, Frequent Itemset mining methods, Apriori algorithm, Eclat algorithm,
Vertical Data format method, pattern evaluation methods, association analysis to correlation analysis.

Unit Number: 3 Title: Recommendation System No. of hours:6

Content Summary: Introduction to Recommendation System, User Based Recommendation System, Item Based
Recommendation System, Matrix-Factorization Recommendation System.

Unit Number: 4 Title: Dimensionality Reduction No. of hours:6

Content Summary:

Introduction to dimensionality reduction, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA),
Kernel PCA.

Unit Number: 5 Title: Model Selection and Boosting No. of hours:8

Content Summary: Pre-processing concepts, Model Selection, Model Selection, XGBoost. Feature Selection-
Filter and Wrapper, Introduction to Self-Organizing Maps (SOM), Building a Self-Organizing Map

12. Brief Description of Self-learning components by students (through books/resource material etc.):
Data preprocessing techniques

13. Course/Reference Books:

Machine Learning Lab

Manual (CSL 313)

1. Machine Learning using Python Paperback – 1 January 2019, by U Dinesh Kumar, Manaranjan Pradhan, Wiley,
https://www.amazon.in/Machine-Learning-Python-Manaranjan-Pradhan/dp/8126579900.
2. Matthew Kirk, “Thoughtful Machine Learning: A Test-Driven Approach” , First Edition, O'Reilly Publications, 2014.

3. Tom Mitchell,”Machine Learning”, McGraw Hill, 1997.

Reference Websites: (nptel, swayam, coursera, edx, udemy, lms, official documentation weblink)

● https://www.coursera.org/learn/practical-rl#syllabus

● https://nptel.ac.in/courses/106/106/106106139/

● https://www.coursera.org/learn/machine-learning

Project (To be done as individual/in group): Individual

Machine Learning Lab

Manual (CSL 313)

1. INTRODUCTION

That ‘learning is a continuous process’ cannot be over emphasized. The theoretical

knowledge gained during lecture sessions need to be strengthened through practical
experimentation. Thus practical makes an integral part of a learning process.

OBJECTIVE:
The purpose of conducting experiments can be stated as follows:

● To familiarize the students with the basic concepts of Machine Learning like
supervised, unsupervised and reinforcement learning.
● The lab sessions will be based on exploring the concepts discussed in class.

● Learning and understanding Clustering.

● Learning and understanding Association Rule Mining algorithms.

● Learning and understanding Recommendation System.

● Learning and understanding Feature selection and Dimensionality Reduction.

● Hands on experience
12

Machine Learning Lab

Manual (CSL 313)

2. LAB REQUIREMENTS

S.No. Requirements Details

1 Software Python 3.
Requirements

2 Operating System Windows(64-bit), Linux

3 Hardware 8 GB RAM (Recommended)

Requirements 2.60 GHz (Recommended)

4 Required Bandwidth NA
13

Machine Learning Lab

Manual (CSL 313)

3. GENERAL INSTRUCTIONS

3.1 General discipline in the lab

● Students must turn up in time and contact concerned faculty for the experiment
they are supposed to perform.
● Students will not be allowed to enter late in the lab.

● Students will not leave the class till the period is over.

● Students should come prepared for their experiment.

● Experimental results should be entered in the lab report format and

certified/signed by concerned faculty/ lab Instructor.
● Students must get the connection of the hardware setup verified before
switching on the power supply.
14

Machine Learning Lab

Manual (CSL 313)

● Students should maintain silence while performing the experiments. If any

necessity arises for discussion amongst them, they should discuss with a very
low pitch without disturbing the adjacent groups.
● Violating the above code of conduct may attract disciplinary action.

● Damaging lab equipment or removing any component from the lab may invite
penalties and strict disciplinary action.

3.2 Attendance

● Attendance in the lab class is compulsory.

● Students should not attend a different lab group/section other than the one
assigned at the beginning of the session.
● On account of illness or some family problems, if a student misses his/her lab
classes, he/she may be assigned a different group to make up the losses in
consultation with the concerned faculty / lab instructor. Or he/she may work
in the lab during spare/extra hours to complete the experiment. No attendance
will be granted for such case.

3.3 Preparation and Performance

● Students should come to the lab thoroughly prepared on the experiments they
are assigned to perform on that day. Brief introduction to each experiment
with information about self study reference is provided on LMS.
● Students must bring the lab report during each practical class with written
records of the last experiments performed complete in all respect.
● Each student is required to write a complete report of the experiment he has
performed and bring to lab class for evaluation in the next working lab.
Sufficient space in work book is provided for independent writing of theory,
observation, calculation and conclusion.
15

Machine Learning Lab

Manual (CSL 313)

● Students should follow the Zero tolerance policy for copying / plagiarism. Zero
marks will be awarded if found copied. If caught further, it will lead to
disciplinary action.
● Refer Annexure 1 for Lab Report Format

4. LIST OF EXPERIMENTS
Sr. Title of the Experiment Software Unit CO
No. used Covered Covered
16

Machine Learning Lab

Manual (CSL 313)

1. Perform K-means clustering Algorithm. Write the python Python 1 CO1

code to find Intercluster distance and IntraCluster (Jupyter)
Distance.
0. Perform K means Clustering on given dataset. Identify Python 1 CO1
number of clusters using elbow method. (Jupyter)
1. Perform hierarchical clustering on given dataset. Identify Python 1 CO1
number of clusters using dendograms. (Jupyter)
4. Compare K-means and Hierarchical clustering for any Python 1 CO1
one dataset. Calculate the Silhouette Score. (Jupyter)
0. Apply Apriori algorithm to identify frequent itemsets from Python 2 CO2
the bakery dataset. (Jupyter)
0. Apply Eclat algorithm to identify frequent itemsets from Python 2 CO2
the bakery dataset. (Jupyter)
0. Apply FP grow algorithm to identify frequent itemsets Python 2 CO2
from the bakery dataset. (Jupyter)
0. To apply User Based Recommendation System Python 3 CO3
(Jupyter)
0. Python 3 CO3
To apply Item Based Recommendation System
(Jupyter)
0. Python 3 CO3
To apply Matrix-Factorization Recommendation System. (Jupyter)

0. Python 4 CO4
Reduce dimensionality of Iris dataset using PCA
(Jupyter)
0. Python 4 CO4
Reduce dimensionality of Iris dataset using LDA
(Jupyter)
0. To apply five Filter feature selection techniques. Python 5 CO5
(Jupyter)
0. To apply Recursive Feature Elimination. Python 5 CO5
(Jupyter)
0. To apply XGBoost on two datasets. Python 5 CO5
(Jupyter)
0. Construct Self Organizing Maps on Iris dataset. Python 5 CO5
(Jupyter)
17. Build a ML model from scratch using and comparing Python 1,2,3,4 CO1,CO2,
supervised, unsupervised and ensemble ML (Jupyter) CO3,CO4
approaches.
18. Build a ML model from scratch applying and comparing Python 1,2,3,4 CO1,CO2,
filter, wrapper, embedded and hybrid feature selection (Jupyter) CO3,CO4
approaches
17

Machine Learning Lab

Manual (CSL 313)

5. LIST OF FLIP EXPERIMENTS

5.1 Project – supervised and unsupervised learning based model on market basket
analysis
5.2 Competition on Kaggle
18

Machine Learning Lab

Manual (CSL 313)

6. LIST OF PROJECTS
19

Machine Learning Lab

Manual (CSL 313)

1. Titanic Challenge: The sinking of the Titanic is one of the most infamous
shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely
considered “unsinkable” RMS Titanic sank after colliding with an iceberg.
Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in
the death of 1502 out of 2224 passengers and crew. While there was some
element of luck involved in surviving, it seems some groups of people were more
likely to survive than others. In this project, the students need to build a predictive
model that answers the question: “what sorts of people were more likely to
survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
2. House Price Prediction Using Advanced Regression Techniques: Ask a home buyer
to describe their dream house, and they probably won't begin with the height of
the basement ceiling or the proximity to an east-west railroad. But Kaggle’s
advanced house price prediction dataset proves that much more influences price
negotiations than the number of bedrooms or a white-picket fence. With 79
explanatory variables describing (almost) every aspect of residential homes in
Ames, Iowa, this dataset can be used to predict the final price of each home.
3. Mechanism of Action (MoA) Prediction: Mechanism of action means the
biochemical interactions through which a drug generates its pharmacological
effect. If we know a disease affects some particular receptor or downstream set of
cell activity, we can develop drugs faster if we can predict how cells and genes
affect various receptor sites. Using a dataset that combines gene expression and
cell viability data in addition to the MoA annotations of more than 5,000 drugs. In
this, each drug was tested under two dose (cp_dose) and three times (cp_time). So,
six samples basically correspond to one drug. We need to train a model that
classifies drugs based on their biological activity. This problem is a multi-label
classification, which means we have multiple targets (not multiple classes). In this
project, perform explanatory data analysis and then train a model using deep
neural networks with Keras.

7. RUBRICS
20

Machine Learning Lab

Manual (CSL 313)

Marks Distribution

Continuous Evaluation (15 Marks) Project Evaluations (80 Marks)

Each experiment shall be evaluated for 10Both the projects shall be evaluated for
marks and at the end of the semester 30 marks each and at the end of the
proportional marks shall be awarded out semester viva will be conducted related
of total 15. to the projects as well as concepts
learned in labs and this component
Following is the breakup of 10 marks for carries 20 marks.
each
4 Marks: Observation & conduct of
experiment. Teacher may ask questions
about experiment.
3 Marks: For report writing
3 Marks: For the 15 minutes quiz to be
conducted in every lab.
21

Machine Learning Lab

Manual (CSL 313)

Annexure 1

Machine Learning
(CSL 313)

Lab Practical Report

Faculty name: Student name:Deepanshu

Roll No.:22csu504

Semester:6th

Group:DS-B

Department of Computer Science and Engineering

Machine Learning Lab

Manual (CSL 313)

NorthCap University, Gurugram- 122001, India

Session 2024-2025
INDEX
S.No Experiment Page Date of Date of Marks CO Sign
No. Experiment Submission Covered
23

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 1

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand what is unsupervised learning.

● Study about clustering and different clustering approaches.

● Implement k-means clustering for solving a real-world problem.

Machine Learning Lab

Manual (CSL 313)

Outcome:

Students would be familiarized with unsupervised learning and clustering in particular.

Problem Statement:

Perform K-means clustering Algorithm. Write the python code to find Inter-cluster distance and Intra-Cluster
Distance.

Background Study: K-means clustering is a method of vector quantization, originally from signal
processing, that aims to partition n observations into k clusters in which each observation belongs
to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of
the cluster.

Question Bank:

1. What is the difference between supervised and unsupervised learning?

2. What is clustering?

3. What are the different types of clustering approaches?

4. How to find the optimal value of k in k-means clustering?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

Machine Learning Lab

Manual (CSL 313)

import seaborn as sns

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.decomposition import PCA

# Load the Titanic dataset

file_path = "/tested.csv" # Update with the correct path

df = pd.read_csv(file_path)

# Selecting relevant numerical and categorical features

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']

df = df[features]

# Handling missing values

df['Age'].fillna(df['Age'].median(), inplace=True)

df['Fare'].fillna(df['Fare'].median(), inplace=True)

# Encoding categorical variables

df['Sex'] = LabelEncoder().fit_transform(df['Sex'])
26

Machine Learning Lab

Manual (CSL 313)

# Normalize the data

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df)

# Finding the optimal number of clusters using the Elbow Method

wcss = [] # Within-Cluster Sum of Squares

k_range = range(1, 11) # Testing for k = 1 to 10

for k in k_range:

kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

kmeans.fit(df_scaled)

wcss.append(kmeans.inertia_) # Inertia = WCSS

# Plot the Elbow Method

plt.figure(figsize=(8, 5))

plt.plot(k_range, wcss, marker='o', linestyle='--')

plt.xlabel("Number of Clusters (k)")

plt.ylabel("WCSS (Within-Cluster Sum of Squares)")

plt.title("Elbow Method for Optimal k")

plt.show()
27

Machine Learning Lab

Manual (CSL 313)

# Choose the optimal k (e.g., from the elbow point)

optimal_k = 3 # You can adjust this based on the elbow plot

# Apply K-means with the optimal number of clusters

kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)

df['Cluster'] = kmeans.fit_predict(df_scaled)

# Apply PCA for visualization (reducing dimensions to 2D)

pca = PCA(n_components=2)

df_pca = pca.fit_transform(df_scaled)

df['PCA1'] = df_pca[:, 0]

df['PCA2'] = df_pca[:, 1]

# Plot the Clusters

plt.figure(figsize=(10, 6))

sns.scatterplot(x=df['PCA1'], y=df['PCA2'], hue=df['Cluster'],

palette="viridis", s=70)

plt.scatter(pca.transform(kmeans.cluster_centers_)[:, 0],
pca.transform(kmeans.cluster_centers_)[:, 1],

color='red', marker='X', s=200, label='Centroids')

Machine Learning Lab

Manual (CSL 313)

plt.title(f"K-Means Clustering (k={optimal_k}) - Titanic Data (2D PCA)")

plt.xlabel("PCA Component 1")

plt.ylabel("PCA Component 2")

plt.legend()

plt.show()
29

Machine Learning Lab

Manual (CSL 313)
30

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 2

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):
● Understand supervised and unsupervised learning.

● Study about clustering and different clustering approaches.

Machine Learning Lab

Manual (CSL 313)

● Study about how to find the number of clusters in Kmeans clustering approach

Outcome:

Students would be familiarized with unsupervised learning and K-means clustering in particular.

Problem Statement:

Perform K means Clustering on given dataset. Identify number of clusters using elbow method.

Background Study:

K-means clustering is a method of vector quantization, originally from signal processing, that aims
to partition n observations into k clusters in which each observation belongs to the cluster with the
nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
In cluster analysis, the elbow method is a heuristic used in determining the number of clusters
in a data set. The method consists of plotting the explained variation as a function of the number
of clusters, and picking the elbow of the curve as the number of clusters to use.

Question Bank:

1. What is the difference between supervised and unsupervised learning?

2. What is K-means clustering?

3. What are the different types of clustering approaches?

4. How to find the optimal value of k in k-means clustering?

Student Work Area

Machine Learning Lab

Manual (CSL 313)

Algorithm/Flowchart/Code/Sample Outputs
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.decomposition import PCA

# Load the Titanic dataset

file_path = "/tested.csv" # Update with the correct path

df = pd.read_csv(file_path)

# Selecting relevant numerical and categorical features

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']

df = df[features]

# Handling missing values

df['Age'].fillna(df['Age'].median(), inplace=True)

df['Fare'].fillna(df['Fare'].median(), inplace=True)
33

Machine Learning Lab

Manual (CSL 313)

# Encoding categorical variables

df['Sex'] = LabelEncoder().fit_transform(df['Sex'])

# Normalize the data

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df)

# Finding the optimal number of clusters using the Elbow Method

wcss = [] # Within-Cluster Sum of Squares

k_range = range(1, 11) # Testing for k = 1 to 10

for k in k_range:

kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

kmeans.fit(df_scaled)

wcss.append(kmeans.inertia_) # Inertia = WCSS

# Plot the Elbow Method

plt.figure(figsize=(8, 5))

plt.plot(k_range, wcss, marker='o', linestyle='--')

plt.xlabel("Number of Clusters (k)")

Machine Learning Lab

Manual (CSL 313)

plt.ylabel("WCSS (Within-Cluster Sum of Squares)")

plt.title("Elbow Method for Optimal k")

plt.show()

# Choose the optimal k (e.g., from the elbow point)

optimal_k = 3 # You can adjust this based on the elbow plot

# Apply K-means with the optimal number of clusters

kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)

df['Cluster'] = kmeans.fit_predict(df_scaled)

# Apply PCA for visualization (reducing dimensions to 2D)

pca = PCA(n_components=2)

df_pca = pca.fit_transform(df_scaled)

df['PCA1'] = df_pca[:, 0]

df['PCA2'] = df_pca[:, 1]

# Plot the Clusters

plt.figure(figsize=(10, 6))

sns.scatterplot(x=df['PCA1'], y=df['PCA2'], hue=df['Cluster'],

palette="viridis", s=70)
35

Machine Learning Lab

Manual (CSL 313)

plt.scatter(pca.transform(kmeans.cluster_centers_)[:, 0],
pca.transform(kmeans.cluster_centers_)[:, 1],

color='red', marker='X', s=200, label='Centroids')

plt.title(f"K-Means Clustering (k={optimal_k}) - Titanic Data (2D PCA)")

plt.xlabel("PCA Component 1")

plt.ylabel("PCA Component 2")

plt.legend()

plt.show()
36

Machine Learning Lab

Manual (CSL 313)
37

Machine Learning Lab

Manual (CSL 313)
38

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 3

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand supervised and unsupervised learning.

● Study about clustering and different clustering approaches.

● Implement both Hierarchical clustering algorithm on any one dataset.

Outcome:

Student will be familiarized with the different types of clustering algorithms.

Problem Statement:

Perform hierarchical clustering on given dataset. Identify number of clusters using dendograms

Background Study:
In data mining and statistics, hierarchical clustering (also called hierarchical cluster
analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters.
39

Machine Learning Lab

Manual (CSL 313)

Strategies for hierarchical clustering generally fall into two types:

● Agglomerative: This is a "bottom-up" approach: each observation starts in its own

cluster, and pairs of clusters are merged as one moves up the hierarchy.
● Divisive: This is a "top-down" approach: all observations start in one cluster, and splits
are performed recursively as one moves down the hierarchy.
In general, the merges and splits are determined in a greedy manner. The results of hierarchical
clustering are usually presented in a dendrogram.

Question Bank:

1. What is a dendrogram?

2. What is Hierarchical Agglomerative clustering?

3. What is the linkage criteria used in Hierarchical clustering?

4. What are the different distance metrics that can be used for Hierarchical clustering?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
import pandas as pd

import numpy as np

from sklearn.preprocessing import LabelEncoder, StandardScaler

from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

from sklearn.metrics import pairwise_distances

import matplotlib.pyplot as plt

Machine Learning Lab

Manual (CSL 313)

from sklearn.cluster import KMeans

# Step 1: Load the Titanic dataset

file_path = "/tested.csv" # Update with actual path

titanic_data = pd.read_csv(file_path)

# Step 2: Preprocess the data

# Select relevant columns

columns_to_use = ['Pclass', 'Sex', 'Age', 'Fare', 'Embarked']

data = titanic_data[columns_to_use].copy()

# Check for missing values

print("Missing Values Before Handling:")

print(data.isnull().sum())

# Handle missing values using appropriate methods

data['Age'] = data['Age'].fillna(data['Age'].mean()) # Replace NaN in Age

with mean

data['Fare'] = data['Fare'].fillna(data['Fare'].mean()) # Replace NaN in

Fare with mean
41

Machine Learning Lab

Manual (CSL 313)

data['Embarked'] = data['Embarked'].fillna(data['Embarked'].mode()[0]) #
Replace NaN in Embarked with mode

# Re-check for missing values

print("Missing Values After Handling:")

print(data.isnull().sum())

# Encode categorical variables

label_encoder = LabelEncoder()

data['Sex'] = label_encoder.fit_transform(data['Sex'])

data['Embarked'] = label_encoder.fit_transform(data['Embarked'])

# Ensure there are no NaN values

if data.isnull().sum().any():

print("There are still missing values!")

else:

print("No missing values remain in the dataset.")

# Standardize the data

scaler = StandardScaler()

scaled_data = scaler.fit_transform(data)
42

Machine Learning Lab

Manual (CSL 313)

# Step 3: Calculate Euclidean Distance

euclidean_distances = pairwise_distances(scaled_data, metric='euclidean')

print("Euclidean Distance Matrix (first 5 rows):")

print(euclidean_distances[:5, :5])

# Step 4: Elbow Method for Optimal Number of Clusters

sse = [] # Sum of squared errors

K = range(1, 11)

for k in K:

kmeans = KMeans(n_clusters=k, n_init=10, random_state=42)

kmeans.fit(scaled_data)

sse.append(kmeans.inertia_)

# Plot Elbow Diagram

plt.figure(figsize=(8, 5))

plt.plot(K, sse, marker='o', linestyle='--', color='b', markersize=8,

linewidth=2)

plt.title("Elbow Method for Optimal Clusters")

Machine Learning Lab

Manual (CSL 313)

plt.xlabel("Number of Clusters (k)")

plt.ylabel("Sum of Squared Errors (SSE)")

plt.xticks(K)

plt.grid(True, linestyle="--", alpha=0.6)

plt.show()

# Step 5: Perform Hierarchical Clustering

linkage_matrix = linkage(scaled_data, method='ward')

# Step 6: Visualize Dendrogram

plt.figure(figsize=(10, 6))

dendrogram(linkage_matrix, truncate_mode='lastp', p=50)

plt.title("Hierarchical Clustering Dendrogram")

plt.xlabel("Data Points (Truncated)")

plt.ylabel("Distance")

plt.show()

# Step 7: Form Clusters

k = 3 # Choose the number of clusters based on the elbow method or

dendrogram
44

Machine Learning Lab

Manual (CSL 313)

clusters = fcluster(linkage_matrix, k, criterion='maxclust')

# Add cluster labels to the original dataset

titanic_data['Cluster'] = clusters

print("Cluster Assignments (first 10 passengers):")

print(titanic_data[['Name', 'Cluster']].head(10))

# Step 8: Visualize Clusters using PCA (Optional)

from sklearn.decomposition import PCA

pca = PCA(n_components=2)

pca_data = pca.fit_transform(scaled_data)

plt.figure(figsize=(8, 6))

for cluster_id in np.unique(clusters):

cluster_points = pca_data[clusters == cluster_id]

plt.scatter(cluster_points[:, 0], cluster_points[:, 1], label=f"Cluster

{cluster_id}")

plt.title("Titanic Clusters (PCA Visualization)")

plt.xlabel("PCA Component 1")

Machine Learning Lab

Manual (CSL 313)

plt.ylabel("PCA Component 2")

plt.legend()

plt.show()
46

Machine Learning Lab

Manual (CSL 313)
47

Machine Learning Lab

Manual (CSL 313)
48

Machine Learning Lab

Manual (CSL 313)
49

Machine Learning Lab

Manual (CSL 313)
50

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 4

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand supervised and unsupervised learning.

● Study about clustering and different clustering approaches.

● Implement both Hierarchical and K-means clustering algorithm on any one dataset.

● Compare and contrast Hierarchical and K-means clustering algorithm.

Outcome:

Students will be familiarized with clustering algorithms.

Problem Statement:

Compare K-means and Hierarchical clustering for any one dataset. Calculate the Silhouette Score.

Background Study:

K-means is method of cluster analysis using a pre-specified no. of clusters. It requires advance
51

Machine Learning Lab

Manual (CSL 313)

knowledge of ‘K’. Hierarchical clustering also known as hierarchical cluster analysis (HCA) is also
a method of cluster analysis which seeks to build a hierarchy of clusters without having fixed
number of clusters. In hierarchical clustering one can stop at any number of clusters and find
appropriate clusters by interpreting the dendrogram.

Question Bank:

1. What are the advantages and disadvantages of K-means vs hierarchical clustering.

2. Compare the time complexities of K-means vs hierarchical clustering.

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.cluster import KMeans, AgglomerativeClustering

from sklearn.preprocessing import StandardScaler, LabelEncoder

from sklearn.metrics import silhouette_score

Machine Learning Lab

Manual (CSL 313)

from scipy.cluster.hierarchy import dendrogram, linkage

from sklearn.decomposition import PCA

# Load the Titanic dataset

file_path = "/tested.csv" # Update with the correct path

df = pd.read_csv(file_path)

# Selecting relevant numerical and categorical features

features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']

df = df[features].copy() # Avoid chained assignment warning

# Handling missing values

df['Age'] = df['Age'].fillna(df['Age'].median())

df['Fare'] = df['Fare'].fillna(df['Fare'].median())

# Encoding categorical variables

df['Sex'] = LabelEncoder().fit_transform(df['Sex'])

# Normalize the data

scaler = StandardScaler()
53

Machine Learning Lab

Manual (CSL 313)

df_scaled = scaler.fit_transform(df)

# Apply K-means clustering

k = 3 # Number of clusters

kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)

df['KMeans_Cluster'] = kmeans.fit_predict(df_scaled)

# Calculate Silhouette Score for K-Means

kmeans_silhouette = silhouette_score(df_scaled, df['KMeans_Cluster'])

print(f"Silhouette Score for K-Means: {kmeans_silhouette:.4f}")

# Apply Hierarchical Clustering (Fix: Removed `affinity`)

hierarchical = AgglomerativeClustering(n_clusters=k, linkage='ward')

df['Hierarchical_Cluster'] = hierarchical.fit_predict(df_scaled)

# Calculate Silhouette Score for Hierarchical Clustering

hierarchical_silhouette = silhouette_score(df_scaled,
df['Hierarchical_Cluster'])

print(f"Silhouette Score for Hierarchical Clustering:

{hierarchical_silhouette:.4f}")
54

Machine Learning Lab

Manual (CSL 313)

# Plot Dendrogram

plt.figure(figsize=(10, 5))

linked = linkage(df_scaled, method='ward')

dendrogram(linked)

plt.title("Hierarchical Clustering Dendrogram")

plt.xlabel("Data Points")

plt.ylabel("Distance")

plt.show()

# Apply PCA for visualization

pca = PCA(n_components=2)

df_pca = pca.fit_transform(df_scaled)

df['PCA1'] = df_pca[:, 0]

df['PCA2'] = df_pca[:, 1]

# Plot Clusters for K-Means

plt.figure(figsize=(10, 5))

sns.scatterplot(x=df['PCA1'], y=df['PCA2'], hue=df['KMeans_Cluster'],

palette="viridis", s=70)

plt.scatter(pca.transform(kmeans.cluster_centers_)[:, 0],
pca.transform(kmeans.cluster_centers_)[:, 1],
55

Machine Learning Lab

Manual (CSL 313)

color='red', marker='X', s=200, label='Centroids')

plt.title("K-Means Clustering (PCA Reduced)")

plt.xlabel("PCA Component 1")

plt.ylabel("PCA Component 2")

plt.legend()

plt.show()

# Plot Clusters for Hierarchical Clustering

plt.figure(figsize=(10, 5))

sns.scatterplot(x=df['PCA1'], y=df['PCA2'],
hue=df['Hierarchical_Cluster'], palette="coolwarm", s=70)

plt.title("Hierarchical Clustering (PCA Reduced)")

plt.xlabel("PCA Component 1")

plt.ylabel("PCA Component 2")

plt.show()
56

Machine Learning Lab

Manual (CSL 313)
57

Machine Learning Lab

Manual (CSL 313)
58

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 5

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand and study Association Rule Mining.

● Study the Apriori property and Apriori Rule in Association Rule Mining.

● Implement the Apriori algorithm in Association Rule Mining.

Outcome:

Students will be familiarized with Association Rule Mining.

Problem Statement:

Apply Apriori algorithm to identify frequent itemsets from the bakery dataset.
59

Machine Learning Lab

Manual (CSL 313)

Background Study:

Apriori is an algorithm for frequent item set mining and association rule learning over relational
databases. It proceeds by identifying the frequent individual items in the database and extending
them to larger and larger item sets as long as those item sets appear sufficiently often in the
database.

Question Bank:

1. What is the Association Rule Mining?

2. What is the Apriori property?

3. Does Association Rule Mining fall under supervised or unsupervised ML?

4. List the applications of Association Rule Mining.

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from mlxtend.frequent_patterns import apriori, association_rules

# Load dataset

df = pd.read_csv('/content/Bakery.csv')
60

Machine Learning Lab

Manual (CSL 313)

# Use correct column names

transaction_col = 'TransactionNo' # Transaction ID column

item_col = 'Items' # Items column

# Convert transactions into a format suitable for Apriori (One-Hot

Encoding)

basket = df.groupby([transaction_col, item_col])

[item_col].count().unstack().reset_index().fillna(0)

basket.set_index(transaction_col, inplace=True)

basket = basket.applymap(lambda x: 1 if x > 0 else 0)

# Apply Apriori Algorithm

frequent_itemsets = apriori(basket, min_support=0.05, use_colnames=True)

# Generate association rules

rules = association_rules(frequent_itemsets, metric="confidence",

min_threshold=0.1)

# Display Frequent Itemsets and Rules

print("Frequent Itemsets:\n", frequent_itemsets)

print("\nAssociation Rules:\n", rules[['antecedents', 'consequents',

'support', 'confidence', 'lift']])
61

Machine Learning Lab

Manual (CSL 313)

# Visualization: Confidence Graph

plt.figure(figsize=(10, 5))

sns.barplot(x=rules['confidence'], y=rules['antecedents'].apply(lambda x:
', '.join(list(x))), palette="viridis")

plt.xlabel("Confidence")

plt.ylabel("Antecedents (Items Bought)")

plt.title("Confidence of Association Rules")

plt.show()
62

Machine Learning Lab

Manual (CSL 313)
63

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 6

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand and study Association Rule Mining.

● Study the ECLAT algorithm in Association Rule Mining.

● Implement the ECLAT algorithm in Association Rule Mining.

Outcome:

Students will be familiarized with Association Rule Mining.

Problem Statement:

Apply ECLAT algorithm to identify frequent itemsets from the bakery dataset.

Background Study:
64

Machine Learning Lab

Manual (CSL 313)

The ECLAT algorithm stands for Equivalence Class Clustering and bottom-up Lattice Traversal. It
is one of the popular methods of Association Rule mining. It is a more efficient and scalable
version of the Apriori algorithm.

Question Bank:

1. What is the ECLAT algorithm?

2. What are the advantages of ECLAT algorithm over Apriori?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs

import pandas as pd

import matplotlib.pyplot as plt

from itertools import combinations

# Load dataset

df = pd.read_csv('/content/Bakery.csv')

# Define correct column names

transaction_col = 'TransactionNo'

item_col = 'Items'
65

Machine Learning Lab

Manual (CSL 313)

# Convert transactions into a list of sets

transactions = df.groupby(transaction_col)[item_col].apply(set).tolist()

# Step 1: Generate 1-itemsets (Vertical Format)

itemsets = {}

for transaction_idx, transaction in enumerate(transactions):

for item in transaction:

if item in itemsets:

itemsets[item].add(transaction_idx)

else:

itemsets[item] = {transaction_idx}

# Minimum support threshold (adjustable)

min_support = 0.02 * len(transactions) # 2% threshold

# Step 2: Filter 1-itemsets based on support

filtered_itemsets = {item: tids for item, tids in itemsets.items() if

len(tids) >= min_support}

# Store only valid items (to avoid KeyErrors)

valid_items = set(filtered_itemsets.keys())
66

Machine Learning Lab

Manual (CSL 313)

# Step 3: Generate k-itemsets (Pairwise and Higher Combinations)

for k in range(2, 4): # Generates up to 3-itemsets

new_itemsets = {}

items = list(valid_items) # Only use valid frequent 1-itemsets

for itemset in combinations(items, k):

if all(i in filtered_itemsets for i in itemset): # Ensure items

exist

common_transactions = set.intersection(*[filtered_itemsets[i]
for i in itemset])

if len(common_transactions) >= min_support:

new_itemsets[itemset] = common_transactions

if not new_itemsets:

break # Stop if no new frequent itemsets found

# Update filtered itemsets and valid items for next iteration

filtered_itemsets.update(new_itemsets)

valid_items = set(filtered_itemsets.keys())
67

Machine Learning Lab

Manual (CSL 313)

# Step 4: Convert results to DataFrame

eclat_results = pd.DataFrame([(key, len(value)) for key, value in

filtered_itemsets.items()],

columns=['Itemset', 'Support'])

# Fixed: Properly closing the parenthesis

eclat_results.sort_values(by='Support', ascending=False, inplace=True)

# Step 5: Generate Association Rules and Calculate Confidence

association_rules = []

confidence_values = []

for itemset, support_count in filtered_itemsets.items():

if isinstance(itemset, tuple): # Only consider multi-item sets

for i in range(1, len(itemset)):

for antecedent in combinations(itemset, i):

consequent = tuple(set(itemset) - set(antecedent))

if antecedent in filtered_itemsets:

support_antecedent = len(filtered_itemsets[antecedent])
68

Machine Learning Lab

Manual (CSL 313)

confidence = (len(support_count) / support_antecedent)

* 100

association_rules.append((antecedent, consequent,
confidence))

confidence_values.append((f"{antecedent}
{consequent}", confidence))

# Step 6: Convert confidence values to DataFrame

confidence_df = pd.DataFrame(confidence_values, columns=['Rule',

'Confidence'])

confidence_df.sort_values(by='Confidence', ascending=False, inplace=True)

# Step 7: Plot Confidence Graph 📊

plt.figure(figsize=(12, 6))

plt.barh(confidence_df['Rule'], confidence_df['Confidence'],
color='skyblue')

plt.xlabel('Confidence (%)')

plt.ylabel('Association Rule')

plt.title('Confidence of Association Rules (ECLAT)')

plt.gca().invert_yaxis() # Invert for better visualization

plt.show()
69

Machine Learning Lab

Manual (CSL 313)

# Display Results

print("Frequent Itemsets from ECLAT Algorithm:\n", eclat_results)

print("\nTop Association Rules:\n", confidence_df)

Machine Learning Lab

Manual (CSL 313)
71

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 7
72

Machine Learning Lab

Manual (CSL 313)

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand and study Association Rule Mining.

● Study the FP-growth algorithm in Association Rule Mining.

● Implement the FP-growth algorithm in Association Rule Mining.

Outcome:

Students will be familiarized with Association Rule Mining.

Problem Statement:

Apply FP-growth algorithm to identify frequent itemsets from the bakery dataset.

Background Study:

FP Growth Algorithm (Frequent pattern growth). FP growth algorithm is an improvement of Apriori

algorithm. FP growth algorithm used for finding frequent itemset in a transaction database without
73

Machine Learning Lab

Manual (CSL 313)

candidate generation. FP growth represents frequent items in frequent pattern trees or FP-tree.

Question Bank:

1. What is a FP-tree?

2. What is the FP-growth algorithm?

3. List out the advantages and disadvantages of FP-growth algorithm.

4. Compare and contrast Apriori, ECLAT and FP-Growth algorithm.

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs

import pandas as pd

import matplotlib.pyplot as plt

from mlxtend.frequent_patterns import fpgrowth, association_rules

from mlxtend.preprocessing import TransactionEncoder

# Load dataset

df = pd.read_csv('/content/Bakery.csv')

# Define correct column names

Machine Learning Lab

Manual (CSL 313)

transaction_col = 'TransactionNo'

item_col = 'Items'

# Step 1: Convert dataset into a list of transactions

transactions = df.groupby(transaction_col)[item_col].apply(list).tolist()

# Step 2: One-Hot Encode Transactions for FP-Growth

te = TransactionEncoder()

encoded_data = te.fit(transactions).transform(transactions)

df_encoded = pd.DataFrame(encoded_data, columns=te.columns_)

# Step 3: Apply FP-Growth Algorithm (Min Support = 2%)

min_support = 0.02 # Adjust as needed

frequent_itemsets = fpgrowth(df_encoded, min_support=min_support,

use_colnames=True)

# Step 4: Generate Association Rules

rules = association_rules(frequent_itemsets, metric="confidence",

min_threshold=0.5)

# Step 5: Extract Relevant Columns & Sort by Confidence

Machine Learning Lab

Manual (CSL 313)

rules = rules[['antecedents', 'consequents', 'support', 'confidence']]

rules = rules.sort_values(by='confidence', ascending=False)

# Step 6: Convert Antecedents & Consequents to Strings for Visualization

rules['Rule'] = rules['antecedents'].apply(lambda x: ', '.join(list(x))) +

" " + rules['consequents'].apply(lambda x: ', '.join(list(x)))

# Step 7: Plot Confidence Graph 📊

plt.figure(figsize=(12, 6))

plt.barh(rules['Rule'], rules['confidence'] * 100, color='skyblue')

plt.xlabel('Confidence (%)')

plt.ylabel('Association Rule')

plt.title('Confidence of Association Rules (FP-Growth)')

plt.gca().invert_yaxis() # Invert for better visualization

plt.show()

# Display Results

print("Frequent Itemsets from FP-Growth Algorithm:\n", frequent_itemsets)

print("\nTop Association Rules:\n", rules[['Rule', 'support',

'confidence']])
76

Machine Learning Lab

Manual (CSL 313)
77

Machine Learning Lab

Manual (CSL 313)
78

Machine Learning Lab

Manual (CSL 313)
79

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 08

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand and study Recommendation Learning.

● Study User Based Recommendation System.

Outcome:

Students will be familiarized with User Based Recommendation System.

Problem Statement:

Implement User Based Recommendation System.

Background Study:
Recommender systems are information filtering systems that help deal with the problem of
information overload by filtering and segregating information and creating fragments out of large
amounts of dynamically generated information according to user’s preferences, interests, or
80

Machine Learning Lab

Manual (CSL 313)

observed behavior about a particular item or items.

User-Based Collaborative Filtering is a technique used to predict the items that a user might like
on the basis of ratings given to that item by the other users who have similar taste with that of the
target user.

Question Bank:

1. What is Recommendation System?

2. Illustrate User Based Recommendation System.

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
81

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 09

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Understand and study Recommendation System.

● Introduction to the Item Based Recommendation System.

Outcome:
82

Machine Learning Lab

Manual (CSL 313)

Students will be familiarized with Item Based Recommendation System.

Problem Statement:

Implement Item Based Recommendation System.

Background Study:

Item-item collaborative filtering, or item-based, or item-to-item, is a form of collaborative filtering for

recommender systems based on the similarity between items calculated using people's ratings of
those items.

Question Bank:

1. What is Recommendation System?

2. Illustrate Item Based Recommendation System.

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
83

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 10

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:
84

Machine Learning Lab

Manual (CSL 313)

Objective(s):

● Understand and study Recommendation System.

● Introduction to the Matrix-Factorization Recommendation System.

Outcome:

Students will be familiarized with Matrix-Factorization Recommendation System.

Problem Statement:

Implement Matrix-Factorization Recommendation System.

Background Study:

Matrix factorization is a way to generate latent features when multiplying two different kinds of
entities. Collaborative filtering is the application of matrix factorization to identify the relationship
between items’ and users’ entities

Question Bank:

1. What is Matrix-Factorization Recommendation System?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
85

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 11
86

Machine Learning Lab

Manual (CSL 313)

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Study Dimensionality Reduction.

● Understand the basic principle behind Principal Component Analysis.

Outcome:

Students will be familiarized with Dimensionality Reduction especially Principal Component

Analysis (PCA).

Problem Statement:

Reduce dimensionality of Iris dataset using Principal Component Analysis.

Background Study:

Principal component analysis is a statistical technique that is used to analyze the interrelationships
among a large number of variables and to explain these variables in terms of a smaller number of
variables, called principal components, with a minimum loss of information.
87

Machine Learning Lab

Manual (CSL 313)

Question Bank:

1. What is dimensionality reduction?

2. Differentiate between Feature Selection, Feature Engineering and Dimensionality Reduction.

3. What are principal components?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
88

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 12

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Study Dimensionality Reduction.

● Understand the basic principle behind Linear Discriminant Analysis.

Outcome:

Students will be familiarized with Dimensionality Reduction especially Linear Discriminant Analysis
(LDA).
89

Machine Learning Lab

Manual (CSL 313)

Problem Statement:

Reduce dimensionality of Iris dataset using Linear Discriminant Analysis.

Background Study:

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis is

a dimensionality reduction technique which is commonly used for the supervised classification
problems. It is used for modeling differences in groups i.e., separating two or more classes. It is
used to project the features in higher dimension space into a lower dimension space.

Question Bank:

1. What is dimensionality reduction?

2. Compare and contrast PCA and LDA for dimensionality reduction.

3. What are the QDA, FDA and RDA extensions to LDA?

4. List the applications of LDA.

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
90

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 13

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:
91

Machine Learning Lab

Manual (CSL 313)

Objective(s):

● To understand the importance of feature selection

● To differentiate between different types of feature selection.

● Build a model using feature selection techniques.

Outcome:

Students will be familiarized with model building using feature selection techniques and
optimization.

Problem Statement:

Write a program to apply any five filter feature selection techniques.

Background Study:

Feature selection is the process of reducing the number of input variables when developing a
predictive model. It is desirable to reduce the number of input variables to both reduce the
computational cost of modeling and, in some cases, to improve the performance of the model.

Question Bank:

1. Differentiate between Feature Selection and Dimensionality Reduction.

2. What are the advantages of Wrapper methods over filter methods for feature selection?

3. Explain Regularization methods for Feature Selection.

4. What are Embedded feature selection methods.

Student Work Area

Machine Learning Lab

Manual (CSL 313)

Algorithm/Flowchart/Code/Sample Outputs

EXPERIMENT NO. 14

Student Name and Roll Number:

Semester /Section:

Link to Code:
93

Machine Learning Lab

Manual (CSL 313)

Date:

Faculty Signature:

Grade:

Objective(s):

● To understand the importance of feature selection

● To differentiate between different types of feature selection.

● To understand the importance of Hybrid Feature Selection Approaches.

● Build a model using feature selection techniques.

Outcome:

Students will be familiarized with model building using feature selection techniques and
optimization.

Problem Statement:

Write a program to perform Recursive Feature Elimination.

Background Study: Recursive Feature Elimination (RFE) is popular feature selection method
because it is easy to configure and use and because it is effective at selecting those features
(columns) in a training dataset that are more or most relevant in predicting the target variable.
There are two important configuration options when using RFE: the choice in the number of
features to select and the choice of the algorithm used to help choose features. Both of these
hyperparameters can be explored, although the performance of the method is not strongly
94

Machine Learning Lab

Manual (CSL 313)

dependent on these hyperparameters being configured well.

Question Bank:

1. What is Recursive Feature Elimination?

2. What are Hybrid Feature Selection Techniques?

3. Differentiate between Recursive Feature Elimination and Step Backward Feature Selection.
95

Machine Learning Lab

Manual (CSL 313)

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
96

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 15

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● To study Ensemble ML algorithms.

● To build an efficient model using Ensemble Machine Learning algorithms.

● To understand and implement boosting.

Outcome:

Students will be able to understand how to build a highly efficient ML model (with high
performance), faster.

Problem Statement:
97

Machine Learning Lab

Manual (CSL 313)

To apply the XGBoost algorithm on two datasets.

Background Study: XGBoost stands for “Extreme Gradient Boosting”. XGBoost is a decision-
tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework.
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient,
flexible and portable. It implements Machine Learning algorithms under the Gradient Boosting
framework. It provides a parallel tree boosting to solve many data science problems in a fast and
accurate way.

Question Bank:

1. What are Ensemble ML algorithms? What is their advantage?

2. Differentiate between bagging and boosting.

3. Differentiate between Boosting, Gradient Boosting and Extreme Gradient Boosting.

4. What are the advantages offered by the XGBoost algorithm that make it superior to most
existing ML algorithms?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
98

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 16

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):
99

Machine Learning Lab

Manual (CSL 313)
100

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 17

Student Name and Roll Number:

Semester /Section:

Link to Code:
101

Machine Learning Lab

Manual (CSL 313)

Date:

Faculty Signature:

Grade:

Objective(s):

● Revise the concepts of supervised and unsupervised learning.

● Use supervised, unsupervised, ensemble ML approaches to solve a real-world problem.

● Build a ML model from scratch and understand the issues involved.

Outcome:

Students will be able to apply the model building concepts learned during ML and use to them
solve a real-world problem.

Problem Statement:

Build a ML model from scratch using and comparing supervised, unsupervised ML approaches.

Background Study:

In machine learning, algorithms are trained according to the data available and the research
question at hand. But if researchers fail to identify the objective of the machine learning algorithm,
they will not be able to build an accurate model. Ultimately, the researchers need to identify the
problem at hand and which approaches would be more suitable to address the problem.

Question Bank:
102

Machine Learning Lab

Manual (CSL 313)

1. What is the difference between supervised and unsupervised learning?

2. What is the bias-variance tradeoff in ML?

3. List some unsupervised and supervised ML approaches.

4. In what situations should you use supervised vs unsupervised ML?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
103

Machine Learning Lab

Manual (CSL 313)

EXPERIMENT NO. 18

Student Name and Roll Number:

Semester /Section:

Link to Code:

Date:

Faculty Signature:

Grade:

Objective(s):

● Revise the concepts of supervised and unsupervised learning.

● Use feature selection to aid the solution to a real-world problem.

● Build a ML model from scratch and understand the issues involved.

● Perform extensive feature selection and dimensionality reduction.

Outcome:

Students will be able to apply the feature selection a and dimensionality reduction techniques
learned in ML and apply them to solve a real-world problem.
104

Machine Learning Lab

Manual (CSL 313)

Problem Statement:

Build a ML model from scratch applying and comparing filter, wrapper, embedded and hybrid
feature selection approaches and dimensionality reduction.

Background Study:

In machine learning, algorithms are trained according to the data available and the research
question at hand. But if researchers fail to identify the useful features from the not so useful
features, tis process can take a lot of time and resources. To optimize model building and
selection, students should be able to understand and incorporate feature selection and
dimensionality reduction.

Question Bank:

1. What is the need of feature selection and dimensionality reduction?

2. What are the embedded methods of feature selection in ML?

Student Work Area

Algorithm/Flowchart/Code/Sample Outputs
105

Machine Learning Lab

Manual (CSL 313)

Annexure 2

Machine Learning
CSL313

Project Report
106

Machine Learning Lab

Manual (CSL 313)

Faculty name: Student name:

Roll No.:

Semester:

Group:

Department of Computer Science and Engineering

The NorthCap University, Gurugram- 122001, India

Session 2024-2025

Table of Contents
S.No Page
No.
1. Project Description

2. Problem Statement

3. Analysis

3.1 Hardware Requirements

107

Machine Learning Lab

Manual (CSL 313)

3.2 Software Requirements

4. Design

4.1 Data/Input Output Description:

4.2 Algorithmic Approach / Algorithm / DFD / ER

diagram/Program Steps

5. Implementation and Testing (stage/module

wise)

6. Output (Screenshots)

7. Conclusion and Future Scope

USAID/BHA Resilience Food Security Guide
No ratings yet
USAID/BHA Resilience Food Security Guide
143 pages
SH - Fall of Troy Semi Fiction PDF
No ratings yet
SH - Fall of Troy Semi Fiction PDF
11 pages
Machine Learning Syllabus V2
No ratings yet
Machine Learning Syllabus V2
5 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
6 pages
Course Code CSA400 8 Course Type LTP Credits 4: Applied Machine Learning
No ratings yet
Course Code CSA400 8 Course Type LTP Credits 4: Applied Machine Learning
3 pages
CSAXXXX Applied Machine Learning
No ratings yet
CSAXXXX Applied Machine Learning
3 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
SML Lab Manuel
No ratings yet
SML Lab Manuel
24 pages
B.Tech Statistical ML Course
No ratings yet
B.Tech Statistical ML Course
3 pages
Lahore University of Management Sciences CS 535/EE 514 Machine Learning
No ratings yet
Lahore University of Management Sciences CS 535/EE 514 Machine Learning
3 pages
Machine Learning Syllabus BCA Eight Semester
No ratings yet
Machine Learning Syllabus BCA Eight Semester
2 pages
Advanced Machine Learning Course
No ratings yet
Advanced Machine Learning Course
4 pages
ML Syllabus Updated E13137
No ratings yet
ML Syllabus Updated E13137
7 pages
SYLLABUS
No ratings yet
SYLLABUS
3 pages
21CSC305P Machine Learning C Professional Core L T P C 2 1 0 3
No ratings yet
21CSC305P Machine Learning C Professional Core L T P C 2 1 0 3
2 pages
Guidelines Machine Learning
No ratings yet
Guidelines Machine Learning
2 pages
ML Lab Manual 18csl76 1
No ratings yet
ML Lab Manual 18csl76 1
54 pages
Syllabus ML
No ratings yet
Syllabus ML
2 pages
21CSU393 Kunal Verma - AI&ML Lab Manual
No ratings yet
21CSU393 Kunal Verma - AI&ML Lab Manual
71 pages
Machine Learning Lab Manual 17CSL76
No ratings yet
Machine Learning Lab Manual 17CSL76
57 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
2 pages
ML With Python Syllabus
No ratings yet
ML With Python Syllabus
2 pages
Mchine Learning Outlines
No ratings yet
Mchine Learning Outlines
4 pages
Hands-On Exercise No. 1 Batch-02 Graphic Design Total Marks: 10 Due Date: 04/08/2022
No ratings yet
Hands-On Exercise No. 1 Batch-02 Graphic Design Total Marks: 10 Due Date: 04/08/2022
3 pages
Vtu ML Lab Manual
67% (3)
Vtu ML Lab Manual
47 pages
Machine Learning - Syllabus - TT1
No ratings yet
Machine Learning - Syllabus - TT1
4 pages
2CSOE51-ML - Course Policy
No ratings yet
2CSOE51-ML - Course Policy
7 pages
Frequency Hopping Network Implementation and Planning: Number/Version Checked by Approved by 1.0.0 23 Oct 98 Jry 1
No ratings yet
Frequency Hopping Network Implementation and Planning: Number/Version Checked by Approved by 1.0.0 23 Oct 98 Jry 1
79 pages
ML With PYTHON LAB
No ratings yet
ML With PYTHON LAB
53 pages
C# Chapter 8
No ratings yet
C# Chapter 8
34 pages
ML Lab Manual-18csl76
No ratings yet
ML Lab Manual-18csl76
52 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
54 pages
Machine Learning - II Syllabus
No ratings yet
Machine Learning - II Syllabus
6 pages
Machine Learning
100% (1)
Machine Learning
65 pages
I Am Neha Jain, Ph.D. Research Scholar of JJT University Jhunjhunu, Doing A Research On "To Study
No ratings yet
I Am Neha Jain, Ph.D. Research Scholar of JJT University Jhunjhunu, Doing A Research On "To Study
5 pages
22cs503 Machine Learning - Unit - III
No ratings yet
22cs503 Machine Learning - Unit - III
73 pages
Swastika 21csu450 ML-1
No ratings yet
Swastika 21csu450 ML-1
82 pages
Office of The Sangguniang Kabataan
No ratings yet
Office of The Sangguniang Kabataan
5 pages
23ca22p1 - Machine Learning Lab
No ratings yet
23ca22p1 - Machine Learning Lab
2 pages
ML Lab Manual
No ratings yet
ML Lab Manual
53 pages
Machine Learning
No ratings yet
Machine Learning
66 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
Astha ML Manual
No ratings yet
Astha ML Manual
56 pages
Maintaining Training Facilities
No ratings yet
Maintaining Training Facilities
97 pages
Olp 34 35 38 Optical Power Meter Manual User Guide en
No ratings yet
Olp 34 35 38 Optical Power Meter Manual User Guide en
36 pages
ML Final
No ratings yet
ML Final
80 pages
Taral ML Final
No ratings yet
Taral ML Final
54 pages
ML Final
No ratings yet
ML Final
72 pages
ILogic and The Inventor API
No ratings yet
ILogic and The Inventor API
20 pages
Document 1
No ratings yet
Document 1
6 pages
ML CP-23-24 EVEN As On 81.25
No ratings yet
ML CP-23-24 EVEN As On 81.25
13 pages
Assignment 1 - Linear Programming I - With Answers
No ratings yet
Assignment 1 - Linear Programming I - With Answers
2 pages
Bcse0731 - Machine Learning Lab
No ratings yet
Bcse0731 - Machine Learning Lab
1 page
Module1 - ARM Microcontroller MIT Portrait
100% (2)
Module1 - ARM Microcontroller MIT Portrait
21 pages
1947 Benscoter, Stanley PDF
No ratings yet
1947 Benscoter, Stanley PDF
258 pages
GR. TECHLAM Ink Specifications
No ratings yet
GR. TECHLAM Ink Specifications
1 page
ML Syllabus - Sem VII - Mumbai University
No ratings yet
ML Syllabus - Sem VII - Mumbai University
3 pages
Macros For Mine Planning Engineer
No ratings yet
Macros For Mine Planning Engineer
8 pages
Telemecanique ZCKE67 Datasheet
No ratings yet
Telemecanique ZCKE67 Datasheet
12 pages
T20 and T24 SP and AP
No ratings yet
T20 and T24 SP and AP
2 pages
ML Practical Format
No ratings yet
ML Practical Format
82 pages
Powers and Exponents Unit (Fall 2015)
No ratings yet
Powers and Exponents Unit (Fall 2015)
48 pages
ML Labmanual
No ratings yet
ML Labmanual
29 pages
AR23 Machine Learning LAB
No ratings yet
AR23 Machine Learning LAB
3 pages
DL LWB
No ratings yet
DL LWB
116 pages
Log
No ratings yet
Log
476 pages
Bcse0701 - Introduction To Machine Learning
No ratings yet
Bcse0701 - Introduction To Machine Learning
1 page
CP - Theory - ML
No ratings yet
CP - Theory - ML
6 pages
Thesis Body Structure
100% (3)
Thesis Body Structure
7 pages
Mac Unit 3
No ratings yet
Mac Unit 3
65 pages
Websys
No ratings yet
Websys
1 page
Python Note 5
No ratings yet
Python Note 5
10 pages
ML Syllabus - 1
No ratings yet
ML Syllabus - 1
5 pages
R22 - B.tech CSE (AI & ML) Minors Syllabus - 20-11-2024
No ratings yet
R22 - B.tech CSE (AI & ML) Minors Syllabus - 20-11-2024
9 pages
AIML Theory + Lab
No ratings yet
AIML Theory + Lab
6 pages
ML Coursefile Final 23.01.25
No ratings yet
ML Coursefile Final 23.01.25
235 pages
Python - Roshni (3)
No ratings yet
Python - Roshni (3)
5 pages
J Jfoodeng 2018 01 016
No ratings yet
J Jfoodeng 2018 01 016
8 pages
Electrolux 102255 User Manual
No ratings yet
Electrolux 102255 User Manual
2 pages
DBMS Lab Report
No ratings yet
DBMS Lab Report
19 pages
IE PSheet
No ratings yet
IE PSheet
3 pages
Data Science Interview Questions (Healthcare)
No ratings yet
Data Science Interview Questions (Healthcare)
19 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
Aml 1,2
No ratings yet
Aml 1,2
17 pages
01CE0524 ML Syllabuspdf 2025 07 14 12 57 18
No ratings yet
01CE0524 ML Syllabuspdf 2025 07 14 12 57 18
4 pages
Freedom-Ticket 01-2 Notes
No ratings yet
Freedom-Ticket 01-2 Notes
10 pages
Elektronik Soalan KVSkills Zon PDF
No ratings yet
Elektronik Soalan KVSkills Zon PDF
19 pages
Daa (22cs102004) V-Sem Syllabus
No ratings yet
Daa (22cs102004) V-Sem Syllabus
32 pages
ML Course Aug2025
No ratings yet
ML Course Aug2025
6 pages