Deepanshu Machine Learning
Deepanshu Machine Learning
Lab Manual
Department of Computer Science and Engineering
The NorthCap University, Gurugram
2
Session 2024-2025
4
Published by:
© Copyright Reserved
Copying or facilitating copying of lab work comes under cheating and is considered as use
of unfair means. Students indulging in copying or facilitating copying shall be awarded zero
marks for that particular experiment. Frequent cases of copying may lead to disciplinary
action. Attendance in lab classes is mandatory.
Labs are open up to 7 PM upon request. Students are encouraged to make full use of labs
beyond normal lab hours.
5
PREFACE
Machine Learning Lab Manual is designed to meet the course and program requirements of
NCU curriculum for B.Tech III year students of CSE branch. The concept of the lab work is
to give brief practical experience for basic lab skills to students. It provides the space and
scope for self-study so that students can come up with new and creative ideas.
The Lab manual is written on the basis of “teach yourself pattern” and expected that
students who come with proper preparation should be able to perform the experiments
without any difficulty. Brief introduction to each experiment with information about self-
study material is provided. The pre-requisite is having a basic working knowledge of
Python. The laboratory exercises will include familiarization with unsupervised ML like
Clustering and Association rule mining, introduction to Reinforcement Learning; Feature
Selection and Dimensionality Reduction. Students would learn the algorithms pertaining to
these and implement the same using a high-level language, i.e. Python. Students are
expected to come thoroughly prepared for the lab. General disciplines, safety guidelines
and report writing are also discussed.
The lab manual is a part of curriculum for the The NorthCap University, Gurugram.
Teacher’s copy of the experimental results and answer for the questions are available as
sample guidelines.
We hope that lab manual would be useful to students of CSE, IT, ECE and BSc branches and
author requests the readers to kindly forward their suggestions / constructive criticism for
further improvement of the work book.
Author expresses deep gratitude to Members, Governing Body-NCU for encouragement and
motivation.
Authors
The NorthCap University
6
Gurugram, India
CONTENTS
Page
S.N. Details No.
Syllabus 6
1 Introduction 9
2 Lab Requirement 10
3 General Instructions 11
4 List of Experiments 13
6 List of Projects 16
7 Rubrics 17
SYLLABUS
1. Department: Department of Computer Science and Engineering
6. Type of Course
jd
(Check one): Programme Core Programme Elective ✔ Open Elective
8. Frequency of offering (check one): Odd Even ✔ Either semester Every semester
9. Brief Syllabus:
Overview to machine learning and pre-processing concepts, Model Selection, Model Selection, XGBoost. Feature
Selection- Filter and Wrapper, Dimensionality Reduction, Principal Component Analysis PCA, Linear Discriminant
Analysis LDA, Kernel PCA, Introduction to Self-Organizing Maps (SOM), Building a Self-Organizing Map. Overview
of clustering in machine learning, Different categories of clustering algorithms, similarity/distance measures, K
Means algorithm, Hierarchical, DBSCan, Fuzzy C-Means , Agglomerative clustering algorithm, Expectation
maximization (EM) for soft clustering. Semi-supervised learning with EM using labeled and unlabled data.,
Evaluation methods, A case study with clustering implementation, Eclat, Recommendation System , User Based
Recommendation System, Item Based Recommendation System, Matrix-Factorization Recommendation System
Total lecture, Tutorial and Practical Hours for this course (Take 15 teaching weeks per semester): 90 hours
Practice
Understand the difference between supervised and unsupervised approaches and design the model
CO 1
with no training data.
CO 2 Implement the methods to find frequent Patterns and associations in the patterns.
Implement the real-world problems with model selection and optimized feature selection for further
CO 5
processing of the data.
Content Summary:
A quick recap to introduction to machine learning, Overview of clustering in machine learning, Different categories
of clustering algorithms, similarity/distance measures, K Means algorithm, Hierarchical Clustering, Advantages,
limitations and comparison.
Content Summary:
Basic concepts of frequent pattern mining, Frequent Itemset mining methods, Apriori algorithm, Eclat algorithm,
Vertical Data format method, pattern evaluation methods, association analysis to correlation analysis.
Content Summary: Introduction to Recommendation System, User Based Recommendation System, Item Based
Recommendation System, Matrix-Factorization Recommendation System.
Content Summary:
Introduction to dimensionality reduction, Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA),
Kernel PCA.
Content Summary: Pre-processing concepts, Model Selection, Model Selection, XGBoost. Feature Selection-
Filter and Wrapper, Introduction to Self-Organizing Maps (SOM), Building a Self-Organizing Map
12. Brief Description of Self-learning components by students (through books/resource material etc.):
Data preprocessing techniques
1. Machine Learning using Python Paperback – 1 January 2019, by U Dinesh Kumar, Manaranjan Pradhan, Wiley,
https://www.amazon.in/Machine-Learning-Python-Manaranjan-Pradhan/dp/8126579900.
2. Matthew Kirk, “Thoughtful Machine Learning: A Test-Driven Approach” , First Edition, O'Reilly Publications, 2014.
Reference Websites: (nptel, swayam, coursera, edx, udemy, lms, official documentation weblink)
● https://www.coursera.org/learn/practical-rl#syllabus
● https://nptel.ac.in/courses/106/106/106106139/
● https://www.coursera.org/learn/machine-learning
1. INTRODUCTION
OBJECTIVE:
The purpose of conducting experiments can be stated as follows:
● To familiarize the students with the basic concepts of Machine Learning like
supervised, unsupervised and reinforcement learning.
● The lab sessions will be based on exploring the concepts discussed in class.
● Hands on experience
12
2. LAB REQUIREMENTS
1 Software Python 3.
Requirements
4 Required Bandwidth NA
13
3. GENERAL INSTRUCTIONS
● Students must turn up in time and contact concerned faculty for the experiment
they are supposed to perform.
● Students will not be allowed to enter late in the lab.
● Students will not leave the class till the period is over.
● Damaging lab equipment or removing any component from the lab may invite
penalties and strict disciplinary action.
3.2 Attendance
● Students should not attend a different lab group/section other than the one
assigned at the beginning of the session.
● On account of illness or some family problems, if a student misses his/her lab
classes, he/she may be assigned a different group to make up the losses in
consultation with the concerned faculty / lab instructor. Or he/she may work
in the lab during spare/extra hours to complete the experiment. No attendance
will be granted for such case.
● Students should come to the lab thoroughly prepared on the experiments they
are assigned to perform on that day. Brief introduction to each experiment
with information about self study reference is provided on LMS.
● Students must bring the lab report during each practical class with written
records of the last experiments performed complete in all respect.
● Each student is required to write a complete report of the experiment he has
performed and bring to lab class for evaluation in the next working lab.
Sufficient space in work book is provided for independent writing of theory,
observation, calculation and conclusion.
15
● Students should follow the Zero tolerance policy for copying / plagiarism. Zero
marks will be awarded if found copied. If caught further, it will lead to
disciplinary action.
● Refer Annexure 1 for Lab Report Format
4. LIST OF EXPERIMENTS
Sr. Title of the Experiment Software Unit CO
No. used Covered Covered
16
0. Python 4 CO4
Reduce dimensionality of Iris dataset using PCA
(Jupyter)
0. Python 4 CO4
Reduce dimensionality of Iris dataset using LDA
(Jupyter)
0. To apply five Filter feature selection techniques. Python 5 CO5
(Jupyter)
0. To apply Recursive Feature Elimination. Python 5 CO5
(Jupyter)
0. To apply XGBoost on two datasets. Python 5 CO5
(Jupyter)
0. Construct Self Organizing Maps on Iris dataset. Python 5 CO5
(Jupyter)
17. Build a ML model from scratch using and comparing Python 1,2,3,4 CO1,CO2,
supervised, unsupervised and ensemble ML (Jupyter) CO3,CO4
approaches.
18. Build a ML model from scratch applying and comparing Python 1,2,3,4 CO1,CO2,
filter, wrapper, embedded and hybrid feature selection (Jupyter) CO3,CO4
approaches
17
5.1 Project – supervised and unsupervised learning based model on market basket
analysis
5.2 Competition on Kaggle
18
6. LIST OF PROJECTS
19
1. Titanic Challenge: The sinking of the Titanic is one of the most infamous
shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely
considered “unsinkable” RMS Titanic sank after colliding with an iceberg.
Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in
the death of 1502 out of 2224 passengers and crew. While there was some
element of luck involved in surviving, it seems some groups of people were more
likely to survive than others. In this project, the students need to build a predictive
model that answers the question: “what sorts of people were more likely to
survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
2. House Price Prediction Using Advanced Regression Techniques: Ask a home buyer
to describe their dream house, and they probably won't begin with the height of
the basement ceiling or the proximity to an east-west railroad. But Kaggle’s
advanced house price prediction dataset proves that much more influences price
negotiations than the number of bedrooms or a white-picket fence. With 79
explanatory variables describing (almost) every aspect of residential homes in
Ames, Iowa, this dataset can be used to predict the final price of each home.
3. Mechanism of Action (MoA) Prediction: Mechanism of action means the
biochemical interactions through which a drug generates its pharmacological
effect. If we know a disease affects some particular receptor or downstream set of
cell activity, we can develop drugs faster if we can predict how cells and genes
affect various receptor sites. Using a dataset that combines gene expression and
cell viability data in addition to the MoA annotations of more than 5,000 drugs. In
this, each drug was tested under two dose (cp_dose) and three times (cp_time). So,
six samples basically correspond to one drug. We need to train a model that
classifies drugs based on their biological activity. This problem is a multi-label
classification, which means we have multiple targets (not multiple classes). In this
project, perform explanatory data analysis and then train a model using deep
neural networks with Keras.
7. RUBRICS
20
Marks Distribution
Each experiment shall be evaluated for 10Both the projects shall be evaluated for
marks and at the end of the semester 30 marks each and at the end of the
proportional marks shall be awarded out semester viva will be conducted related
of total 15. to the projects as well as concepts
learned in labs and this component
Following is the breakup of 10 marks for carries 20 marks.
each
4 Marks: Observation & conduct of
experiment. Teacher may ask questions
about experiment.
3 Marks: For report writing
3 Marks: For the 15 minutes quiz to be
conducted in every lab.
21
Annexure 1
Machine Learning
(CSL 313)
Roll No.:22csu504
Semester:6th
Group:DS-B
EXPERIMENT NO. 1
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Problem Statement:
Perform K-means clustering Algorithm. Write the python code to find Inter-cluster distance and Intra-Cluster
Distance.
Background Study: K-means clustering is a method of vector quantization, originally from signal
processing, that aims to partition n observations into k clusters in which each observation belongs
to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of
the cluster.
Question Bank:
2. What is clustering?
import pandas as pd
import numpy as np
df = pd.read_csv(file_path)
df = df[features]
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Fare'].fillna(df['Fare'].median(), inplace=True)
df['Sex'] = LabelEncoder().fit_transform(df['Sex'])
26
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
for k in k_range:
kmeans.fit(df_scaled)
plt.figure(figsize=(8, 5))
plt.show()
27
df['Cluster'] = kmeans.fit_predict(df_scaled)
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df_scaled)
df['PCA1'] = df_pca[:, 0]
df['PCA2'] = df_pca[:, 1]
plt.figure(figsize=(10, 6))
plt.scatter(pca.transform(kmeans.cluster_centers_)[:, 0],
pca.transform(kmeans.cluster_centers_)[:, 1],
plt.legend()
plt.show()
29
EXPERIMENT NO. 2
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
● Understand supervised and unsupervised learning.
● Study about how to find the number of clusters in Kmeans clustering approach
Outcome:
Students would be familiarized with unsupervised learning and K-means clustering in particular.
Problem Statement:
Perform K means Clustering on given dataset. Identify number of clusters using elbow method.
Background Study:
K-means clustering is a method of vector quantization, originally from signal processing, that aims
to partition n observations into k clusters in which each observation belongs to the cluster with the
nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
In cluster analysis, the elbow method is a heuristic used in determining the number of clusters
in a data set. The method consists of plotting the explained variation as a function of the number
of clusters, and picking the elbow of the curve as the number of clusters to use.
Question Bank:
Algorithm/Flowchart/Code/Sample Outputs
import pandas as pd
import numpy as np
df = pd.read_csv(file_path)
df = df[features]
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Fare'].fillna(df['Fare'].median(), inplace=True)
33
df['Sex'] = LabelEncoder().fit_transform(df['Sex'])
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
for k in k_range:
kmeans.fit(df_scaled)
plt.figure(figsize=(8, 5))
plt.show()
df['Cluster'] = kmeans.fit_predict(df_scaled)
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df_scaled)
df['PCA1'] = df_pca[:, 0]
df['PCA2'] = df_pca[:, 1]
plt.figure(figsize=(10, 6))
plt.scatter(pca.transform(kmeans.cluster_centers_)[:, 0],
pca.transform(kmeans.cluster_centers_)[:, 1],
plt.legend()
plt.show()
36
EXPERIMENT NO. 3
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Problem Statement:
Perform hierarchical clustering on given dataset. Identify number of clusters using dendograms
Background Study:
In data mining and statistics, hierarchical clustering (also called hierarchical cluster
analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters.
39
Question Bank:
1. What is a dendrogram?
4. What are the different distance metrics that can be used for Hierarchical clustering?
import numpy as np
titanic_data = pd.read_csv(file_path)
data = titanic_data[columns_to_use].copy()
print(data.isnull().sum())
data['Embarked'] = data['Embarked'].fillna(data['Embarked'].mode()[0]) #
Replace NaN in Embarked with mode
print(data.isnull().sum())
label_encoder = LabelEncoder()
data['Sex'] = label_encoder.fit_transform(data['Sex'])
data['Embarked'] = label_encoder.fit_transform(data['Embarked'])
if data.isnull().sum().any():
else:
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
42
print(euclidean_distances[:5, :5])
K = range(1, 11)
for k in K:
kmeans.fit(scaled_data)
sse.append(kmeans.inertia_)
plt.figure(figsize=(8, 5))
plt.xticks(K)
plt.show()
plt.figure(figsize=(10, 6))
plt.ylabel("Distance")
plt.show()
titanic_data['Cluster'] = clusters
print(titanic_data[['Name', 'Cluster']].head(10))
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)
plt.figure(figsize=(8, 6))
plt.legend()
plt.show()
46
EXPERIMENT NO. 4
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
● Implement both Hierarchical and K-means clustering algorithm on any one dataset.
Problem Statement:
Compare K-means and Hierarchical clustering for any one dataset. Calculate the Silhouette Score.
Background Study:
K-means is method of cluster analysis using a pre-specified no. of clusters. It requires advance
51
knowledge of ‘K’. Hierarchical clustering also known as hierarchical cluster analysis (HCA) is also
a method of cluster analysis which seeks to build a hierarchy of clusters without having fixed
number of clusters. In hierarchical clustering one can stop at any number of clusters and find
appropriate clusters by interpreting the dendrogram.
Question Bank:
import pandas as pd
import numpy as np
df = pd.read_csv(file_path)
df['Age'] = df['Age'].fillna(df['Age'].median())
df['Fare'] = df['Fare'].fillna(df['Fare'].median())
df['Sex'] = LabelEncoder().fit_transform(df['Sex'])
scaler = StandardScaler()
53
df_scaled = scaler.fit_transform(df)
k = 3 # Number of clusters
df['KMeans_Cluster'] = kmeans.fit_predict(df_scaled)
df['Hierarchical_Cluster'] = hierarchical.fit_predict(df_scaled)
hierarchical_silhouette = silhouette_score(df_scaled,
df['Hierarchical_Cluster'])
# Plot Dendrogram
plt.figure(figsize=(10, 5))
dendrogram(linked)
plt.xlabel("Data Points")
plt.ylabel("Distance")
plt.show()
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df_scaled)
df['PCA1'] = df_pca[:, 0]
df['PCA2'] = df_pca[:, 1]
plt.figure(figsize=(10, 5))
plt.scatter(pca.transform(kmeans.cluster_centers_)[:, 0],
pca.transform(kmeans.cluster_centers_)[:, 1],
55
plt.legend()
plt.show()
plt.figure(figsize=(10, 5))
sns.scatterplot(x=df['PCA1'], y=df['PCA2'],
hue=df['Hierarchical_Cluster'], palette="coolwarm", s=70)
plt.show()
56
EXPERIMENT NO. 5
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
● Study the Apriori property and Apriori Rule in Association Rule Mining.
Outcome:
Problem Statement:
Apply Apriori algorithm to identify frequent itemsets from the bakery dataset.
59
Background Study:
Apriori is an algorithm for frequent item set mining and association rule learning over relational
databases. It proceeds by identifying the frequent individual items in the database and extending
them to larger and larger item sets as long as those item sets appear sufficiently often in the
database.
Question Bank:
# Load dataset
df = pd.read_csv('/content/Bakery.csv')
60
basket.set_index(transaction_col, inplace=True)
plt.figure(figsize=(10, 5))
sns.barplot(x=rules['confidence'], y=rules['antecedents'].apply(lambda x:
', '.join(list(x))), palette="viridis")
plt.xlabel("Confidence")
plt.show()
62
EXPERIMENT NO. 6
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Problem Statement:
Apply ECLAT algorithm to identify frequent itemsets from the bakery dataset.
Background Study:
64
The ECLAT algorithm stands for Equivalence Class Clustering and bottom-up Lattice Traversal. It
is one of the popular methods of Association Rule mining. It is a more efficient and scalable
version of the Apriori algorithm.
Question Bank:
import pandas as pd
# Load dataset
df = pd.read_csv('/content/Bakery.csv')
transaction_col = 'TransactionNo'
item_col = 'Items'
65
transactions = df.groupby(transaction_col)[item_col].apply(set).tolist()
itemsets = {}
if item in itemsets:
itemsets[item].add(transaction_idx)
else:
itemsets[item] = {transaction_idx}
valid_items = set(filtered_itemsets.keys())
66
new_itemsets = {}
common_transactions = set.intersection(*[filtered_itemsets[i]
for i in itemset])
new_itemsets[itemset] = common_transactions
if not new_itemsets:
filtered_itemsets.update(new_itemsets)
valid_items = set(filtered_itemsets.keys())
67
columns=['Itemset', 'Support'])
association_rules = []
confidence_values = []
if antecedent in filtered_itemsets:
support_antecedent = len(filtered_itemsets[antecedent])
68
association_rules.append((antecedent, consequent,
confidence))
confidence_values.append((f"{antecedent}
{consequent}", confidence))
plt.figure(figsize=(12, 6))
plt.barh(confidence_df['Rule'], confidence_df['Confidence'],
color='skyblue')
plt.xlabel('Confidence (%)')
plt.ylabel('Association Rule')
plt.show()
69
# Display Results
EXPERIMENT NO. 7
72
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Problem Statement:
Apply FP-growth algorithm to identify frequent itemsets from the bakery dataset.
Background Study:
candidate generation. FP growth represents frequent items in frequent pattern trees or FP-tree.
Question Bank:
1. What is a FP-tree?
import pandas as pd
# Load dataset
df = pd.read_csv('/content/Bakery.csv')
transaction_col = 'TransactionNo'
item_col = 'Items'
transactions = df.groupby(transaction_col)[item_col].apply(list).tolist()
te = TransactionEncoder()
encoded_data = te.fit(transactions).transform(transactions)
plt.figure(figsize=(12, 6))
plt.xlabel('Confidence (%)')
plt.ylabel('Association Rule')
plt.show()
# Display Results
EXPERIMENT NO. 08
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Problem Statement:
Background Study:
Recommender systems are information filtering systems that help deal with the problem of
information overload by filtering and segregating information and creating fragments out of large
amounts of dynamically generated information according to user’s preferences, interests, or
80
Question Bank:
EXPERIMENT NO. 09
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
82
Problem Statement:
Background Study:
Question Bank:
EXPERIMENT NO. 10
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
84
Objective(s):
Outcome:
Problem Statement:
Background Study:
Matrix factorization is a way to generate latent features when multiplying two different kinds of
entities. Collaborative filtering is the application of matrix factorization to identify the relationship
between items’ and users’ entities
Question Bank:
EXPERIMENT NO. 11
86
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Problem Statement:
Background Study:
Principal component analysis is a statistical technique that is used to analyze the interrelationships
among a large number of variables and to explain these variables in terms of a smaller number of
variables, called principal components, with a minimum loss of information.
87
Question Bank:
EXPERIMENT NO. 12
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Students will be familiarized with Dimensionality Reduction especially Linear Discriminant Analysis
(LDA).
89
Problem Statement:
Background Study:
Question Bank:
EXPERIMENT NO. 13
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
91
Objective(s):
Students will be familiarized with model building using feature selection techniques and
optimization.
Problem Statement:
Background Study:
Feature selection is the process of reducing the number of input variables when developing a
predictive model. It is desirable to reduce the number of input variables to both reduce the
computational cost of modeling and, in some cases, to improve the performance of the model.
Question Bank:
2. What are the advantages of Wrapper methods over filter methods for feature selection?
Algorithm/Flowchart/Code/Sample Outputs
EXPERIMENT NO. 14
Semester /Section:
Link to Code:
93
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Students will be familiarized with model building using feature selection techniques and
optimization.
Problem Statement:
Background Study: Recursive Feature Elimination (RFE) is popular feature selection method
because it is easy to configure and use and because it is effective at selecting those features
(columns) in a training dataset that are more or most relevant in predicting the target variable.
There are two important configuration options when using RFE: the choice in the number of
features to select and the choice of the algorithm used to help choose features. Both of these
hyperparameters can be explored, although the performance of the method is not strongly
94
Question Bank:
3. Differentiate between Recursive Feature Elimination and Step Backward Feature Selection.
95
EXPERIMENT NO. 15
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Students will be able to understand how to build a highly efficient ML model (with high
performance), faster.
Problem Statement:
97
Background Study: XGBoost stands for “Extreme Gradient Boosting”. XGBoost is a decision-
tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework.
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient,
flexible and portable. It implements Machine Learning algorithms under the Gradient Boosting
framework. It provides a parallel tree boosting to solve many data science problems in a fast and
accurate way.
Question Bank:
4. What are the advantages offered by the XGBoost algorithm that make it superior to most
existing ML algorithms?
EXPERIMENT NO. 16
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
99
EXPERIMENT NO. 17
Semester /Section:
Link to Code:
101
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Students will be able to apply the model building concepts learned during ML and use to them
solve a real-world problem.
Problem Statement:
Build a ML model from scratch using and comparing supervised, unsupervised ML approaches.
Background Study:
In machine learning, algorithms are trained according to the data available and the research
question at hand. But if researchers fail to identify the objective of the machine learning algorithm,
they will not be able to build an accurate model. Ultimately, the researchers need to identify the
problem at hand and which approaches would be more suitable to address the problem.
Question Bank:
102
EXPERIMENT NO. 18
Semester /Section:
Link to Code:
Date:
Faculty Signature:
Grade:
Objective(s):
Outcome:
Students will be able to apply the feature selection a and dimensionality reduction techniques
learned in ML and apply them to solve a real-world problem.
104
Problem Statement:
Build a ML model from scratch applying and comparing filter, wrapper, embedded and hybrid
feature selection approaches and dimensionality reduction.
Background Study:
In machine learning, algorithms are trained according to the data available and the research
question at hand. But if researchers fail to identify the useful features from the not so useful
features, tis process can take a lot of time and resources. To optimize model building and
selection, students should be able to understand and incorporate feature selection and
dimensionality reduction.
Question Bank:
Annexure 2
Machine Learning
CSL313
Project Report
106
Roll No.:
Semester:
Group:
Session 2024-2025
Table of Contents
S.No Page
No.
1. Project Description
2. Problem Statement
3. Analysis
4. Design
6. Output (Screenshots)