Module 1 ML 2025
Summative Assessment (35 Marks)
Instructions: Attempt any 2 Qs from 2 Marks, 5 Marks, 10 Marks each. One marks
shall be granted for neat work (handwriting)
Sr
No Questions RBT Marks
.
Explain the key idea behind Machine Learning and how it differs from traditional
1 U 2
programming.
2 List down the steps in Data Cleaning. R 2
3 State examples of categorical and numerical features. R 2
What can be said about the value of r if the points on the scatter diagram indicate that as
4 U 2
one variable increases the other variable tends to decrease. Relate your answer.
You want to develop a machine learning algorithm which predicts the number of views on
the articles. Your analysis is based on features like author name, number of articles and a
5 U 2
few other features. Relate which evaluation metric would you choose in that case and
why?
Explain an example of a visual or statistical technique used in EDA to identify outliers in a
6 U 2
dataset.
7 Suppose you are given 7 Scatter plots 1-7 (left to right) and you want to compare Pearson
correlation coefficients between variables of each scatterplot. Demonstrate the right order?
1. 1<2<3<4
2. 1>2>3 > 4
3. 7<6<5<4
4. 7>6>5>4
U 2
Compare and contrast the approaches to Machine Learning in the early days with the
8 U 5
current state of the field.
Explain how reinforcement learning aligns with the idea of learning from trial and error,
9 U 5
using real-world examples.
Illustrate which technique can be applicable for which application technique
a Supervised classification
b Supervised regression
c Unsupervised learning
d Outlier analysis
e reinforcement
10 U 5
to application
i Credit card fraud detection
ii Word frequency of a featured article
iii Identifying whether a mail is spam or not
iv Predicting the price of stock
v stock price prediction
11 Explain ML algorithmic tradeoff with diagram of accuracy vs interpretability. U 5
Explain how data visualization tools aid in understanding the distribution of data points
12 U 5
during Exploratory Data Analysis.
For the given data points, illustrate the quartile values (Q1, Q3), median, min and max
13 U 5
values for creating a box plot: 3,3,6,7,9,10,12,11,8
Interpret the fundamental differences between supervised and unsupervised learning with
14 diagram. How do these approaches handle data, and what are the typical use cases where U 5
each type of learning is applied?
15 Explain essential Python libraries for data visualizaiton. U 10
16 Explain any five univariate or bivariate plots with example. U 10
Explain the intricate relationship between AI and ML. Explain how machine learning
17 techniques enable the development of intelligent AI systems. Discuss the challenges faced U 10
when integrating AI and ML and propose potential solutions.
Outline note on following usecases of ML in real life:
Smart phones
Transportation
18 U 10
Web Services
Sales and Marketing
Financial Domain
How is a missing value represented? Discuss the types and ways of dealing with missing
19 U 10
values.
Compare and contrast supervised, unsupervised, and reinforcement learning approaches in
20 machine learning. Use concrete examples to illustrate the differences and showcase U 10
scenarios where one approach outperforms the others.
Discuss the role of data visualization in EDA. How do different types of visualizations,
21 such as histograms, scatter plots, box plots, heatmap and barplot contribute to uncovering U 10
patterns, outliers, and relationships within a dataset?
Module 2 ML 2025
Summative Assessment (35 Marks)
Instructions: Attempt any 2 Qs from 2 Marks, 5 Marks, 10 Marks each. One marks
shall be granted for neat work (handwriting)
22 Explain the concept of feature engineering in machine learning. U 2
Discuss why feature engineering is important for building machine learning
23 U 2
models.
24 How would you approach feature engineering for a dataset with missing values? AN 2
25 Define feature transformation in the context of machine learning. R 2
26 Explain the concept of filter-based feature selection methods. U 2
27 Describe the concept of dimensionality reduction in feature selection. U 2
What is automated feature engineering, and how does it differ from manual
28 R 2
feature engineering?
Evaluate the performance of a machine learning model before and after applying
29 AN 5
feature selection techniques.
Compare and contrast between filter, wrapper, and embedded feature selection
30 U 5
methods.
31 How would you use a tool to automate feature engineering for a dataset? AN 5
A researcher wants to determine if there is a significant difference between the
expected and observed frequencies in a categorical dataset. The expected
32 A 5
frequencies are [20, 30, 50] and the observed frequencies are [22, 29, 49].
Calculate the Chi-Square statistic.
33 Explain the concept of the Fisher Score and its importance in feature selection. U 5
Describe the role of automated machine learning (AutoML) tools like EvalML in
34 U 5
feature engineering and model selection.
Analyze the trade-offs between manual and automated feature engineering. What
35 AN 5
are the key considerations?
36 Differentiate between Standardisation and Normalization with an help of example. AN 10
A study was conducted to see if there is an association between gender (male,
female) and preference for a new product (like, dislike). The observed frequencies
are given in the table below:
Perform a chi-square test ,given critical value from the chi-square distribution
table at the 0.05 significance level is 5.991
37 AN 10
Describe three common feature transformation techniques and provide a use case
38 U 10
for each.
A researcher wants to determine if there is a significant difference in the number of
defective products produced by three different machines. The observed frequencies
of defects are given below:
Perform a chi-square test ,given critical value from the chi-square distribution table
at the 0.05 significance level is 5.991
39 AN 10
40 Analyze the impact of different feature transformation techniques on a dataset's AN 10
distribution and model performance. How do these techniques influence the
outcome of a machine learning model?
Compare and contrast filter methods, wrapper methods, and embedded methods
41 U 10
for feature selection. Provide examples of each method.
42 Inspect how feature engineering can be applied to Numeric and Categorical data. AN 10
Module 3 ML 2025
Summative Assessment (35 Marks)
Instructions: Attempt any 2 Qs from 2 Marks, 5 Marks, 10 Marks each. One marks
shall be granted for neat work (handwriting)
Question RBT Marks
43 Explain Naïve Bayes assumptions. U 2
44 Explain residual with example. U 2
45 Explain coefficient of determination with equation. U 2
46 Explain what does a confusion matrix represent in the context of classification? U 2
47 State Bayes' theorem. R 2
48 Define Bias and Variance. R 2
49 Describe assumptions of Linear Regression. U 2
50 Explain SSE, MSE, MAE with mathematical equations. U 5
51 Compare ridge and LASSO regression. U 5
52 Explain ODDS ratio and logit transformation with appropriate mathematical
U 5
equation and range.
53 Discuss the steps involved in time series analysis, from data collection to
U 5
forecasting. Illustrate each step with a brief explanation and an example.
54 Define overfitting and underfitting in the context of machine learning models R 5
55 Discuss the challenges posed by multicollinearity in multiple linear regression.
How can multicollinearity affect the stability of coefficient estimates and the
U 5
interpretability of the model? What techniques can be employed to mitigate
multicollinearity's impact?
56 A company has two machines, A and B. Machine A produces 60% of the total
products, and Machine B produces 40% of the total products. The probability that a
product produced by Machine A is defective is 3%, while the probability that a
A 5
product produced by Machine B is defective is 5%. If a randomly selected product
is found to be defective, identify the probability that it was produced by Machine
A?
57 Identify relationship b/w height and weight of students by creating a relationship
model for given data. Compute Karl Pearson coefficient and Coefficient of
determination.
A 10
58 Explain following usecases for linear regression in detail:
a)Healthcare U 10
b) Demand forecasting
59 Make use of Logistic Regression cost function and apply Gradient Descent onto it.
A 10
Show mathematical steps.
60 A company wants to classify emails as "Spam" or "Not Spam" based on the
occurrence of certain words. They have the following training data. Using the
Naive Bayes classifier, predict whether a new email containing both the words
"Offer" and "Win" is spam.
Email Contains “offer” Contains “win” Spam/not Spam
A 10
1 Y Y Spam
2 Y N not Spam
3 N Y Spam
4 Y Y Spam
5 N N not Spam
61 Discuss the Naive Bayes algorithm in detail, including its underlying principles,
key assumptions, types of Naive Bayes classifiers, advantages, disadvantages, and U 10
common applications. Illustrate with examples where necessary.
62 Explain following aspects of logistic regression: Model and Hypothesis, Sigmoid
U 10
Function, Cost Function, Parameter Estimation and Use Cases.
63 Following table shows mid term and final exam grades for a student in database
course Use the method of least square using regression to identify the final exam
grade of a student who received 86 on mid term exam
A 10
Formative Assessment (MCQ 35 Marks) attached as Annexure
Note: Average of Formative and Summative Assessment shall be taken module wise.