UNIT I
Q.1 Describe Machine Learning and Compare Machine Learning with Traditional
programming.[5][6]
Ans:- Machine Learning (ML): A branch of AI that focuses on creating systems that learn
from data and improve over time without being explicitly programmed.
Definition: A computer learns from experience (E) to perform tasks (T) better with time, as
measured by performance (P).
Why ML is needed:
Some problems (like face recognition) are hard to solve by writing traditional code.
ML can use data to create programs that can adapt and improve over time.
Goal of ML:
Develop systems that learn automatically.
Build programs that adapt and work in new situations.
Applications: ML is used in fields like computer vision, speech recognition, and robotics
using statistical methods to make decisions based on data.
Aspect Machine Learning Traditional Programming
Logic Logic is written manually by the
Learns logic automatically from data.
Creation programmer.
Data is the input, and the system learns Rules (program) and data are both
Input
from it. inputs.
Produces a program/model that can Produces output based on fixed
Output
make predictions. logic.
The system improves its performance No learning; it always follows the
Process
over time by learning. same rules.
Depends entirely on manually
Automation Automates learning from data.
written rules.
More efficient for tasks like image Efficient when the problem is clearly
Efficiency
recognition or prediction. defined by rules.
(Digram see in laptop studuko)
Q.2 What is Dimensionality Reduction, Explain any one Dimensionality Reduction
technique.[6]
Q.2.1 Explain Principal Component Analysis used in Machine Learning. [5]
Q.2.2 Explain Linear Discriminant Analysis (LDA) used in Machine Learning.[5]
Ans:- Dimensionality Reduction is the process of reducing the number of input variables
(features) in a dataset, while retaining as much important information as possible.
It transforms the original high-dimensional data into a lower-dimensional space (fewer
features) while keeping the essential structure or patterns.
Why is Dimensionality Reduction important?
Real-world datasets often have too many features (also called high dimensionality),
which can:
o Make models slower and more complex
o Cause overfitting (model learns noise instead of useful patterns)
o Make visualization difficult
Dimensionality Reduction techniques:-
Q.2.1 Explain Principal Component Analysis used in Machine Learning. [5]
Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a dimensionality reduction technique used in machine
learning to simplify large datasets by reducing the number of features (variables), while
keeping the most important information.
PCA is an unsupervised method (doesn't need output labels).
It helps remove noise and redundancy in the data.
It improves efficiency and accuracy of models in many cases.
Commonly used in image compression, pattern recognition, and data visualization.
Why PCA is Used:
To reduce the size of data without losing much information
To make models faster and easier to train
To help in visualizing high-dimensional data in 2D or 3D
How PCA Works (in simple steps):
1. Standardize the data – Make sure each feature has the same scale.
2. Find correlations – PCA finds patterns or directions (called components) in the data
that capture the most variation (spread).
3. Create new features – It creates new features (called principal components) that are
combinations of old ones.
4. Keep top components – We select the top few components that capture most of the
information and drop the rest.
Example:
Suppose we have 100 features. PCA may reduce them to just 2 or 3 new features that still
explain most of the data's behavior.
Q.2.2 Explain Linear Discriminant Analysis (LDA) used in Machine Learning.[5]
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) is a supervised machine learning technique used for
classification and dimensionality reduction. It reduces the number of input features while
keeping the class-discriminating information.
LDA is supervised (uses class labels).
It is useful when the output has multiple classes (e.g., Class A, B, C).
It works well when the data is normally distributed.
LDA improves classification accuracy by reducing noise and redundancy.
Purpose of LDA:
To separate different classes in a dataset as much as possible
To reduce dimensions while keeping the data well-separated based on class labels
How LDA Works (in simple steps):
1. Calculate the mean of each class.
2. Measure how spread out the data is within and between classes.
3. Find a new axis (direction) that maximizes separation between the classes.
4. Project the data onto this new axis with fewer dimensions.
Example Use Cases:
Face recognition
Medical diagnosis
Text classification
Q.3 Write a note on Reinforcement Learning.[4]
Ans:-
RL is a type of machine learning focused on decision making.
It learns how to take actions in an environment to get maximum rewards.
It works by trial and error and learning from feedback (reward or penalty).
RL is different from supervised learning—it doesn’t use labeled input-output data.
Key Points:
The agent learns by doing, not by being told.
It’s self-learning and autonomous.
Works well in tasks where continuous decisions are needed (like games, robotics,
etc.).
How It Works:
An agent takes an action in an environment.
The environment gives feedback in the form of a reward.
Based on the reward, the agent learns which actions are better for the future.
Elements of Reinforcement Learning – Simplified
RL consists of four main elements:
1. Policy:
The strategy used by the agent to decide actions based on the current state.
It’s like a map from “situation” to “what action to take.”
2. Reward Function:
Gives the agent feedback for each action.
Helps the agent know if it did well or poorly.
It guides the agent to achieve the goal.
3. Value Function:
Tells the agent how good a state is in the long run.
It’s the total expected future reward from a given state.
4. Model of the Environment:
Used for planning.
It helps the agent predict the next state and reward based on current action.
Example:
A robot learning to walk: it tries steps, falls, gets feedback, adjusts, and tries again.
Over time, it learns how to walk by improving based on rewards (positive/negative).
Q.4 Explain parametric & nonparametric models in machine learning.[5]
Ans:- What are Parametric Models?
Parametric models use a fixed number of parameters (like mean, variance).
We assume a specific form for the function/model, then estimate its parameters
using data.
Once parameters are known, we can describe the entire distribution.
Likelihood is a function that tells us how likely our data is, given a certain parameter.
Examples:
Logistic Regression
Linear Discriminant Analysis
Naive Bayes
Perceptron
Simple Neural Networks
Advantages:
1. Simple and easy to understand.
2. Learn quickly from small datasets.
3. Use less training data.
4. Good for simple problems.
5. Maximum Likelihood Estimation (MLE)
What is MLE?
A method to estimate parameters of a model that make the observed data most
probable.
It tries to find the value of parameter (θ) that maximizes the likelihood of the data.
Example:
Tossing a coin and estimating the probability of heads or tails using observed
outcomes.
Non-Parametric Methods
These do not assume a fixed form for the model.
They adapt based on the amount and nature of data.
Often used when we don’t know the exact distribution of the data.
How It Works:
It uses techniques like density estimation (e.g., histograms, kernel methods).
Divides data into bins and counts observations in each bin to estimate the
distribution.
Examples:
k-Nearest Neighbors (k-NN)
Decision Trees
Support Vector Machines (SVM)
Random Forest
Advantages:
1. No assumption about data shape.
2. Can learn complex functions.
3. More flexible and powerful for large data.
Limitations:
1. Need more training data.
2. Slower and computationally expensive.
3. Risk of overfitting.
4.
Q.5 Differentiate supervised and unsupervised learning techniques.[5]
Ans:-
Sr. No. Supervised Learning Unsupervised Learning
1 Output is known and given. Output is unknown or not given.
2 Hard to learn very complex patterns. Can discover complex patterns.
3 Uses labeled training data. No labeled data is used.
Outputs (labels) are not shown to the
4 Every input has a matching output.
system.
5 Goal is to predict output. Goal is to find patterns or groups in data.
6 Needs a clear and defined target/output. Target may be missing or unclear.
Example: OCR (Optical Character
7 Example: Finding faces in images
Recognition)
8 Model can be tested for accuracy. Model cannot be directly tested.
9 Also called classification or regression. Also called clustering.
Q.6 Elaborate grouping and grading models and Differentiate Grouping and Grading
models of Machine Learning.[5][4]
Ans: 1. Grouping Model (Clustering / Segmentation)
Definition:
Grouping models aim to divide a dataset into clusters or groups such that items in the same
group are more similar to each other than to those in other groups.
Key Points:
It is a type of unsupervised learning.
The model does not use labeled output data.
It looks for natural structures or hidden patterns in the data.
Groups are formed based on distance, density, or similarity.
Techniques Used:
K-Means Clustering
Hierarchical Clustering
DBSCAN
Applications:
Customer segmentation in marketing
Grouping similar news articles
Market basket analysis
Medical diagnosis (grouping patients with similar symptoms)
Example:
Given customer data (age, income, purchase history), a grouping model can automatically
divide them into segments like budget shoppers, premium buyers, or occasional buyers.
2. Grading Model (Classification / Ranking)
Definition:
Grading models predict the category, rank, or score of an input based on past labeled
examples.
Key Points:
It is a type of supervised learning.
The model uses input-output pairs for training.
The goal is to assign a grade/class to new, unseen data.
It can also involve ordinal values (ordered categories like low, medium, high).
Techniques Used:
Decision Trees
Logistic Regression
Support Vector Machines
Naive Bayes
Neural Networks
Applications:
Credit score prediction
Exam paper grading (A, B, C...)
Spam detection
Disease classification
Product quality rating
Example:
A grading model trained on past student exam scores can predict the grade (A/B/C) for a
new student based on their test performance.
Aspect Grouping Model Grading Model
Learning Type Unsupervised Learning Supervised Learning
Assign a level, grade, or class to
Goal Group similar data points
data
Output Labels Not given during training Given during training
Similarity or distance between data
Basis Predefined rules or labeled data
points
Example Customer segmentation, Face clustering Exam scoring, Credit risk prediction
Also Known As Clustering Classification or Ranking
Q.7 Explain the relationship between Artificial Intelligence, Machine Learning
and data science. [4]
Ans:-
Aspect Machine Learning (ML) Artificial Intelligence (AI) Data Science
Learning from data to Making machines act like
Extracting useful
Focus improve performance over humans using
insights from data.
time. intelligence.
Structured data,
Statistical models and data Logic, decision trees, and
Uses analytics, and
patterns. intelligent behavior.
visualizations.
Simulates human
Software learns patterns in Uses data analysis to
Definition thinking and decision-
data to make decisions. find useful information.
making.
Gain insights for
Main Goal Maximize accuracy. Maximize success rate.
decision-making.
Learning Supervised, unsupervised, Includes planning, Uses ML, stats, big data
Type reinforcement learning. prediction, perception. tools.
Aspect Machine Learning (ML) Artificial Intelligence (AI) Data Science
Concerned Learning and improving Acting smart and making Managing and analyzing
With knowledge. decisions. data effectively.
Q.8 Explain types of Machine Learning. [6]
Ans:- Types of Machine Learning (ML)
Machine Learning is categorized into 3 main types:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
1. Supervised Learning
Definition: The algorithm learns from a labeled dataset (both input and correct
output are given).
Goal: Predict output for new data based on what it learned.
Example: Predicting whether an image is of a dog or cat after training with labeled
examples.
Types:
Classification: Predicts categories (e.g., spam or not spam).
o Algorithms: Logistic Regression, Decision Tree, KNN, Naive Bayes, SVM
Regression: Predicts numerical values (e.g., house price).
o Algorithms: Linear Regression, Polynomial Regression, Random Forest, Ridge
Advantages:
High accuracy possible
Decision-making is explainable
Can reuse pre-trained models
Disadvantages:
Needs labeled data (costly and time-consuming)
May not work well on new or unexpected data
Applications:
Image & speech recognition
Fraud detection
Customer churn prediction
2. Unsupervised Learning
Definition: Algorithm works on unlabeled data and finds hidden patterns.
Goal: Explore and group data without knowing the outcome.
Example: Grouping customers with similar buying behavior (clustering).
Types:
Clustering: Groups similar data (e.g., K-Means, DBSCAN)
Association: Finds relationships (e.g., Apriori, FP-Growth)
Advantages:
No labeled data needed
Good for exploring unknown data
Helps in pattern discovery and data reduction
Disadvantages:
Hard to evaluate accuracy
Results may be difficult to interpret
Applications:
Customer segmentation
Market basket analysis
Anomaly detection
Image compression
Topic modeling (in NLP)
Q.9 Models of Machine learning: Geometric model, Probabilistic Models, Logical Models
Ans:-
Machine Learning models can be grouped based on how they represent and learn from
data. The main types are:
1. Geometric Models
Idea: These models represent data as points in a high-dimensional space and try to
draw boundaries or fit lines/curves between different categories or values.
Use: Mostly in classification and regression tasks.
Goal: Separate or relate data using geometrical shapes like lines, planes, or curves.
Examples:
Linear Regression – fits a straight line to data points.
Support Vector Machines (SVM) – finds the best boundary (hyperplane) between
classes.
K-Nearest Neighbors (KNN) – uses distance (Euclidean) to find nearby points and
classify.
2. Probabilistic Models
Idea: These models use probability and statistics to model the uncertainty in data.
Use: Handle noisy, uncertain, or incomplete data well.
Goal: Predict the probability of outcomes and make decisions based on likelihood.
Examples:
Naive Bayes Classifier – uses Bayes’ theorem to classify.
Hidden Markov Models (HMM) – used in speech and sequence modeling.
Gaussian Mixture Models (GMM) – models data as a mixture of normal distributions.
3. Logical Models
Idea: These models use rules, logic, and decision structures to learn patterns.
Use: Interpretable models for decision-making.
Goal: Learn if-else type rules or structured logical expressions.
Examples:
Decision Trees – splits data based on questions (rules).
Rule-based Learning – generates logical rules from data.
Inductive Logic Programming – uses logic programming to learn structured
knowledge.
UNIT II
Q.2 Elaborate random forest regression. [5][5]
Ans: Random Forest Regression
It is a supervised learning method used for classification and regression tasks.
Works by combining multiple decision trees (ensemble learning) to solve complex
problems.
Improves accuracy and reduces overfitting by using multiple classifiers.
How does the Random Forest Algorithm work?
Steps:
Step 1: Select Random Samples
Randomly select K subsets from the dataset.
Step 2: Build Decision Trees
Use each subset to build one decision tree.
Step 3: Choose Number of Trees
Decide how many trees (N) you want in the forest.
Step 4: Repeat
Repeat steps 1 and 2 to build N trees.
Step 5: Make Predictions
For new data, each tree predicts an output.
The final result is based on majority voting (classification) or average (regression).
Example:
Suppose you want to classify fruit photos.
Dataset is divided and given to each decision tree.
Each tree gives a prediction; the majority vote determines the final class.
Applications of Random Forest
1. Banking: Identify loan risks and defaults.
2. Medicine: Diagnose diseases and assess risk factors.
3. Land Use: Analyze patterns in land use.
4. Marketing: Predict market trends.
Advantages of Random Forest
Suitable for both classification and regression tasks.
Handles large datasets with high dimensionality well.
Increases accuracy and reduces overfitting.
Disadvantages of Random Forest
Can be complex and difficult to interpret.
Requires more computational power and resources.
Q.2 Differentiate multivariate regression and univariate regression. [4]
Ans:
Sr.
Univariate Multivariate
No.
Univariate analysis refers to the Multivariate analysis refers to the analysis of
1
analysis of one variable. more than one variable.
It does not deal with causes and
2 It deals with causes and relationships.
relationships.
It does not contain any dependent
3 It contains more than one dependent variable.
variable.
4 Equation: Y = A + BX Equation: Y = A + BX + CX₁
Q.3 Define Regression. Explain types of regression. [6][6]
Ans: Regression:
Regression analysis is a statistical method used to find the relationship between a
dependent variable and one or more independent variables.
It helps to measure how variables are related and to predict future outcomes.
Types of Regression
1. Simple Linear Regression
Uses one independent variable to predict one dependent variable.
Assumes a straight-line relationship.
Example: Predicting house price based on size.
2. Multiple Linear Regression
Uses multiple independent variables to predict one dependent variable.
Example: Predicting house price based on size, location, number of rooms, etc.
3. Polynomial Regression
Used when the relationship is non-linear.
Adds polynomial terms (like x², x³) to model curved trends.
Example: Predicting population growth over time.
4. Support Vector Regression (SVR)
Based on Support Vector Machines (SVM).
Tries to find a line (hyperplane) that best predicts values with minimum error.
Works for both linear and non-linear relationships.
5. Decision Tree Regression
Uses a tree-like structure to make predictions.
Each decision splits the data based on a feature.
Example: Predicting customer behavior based on age, income, etc.
6. Random Forest Regression
Ensemble method using many decision trees.
Combines multiple tree predictions for better accuracy.
Example: Predicting sales or customer churn.
Advantages of Regression
Easy to understand and interpret.
Robust to outliers.
Can handle both linear relationships easily.
Disadvantages of Regression
Assumes linearity.
May not be suitable for highly complex relationships.
Q.4 What is underfitting and overfitting in machine Learning explain the techniques to
reduce overfitting? [5]
Ans:
Underfitting
Underfitting happens when a machine learning model is too simple and cannot capture the
underlying patterns in the data.
As a result:
The model performs poorly on both training and testing data.
It has high bias and low variance.
Example: Using a straight line to fit data that clearly follows a curve.
Overfitting
Overfitting happens when a model learns not only the patterns but also the noise in the
training data.
As a result:
The model performs well on training data but poorly on unseen test data.
It has low bias but very high variance.
Example: A model that is too complex and tries to perfectly fit every point in the training
data.
Techniques to Reduce Overfitting
1. Limit Model Complexity
o Reduce the number of hidden nodes or layers in neural networks.
o Use simpler models to prevent capturing noise.
2. Early Stopping
o Stop training the model before it starts memorizing the training data.
3. Regularization
o Apply techniques like weight decay to limit large weights in the model.
o Use Ridge Regression or Lasso Regression in linear models.
4. Use More Data
o A larger training set helps the model generalize better.
5. Feature Selection
o Remove irrelevant or highly correlated features to avoid confusing the model.
Common Reasons for Overfitting
Noisy data
Small training set
Large number of features
Q.5 Explain any two Evaluation Metrics for regression/ Explain three evaluation metrics
used for regression model.[5][6]
Ans:
In regression models, evaluation metrics help measure how well the model predicts
continuous values. They compare the actual target values with the predicted values from the
model.
Here are three important evaluation metrics:
1. Mean Squared Error (MSE)
Definition:
MSE is the average of the squares of the differences between actual and predicted
values.
Formula:
Explanation:
o MSE shows how much error, on average, the model makes.
o Squaring the differences penalizes larger errors more heavily.
o A lower MSE indicates a more accurate model.
Example:
Used in predicting house prices, where large errors are costly and should be
penalized.
2. Mean Absolute Error (MAE)
Definition:
MAE is the average of the absolute differences between the actual and predicted
values.
Formula:
Explanation:
o MAE provides a straightforward interpretation: average error in prediction.
o It treats all errors equally and is not sensitive to outliers.
o The smaller the MAE, the better the model’s performance.
Example:
Suitable when outliers are present, such as in forecasting sales data.
3. R-squared (R²) — Coefficient of Determination
Definition:
R² measures the proportion of the variance in the dependent variable that is
predictable from the independent variables.
Formula:
Explanation:
o R² ranges between 0 and 1.
o An R² value close to 1 indicates that the model explains most of the variance.
o An R² of 0 means the model does not explain any variance.
Example:
Used in evaluating models like stock price prediction to see how well the model
explains price movements.
Q.6 Explain Elastic Net regression in Machine Learning. [5]
Ans: ElasticNet Regression
Definition:
o ElasticNet Regression combines Ridge (L2 penalty) and Lasso (L1 penalty)
techniques in linear regression.
o It balances feature selection and feature preservation by blending both
approaches.
Use Case:
o Especially useful when there are more features than observations.
o Helps when features are correlated:
Lasso may remove most correlated features.
Ridge may keep all features.
ElasticNet selects a subset of correlated features while maintaining
stability.
Advantages:
Reduces model complexity:
o Effectively eliminates irrelevant features, better than Ridge regression.
Better bias-variance trade-off:
o By tuning regularization parameters, achieves a better balance between bias
and variance than Lasso or Ridge.
Versatile:
o Applicable to various regression models: Linear, Logistic, Cox models, etc.
Disadvantages:
Higher computational cost:
o Requires more resources and time due to two regularization parameters and
cross-validation.
Interpretability issues:
o May become less interpretable when dealing with a large number of features
or large coefficients.
Q.7 Differentiate between Regression and Correlation. [4]
Ans:
Aspect Correlation Regression
Measures the relationship strength Explains how one variable affects
Definition
between two or more variables. another.
Coefficients can be positive or
Range Values range between -1 to +1.
negative (slope & intercept).
No distinction; all variables are treated Distinction exists: Independent and
Variables
equally. Dependent variables.
Shows the degree of association Predicts the value of one variable
Purpose
between variables. based on another.
Symmetrical; correlation between A and Not symmetrical; regression of Y on
Symmetry
B is same as B and A. X ≠ regression of X on Y.
Coefficient Provides a relative measure (correlation Provides an absolute measure
Type coefficient). (regression equation).
Aspect Correlation Regression
Q.8 Explain Bias-Variance Trade-off with respect to Machine Learning. [5]
Ans:
Bias-Variance Tradeoff in Machine Learning
In machine learning, the bias-variance tradeoff explains how model complexity affects
prediction errors and generalization.
Bias
Error due to simplifying assumptions in the model.
High bias → model is too simple → underfitting.
Leads to poor performance on both training and test data.
Variance
Error from model’s sensitivity to small changes in training data.
High variance → model is too complex → overfitting.
Performs well on training data but poorly on new data.
Trade-off
Underfitting (High Bias, Low Variance):
o Model fails to capture data patterns.
o Caused by using an overly simple model or too little data.
o Fix: Use a more complex model or add data.
Overfitting (High Variance, Low Bias):
o Model captures both patterns and noise.
o Caused by overly complex models or too many features.
o Fix: Simplify model, reduce features, or apply regularization.
Examples
Underfitting: Linear model on curved data.
Overfitting: 10th-degree polynomial on noisy data.
Practical Techniques
Use cross-validation to evaluate model generalization.
Apply regularization (L1/L2) to control complexity.
Increase training data to reduce overfitting.
Q.9 Differentiate Ridge and Lasso Regression techniques. [4]
Ans:
Characteristic Ridge Regression Lasso Regression
Regularization Uses L2 regularization (penalty = Uses L1 regularization (penalty =
Type square of coefficients) absolute value of coefficients)
Best when all features are important Best when only some features are
Use Case
and you want to reduce overfitting important and others can be ignored
Includes all features with smaller Results in a simpler model with only
Model Simplicity
weights key features
Effect on Shrinks them close to zero, but not Shrinks some to exactly zero,
Coefficients exactly zero removing them from the model
Computation Usually faster, as it doesn't remove Slightly slower due to feature
Speed variables selection process
Predicting house prices using many Genetic analysis where only a few
Example
factors (size, location, amenities) genes affect the outcome
Q.10 Regression techniques:
Ans:
Regression Techniques
1. Polynomial Regression
Extends linear regression by fitting a curved line (polynomial equation) to the data.
Suitable when the relationship between variables is non-linear.
Example: Fitting a curve for house price vs size where the price increases more
sharply at higher sizes.
2. Decision Tree Regression
Uses a tree-like structure to model decisions and outcomes.
Splits the data based on feature values into branches until a prediction is made.
Easy to interpret and handles non-linear data well.
3. Random Forest Regression
An ensemble method that builds multiple decision trees and averages their results.
More accurate and stable than a single decision tree.
Reduces overfitting and improves prediction performance.
4. Support Vector Regression (SVR)
Based on Support Vector Machines, it tries to fit the best line within a margin of
tolerance (epsilon).
Works well for high-dimensional or non-linear data using kernel functions.
Good at handling outliers and complex patterns.
5. Ridge Regression
A type of linear regression that uses L2 regularization (squares of coefficients).
Helps reduce overfitting by shrinking large coefficients.
Keeps all features but with smaller weights.
6. Lasso Regression
Uses L1 regularization (absolute values of coefficients).
Can shrink some coefficients to zero, effectively performing feature selection.
Useful when we want a simpler model with fewer predictors.
7. ElasticNet Regression
Combines L1 (Lasso) and L2 (Ridge) regularization.
Balances between feature selection and coefficient shrinkage.
Best when features are highly correlated or when there are more features than
observations.
8. Bayesian Linear Regression
Applies Bayesian probability to linear regression.
Provides probabilistic predictions and measures of uncertainty.
Useful when we need confidence intervals or prior knowledge included in the model.