0% found this document useful (0 votes)

3 views8 pages

Unit 3

The document covers various statistical learning concepts in machine learning, including Bayesian reasoning, K-Nearest Neighbor (KNN) classifier, Least Squares Error criterion for linear regression, logistic regression, Fisher’s Linear Discriminant, and the Minimum Description Length (MDL) principle. Each section explains the methodology, advantages, limitations, and applications of these techniques, emphasizing their roles in classification and regression tasks. The document highlights the importance of model selection and the balance between complexity and data fit.

Uploaded by

rams

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views8 pages

Unit 3

Uploaded by

rams

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT-III: STATISTICAL LEARNING

1. Explain Bayesian reasoning in Machine Learning with an example. (10 marks)

Bayesian reasoning is a probabilistic approach in machine learning based on Bayes’ Theorem, which
describes how to update the probability of a hypothesis as more evidence or information becomes available.

Bayes’ Theorem:
P (E∨H )⋅ P(H )
P(H ∨E)=
P (E)
Where:
 P(H ∨E): Posterior probability (probability of hypothesis H given evidence E )
 P(E∨H ): Likelihood (probability of evidence given hypothesis)
 P(H ): Prior probability (initial belief before evidence)
 P(E): Marginal probability of evidence

Bayesian Reasoning in ML:

Used in Bayesian classification, Naïve Bayes algorithms, Bayesian networks, and probabilistic
inference.

Example: Email Spam Classification

Let’s say:
 H : Email is spam
 E : Email contains the word “lottery”
We want to compute P(spam∨lottery)
Assume:
 P(spam)=0.4
 P(lottery∨spam)=0.8
 P(lottery)=0.5
Apply Bayes' Theorem:
0.8 ×0.4
P(spam∨lottery)= =0.64
0.5
So, there’s a 64% chance that the email is spam if it contains “lottery”.

Applications:
 Naïve Bayes Classifier (text classification, sentiment analysis)
 Bayesian Optimization (hyperparameter tuning)
 Bayesian Networks (modeling uncertainty in medical diagnosis)

Advantages:
 Handles uncertainty and prior knowledge.
 Works well with small datasets.
 Computationally efficient.
Limitations:
 Requires strong assumptions (e.g., feature independence in Naïve Bayes).
 Computing priors can be difficult in complex domains.

Conclusion:
Bayesian reasoning provides a powerful mathematical framework for dealing with uncertainty, making it
widely applicable in many ML tasks involving probability-based decision-making.

2. Describe the K-Nearest Neighbor (KNN) classifier. What are its advantages and
limitations? (10 marks)
K-Nearest Neighbor (KNN) is a non-parametric, instance-based supervised learning algorithm used for
classification and regression.

How KNN Works:

1. Choose K (number of neighbors).
2. Calculate distance (usually Euclidean) between the test point and all training points.
3. Select the K closest neighbors.
4. Vote for classification (majority class) or average for regression.

Formula (Euclidean Distance):

√
n
d= ∑ ¿¿ ¿
i=1

Where x and y are data points in n -dimensional space.

Example:
Suppose we want to classify a fruit as apple or orange based on features like weight and color. If K = 3 and
among the 3 closest fruits, 2 are apples and 1 is orange, the new fruit is classified as apple.

Advantages:
 Simple and easy to implement
 No training phase — good for real-time applications
 Naturally handles multi-class classification
 Adaptable to non-linear decision boundaries
Limitations:
 Slow for large datasets (needs to compute distance to all points)
 Sensitive to irrelevant or unscaled features
 Memory-intensive — stores entire dataset
 Choice of K and distance metric affects performance

Applications:
 Handwritten digit recognition (e.g., MNIST)
 Recommendation systems
 Medical diagnosis (predicting disease type)

Conclusion:
KNN is a powerful yet simple algorithm for classification and regression, best used when the dataset is small
and well-preprocessed. Its performance depends heavily on feature scaling and choice of K.

3. Derive the Least Square Error Criterion for Linear Regression. (10 marks)
Linear Regression is a supervised learning algorithm used to model the relationship between an input
variable X and an output variable Y by fitting a linear equation:
^y =w0 + w1 x

Where:
 ^y : predicted output
 w 0: intercept (bias)
 w 1: slope (weight)

Objective:
To find the best-fitting line by minimizing the error between predicted and actual values using the Least
Squares Error (LSE) criterion.

1. Define Error (Residual):

For each data point i :
e i= y i− ^yi = y i−(w0 + w1 x i )

2. Define the Cost Function (Sum of Squared Errors):

n
J (w0 , w1 )=∑ ¿ ¿
i=1

This function measures how well the model fits the data. The goal is to minimize J (w0 , w1 ).
3. Derive with Respect to Parameters:
To find the optimal values of w 0 and w 1, we use calculus to minimize the cost function by setting partial
derivatives to zero.
Partial Derivatives:
n
∂J
=−2 ∑ (¿ y i−w 0−w1 x i )¿
∂ w0 i=1

n
∂J
=−2 ∑ x i ( y i−w 0−w1 x i )
∂ w1 i=1

4. Solve the Normal Equations:

Using the above derivatives, solve the following system of equations:
∑ y i=n w0 + w1 ∑ x i (1)
2
∑ x i y i =w 0 ∑ x i +w 1 ∑ x i (2)

Solving these gives the optimal parameters:

n ∑ xi y i−∑ x i ∑ y i
w 1= 2
n ∑ x i −¿ ¿
w 0= ý −w1 x́

Conclusion:
Least Squares Error criterion helps determine the best-fitting line by minimizing the total squared
differences between actual and predicted outputs.

4. Explain Logistic Regression for classification tasks. How does it differ from Linear
Regression? (10 marks)
Logistic Regression is a supervised learning algorithm used for binary (or multi-class) classification, not
regression, despite its name.

Purpose:
To predict the probability that a given input belongs to a certain class (e.g., yes/no, spam/ham).

1. Problem with Linear Regression in Classification:

Linear regression can output any real number, but classification problems need probabilities in the range
[0, 1].
2. Logistic Function (Sigmoid):
To map output to a probability, logistic regression uses the sigmoid function:
1
σ (z)= −z
, where z=w0 + w1 x
1+e
1
⇒ P( y=1∨x )= −(w 0+ w1 x)
1+ e

3. Classification Rule:

{
^y = 1 if P( y=1∨x )> 0.5
0 otherwise

4. Cost Function (Cross-Entropy Loss):

Since squared error is not suitable for classification, logistic regression uses log-loss:
n
−1
J (w)= ∑ [ y log ( ^y i )+(1− y i)log (1−^y i)]
n i=1 i

Differences Between Logistic and Linear Regression:

Feature Linear Regression Logistic Regression
Output Continuous values Probabilities [0, 1]
Use Case Regression Classification problems
problems
Activation None Sigmoid function
Cost Function Mean squared error Cross-entropy loss
Decision Boundary Not defined Defined via threshold (e.g., 0.5)

Applications:
 Spam detection
 Credit risk prediction
 Medical diagnosis (e.g., predicting disease presence)

Conclusion:
Logistic regression is a robust, interpretable method for classification. It differs from linear regression by
outputting probabilities and using a sigmoid activation and cross-entropy loss.
5. Discuss Fisher’s Linear Discriminant and its role in classification. (10 marks)
What is Fisher’s Linear Discriminant?
Fisher’s Linear Discriminant (FLD) is a supervised dimensionality reduction technique used in
classification problems. It projects high-dimensional data onto a line in such a way that class separability is
maximized.
While similar to Principal Component Analysis (PCA), which maximizes variance, FLD focuses on
maximizing class separation.

Objective:
To find a projection vector w such that:
T
y=w x
where y is the 1D projection, and x is a data point.
The goal is to maximize the distance between class means while minimizing the variance within each
class.

Fisher’s Criterion:
2
¿ μ 1−μ2 ¿
J (w)= 2 2
s1 + s2

Where:
 μ1 , μ 2 are the projected class means,
2 2
 s1 , s 2 are the variances of each class after projection.

This can be expressed in matrix form as:

T
w SB w
J (w)= T
w SW w

Where:
 S B: Between-class scatter matrix
 SW : Within-class scatter matrix

Solution:
¿
The optimal projection vector w is obtained by:
¿ −1
w =S W (μ1−μ2 )

Role in Classification:
 Projects data to 1D (or low-dimensional) space for better class separation.
 Especially useful in binary classification.
 Used as a preprocessing step before applying a classifier like logistic regression, SVM, or KNN.
Example:
Classifying two types of flowers (e.g., Setosa vs. Versicolor) based on features like petal width and length.
FLD finds the line along which the classes are most separated.

Advantages:
 Improves class separability.
 Works well with small datasets.
 Easy to implement and interpret.
Limitations:
 Assumes data is linearly separable.
 Works best with normally distributed classes with equal covariance.

Conclusion:
Fisher’s Linear Discriminant helps reduce dimensionality while preserving class discrimination, making it
an effective tool for classification and preprocessing in ML pipelines.

6. What is the Minimum Description Length (MDL) principle? How is it used in model
selection? (10 marks)
What is MDL Principle?
The Minimum Description Length (MDL) principle is a formalization of Occam’s Razor in information
theory. It states:
"The best model is the one that compresses the data most effectively."
In other words, the optimal model is the one that minimizes the total description length of:
1. The model itself, and
2. The data given the model (i.e., the errors or residuals).

Mathematically:
L(D , M )=L(M )+ L(D∨M )
Where:
 L(M ): Length (complexity) of the model,
 L(D∨M ): Length of the data when encoded with the model (i.e., how well the model explains the
data),
 L(D , M ): Total length of the description.

Interpretation:
 Simple models have short L(M ) but may fit poorly (long L(D∨M )).
 Complex models fit well (short L(D∨M )) but are hard to describe (long L(M )).
 MDL aims to balance model simplicity and data fit.
Use in Model Selection:
1. Compare multiple models (e.g., polynomial regression of degree 1, 2, 3...).
2. Calculate L(M )+ L( D∨M ) for each.
3. Select the model with the smallest total length.

Example:
Choosing between:
 A linear regression model with fewer coefficients,
 A higher-degree polynomial that fits training data better.
MDL may favor the simpler model if the added complexity of the polynomial does not justify the
improvement in fit (to avoid overfitting).

Applications:
 Model selection in regression and classification.
 Decision tree pruning.
 Feature selection.
 Comparing probabilistic models.

Advantages:
 Theoretically sound and general.
 Prevents overfitting by penalizing model complexity.
Limitations:
 Requires a way to quantify model and data length (encoding scheme).
 May be computationally intensive for large models.

Conclusion:
The MDL principle provides a principled way to select models that generalize well by trading off accuracy
and complexity, aligning closely with the goals of machine learning.

Week3 Summary Detail
No ratings yet
Week3 Summary Detail
13 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
Logistic Regression
No ratings yet
Logistic Regression
61 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Classification
No ratings yet
Classification
47 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Hota ML LDF
No ratings yet
Hota ML LDF
28 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Unit 3
No ratings yet
Unit 3
8 pages
12 - Bài Toán Phân L P - LR - v2
No ratings yet
12 - Bài Toán Phân L P - LR - v2
130 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
79 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
ML Final
No ratings yet
ML Final
92 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Classification
No ratings yet
Classification
74 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
2 Static & Dynamic Web Pages
No ratings yet
2 Static & Dynamic Web Pages
24 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
DM Assignment 2
No ratings yet
DM Assignment 2
23 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Answers To All Questions
No ratings yet
Answers To All Questions
4 pages
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
ML Imp Ques 1
No ratings yet
ML Imp Ques 1
22 pages
5 Markd
No ratings yet
5 Markd
24 pages
Statistical Methods-1
No ratings yet
Statistical Methods-1
63 pages
Introduction To Machine Learning and Logistic Regression
No ratings yet
Introduction To Machine Learning and Logistic Regression
28 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
ML Indivisual Assignment
No ratings yet
ML Indivisual Assignment
11 pages
Developing A Machining Learning Models From Start To Finish.
No ratings yet
Developing A Machining Learning Models From Start To Finish.
59 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Term Paper On Management Information System
100% (1)
Term Paper On Management Information System
4 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
ML QB
No ratings yet
ML QB
13 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Information Retrieval Important Questions
No ratings yet
Information Retrieval Important Questions
20 pages
Accounts Payable User Manual
No ratings yet
Accounts Payable User Manual
32 pages
SM-A305F.FN Galaxy A30 PDF
No ratings yet
SM-A305F.FN Galaxy A30 PDF
1 page
How to Write Essays Fast & Well
No ratings yet
How to Write Essays Fast & Well
56 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
The Chronicles of Riddick PC Game Download
No ratings yet
The Chronicles of Riddick PC Game Download
2 pages
Aji AP Trek-III, RVR & JC College
No ratings yet
Aji AP Trek-III, RVR & JC College
43 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Turunan Imidazoline Crodazoline o
No ratings yet
Turunan Imidazoline Crodazoline o
2 pages
Rfmipi PDF
No ratings yet
Rfmipi PDF
10 pages
Paint Color Codes Guide
No ratings yet
Paint Color Codes Guide
10 pages
Cognex Machine Vision PDF
No ratings yet
Cognex Machine Vision PDF
24 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
69 pages
19 B9 IELTS T2 Essays 240 T2 Questions
100% (1)
19 B9 IELTS T2 Essays 240 T2 Questions
116 pages
ML Imp Q Answers
No ratings yet
ML Imp Q Answers
36 pages
Intel® Core™2 Duo Processor E7500
No ratings yet
Intel® Core™2 Duo Processor E7500
4 pages
S1-K12 Laser Service Manual
No ratings yet
S1-K12 Laser Service Manual
10 pages
Unit 1
No ratings yet
Unit 1
6 pages
Unit 2
No ratings yet
Unit 2
7 pages
Unit 4
No ratings yet
Unit 4
8 pages
CNS Unit-2
No ratings yet
CNS Unit-2
25 pages
State of California Security Evaluation ES&S EVS 5210
No ratings yet
State of California Security Evaluation ES&S EVS 5210
12 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Computer Applications in Hydraulic Engineering Tutorials 2020-Jul-21
No ratings yet
Computer Applications in Hydraulic Engineering Tutorials 2020-Jul-21
100 pages
Hot Key
No ratings yet
Hot Key
8 pages
Rectangular Microstrip Antenna Design
No ratings yet
Rectangular Microstrip Antenna Design
3 pages
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
No ratings yet
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
24 pages
Interview Questions
No ratings yet
Interview Questions
50 pages
Maintenance Manual mb491 PDF
No ratings yet
Maintenance Manual mb491 PDF
298 pages
Unit 5
No ratings yet
Unit 5
7 pages
HXGN Eam Saas Delivery Guide
No ratings yet
HXGN Eam Saas Delivery Guide
40 pages
Bank Account Transactions June-July 2024
No ratings yet
Bank Account Transactions June-July 2024
18 pages
FRST
No ratings yet
FRST
19 pages
1 - Introduction To BI
No ratings yet
1 - Introduction To BI
16 pages
Create All Time Zone Tables in HANA Schema SYSTEM
No ratings yet
Create All Time Zone Tables in HANA Schema SYSTEM
4 pages
Coast Guard Exam Admit Card
No ratings yet
Coast Guard Exam Admit Card
7 pages
Engineering & Industry 4.0 Insights
No ratings yet
Engineering & Industry 4.0 Insights
32 pages
Service Level Management Upgrade Training: HPSM For HP Enterprise Services
No ratings yet
Service Level Management Upgrade Training: HPSM For HP Enterprise Services
32 pages

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT-III: STATISTICAL LEARNING

1. Explain Bayesian reasoning in Machine Learning with an example. (10 marks)

Bayesian Reasoning in ML:

Example: Email Spam Classification

How KNN Works:

Formula (Euclidean Distance):

Where x and y are data points in n -dimensional space.

1. Define Error (Residual):

2. Define the Cost Function (Sum of Squared Errors):

4. Solve the Normal Equations:

Solving these gives the optimal parameters:

1. Problem with Linear Regression in Classification:

4. Cost Function (Cross-Entropy Loss):

Differences Between Logistic and Linear Regression:

This can be expressed in matrix form as:

You might also like