0% found this document useful (0 votes)

18 views20 pages

Week8 Lecture 1 ML SPR25

The document discusses various subset selection methods in machine learning, focusing on improving prediction accuracy and model interpretability. It outlines techniques such as Best Subset Selection, Forward Selection, and Backward Selection, along with their computational limitations and effectiveness in model selection. Additionally, it highlights the importance of metrics like Cp, AIC, and BIC for optimal model selection and emphasizes the use of validation methods for estimating test error.

Uploaded by

Not A Bourgeoisie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views20 pages

Week8 Lecture 1 ML SPR25

Uploaded by

Not A Bourgeoisie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Machine Learning and Deep Learning

with R
Instructor: Babu Adhimoolam
Learning objectives Subset Selection Methods
To improve prediction accuracy on
the test dataset in the following
conditions with high variance:

number of (n) is not so large than

Why additional the number of (p)
linear methods?
By constraining or shrinking the
coefficients associated with (p), we
can substantially reduce the variance
associated with test error.
To improve model interpretability:

Including multiple variables that are

not associated with the response in a
Why additional model leads to complexity.
linear methods?
We can make the coefficient of those
variables that do not contribute to
the response as zero thereby
resulting in interpretable models.
Subset Selection Methods
Best Subset selection
Forward Selection method
Backward Selection method
Shrinkage Methods
Extension of linear
Ridge Regression
methods
Lasso Regression
Dimensionality Reduction Methods
Principal Components Regression
Partial Least Squares
The Best Subset Selection Method
• We fit least squares regression for each possible combination of p predictors.

• The total number of all possible combinations of predictors for least square estimation
in the case of p predictors is 2p

• We first start with a null model (M0) with no predictors and then we calculate Mk for each
value of k.

for k = 1, 2, …, p :

- fit all models for each Mk.

- choose the best of all models with RSS or R 2 criterion for each Mk

• We finally choose the best model from the list of available models M 0 ,…. ,Mk .
Application of Best Subset selection to Credit Data set

M1 to Mp models

Response - Balance
Predictors – Income, Limit, Rating, Cards, Age, Education, Own, Student, Married and Region

Red line – Best model in each subset of predictors.

Suffers heavily from computational
limitations as p becomes higher.

Recall the total possible subsets of

models with p predictors is 2 p

If p is 10, 2 10 ~1000 models to

evaluate
Limitations of
If p is 20, 2 20 ~1,000,000 models to
best subset Selection evaluate.

P>40 is computationally infeasible!

In addition, large model space allows

for overfitting problems that may not
generalize to high accuracy in test
data.
Forward Stepwise Selection

• Forward stepwise selection is a computationally feasible and efficient alterative to best

subset selection as it considers lesser subset of models in comparison to 2p models.

• We begin with a model with no predictors(M0), and then add predictors to the model,
one at a time until we add all the predictors.

for k = 0, 1,..,p-1:

- consider all (p – k) models that will augment the predictions in M k with one additional
predictor.
- choose the best among the (p-k) models with RSS and R2 and call it Mk+1.

• We finally choose the best model among the list of available models M0 … Mk.
Unlike best subset selection which
involves model selection with 2 p
models (with p predictors), we have
in forward stepwise selection:

Computational
feasibility of forward
stepwise selection
So, if p = 20, in best subset selection
we must fit approximately 1,048,576
models. In contrast we have only 211
models in forward stepwise
selection.
Forward selection methods always don’t guarantee the best
models

Note that the best four variable models are different between best subset selection and
Forward stepwise selection methods.
Backward Stepwise Selection
• Unlike the best subset selection and forward step wise selection, here we start with a full
model (Mp) containing all the predictors.

• We then iteratively remove the least useful predictor one and a time.

for k = p, p-1, …1:

- consider all k models that contain all but one (k-1) of the predictors in Mk

- choose the best among these k models, call it M k-1, (as assessed with RSS and R2)

• We then choose the single best model out of Mp,…,M1

• Backward stepwise method is computationally similar to Forward stepwise selection

• It requires n>p unlike forward stepwise selection method.

R 2 and RSS are not the best metrics
for selecting the best models as
models with all the predictors will
always have R 2 as highest and RSS as
lowest values.
Choosing the
optimal model Indirect methods for test error
among M0 ,…,Mp estimation use adjustment to
training error rates to account for
models bias/overfitting.

Direct methods for test error

estimation use cross-validation
methods.
Cp

Indirect Methods Akaike Information Criterion(AIC)

for adjusting training
error rates Bayesian Information Criterion(BIC)

Adjusted R2
For a fitted least square model with
d predictors, C p estimate of test MSE
is:

C p adds a penalty that is proportional

to the number of predictors in the
Cp model.

The model with more predictors will

incur more penalty.

C p tends to take the small value for

models with low test errors(best
model!).
AIC is defined for larger set of
models fit by maximum likelihood
approach.

In the case least square models it is

defined as follows:
Akaike Information
Criterion (AIC)
AIC and Cp values measure same
things for least square models and
are proportional to each other.

AIC takes smaller values for best

models.
Like Cp and AIC, the BIC takes on
small value for models with lowest
test error and is given by:

Bayesian
Information
Criteria(BIC) The BIC places heavier penalty than
the Cp or AIC for models with many
variables for observations(n)> 7.
Maximizing adjusted R 2 is equivalent
to minimizing the RSS/(n-d-1).

The addition of noise variables will

result in very small decrease in RSS
along with increase in d resulting in
Adjusted R2 overall increase in RSS.

In comparison with regular R 2,

adjusted R 2 accounts for nuisance
variables (larger adjusted R 2 is the
best model)
Optimal model selection in the credit data set

Choose M1 ,…, Mp

Select from M1,…,Mp

Low values of Cp, AIC and BIC and high values of adjusted R2 will reveal models with low test error rates.
Choosing the optimal model with validation and cross
validation

Validation and cross-validation methods directly estimate the test error and are the most preferred methods for model
selection in comparison with previous methods.

M-4 U-3 Combined Notes
No ratings yet
M-4 U-3 Combined Notes
166 pages
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
4 pages
FDS IMPORTANT QUESTIONS EduEngg
100% (1)
FDS IMPORTANT QUESTIONS EduEngg
7 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
Lecture4 - Model Selection and Regularization - Ver2
No ratings yet
Lecture4 - Model Selection and Regularization - Ver2
98 pages
Linear Regression & Model Selection
No ratings yet
Linear Regression & Model Selection
83 pages
WINSEM2024-25 CSE3008 ETH AP2024254000248 2025-01-24 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE3008 ETH AP2024254000248 2025-01-24 Reference-Material-I
27 pages
06 Model Selection and Regularization I 169
No ratings yet
06 Model Selection and Regularization I 169
60 pages
Class2 Slides
No ratings yet
Class2 Slides
26 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Chap3 Variable Selection
No ratings yet
Chap3 Variable Selection
23 pages
Dynamic Econometric Models
No ratings yet
Dynamic Econometric Models
18 pages
Model Selection
No ratings yet
Model Selection
49 pages
Rio Thesis - 054559
No ratings yet
Rio Thesis - 054559
53 pages
Ch5 Slide VariableSelection
No ratings yet
Ch5 Slide VariableSelection
36 pages
Stepwise Regression
No ratings yet
Stepwise Regression
4 pages
Econometrics Part1 Notes
No ratings yet
Econometrics Part1 Notes
7 pages
Session 3 - Chapter 06 Linear Reg
No ratings yet
Session 3 - Chapter 06 Linear Reg
20 pages
Confusion Matrix
No ratings yet
Confusion Matrix
21 pages
Feature Selection
No ratings yet
Feature Selection
22 pages
DAV Module 2
No ratings yet
DAV Module 2
21 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
Minitab Regression Model Guide
No ratings yet
Minitab Regression Model Guide
2 pages
Model Selection for Statisticians
No ratings yet
Model Selection for Statisticians
41 pages
Unit 4
No ratings yet
Unit 4
7 pages
Question Bank 1
No ratings yet
Question Bank 1
4 pages
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
No ratings yet
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
17 pages
L2D-Multiple Regression D 2022-03-03 21 - 20 - 03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21 - 20 - 03
31 pages
Lecture 6 Model Selection and Regularization 11oct2023
No ratings yet
Lecture 6 Model Selection and Regularization 11oct2023
29 pages
Linear Model Selection and Regularization
No ratings yet
Linear Model Selection and Regularization
23 pages
Reg 07
No ratings yet
Reg 07
22 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
No ratings yet
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
17 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
R Model Selection for Business Students
No ratings yet
R Model Selection for Business Students
30 pages
DDMA05 ModelSelection
No ratings yet
DDMA05 ModelSelection
28 pages
Model Selection and BIC Explained
No ratings yet
Model Selection and BIC Explained
41 pages
Model Selection R Chap 4
No ratings yet
Model Selection R Chap 4
5 pages
Multicollinearity & Model Selection
No ratings yet
Multicollinearity & Model Selection
30 pages
Multiple Linear Regression Notes
No ratings yet
Multiple Linear Regression Notes
18 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Data Pre Processing
No ratings yet
Data Pre Processing
26 pages
TP MSDC 2 Sujet
No ratings yet
TP MSDC 2 Sujet
5 pages
Model Selection Using Multi-Objective Optimization: Pared
No ratings yet
Model Selection Using Multi-Objective Optimization: Pared
4 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Slides 2
No ratings yet
Slides 2
27 pages
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
No ratings yet
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
39 pages
Statistical Methods For Bioinformatics Lecture 3
No ratings yet
Statistical Methods For Bioinformatics Lecture 3
33 pages
Model Selection
No ratings yet
Model Selection
7 pages
Time Series hw5
100% (2)
Time Series hw5
4 pages
SAS Code To Select The Best Multiple Linear Regression Model For Multivariate Data Using Information Criteria
No ratings yet
SAS Code To Select The Best Multiple Linear Regression Model For Multivariate Data Using Information Criteria
6 pages
Yang-39 2 Proof 27
No ratings yet
Yang-39 2 Proof 27
11 pages
(Lecture) Introduction To Essay Writing (Part II)
No ratings yet
(Lecture) Introduction To Essay Writing (Part II)
32 pages
Data Mining
No ratings yet
Data Mining
2 pages
New Criterion for Model Selection
No ratings yet
New Criterion for Model Selection
12 pages
Mathematics 07 01215
No ratings yet
Mathematics 07 01215
12 pages
(Tutorial) 20-24 Aug SEA National Unity
No ratings yet
(Tutorial) 20-24 Aug SEA National Unity
3 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Package Reams': R Topics Documented
No ratings yet
Package Reams': R Topics Documented
12 pages
Fraud Detection Using Machine Learning and Deep Learning
No ratings yet
Fraud Detection Using Machine Learning and Deep Learning
6 pages
Machine - Learning - Content - Python PDF
No ratings yet
Machine - Learning - Content - Python PDF
3 pages
CO4CRT12 - Quantitative Techniques For Business - II (T)
No ratings yet
CO4CRT12 - Quantitative Techniques For Business - II (T)
4 pages
Model Selection and Model Validation
No ratings yet
Model Selection and Model Validation
36 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Bestglm Using R
No ratings yet
Bestglm Using R
39 pages
13 Paper PDF
No ratings yet
13 Paper PDF
14 pages
Problems With Econometric Models Heteros
No ratings yet
Problems With Econometric Models Heteros
10 pages
NeurIPS 2024 Bridging Semantics and Pragmatics in Information Theoretic Emergent Communication Paper Conference
No ratings yet
NeurIPS 2024 Bridging Semantics and Pragmatics in Information Theoretic Emergent Communication Paper Conference
20 pages
Handout 3 Non Stationarity
No ratings yet
Handout 3 Non Stationarity
27 pages
2023 Past Year Question Paper
No ratings yet
2023 Past Year Question Paper
6 pages
Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
Random Walks & White Noise Explained
No ratings yet
Random Walks & White Noise Explained
13 pages
Microfit Guide2
No ratings yet
Microfit Guide2
17 pages
Geostatistics Assignment 5 1 A) Statistics and Variogram Modelling For Domain 1 North
No ratings yet
Geostatistics Assignment 5 1 A) Statistics and Variogram Modelling For Domain 1 North
12 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Statistics and Probability Q2 M15
No ratings yet
Statistics and Probability Q2 M15
14 pages
Estimating The Economic Model of Crime With Panel Data: June 2019
No ratings yet
Estimating The Economic Model of Crime With Panel Data: June 2019
12 pages
(Tutorial) 17-21 Sep SEA National Unity
No ratings yet
(Tutorial) 17-21 Sep SEA National Unity
2 pages
Regression Essay
No ratings yet
Regression Essay
7 pages
S1 Exercise 5C
No ratings yet
S1 Exercise 5C
5 pages
Homework 1: R Markdown
No ratings yet
Homework 1: R Markdown
1 page
Classical Assumption
No ratings yet
Classical Assumption
3 pages
Full Factorial (Minitab 1)
No ratings yet
Full Factorial (Minitab 1)
3 pages
Final Assign Harshi
0% (1)
Final Assign Harshi
15 pages
Geographically Weighted Regression Guide
No ratings yet
Geographically Weighted Regression Guide
2 pages
Example Correlation Analysis
No ratings yet
Example Correlation Analysis
2 pages
Translet No 5
No ratings yet
Translet No 5
1 page

Week8 Lecture 1 ML SPR25

Uploaded by

Week8 Lecture 1 ML SPR25

Uploaded by

Machine Learning and Deep Learning

number of (n) is not so large than

Including multiple variables that are

- fit all models for each Mk.

Red line – Best model in each subset of predictors.

Recall the total possible subsets of

If p is 10, 2 10 ~1000 models to

P>40 is computationally infeasible!

In addition, large model space allows

• Forward stepwise selection is a computationally feasible and efficient alterative to best

for k = p, p-1, …1:

• We then choose the single best model out of Mp,…,M1

• Backward stepwise method is computationally similar to Forward stepwise selection

• It requires n>p unlike forward stepwise selection method.

Direct methods for test error

Indirect Methods Akaike Information Criterion(AIC)

C p adds a penalty that is proportional

The model with more predictors will

C p tends to take the small value for

In the case least square models it is

AIC takes smaller values for best

The addition of noise variables will

In comparison with regular R 2,

Select from M1,…,Mp

You might also like