0% found this document useful (0 votes)

3 views11 pages

Conceptual Exercises

The document discusses various model selection techniques including best subset, forward stepwise, and backward stepwise selection, highlighting their differences in training and test RSS. It also explores the effects of lasso and ridge regression on bias and variance, and the implications of regularization on model performance. Additionally, it touches on the Bayesian perspective of lasso and ridge regression, detailing the likelihood and posterior distributions associated with these methods.

Uploaded by

anit.sarkar.23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views11 pages

Conceptual Exercises

Uploaded by

anit.sarkar.23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

6.

8 CONCEPTUAL EXERCISES
1. We perform best subset, forward stepwise, and backward stepwise selection on a single data set. For
each approach, we obtain p + 1 models, containing 0, 1, 2,...,p predictors. Explain your answers:

(a) Which of the three models with k predictors has the smallest training RSS?

If all the three models have k predictors each, then the model selected by best subset will
have the smallest training RSS. Best subset approach searches the whole combinations of
predictors that can be formed and than chooses the correct model, in Forward stepwise, the
model with k features, will derive k-1 features from the previous model, and hence it doesn’t
search over the whole combinations. Similarly backward stepwise selection also looks for a
subset of combinations.

(b) Which of the three models with k predictors has the smallest test RSS?

We can be certain about the which approach will yield the model which has the smallest
training error, but this can’t be transferred to the test data. The best select approach will give
the model with best training error, but it can overfit the data and may not perform that good
on test data.

(c) True or False:

i. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors in
the (k+1)-variable model identified by forward stepwise selection.

TRUE

ii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors
in the (k + 1)- variable model identified by backward stepwise selection.

TRUE

iii. The predictors in the k-variable model identified by backward stepwise are a subset of the predictors
in the (k + 1)- variable model identified by forward stepwise selection.

FALSE. The predictors chosen by forward stepwise selection and backward stepwise selection
are independent of each other,

iv. The predictors in the k-variable model identified by forward stepwise are a subset of the predictors
in the (k+1)-variable model identified by backward stepwise selection.

FALSE.
v. The predictors in the k-variable model identified by best subset are a subset of the predictors in the (k
+ 1)-variable model identified by best subset selection.
FALSE. Best subset approach looks for the all possible combinations. There’s no certainity
that predictors in k variable model will be a subset of predictors in the k+1 variable model.

2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer.

(a) The lasso, relative to least squares, is:

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is
less than its decrease in variance.
Since, lasso results in some feature to have zero coefficients, therefore it will always be less
flexible than least squares. Also, lasso decreses with variance at a cost of slight increase in
bias.

(b) Repeat (a) for ridge regression relative to least squares

iii. Less flexible and hence will give improved prediction accuracy when its increase in bias is
less than its decrease in variance.
The reason is the same. Ridge also decreases the variance at a cost of slight increase in the
bias.

(c) Repeat (a) for non-linear methods relative to least squares.

ii. More flexible and hence will give improved prediction accuracy when its increase in
variance is less than its decrease in bias.
Non linear methods are more flexible than least squares which is a linear method. Also, as
flexibility increases, variance also increases, hence this option is the correct one.

3. Suppose we estimate the regression coefficients in a linear regression model by minimizing

(eq 6.8 in text)
for a particular value of s. For parts (a) through (e), indicate which of i. through v. is correct. Justify your
answer.
We can visualize s as inversely propotional to lambda, in figure 6.9 in the text, it
may help with this answer. So, as s increases, see the changes in the curve from
right to left.
(a) As we increase s from 0, the training RSS will:

iv. Steadily decrease

As we increase s, the effect of regularization will decrease. So, we are lifting the restrictions
that we imposed on s. The model will behave more as least square, and training error will
keep on decreasing as we increase s.

(b) Repeat (a) for test RSS.

ii. Decrease initially, and then eventually start increasing in a U shape.

As we can look from the graph (fig. 6.9 in chapter), as we go from right to left, the test error
first decreases, reaches a minimum and then increases.

(c) Repeat (a) for variance.

iii. Steadily increase.

As we increase s, the model is fitting the data better and as training error decreases, variance
increases. We can also conform this from graph, as we go from right to left, variance
increases.

(d) Repeat (a) for (squared) bias.

iv. Steadily decrease.

With increase in s, the bias decreases steadily. This can be confirmed through the graph, as
we go from right to left, the black line describing the square of bias is decreasing steadily.

(e) Repeat (a) for the irreducible error

v. Remain constant.
Irreducable error, as the name suggests cannot be reduced, and remains constant
independent of the method used for fitting.

4. Suppose we estimate the regression coefficients in a linear regression model by minimizing

(eq 6.7)
for a particular value of λ. For parts (a) through (e), indicate which of i. through v. is correct. Justify your
answer.

Be careful, this time its lambda, in previous question it was s. They are inversely related to
each other,

(a) As we increase λ from 0, the training RSS will:

iii. Steadily increase.

As we increase λ, the regularization effect in the model will increase. The flexibility of the
model decreases and RSS increases.
(b) Repeat (a) for test RSS.

ii. Decrease initially, and then eventually start increasing in a U shape.

We can see it from the graph (fig. 6.9), as the flexibility increases, test error (purple line) first
decreases, reaches a min, and then increases.

(c) Repeat (a) for variance.

iv. Steadily decrease.

Increasing value of lambda causes the model to become less and less flexible. As flexibility
decreases, variance decreases.

(d) Repeat (a) for (squared) bias.

iii. Steadily increases

(squared) bias increases with lambda. From the graph we can conclude that increasing
lambda will result in decreasing variance and increasing bias.

(e) Repeat (a) for the irreducible error.

v. Remains constant
Irreducible error is independent of the model, and remains constant .

5. It is well-known that ridge regression tends to give similar coefficient values to correlated variables,
whereas the lasso may give quite different coefficient values to correlated variables.
We will now explore this property in a very simple setting. Suppose that n = 2, p = 2, x11 = x12, x21 =
x22. Furthermore, suppose that y1 +y2 = 0 and x11 +x21 = 0 and x12 +x22 = 0, so that the estimate for
the intercept in a least squares, ridge regression, or lasso model is zero: βˆ0 = 0.

(a) Write out the ridge regression optimization problem in this setting.
 answer in the image below
(b) Argue that in this setting, the ridge coefficient estimates satisfy βˆ1 = βˆ2.
(c) Write out the lasso optimization problem in this setting.

(d) Argue that in this setting, the lasso coefficients βˆ1 and βˆ2 are not unique—in other words, there
are many possible solutions to the optimization problem in (c). Describe these solutions.
Will add this in the future 

6. We will now explore (6.12) and (6.13) further.

(a) Consider (6.12) with p = 1. For some choice of y1 and λ > 0, plot (6.12) as a function of β1. Your plot
should confirm that (6.12) is solved by (6.14).
(b) Consider (6.13) with p = 1. For some choice of y1 and λ > 0, plot (6.13) as a function of β1. Your plot
should confirm that (6.13) is solved by (6.15).
7. We will now derive the Bayesian connection to the lasso and ridge regression discussed in Section
6.2.2.
(a) Suppose that yi = β0+ p j=1 xijβj+i where 1,...,n are independent and identically distributed from a
N(0, σ2) distribution. Write out the likelihood for the data.
(b) Assume the following prior for β: β1,...,βp are independent and identically distributed according to a
double-exponential distribution with mean 0 and common scale parameter b: i.e. p(β) = 1 2b
exp(−|β|/b). Write out the posterior for β in this setting.
(c) Argue that the lasso estimate is the mode for β under this posterior distribution.
(d) Now assume the following prior for β: β1,...,βp are independent and identically distributed according
to a normal distribution with mean zero and variance c. Write out the posterior for β in this setting.
(e) Argue that the ridge regression estimate is both the mode and the mean for β under this posterior
distribution.
Will update in future. 

A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
No ratings yet
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
5 pages
Measurement System Analysis Guide
No ratings yet
Measurement System Analysis Guide
21 pages
Solution of The Elements of Statistical Learning Ch6
0% (1)
Solution of The Elements of Statistical Learning Ch6
3 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Introduction To Machine Learning Week 2 Assignment
100% (1)
Introduction To Machine Learning Week 2 Assignment
8 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Research Terminologies
No ratings yet
Research Terminologies
2 pages
Advanced Regression Analysis Guide
No ratings yet
Advanced Regression Analysis Guide
68 pages
Regression Analysis Guide
100% (1)
Regression Analysis Guide
280 pages
Lecture Probabilistic Approach in Slope Stability Analyses
No ratings yet
Lecture Probabilistic Approach in Slope Stability Analyses
26 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Functional Regression Insights
No ratings yet
Functional Regression Insights
7 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
A Novel Bayesian Approach For Variable Selection in Linear Regression Models
No ratings yet
A Novel Bayesian Approach For Variable Selection in Linear Regression Models
24 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
Eco Trix
No ratings yet
Eco Trix
16 pages
Forecasting Techniques Guide
No ratings yet
Forecasting Techniques Guide
20 pages
Model Selection
No ratings yet
Model Selection
11 pages
Markov Processes Meng
No ratings yet
Markov Processes Meng
29 pages
Econometrics: CLM & OLS Basics
No ratings yet
Econometrics: CLM & OLS Basics
11 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Durbin-Watson Test Guide
No ratings yet
Durbin-Watson Test Guide
11 pages
One-Way ANOVA for Weight Loss
No ratings yet
One-Way ANOVA for Weight Loss
47 pages
Econometrics Exam Guide
No ratings yet
Econometrics Exam Guide
19 pages
SDSC3006 - Assignment 3
No ratings yet
SDSC3006 - Assignment 3
4 pages
6414 SP2022 Practice Final Part1 Solutions
No ratings yet
6414 SP2022 Practice Final Part1 Solutions
3 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Prosiding Seminar Edusainstech ISBN: 978-602-5614-35-4 Fmipa Unimus 2020
No ratings yet
Prosiding Seminar Edusainstech ISBN: 978-602-5614-35-4 Fmipa Unimus 2020
9 pages
Regression Shrinkage Techniques
No ratings yet
Regression Shrinkage Techniques
5 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Chapter 5
No ratings yet
Chapter 5
30 pages
MMGT6012 - Topic 5 - Multiple Regression Modelling
No ratings yet
MMGT6012 - Topic 5 - Multiple Regression Modelling
43 pages
18mab204t - MCQ (I-V)
No ratings yet
18mab204t - MCQ (I-V)
15 pages
Slides 2
No ratings yet
Slides 2
27 pages
Formula For Hypothesis Testing
No ratings yet
Formula For Hypothesis Testing
5 pages
Machine Learning Model Selection
No ratings yet
Machine Learning Model Selection
7 pages
Assignment 2 0f Inferential Statistics-Converted-Compressed-1 PDF
No ratings yet
Assignment 2 0f Inferential Statistics-Converted-Compressed-1 PDF
21 pages
Basic Stat in SAS
No ratings yet
Basic Stat in SAS
12 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Outer Loading
No ratings yet
Outer Loading
114 pages
CPE 2 2 Time Scaled Event Network Exercises Macalinao
100% (2)
CPE 2 2 Time Scaled Event Network Exercises Macalinao
6 pages
Lec5 CostBehavior
No ratings yet
Lec5 CostBehavior
23 pages
Statistics for Skittles Analysis
No ratings yet
Statistics for Skittles Analysis
8 pages
Probability Theory Course Overview
No ratings yet
Probability Theory Course Overview
2 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
VariableSelectionAndModelBuilding IIT
No ratings yet
VariableSelectionAndModelBuilding IIT
22 pages
Restricted Regression Edit - Removed
No ratings yet
Restricted Regression Edit - Removed
6 pages
Minitab Demonstration For Randomized Block Design
100% (1)
Minitab Demonstration For Randomized Block Design
3 pages
Statistical Analysis for Experiments
No ratings yet
Statistical Analysis for Experiments
3 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
Assignment 34
No ratings yet
Assignment 34
16 pages
Linear Model Selection and Regularization
No ratings yet
Linear Model Selection and Regularization
23 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
No ratings yet
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
29 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Abe 2011 Symmetric Circular Models Through Duplication and Cosine Perturbation
No ratings yet
Abe 2011 Symmetric Circular Models Through Duplication and Cosine Perturbation
12 pages
Time Series Questions
100% (1)
Time Series Questions
8 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Data Analytics - Ridge and LASSO Regression
No ratings yet
Data Analytics - Ridge and LASSO Regression
15 pages
Statistics Cheat Sheet
No ratings yet
Statistics Cheat Sheet
5 pages
10 Ba607
No ratings yet
10 Ba607
44 pages
SLChapter 5
No ratings yet
SLChapter 5
16 pages
T04 Soln
No ratings yet
T04 Soln
4 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
ML Ques Mod-1
No ratings yet
ML Ques Mod-1
25 pages
Exam Practice Solution
No ratings yet
Exam Practice Solution
9 pages
De Souza
No ratings yet
De Souza
10 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
480 Note Lin
No ratings yet
480 Note Lin
11 pages
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
No ratings yet
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
38 pages
TSA Chapter 2
No ratings yet
TSA Chapter 2
3 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
Mcom 2 Sem Paper 9 Advanced Statistics F 109 Jun 2022
No ratings yet
Mcom 2 Sem Paper 9 Advanced Statistics F 109 Jun 2022
8 pages
Tutorial 5 - Solution Data Science
No ratings yet
Tutorial 5 - Solution Data Science
9 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
WINSEM2024-25 CSE3008 ETH AP2024254000248 2025-01-24 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE3008 ETH AP2024254000248 2025-01-24 Reference-Material-I
27 pages
ADDITIONAL
No ratings yet
ADDITIONAL
4 pages
Additional Notes - Mod - 5
No ratings yet
Additional Notes - Mod - 5
3 pages
Exercises Conceptual
No ratings yet
Exercises Conceptual
6 pages
Conflict in Organizational Behaviour
No ratings yet
Conflict in Organizational Behaviour
7 pages
Antenna Array Notes
No ratings yet
Antenna Array Notes
7 pages
10 - Linear Regression-Problems and Solutions
No ratings yet
10 - Linear Regression-Problems and Solutions
23 pages
Ch6 Linear Model Selection and Regularization
No ratings yet
Ch6 Linear Model Selection and Regularization
52 pages

Conceptual Exercises

Uploaded by

Conceptual Exercises

Uploaded by

6.

(c) True or False:

(a) The lasso, relative to least squares, is:

(b) Repeat (a) for ridge regression relative to least squares

(c) Repeat (a) for non-linear methods relative to least squares.

3. Suppose we estimate the regression coefficients in a linear regression model by minimizing

iv. Steadily decrease

(b) Repeat (a) for test RSS.

ii. Decrease initially, and then eventually start increasing in a U shape.

(c) Repeat (a) for variance.

iii. Steadily increase.

(d) Repeat (a) for (squared) bias.

iv. Steadily decrease.

(e) Repeat (a) for the irreducible error

4. Suppose we estimate the regression coefficients in a linear regression model by minimizing

(a) As we increase λ from 0, the training RSS will:

iii. Steadily increase.

ii. Decrease initially, and then eventually start increasing in a U shape.

(c) Repeat (a) for variance.

iv. Steadily decrease.

(d) Repeat (a) for (squared) bias.

iii. Steadily increases

(e) Repeat (a) for the irreducible error.

6. We will now explore (6.12) and (6.13) further.

You might also like