0% found this document useful (0 votes)

3 views26 pages

Class2 Slides

Uploaded by

daniel.li.here2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views26 pages

Class2 Slides

Uploaded by

daniel.li.here2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Stat 422/722, Class 2

Richard Waterman

The Wharton School

Fall 2019

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 1 / 26
Table of Contents

1 Last time

2 Today’s class

3 Review

4 JMP tasks

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 2 / 26
Last time

What is a linear regression? A linear model for the conditional mean.

The least squares criteria.
The standard regression assumptions.
Predictions can be viewed as weighted linear combinations of the yi .
Leverage as measured by the hats (hii ).
Hypothesis testing.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 3 / 26
Today’s class

Introduce stepwise regression.

Illustrate the JMP platform.
Identify the BIG problem – overfitting.
Alternative stopping rules (in addition to p-values)
1 RMSE
2 Adjusted R 2
3 Mallows’ Cp
4 Akaike Information Criterion (AIC)
5 Bayesian Information Criterion (BIC)
Comparing the RMSE in and out-of-sample.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 4 / 26
The Apple data set

The goal: based on data available today, predict Apple’s return

tomorrow.
We have 106 data points, 42 are held out, October onward.
Variables: 10 other stocks GOOG, INTC etc.
These features provide 41 variables to choose from: price, volume,
number of trades and returns.
Even just considering main effects (no interactions or squares) there
are 241 possible models. That is 2,199,023,255,552 (two trillion)
different models to explore.
Including all possible interactions and squares there are
1
41 41 + × 40 × 41 = 902 terms and
+ |{z}
|2
|{z}
main.effects squares {z }
interactions
2902 = 3.381 × 10271 possible models1 .
1
The number of atoms in the universe is estimated at 1080 atoms.
Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 5 / 26
Heuristic

So it is a big problem. A big space of models to explore – the so

called curse of dimensionality.
We bring a heuristic to the problem.
Merriam Webster: involving or serving as an aid to learning, dis-
covery, or problem-solving by experimental and especially trial-
and-error methods
Then follow our natural instinct: iterative model fitting. Find the
single best variable. Given this variable, find the second best. Given
these two, find the third best and so on.
The essence of stepwise regression, the original automated model
selection tool.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 6 / 26
Elements of stepwise

Choose a direction to make a path through the big model space:

1 Forward selection.
2 Backwards elimination.
3 Forwards and backwards = mixed.
4 When there is collinearity these three approaches will not necessarily
identify the same model.
A rule for variable selection.
1 Add step: the one with the lowest p-value (or R 2 up the most).
2 Removal: the one with the highest p-value (or R 2 down the least).
A rule for stopping.
1 P-Value Threshold
2 AICc
3 BIC
Details: rules for treating categorical variables and interactions.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 7 / 26
Choosing the variables to offer to stepwise

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 8 / 26
The stepwise dialog

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 9 / 26
The stepwise elements

Variable selection: which one makes R 2 go up the most or down the

least.
Direction: mixed
Stopping rule: P-Value threshold
Rules: No rules!

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 10 / 26
Output from the chosen model

All variables highly significant. R 2 = 54%. RMSE = 0.012573. The initial

raw standard deviation of the returns was 0.0178048. So RMSE is
0.012573
= 70.616%
0.0178048
of the initial unexplained variation. The model looks good by most
standards.
Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 11 / 26
Out-of-sample prediction

Disaster strikes! A plot of the absolute forecast error, both in and

out-of-sample.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 12 / 26
Overfitting and the in and out-of-sample RMSE

This is overfitting. The big danger of greedy algorithms run amok.

The model actually performs much worse out-of-sample than the
in-sample summaries suggest.
We will see how to mitigate this in the next class.
In-sample RMSE = 0.012573.
Out-of-sample RMSE = 0.0623.
A 500% inflation factor.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 13 / 26
Comments on stepwise regression

You can use stepwise after a hand-crafted model has been made to
make sure nothing has been overlooked.
Stepwise can’t find variables unless you offer them to it!
Stepwise can’t think about transformations and normalization.
Stepwise can’t help in interpretation.
Stepwise looks one step ahead. It is a greedy algorithm; that is one
that makes locally optimal decisions in the hope that it comes close
to a globally optimal one. You could imagine looking over pairs of
variables or triplets, rather than one at a time. Kasparov looked 3-5
moves ahead in chess and sometimes as many as 12. Stepwise looks
one step ahead!
Use center polynomials to reduce collinearity.

Use stepwise as a validation/exploratory tool, not as the only approach.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 14 / 26
Stopping criteria other than the p-value cut-off

K.I.S.S = Occam’s razor = Parsimony

Among competing theories that equally well explain the observa-

tions, choose the one that is simplest.
Comparing R 2 (the same as minimizing Sums of Squared Error [SSE])
across models doesn’t capture the idea of simplicity.
Neither does RMSE. Two models can have the same RMSE but that
doesn’t distinguish between the complexity of the models.
Unlike regular R 2 , Adjusted R 2 doesn’t have to increase with
additional variables so looks like a better choice, but
RMSE 2

2
AdjustedR = 1 − ,
sy2

so maximizing Adjusted R 2 is equivalent to minimizing RMSE.

We need a new idea.
Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 15 / 26
Explicitly incorporating complexity in the model selection
criterion
Rather than choosing the model with the smallest sums of squares error
(SSE), you can penalize more complex models directly through the
number of variables included (k is the number of variables in the model
and σ̂ 2 is an estimate of the variance of the i ).
SSEk
1 Mallows’ Cp = σ̂ 2
− n + 2k.
SSEk
2 Akaike Information Criterion: AIC (k) ∝ σ̂ 2
+ 2k.
SSEk
3 Bayesian Information Criterion: BIC (k) ∝ σ̂2 + log(n)k.
With normally distributed error terms Cp and AIC are equivalent.
BIC penalizes complexity more than AIC (when log(n) > 2) so prefers
smaller models.
These are the other Stopping Rules in the stepwise dialog.
These stopping rules are more appropriate when the goal is model
selection for prediction.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 16 / 26
Interpreting the model selection criteria through t-stats

We add a variable to the model if the increased complexity (k goes up) is

appropriately offset with a smaller unexplained variation SSE.
One can show that a new variable is added to an existing model if:

Criterion Approx |t| cut-off Equiv p-value Goal

Adjusted R 2 |t| > 1√ 0.33 Minimize RMSE
Cp / AIC |t| > 2 0.16 Achieve an unbiased
estimate of prediction
p accuracy
BIC |t| > log(n) Depends on n Something Bayesian!

Recall that in standard hypothesis testing a significant t is one such that

|t| > 2. The value 2 is arbitrary and is chosen to control the Type I error
rate at α = 0.05.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 17 / 26
The reason to adjust for complexity
Statistics estimates parameters through optimization – typically by
making something as small as possible. In particular, in regression, by
making the Sums of Squares Error (SSE) as small as possible.
But we end up using the data twice. Once to estimate p the parameters
and then again to judge the quality of fit (RMSE = SSE /(n − p)).
Hence it provides an over-optimistic view of what happens in practice.
Penalizing the fitting criteria by the number of parameters in the
model, is an explicit way to mitigate this over-optimism.
Recommendation: use AIC when you are looking for a predictive
model that is trying to get close to an unobtainable complex truth.
That is, you see your final model as an approximation to the truth.
Choose the model with the lowest AIC. But AIC will only be reliable if
n >> k.
Recommendation: use BIC, when you are searching for the right
model, within a set of predefined models, and you believe that the
truth lies in the set under consideration.
Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 18 / 26
Categorical variables in stepwise

JMP creates two-level contrasts of the categorical variables. That is,

it buckets the multi-level categorical variable into a set of two-level
comparisons.
There is no reason why these two-level comparisons should be easily
interpretable.
Example: After fitting a stepwise model for GP1000M City with a
categorical variable (Transmission) and weight and horsepower,
run the model and then look at the prediction formula to see the
coding.
If a categorical is selected by stepwise, you could choose to put the
entire variable into the final model, or you could create an
interpretable recoding or even use the contrasts chosen by stepwise
itself.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 19 / 26
Illustration of the coding for the Transmission variable

JMP will use a {+1,-1,0} coding by default, and not the {+1,0} dummy
variable coding scheme we saw in Stat 613/621.

Transmission is a four level categorical with levels {A, AS, AV, M}.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 20 / 26
Interpretation of the categorical variable parameters

JMP notation: “&” means to combine the categories and “-” means to
compare/contrast them.

Contrast A AS AV M
AV&M - A&AS -1 -1 +1 +1
AV - M 0 0 +1 -1
A - AS +1 -1 0 0

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 21 / 26
Representation of the categorical variable in the data table

Adding the variable Transmission{AV - M} to the model adds a parameter

estimate. Its value is -4.37. So the AV’s forecast changes by -4.37, the
M’s changes by +4.37 and the A’s and AS’s change by 0.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 22 / 26
Stepwise regression review

The need for tools like stepwise – the space of all models is typically
too big to exhaustively explore.
The mechanics of stepwise; stopping rules, variable selection criterion.
The big issue with stepwise: over-fitting.
The Information criteria that penalize complexity.
How JMP treats categoricals in stepwise.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 23 / 26
JMP tasks

Make sure you can:

1 Use the model dialog to include all interactions and squares as
potential model effects.
2 Run the stepwise tool.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 24 / 26
Creating all interactions and squared terms

In the fit model dialog

Select the X-variables of interest
In model effects, go to Macros
Choose Response Surface.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 25 / 26
Stepwise dialog

JMP stepwise

After having selected the X-variables

in the Fit Model dialog
In the Personality choose Stepwise
Check Keep dialog open and click Run
For Stopping Rule choose P-value Threshold
Enter Prob To Enter and Prob to Leave
For Direction choose Mixed
For Rules choose No rules
Click Step to step through the variables.

Richard Waterman (The Wharton School) Stat 422/722, Class 2 Fall 2019 26 / 26

Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
How To Be A Bayesian in Sas
No ratings yet
How To Be A Bayesian in Sas
9 pages
Assignment HBEC4503 Action Research in Early Childhood Education Assignment 2 May 2019 Semester
No ratings yet
Assignment HBEC4503 Action Research in Early Childhood Education Assignment 2 May 2019 Semester
10 pages
Lecture4 - Model Selection and Regularization - Ver2
No ratings yet
Lecture4 - Model Selection and Regularization - Ver2
98 pages
Model Analysis
100% (3)
Model Analysis
7 pages
Class11 Slides
No ratings yet
Class11 Slides
33 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
Class3 Slides
No ratings yet
Class3 Slides
24 pages
Stepwise Regression
100% (2)
Stepwise Regression
28 pages
WINSEM2024-25 CSE3008 ETH AP2024254000248 2025-01-24 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE3008 ETH AP2024254000248 2025-01-24 Reference-Material-I
27 pages
Class1 Slides
No ratings yet
Class1 Slides
21 pages
Chap3 Variable Selection
No ratings yet
Chap3 Variable Selection
23 pages
Session 3 - Chapter 06 Linear Reg
No ratings yet
Session 3 - Chapter 06 Linear Reg
20 pages
Personal Development: 1 Quarter: Module 2 Developing The Whole Person
100% (2)
Personal Development: 1 Quarter: Module 2 Developing The Whole Person
10 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
Rio Thesis - 054559
No ratings yet
Rio Thesis - 054559
53 pages
Class13 Slides
No ratings yet
Class13 Slides
16 pages
Stepwise Regression
No ratings yet
Stepwise Regression
4 pages
QM2 23-24 Session 4
No ratings yet
QM2 23-24 Session 4
68 pages
Model Selection for Statisticians
No ratings yet
Model Selection for Statisticians
41 pages
Model Selection and BIC Explained
No ratings yet
Model Selection and BIC Explained
41 pages
Week8 Lecture 1 ML SPR25
No ratings yet
Week8 Lecture 1 ML SPR25
20 pages
Advanced GLM Techniques Lecture
No ratings yet
Advanced GLM Techniques Lecture
61 pages
STEPAIC
No ratings yet
STEPAIC
36 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Ch5 Slide VariableSelection
No ratings yet
Ch5 Slide VariableSelection
36 pages
Lec 4 - Fin 611 - Return On Investment - Post
No ratings yet
Lec 4 - Fin 611 - Return On Investment - Post
76 pages
L2D-Multiple Regression D 2022-03-03 21 - 20 - 03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21 - 20 - 03
31 pages
Statistica Neerlandica - 2012 - Wit - All Models Are Wrong An Introduction To Model Uncertainty
No ratings yet
Statistica Neerlandica - 2012 - Wit - All Models Are Wrong An Introduction To Model Uncertainty
20 pages
Step Away From Stepwise: Open Access Short Report
No ratings yet
Step Away From Stepwise: Open Access Short Report
12 pages
R Model Selection for Business Students
No ratings yet
R Model Selection for Business Students
30 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Black Seed (Nigella Sativa) - Clark's Nutrition
No ratings yet
Black Seed (Nigella Sativa) - Clark's Nutrition
5 pages
lOFTUS ET AL
No ratings yet
lOFTUS ET AL
17 pages
Multicollinearity & Model Selection
No ratings yet
Multicollinearity & Model Selection
30 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Reg 07
No ratings yet
Reg 07
22 pages
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
No ratings yet
WINSEM2023-24 MAT6015 ETH VL2023240501308 2024-03-19 Reference-Material-I
39 pages
Stepwise Regression Explained
No ratings yet
Stepwise Regression Explained
4 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
No ratings yet
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
17 pages
Ch08 - Linear Regression
No ratings yet
Ch08 - Linear Regression
37 pages
LSD Thesis Statement
100% (3)
LSD Thesis Statement
5 pages
Model Selection R Chap 4
No ratings yet
Model Selection R Chap 4
5 pages
UPI Transactiosn Frauds in India
No ratings yet
UPI Transactiosn Frauds in India
4 pages
Rstepwise
No ratings yet
Rstepwise
11 pages
Multi-Collineartity, Variance Inflation and Orthogonalization in Regression
No ratings yet
Multi-Collineartity, Variance Inflation and Orthogonalization in Regression
5 pages
LM02 Evaluating Regression Model Fit and Interpreting Model Results IFT Notes
No ratings yet
LM02 Evaluating Regression Model Fit and Interpreting Model Results IFT Notes
9 pages
Unit 4
No ratings yet
Unit 4
7 pages
31 Lecture Slides 29 and 30
No ratings yet
31 Lecture Slides 29 and 30
15 pages
Variable Selection
No ratings yet
Variable Selection
26 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Data Mining
No ratings yet
Data Mining
2 pages
Clinical Microbiology MCQ Practice Test
100% (4)
Clinical Microbiology MCQ Practice Test
13 pages
Lectures Named Reactions
No ratings yet
Lectures Named Reactions
26 pages
S05&06 Slides - STMNT of Cash Flows (Full)
No ratings yet
S05&06 Slides - STMNT of Cash Flows (Full)
52 pages
2021-A Complete Guide To Stepwise Regression in R
No ratings yet
2021-A Complete Guide To Stepwise Regression in R
4 pages
Lec 5 - Fin 611 - Fixed Income Instruments - Slides - Post
No ratings yet
Lec 5 - Fin 611 - Fixed Income Instruments - Slides - Post
42 pages
Alison Vidal
No ratings yet
Alison Vidal
5 pages
SAS Code To Select The Best Multiple Linear Regression Model For Multivariate Data Using Information Criteria
No ratings yet
SAS Code To Select The Best Multiple Linear Regression Model For Multivariate Data Using Information Criteria
6 pages
MultiLinear VariableSelection
No ratings yet
MultiLinear VariableSelection
10 pages
Class10 Slides
No ratings yet
Class10 Slides
29 pages
Glmulti Walkthrough
No ratings yet
Glmulti Walkthrough
29 pages
S03 Slides - Acctg Elements
No ratings yet
S03 Slides - Acctg Elements
25 pages
Class4 Slides
No ratings yet
Class4 Slides
23 pages
Class9 Slides
No ratings yet
Class9 Slides
23 pages
Lecture 7 - CCC Case - Fall 2018 - Bradlow
No ratings yet
Lecture 7 - CCC Case - Fall 2018 - Bradlow
23 pages
Class5 Slides
No ratings yet
Class5 Slides
22 pages
Specification Errors in Regression Analysis
No ratings yet
Specification Errors in Regression Analysis
7 pages
OLS Assumptions & Issues Guide
No ratings yet
OLS Assumptions & Issues Guide
4 pages
Session 3
No ratings yet
Session 3
20 pages
Lars Based S Estimator
No ratings yet
Lars Based S Estimator
10 pages
Class7 Slides
No ratings yet
Class7 Slides
18 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Selecting Amongst Large Classes of Models: Brian D. Ripley
No ratings yet
Selecting Amongst Large Classes of Models: Brian D. Ripley
38 pages
Class3 MGMT610 2018web
No ratings yet
Class3 MGMT610 2018web
21 pages
Camay Relaunch in Pakistan
100% (1)
Camay Relaunch in Pakistan
26 pages
Purposive Communication - Lesson 3
No ratings yet
Purposive Communication - Lesson 3
7 pages
Exam 2 Commentary
No ratings yet
Exam 2 Commentary
1 page
STID1103 SYLLABUS A211 Student
No ratings yet
STID1103 SYLLABUS A211 Student
5 pages
Engineering Heat Calculations
No ratings yet
Engineering Heat Calculations
6 pages
ASH32803 Lecture 8 Management of Dry Cows
No ratings yet
ASH32803 Lecture 8 Management of Dry Cows
16 pages
International Finance Overview
No ratings yet
International Finance Overview
36 pages
PNL Account Cashflow Forecast: Missing Values
No ratings yet
PNL Account Cashflow Forecast: Missing Values
5 pages
4 VMXQ J9 R Qyj 7 Xo XUaj EB
No ratings yet
4 VMXQ J9 R Qyj 7 Xo XUaj EB
49 pages
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
No ratings yet
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
5 pages
Model Selection and Model Averaging
No ratings yet
Model Selection and Model Averaging
16 pages
Weekly Assessment in Science
No ratings yet
Weekly Assessment in Science
1 page
Advanced Ventilator Specifications
No ratings yet
Advanced Ventilator Specifications
2 pages
MyEdBC Family Portal Instructional Manual
No ratings yet
MyEdBC Family Portal Instructional Manual
6 pages
Angelica Resume
No ratings yet
Angelica Resume
1 page
Trial Memorandum Plaintiff SAMPLE
No ratings yet
Trial Memorandum Plaintiff SAMPLE
9 pages
High Pass Filter
No ratings yet
High Pass Filter
12 pages
According To Saunders Et Al
No ratings yet
According To Saunders Et Al
13 pages
Agency Sales Call Script
No ratings yet
Agency Sales Call Script
4 pages
GoAnywhere System Architecture Guide
No ratings yet
GoAnywhere System Architecture Guide
29 pages
Instruction Manual: P/N 30-2131-XXX Pressure Sensors
No ratings yet
Instruction Manual: P/N 30-2131-XXX Pressure Sensors
2 pages
Omkar Resume
No ratings yet
Omkar Resume
2 pages
9 - Class INTSO Work Sheet - 3 - Basic Concepts of Geometry
No ratings yet
9 - Class INTSO Work Sheet - 3 - Basic Concepts of Geometry
8 pages

Class2 Slides

Uploaded by

Class2 Slides

Uploaded by

Stat 422/722, Class 2

The Wharton School

What is a linear regression? A linear model for the conditional mean.

Introduce stepwise regression.

The goal: based on data available today, predict Apple’s return

So it is a big problem. A big space of models to explore – the so

Choose a direction to make a path through the big model space:

Variable selection: which one makes R 2 go up the most or down the

All variables highly significant. R 2 = 54%. RMSE = 0.012573. The initial

Disaster strikes! A plot of the absolute forecast error, both in and

This is overfitting. The big danger of greedy algorithms run amok.

Use stepwise as a validation/exploratory tool, not as the only approach.

K.I.S.S = Occam’s razor = Parsimony

Among competing theories that equally well explain the observa-

so maximizing Adjusted R 2 is equivalent to minimizing RMSE.

We add a variable to the model if the increased complexity (k goes up) is

Criterion Approx |t| cut-off Equiv p-value Goal

Recall that in standard hypothesis testing a significant t is one such that

JMP creates two-level contrasts of the categorical variables. That is,

Adding the variable Transmission{AV - M} to the model adds a parameter

Make sure you can:

In the fit model dialog

After having selected the X-variables

You might also like