0% found this document useful (0 votes)

17 views43 pages

Shrinkage Techniques

The document discusses different techniques for shrinkage in statistical modeling to prevent overfitting, including shrinking regression coefficients after estimation, penalized maximum likelihood estimation, and lasso regression. Shrinkage methods like these make predictions from models more reliable and improve calibration by reducing overfitting, while typically not affecting the model's ability to discriminate between outcomes (discrimination).

Uploaded by

IgnacioCortésFuentes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views43 pages

Shrinkage Techniques

Uploaded by

IgnacioCortésFuentes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Shrinkage techniques

Presentation Capita Selecta

28-10-2014
Prediction model
Aim:
• Predict the outcome for new subjects

Overfitting:
• Data in sample are well described, but not
valid for new subjects
• Higher predictions will be found too high,
lower predictions too low
Steyerberg EW. Clinical prediction models, chapter 5 . Springer, 2009
Overfitting

Good performance in sample

Bad performance in new subjects

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/AlexZhao/Overfitting_Cancer_workshop_04162010.pdf
What you see is not what you get
Overfitting

Fitting a model with too many degrees of

freedom in modeling process.
Eg: univariable selection, interactions, transformations etc.

Solutions:
• Use less degrees of freedom
• Increase the degrees of freedom you can use

Steyerberg EW. Clinical prediction models, chapter 5 . Springer, 2009

How many degrees of freedom?

Candidate predictors Reducing overfitting

>1:10 events Necessary
1:10 – 1:20 events Advisable
1:20 – 1:50 events Not necessary
<1:50 events Not necessary

http://painconsortium.nih.gov/symptomresearch/chapter_8/sec8/cess8pg2.htm
What is shrinkage?
• Regression coefficients to less extreme values
Predicted probabilities

No After
shrinkage shrinkage
Moons KGM e.a. J Clin Epidemiol. 2004;57(12):1262-70
How to shrink?
• 4 methods
• Before or after estimation of betas
• Simple to more difficult
Shrinkage after estimation
• Apply a shrinkage factor to the regression
coefficients (βshrunken = s x β)

How to determine s?
• Formula
• Bootstrap

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Shrinkage after estimation
Formula (van Houwelingen, Le Cessie)

s = (model χ2 – df) / model χ2 = 0.89

Example is based on dataset with 24 outcomes

van Houwelingen JC, Le Cessie S. Stat Med 1990;9(11):1303-25
Shrinkage by formula
Increased shrinkage with:
↓ sample size (= ↓ χ2)
↑ predictors
• 3 predictors: 0.92
• 5 predictors: 0.89
• 8 predictors: 0.83

Note: honestly estimate df!

Shrinkage by formula
New intercept
• βshrunken = s x β
• Calculate LP: βs1*x1+ .. +βsn*xn
• Calculate predicted probabilities
• Sum of predictions = # of events
Questions?
Shrinkage by bootstrap
1. Take a bootstrap sample (size = n, with
replacement)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Shrinkage by bootstrap
1. Take a bootstrap sample
2. Estimate the regression coefficients in the
bootstrap sample (same selection & estimation strategy)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Shrinkage by bootstrap
1. Take a bootstrap sample (size n, drawn with replacement)
2. Estimate the regression coefficients (same
selection & estimation strategy)

3. Calculate linear predictor (β1x1+…+βnxn) in original

sample with bootstrapped coefficients.

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Shrinkage by bootstrap
1. Take a bootstrap sample
2. Estimate the regression coefficients (same selection & estimation strategy)
3. Calculate linear predictor (β1*x1+β2*x2 etc) in original sample with bootstrapped coefficients.

4. Slope of LP: regression with outcome of

patients in original sample and LP as covariable.
0.5
Observed

slope = 0.9

Predicted probability 0.5

Shrinkage by bootstrap
1. Take a bootstrap sample
2. Estimate the regression coefficients (same selection &
estimation strategy)
3. Calculate linear predictor (β1*x1+β2*x2 etc) in original
sample with bootstrapped coefficients.
4. Slope of LP: regression with outcome of patients in
original sample and LP as covariable.

Repeat steps 1 – 4 (e.g. 200x)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Shrinkage by bootstrap
Shrinkage factor: average slope of LP
Intercept: sum predictions = observed # events

R output

Example: 24 outcomes, 8 predictors, backward selection.

Questions?
Shrinkage during estimation
Penalized maximum likelihood
- Normally: maximum likelihood estimation
- Penalized ML: log L – 0.5 * penalty * Σ(βscaled)2
- Penalty factor by optimizing AIC
(related to model χ2 & # predictors)
- Trial & error

Moons KGM e.a. J Clin Epidemiol. 2004;57(12):1262-70

Penalty factor
8

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Questions?
Shrinkage for selection
Lasso
- Shrinkage + selection
- Some predictors are set to zero
- Maximum sum of |βstandardized|
- How to determine this maximum?
Trial & error (e.g. cross validation/AIC)

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Questions?
Result of shrinkage

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Differences in amount of shrinkage
Shrinking afterwards
• Same shrinkage factor for every predictor

PML
• Different shrinkage for different predictors
• Unstable predictors are shrunk the most

Lasso
• Also selection of predictors (less is more)
Differences in performance
Case study by Steyerberg et al.
• GUSTO-I study
• Mortality at 30 days
• 8 predictor model
• Subsample: 336 patients, 20 cases
• Total sample: 20512 patients, 1435 cases

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

1.09 1.01

0.90

0.68

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Differences in performance

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Question

Can you explain why shrinkage methods make

predictions more reliable, and affect calibration,
but not discrimination?
Conclusion case study
Shrinkage in small datasets
• No shrinkage: poor performance
• No/minor advantage on discrimination
• Major improvement of calibration
• No major differences between methods

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Software differences
Shrinking afterwards (relatively easy)
• Formula: all software
• Bootstrapping: SAS/Stata/R

PML & Lasso (advanced)

• Only in R
Questions?
Conclusion
Shrinkage is necessary in small datasets
or with many predictors

(like studies on molecular markers)

But…
Clinical practice

Vickers e.a. Cancer 2008;112(8):1862-8

Conclusion
Shrinkage is necessary in small datasets
• When? >1 / 20 events
• How? Different methods available
• All result in better performance
• Techniques: simple to difficult

Reminder: make less data driven decisions

Questions?
Differences in performance
Stimulation study by Vach et al.
Lasso
• Overcorrects larger effects
• Cautious in removing variables

Formula
• Good estimation of shrinkage
• No variable selection
Vach K e.a. Stat Neerl 2001;55:53-75

Clinical Prediction Model Insights
No ratings yet
Clinical Prediction Model Insights
14 pages
Multiple Regression
No ratings yet
Multiple Regression
55 pages
What Non-Statisticians Need To Know About Statistics in Clinical Trials
No ratings yet
What Non-Statisticians Need To Know About Statistics in Clinical Trials
43 pages
Stastitics (Clinical Trials)
No ratings yet
Stastitics (Clinical Trials)
54 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
Steyerberg Prediction Modeling 7 Steps Jan10
No ratings yet
Steyerberg Prediction Modeling 7 Steps Jan10
45 pages
Interpreting Multiple Regression
No ratings yet
Interpreting Multiple Regression
7 pages
Reglas de Predicción Clínica
No ratings yet
Reglas de Predicción Clínica
7 pages
Statistics in Medicine - 2004 - Steyerberg - Validation and Updating of Predictive Logistic Regression Models A Study On
No ratings yet
Statistics in Medicine - 2004 - Steyerberg - Validation and Updating of Predictive Logistic Regression Models A Study On
20 pages
Prognosis and Prognostic Research - Developing A Prognostic Model - The BMJ
No ratings yet
Prognosis and Prognostic Research - Developing A Prognostic Model - The BMJ
10 pages
Statistics in Medicine - 2018 - Riley - Minimum Sample Size For Developing A Multivariable Prediction Model Part I
No ratings yet
Statistics in Medicine - 2018 - Riley - Minimum Sample Size For Developing A Multivariable Prediction Model Part I
14 pages
Minitab Tip Sheet 15
No ratings yet
Minitab Tip Sheet 15
5 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Bootstrap
No ratings yet
Bootstrap
33 pages
Biostatistics: ABSITE Review Series Sarah Abdulla
No ratings yet
Biostatistics: ABSITE Review Series Sarah Abdulla
30 pages
Basic Statistical
No ratings yet
Basic Statistical
31 pages
Empirical Bayes Estimation of The Selected Treatment Mean For Two Stage Drop The Loser Trials A Meta Analytic Approach
No ratings yet
Empirical Bayes Estimation of The Selected Treatment Mean For Two Stage Drop The Loser Trials A Meta Analytic Approach
13 pages
An Overview of Regression Analysis: Notes
No ratings yet
An Overview of Regression Analysis: Notes
5 pages
Reviewer's Quick Guide To Common Statistical Errors
No ratings yet
Reviewer's Quick Guide To Common Statistical Errors
1 page
Common Statistical Errors
No ratings yet
Common Statistical Errors
1 page
6414 SP2022 Practice Final Part1 Solutions
No ratings yet
6414 SP2022 Practice Final Part1 Solutions
3 pages
Bio Stats
No ratings yet
Bio Stats
29 pages
MEDI 1020 - Workshop 7 - Regression
No ratings yet
MEDI 1020 - Workshop 7 - Regression
15 pages
Università Di Genova 20230102
No ratings yet
Università Di Genova 20230102
81 pages
Bayesian Stats for Clinical Science
100% (4)
Bayesian Stats for Clinical Science
45 pages
02 MG JSM StatReg
No ratings yet
02 MG JSM StatReg
38 pages
Statistical Modelling of Epidemiological Data
No ratings yet
Statistical Modelling of Epidemiological Data
87 pages
Critical Appraisal UFH EM IFS
No ratings yet
Critical Appraisal UFH EM IFS
69 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
Confidence Interval: DR - Renj U
No ratings yet
Confidence Interval: DR - Renj U
71 pages
k2 - Attachments - CT Lecture 16. Model Selection
No ratings yet
k2 - Attachments - CT Lecture 16. Model Selection
50 pages
Statistics in Medicine - 2018 - Riley - Minimum Sample Size For Developing A Multivariable Prediction Model PART II
No ratings yet
Statistics in Medicine - 2018 - Riley - Minimum Sample Size For Developing A Multivariable Prediction Model PART II
21 pages
Multiple Linear Regression - Prof. Sami Day 1
No ratings yet
Multiple Linear Regression - Prof. Sami Day 1
58 pages
1694 Gastonguaypage2011
No ratings yet
1694 Gastonguaypage2011
43 pages
Many Variables
No ratings yet
Many Variables
5 pages
Regression Models As A Tool in Medical Research - 1st Edition No-Wait Download
No ratings yet
Regression Models As A Tool in Medical Research - 1st Edition No-Wait Download
15 pages
EPIB 603 Lecture 2 - Linear Models With Covariates
No ratings yet
EPIB 603 Lecture 2 - Linear Models With Covariates
88 pages
SPSS For Starters Part2
No ratings yet
SPSS For Starters Part2
106 pages
Prognosis Critical Appraisal Guide: Validity
No ratings yet
Prognosis Critical Appraisal Guide: Validity
2 pages
Bio Stat Problems 2
No ratings yet
Bio Stat Problems 2
15 pages
Biostatistics Epidemiology Definitions Chart
No ratings yet
Biostatistics Epidemiology Definitions Chart
10 pages
Passmedicine Statistics Note 2021: Prepared by DR - Abohaneen Mrcpase Telegram Group
No ratings yet
Passmedicine Statistics Note 2021: Prepared by DR - Abohaneen Mrcpase Telegram Group
25 pages
Inferentialstatistics 210411214248
No ratings yet
Inferentialstatistics 210411214248
102 pages
A Review of Clinical Prediction Models
No ratings yet
A Review of Clinical Prediction Models
36 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
Introduction To Cox Regression: Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
No ratings yet
Introduction To Cox Regression: Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
62 pages
Regression Modeling PDF
100% (1)
Regression Modeling PDF
598 pages
2015 Book RegressionModelingStrategies-1 PDF
No ratings yet
2015 Book RegressionModelingStrategies-1 PDF
598 pages
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
100% (4)
Regression Modeling Strategies - With Applications To Linear Models by Frank E. Harrell
598 pages
L3 Overview of Quantitative Research I
No ratings yet
L3 Overview of Quantitative Research I
44 pages
PROGNOSIS Ebm
No ratings yet
PROGNOSIS Ebm
25 pages
Basic Principles of Research Design
No ratings yet
Basic Principles of Research Design
33 pages
Clinical Trials CA Whitepaper
No ratings yet
Clinical Trials CA Whitepaper
14 pages
45 Excel Formulas
No ratings yet
45 Excel Formulas
138 pages
Datasheet ST S5H100
No ratings yet
Datasheet ST S5H100
5 pages
Corp Internet Banking FAQs
No ratings yet
Corp Internet Banking FAQs
2 pages
SRAN (2G+4G) With 3G Classical Sync Process
No ratings yet
SRAN (2G+4G) With 3G Classical Sync Process
14 pages
P8 5.5.0-P85.5.4 Patch Compatibility Matrix 6
No ratings yet
P8 5.5.0-P85.5.4 Patch Compatibility Matrix 6
16 pages
Cisco Asa Firepower
No ratings yet
Cisco Asa Firepower
11 pages
Windows System Error Codes
No ratings yet
Windows System Error Codes
304 pages
CH 3-5 MRI Contrast Spatial Localization
No ratings yet
CH 3-5 MRI Contrast Spatial Localization
109 pages
36 - Extracted - CN LAB FILE
No ratings yet
36 - Extracted - CN LAB FILE
21 pages
Log
No ratings yet
Log
44 pages
Introduction To Central User Administration (CUA) - SAP - All About Web and Cloud
No ratings yet
Introduction To Central User Administration (CUA) - SAP - All About Web and Cloud
3 pages
Berkeley DCM - 25.03.2022
No ratings yet
Berkeley DCM - 25.03.2022
2 pages
Exploit Labs Short
No ratings yet
Exploit Labs Short
17 pages
Exploring English Learners Experiences of Using M
No ratings yet
Exploring English Learners Experiences of Using M
15 pages
Business Process Simulation Guide
No ratings yet
Business Process Simulation Guide
24 pages
Fall 2011 - CS502 - 1
No ratings yet
Fall 2011 - CS502 - 1
3 pages
Crypto & Forex Trading with GPT AI
No ratings yet
Crypto & Forex Trading with GPT AI
15 pages
Salient Features of IT Act 2000
No ratings yet
Salient Features of IT Act 2000
10 pages
Chapter 4
No ratings yet
Chapter 4
6 pages
Cantina Centrifuge CFG February March2025
No ratings yet
Cantina Centrifuge CFG February March2025
10 pages
Service Level Management Upgrade Training: HPSM For HP Enterprise Services
No ratings yet
Service Level Management Upgrade Training: HPSM For HP Enterprise Services
32 pages
DFCM Driver Manual
No ratings yet
DFCM Driver Manual
52 pages
Hi-Target V30 50 GNSS RTK System Manual PDF
100% (2)
Hi-Target V30 50 GNSS RTK System Manual PDF
70 pages
SDQCQAManual
No ratings yet
SDQCQAManual
344 pages
Native Otp Authentication With Netscaler
No ratings yet
Native Otp Authentication With Netscaler
14 pages
Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer
No ratings yet
Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer
44 pages
Create All Time Zone Tables in HANA Schema SYSTEM
No ratings yet
Create All Time Zone Tables in HANA Schema SYSTEM
4 pages
Document
No ratings yet
Document
22 pages
Goldenmorning Electronic: Product Name: Rgs15128128Wr000 Product No.: Gme128128-01
No ratings yet
Goldenmorning Electronic: Product Name: Rgs15128128Wr000 Product No.: Gme128128-01
34 pages
Synopsis On Mobile Control Robot
No ratings yet
Synopsis On Mobile Control Robot
5 pages

Shrinkage Techniques

Uploaded by

Shrinkage Techniques

Uploaded by

Shrinkage techniques

Presentation Capita Selecta

Good performance in sample

Fitting a model with too many degrees of

Steyerberg EW. Clinical prediction models, chapter 5 . Springer, 2009

Candidate predictors Reducing overfitting

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

s = (model χ2 – df) / model χ2 = 0.89

Example is based on dataset with 24 outcomes

Note: honestly estimate df!

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

3. Calculate linear predictor (β1*x1+…+βn*xn) in original

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

4. Slope of LP: regression with outcome of

Predicted probability 0.5

Repeat steps 1 – 4 (e.g. 200x)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Example: 24 outcomes, 8 predictors, backward selection.

Moons KGM e.a. J Clin Epidemiol. 2004;57(12):1262-70

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

Can you explain why shrinkage methods make

Steyerberg EW e.a. Stat Neerl 2001;55:76-88

PML & Lasso (advanced)

(like studies on molecular markers)

Vickers e.a. Cancer 2008;112(8):1862-8

Reminder: make less data driven decisions

You might also like

3. Calculate linear predictor (β1x1+…+βnxn) in original