Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views43 pages

Shrinkage Techniques

The document discusses different techniques for shrinkage in statistical modeling to prevent overfitting, including shrinking regression coefficients after estimation, penalized maximum likelihood estimation, and lasso regression. Shrinkage methods like these make predictions from models more reliable and improve calibration by reducing overfitting, while typically not affecting the model's ability to discriminate between outcomes (discrimination).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views43 pages

Shrinkage Techniques

The document discusses different techniques for shrinkage in statistical modeling to prevent overfitting, including shrinking regression coefficients after estimation, penalized maximum likelihood estimation, and lasso regression. Shrinkage methods like these make predictions from models more reliable and improve calibration by reducing overfitting, while typically not affecting the model's ability to discriminate between outcomes (discrimination).
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Shrinkage techniques

Presentation Capita Selecta


28-10-2014
Prediction model
Aim:
• Predict the outcome for new subjects

Overfitting:
• Data in sample are well described, but not
valid for new subjects
• Higher predictions will be found too high,
lower predictions too low
Steyerberg EW. Clinical prediction models, chapter 5 . Springer, 2009
Overfitting

Good performance in sample


Bad performance in new subjects

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/AlexZhao/Overfitting_Cancer_workshop_04162010.pdf
What you see is not what you get
Overfitting

Fitting a model with too many degrees of


freedom in modeling process.
Eg: univariable selection, interactions, transformations etc.

Solutions:
• Use less degrees of freedom
• Increase the degrees of freedom you can use

Steyerberg EW. Clinical prediction models, chapter 5 . Springer, 2009


How many degrees of freedom?

Candidate predictors Reducing overfitting


>1:10 events Necessary
1:10 – 1:20 events Advisable
1:20 – 1:50 events Not necessary
<1:50 events Not necessary

http://painconsortium.nih.gov/symptomresearch/chapter_8/sec8/cess8pg2.htm
What is shrinkage?
• Regression coefficients to less extreme values
Predicted probabilities

No After
shrinkage shrinkage
Moons KGM e.a. J Clin Epidemiol. 2004;57(12):1262-70
How to shrink?
• 4 methods
• Before or after estimation of betas
• Simple to more difficult
Shrinkage after estimation
• Apply a shrinkage factor to the regression
coefficients (βshrunken = s x β)

How to determine s?
• Formula
• Bootstrap

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Shrinkage after estimation
Formula (van Houwelingen, Le Cessie)

s = (model χ2 – df) / model χ2 = 0.89

Example is based on dataset with 24 outcomes


van Houwelingen JC, Le Cessie S. Stat Med 1990;9(11):1303-25
Shrinkage by formula
Increased shrinkage with:
↓ sample size (= ↓ χ2)
↑ predictors
• 3 predictors: 0.92
• 5 predictors: 0.89
• 8 predictors: 0.83

Note: honestly estimate df!


Shrinkage by formula
New intercept
• βshrunken = s x β
• Calculate LP: βs1*x1+ .. +βsn*xn
• Calculate predicted probabilities
• Sum of predictions = # of events
Questions?
Shrinkage by bootstrap
1. Take a bootstrap sample (size = n, with
replacement)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Shrinkage by bootstrap
1. Take a bootstrap sample
2. Estimate the regression coefficients in the
bootstrap sample (same selection & estimation strategy)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Shrinkage by bootstrap
1. Take a bootstrap sample (size n, drawn with replacement)
2. Estimate the regression coefficients (same
selection & estimation strategy)

3. Calculate linear predictor (β1*x1+…+βn*xn) in original


sample with bootstrapped coefficients.

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Shrinkage by bootstrap
1. Take a bootstrap sample
2. Estimate the regression coefficients (same selection & estimation strategy)
3. Calculate linear predictor (β1*x1+β2*x2 etc) in original sample with bootstrapped coefficients.

4. Slope of LP: regression with outcome of


patients in original sample and LP as covariable.
0.5
Observed

slope = 0.9

Predicted probability 0.5


Shrinkage by bootstrap
1. Take a bootstrap sample
2. Estimate the regression coefficients (same selection &
estimation strategy)
3. Calculate linear predictor (β1*x1+β2*x2 etc) in original
sample with bootstrapped coefficients.
4. Slope of LP: regression with outcome of patients in
original sample and LP as covariable.

Repeat steps 1 – 4 (e.g. 200x)

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Shrinkage by bootstrap
Shrinkage factor: average slope of LP
Intercept: sum predictions = observed # events

R output

Example: 24 outcomes, 8 predictors, backward selection.


Questions?
Shrinkage during estimation
Penalized maximum likelihood
- Normally: maximum likelihood estimation
- Penalized ML: log L – 0.5 * penalty * Σ(βscaled)2
- Penalty factor by optimizing AIC
(related to model χ2 & # predictors)
- Trial & error

Moons KGM e.a. J Clin Epidemiol. 2004;57(12):1262-70


Penalty factor
8

24

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Questions?
Shrinkage for selection
Lasso
- Shrinkage + selection
- Some predictors are set to zero
- Maximum sum of |βstandardized|
- How to determine this maximum?
Trial & error (e.g. cross validation/AIC)

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Questions?
Result of shrinkage

Steyerberg EW. Clinical prediction models, chapter 13. Springer, 2009


Differences in amount of shrinkage
Shrinking afterwards
• Same shrinkage factor for every predictor

PML
• Different shrinkage for different predictors
• Unstable predictors are shrunk the most

Lasso
• Also selection of predictors (less is more)
Differences in performance
Case study by Steyerberg et al.
• GUSTO-I study
• Mortality at 30 days
• 8 predictor model
• Subsample: 336 patients, 20 cases
• Total sample: 20512 patients, 1435 cases

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Differences in obtained betas

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


1.09 1.01

0.90

0.68

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Differences in performance

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Question

Can you explain why shrinkage methods make


predictions more reliable, and affect calibration,
but not discrimination?
Conclusion case study
Shrinkage in small datasets
• No shrinkage: poor performance
• No/minor advantage on discrimination
• Major improvement of calibration
• No major differences between methods

Steyerberg EW e.a. Stat Neerl 2001;55:76-88


Software differences
Shrinking afterwards (relatively easy)
• Formula: all software
• Bootstrapping: SAS/Stata/R

PML & Lasso (advanced)


• Only in R
Questions?
Conclusion
Shrinkage is necessary in small datasets
or with many predictors

(like studies on molecular markers)

But…
Clinical practice

Vickers e.a. Cancer 2008;112(8):1862-8


Conclusion
Shrinkage is necessary in small datasets
• When? >1 / 20 events
• How? Different methods available
• All result in better performance
• Techniques: simple to difficult

Reminder: make less data driven decisions


Questions?
Differences in performance
Stimulation study by Vach et al.
Lasso
• Overcorrects larger effects
• Cautious in removing variables

Formula
• Good estimation of shrinkage
• No variable selection
Vach K e.a. Stat Neerl 2001;55:53-75

You might also like