0% found this document useful (0 votes)

15 views66 pages

BDA Lecture 11a

The document discusses variable selection using projpred, emphasizing that while comparing 2-3 models is sufficient for a project, the number of potential models increases exponentially with the number of variables. It recommends using brms and projpred to avoid overfitting in model selection. The document also references historical approaches to model choice and provides examples of simulated regression for illustration.

Uploaded by

marius.boda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views66 pages

BDA Lecture 11a

Uploaded by

marius.boda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Variable selection with projpred

• In your project it is sufficient to compare 2–3 models

1 / 22
Variable selection with projpred

• In your project it is sufficient to compare 2–3 models

• ...but if you are interested in variable selection, then the number
of potential models is 2p , where p is the number of variables

1 / 22
Variable selection with projpred

• In your project it is sufficient to compare 2–3 models

• ...but if you are interested in variable selection, then the number
of potential models is 2p , where p is the number of variables
• ...in such case I recommended to use brms + projpred

1 / 22
Variable selection with projpred

• In your project it is sufficient to compare 2–3 models

• ...but if you are interested in variable selection, then the number
of potential models is 2p , where p is the number of variables
• ...in such case I recommended to use brms + projpred
• projpred avoids the overfit in model selection

1 / 22
Use of reference models in model selection

• Background
• First example
• Bayesian and decision theoretical justification
• More examples

2 / 22
Not a novel idea

• Lindley (1968): The choice of variables in multiple regression

• Bayesian and decision theoretical justification, but simplified
model and computation

3 / 22
Not a novel idea

• Lindley (1968): The choice of variables in multiple regression

3 / 22
Not a novel idea

• Lindley (1968): The choice of variables in multiple regression

• Bayesian and decision theoretical justification, but simplified
model and computation
• Goutis & Robert (1998): Model choice in generalised linear
models: a Bayesian approach via Kullback-Leibler projections
• one key part for practical computation
• Related approaches
• gold standard, preconditioning, teacher and student, distilling, . . .

3 / 22
Not a novel idea

• Lindley (1968): The choice of variables in multiple regression

3 / 22
Example: Simulated regression

f ∼ N(0, 1),
y | f ∼ N(f , 1)

2 ●

●
●
1 ●

● ●
● ●
●
0 ●
●
●●
● ●● ●
y

●
●

● ●
−1 ● ●
● ● ●
●

−2
● ●

−3 −2 −1 0 1 2 3
f
4 / 22
Example: Simulated regression
√
f ∼ N(0, 1), xj | f ∼ N( 𝜌f , 1 − 𝜌), j = 1, . . . , 150 ,
y | f ∼ N(f , 1) xj | f ∼ N(0, 1), j = 151, . . . , 500 .

4 / 22
Example: Simulated regression
√
f ∼ N(0, 1), xj | f ∼ N( 𝜌f , 1 − 𝜌), j = 1, . . . , 150 ,
y | f ∼ N(f , 1) xj | f ∼ N(0, 1), j = 151, . . . , 500 .

2 ●

●
●
1 ●

● ●
● ●
●
0 ●●
●● ● ●●
●
y

●
●

●
●
−1 ● ●
● ● ●
●

−2
● ●

−3 −2 −1 0 1 2 3
x[,?]
4 / 22
Example: Simulated regression
√
f ∼ N(0, 1), xj | f ∼ N( 𝜌f , 1 − 𝜌), j = 1, . . . , 150 ,
y | f ∼ N(f , 1) xj | f ∼ N(0, 1), j = 151, . . . , 500 .

2 ●

●
●
1 ●

● ●
● ●
●
0 ●
●
●● ● ●● ●
y

●
●

● ●
−1 ● ●
● ●
●
●

−2
● ●

2 ●

●
●
1 ●

● ●
● ●
●
0 ●
● ●
● ●
●●
●
y

●
●

● ●
−1 ● ●
● ● ●
●

−2
● ●

2 ●

●
●
1 ●

●●
● ●
●
0 ● ●●
●● ●
●
y

●
●
●

● ●
−1 ● ●
● ● ●
●

−2
● ●

2 ●

●
●
1 ●

● ●
● ●
●
0 ●
● ● ● ●
●
●
y

●
●

● ●
−1 ● ●
● ● ●
●

−2
● ●

2 ●

●
●
1 ●

● ●
●
●
●
0 ● ● ● ●
●
●
●
y

●
●
●

● ●
−1 ● ●
● ● ●
●

−2
● ●

2 ●

●
●
1 ●

● ●
● ●
●
0 ● ● ●●
●
● ●
y

●
●
●

●●
−1 ● ●
● ● ●
●

−2
● ●

2 ●

●
●
1 ●

● ●
● ●
●
0 ●
●
●
●●
● ●
y

●
●

● ●
−1 ● ●
● ● ●
●

−2
● ●

−3 −2 −1 0 1 2 3
x[,?]
4 / 22
Example: Individual correlations
√
f ∼ N(0, 1), xj | f ∼ N( 𝜌f , 1 − 𝜌), j = 1, . . . , 150 ,
y | f ∼ N(f , 1) xj | f ∼ N(0, 1), j = 151, . . . , 500 .

Correlation for xj , y
1.00

0.75
●
●
●●
● ●
●
|R(xj, y)|

●● ●
● ●
● ● ● ● ●●● ●● ●
●● ● ●● ● ● ● ● ●●
●● ● ●
● ● ● ●
0.50 ● ●
● ● ●● ●
● ●
●
● ●● ●
●
● ●●
● ● ●●● ●
●● ●
●

● ● ● ● ●●
● ●● ● ●●
●● ● ●●
●● ●● ●● ●●● ● ●● ●
●
● ● ●●●●
● ● ● ● ●
●
●● ● ●● ● ● ●
●
●
● ● ●
●● ● ● ●● ●● ●● ● ● ● ●
● ●● ● ● ● ● ●
●● ● ● ● ●● ● ● ● ● ● ●
● ● ●
● ●●● ● ●● ●● ●● ● ●
●● ● ● ● ● ● ● ●
● ● ● ● ● ●● ● ● ●
●
0.25 ● ●
●
●
●
●●
●● ● ●
● ● ● ● ●● ●
● ●
● ●
● ●● ● ●
●●
●
●
● ● ● ●
●●
● ●
●● ● ●●
● ● ●● ● ● ●● ● ●● ● ●● ● ●●● ●
● ● ● ● ● ● ●●● ● ●
● ● ● ●●
●●● ●
● ● ● ●● ● ●● ● ● ● ●
● ● ●●● ● ●● ●● ● ● ●●●●●
●●
●● ●● ● ● ● ●●●● ●●
●● ● ● ● ●● ●● ● ● ●
● ● ● ● ● ● ● ●● ●
● ● ● ●● ● ● ● ● ● ●
● ● ●
● ●● ● ● ● ●● ●● ●● ●
●● ● ● ●
● ● ● ● ● ●● ●● ● ●●● ● ● ●●● ● ● ● ● ● ●● ● ● ●●
●●●●●
● ●●● ● ●
●● ● ●● ● ● ●
● ●● ●● ●●
●● ●
● ●
● ●● ●● ●● ● ● ●●● ● ●●● ●
●
●● ●● ● ●●● ●●● ●● ●●
0.00 ●●●● ●● ● ● ● ●●

0 100 200 300 400 500

variable index j
5 / 22
Example: Individual correlations
√
f ∼ N(0, 1), xj | f ∼ N( 𝜌f , 1 − 𝜌), j = 1, . . . , 150 ,
y | f ∼ N(f , 1) xj | f ∼ N(0, 1), j = 151, . . . , 500 .

Correlation for xj , y
1.00

0.75
●
●
● ● ●
● ●
|R(xj, y)|

● ●
● ● ● ●
● ● ●● ● ● ● ● ● ● ● ●
●
● ● ● ● ● ● ● ●
● ● ● ● ●● ● ●
0.50 ● ●
●
●●● ●
● ●●
●
●
● ●
● ●
●
● ● ●● ●● ●
●
● ●
●
●● ●● ●● ● ● ● ● ●
●
● ● ● ●
● ●● ●● ●
● ● ●●● ● ●
●
● ● ● ● ●● ●
● ●● ● ● ● ●●
● ● ●● ●●
● ●
● ●● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ●● ● ● ●
● ●● ● ●
●● ●
●
●● ● ● ● ● ● ●● ●● ●
●
●●● ●● ●
● ● ●● ●● ● ●
● ●● ●● ● ●● ● ●● ● ●● ●
0.25 ●
● ●● ●
●●
● ●●
●● ● ● ● ●
●
●
●
●
●●
● ●
● ●
●
●● ● ● ● ●
●
●●● ● ● ●●●●● ●● ●● ● ● ● ● ● ● ● ● ●
●●
● ●
●● ● ● ● ● ● ● ● ●
● ● ● ●●● ●
● ●
●
● ●
● ● ● ● ●● ● ●●
● ● ● ●
● ●
● ● ●● ● ● ●● ● ●
● ●● ●● ●
● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ●● ● ● ● ● ● ●● ●
●
●
● ● ● ● ● ●
● ● ● ● ●
● ●● ● ● ● ● ● ●● ● ● ● ●
●● ● ● ● ● ●
● ● ● ● ●●● ● ● ● ●● ● ●● ●
●
●●
● ● ●● ●● ● ● ●● ●● ●● ●● ●● ● ●● ●
●● ● ●● ● ● ● ●●● ●
● ● ●● ● ● ● ●● ●● ● ● ● ●●●
●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●●●
● ●
●● ● ● ● ● ● ● ● ●
0.00 ●● ● ●● ● ● ●

0 100 200 300 400 500

randomized variable index
5 / 22
Example: Individual correlations
√
f ∼ N(0, 1), xj | f ∼ N( 𝜌f , 1 − 𝜌), j = 1, . . . , 150 ,
y | f ∼ N(f , 1) xj | f ∼ N(0, 1), j = 151, . . . , 500 .

Correlation for xj , f
1.00
●
●
● ●
● ● ●● ●
● ● ● ● ●● ● ●● ● ●
● ● ●
●
●● ●
●●● ● ● ●● ●
0.75 ●●● ●
●●
●
● ●
●
●
● ●
●●
● ●
● ●
●
●
● ● ●●
●
●
●
●
●●
● ●● ● ●● ●● ● ● ● ● ●
● ● ● ●● ● ● ●
● ● ● ● ● ●●
●
● ● ●
● ● ● ●
●● ●● ●● ●● ● ● ●
● ● ● ● ● ● ●● ●
|R(xj, f)|

● ● ● ●
● ●● ● ●
● ●
● ● ● ●
0.50 ● ●
● ●
● ●● ●
●
● ●
●
● ●
● ●
● ●
●● ● ●
● ● ●
●● ●
● ● ●
●
●● ● ● ●
● ● ● ● ●
● ● ● ● ● ●
● ●● ●● ● ● ● ●
● ● ● ● ● ●● ● ●
● ● ● ●
●●●● ●
●
0.25 ●
●● ●
●
●
●
● ●●
●
●●
● ● ●●
● ● ●
● ●
● ● ● ●
●
●
●● ●
●
●
●
● ●● ● ● ● ●● ●● ●● ● ● ● ●●
●● ● ● ● ● ● ● ● ●●
● ●
●
● ● ●● ●● ● ● ● ● ● ● ●● ●
●● ● ● ● ● ●
●●
●●● ●● ● ●
● ● ● ●●
● ● ●●● ● ●● ●● ● ●
●● ● ● ● ●● ●● ●
●●● ●● ● ● ● ●●●● ●●● ● ●
●● ●● ● ● ●● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ●● ●● ● ●● ●
● ● ●● ●
● ● ●●
●● ●● ●● ● ●
●● ● ●
●●
● ● ● ●● ●●
● ● ●● ● ● ●
● ●●● ● ● ● ●● ●● ● ●● ●● ● ●
●● ●● ● ● ●● ●● ●
●● ●
● ●
● ●
● ● ● ● ● ● ●● ● ● ●● ● ●
● ●
●
● ● ● ●● ● ● ●
●
0.00 ● ● ● ● ●●

0 100 200 300 400 500

Correlation for xj , f∗ (f∗ = PCA + linear regression)

1.00
●
● ●
● ● ●
● ● ●● ●●● ● ●
● ●●
0.75 ●
●
●
● ● ●
●
●● ●
●
● ● ●
● ●●
●
●
●●
●
●
●
●
● ●●●
●●
● ●●
● ● ● ● ●● ●
● ● ● ● ● ● ●● ● ● ● ●
● ● ●● ●● ● ●● ● ● ● ●
●
● ● ● ● ●●
●
● ●● ●●
●
● ●● ●●
● ●
|R(xj, f∗)|

● ● ●● ●● ● ●●
●
●● ● ● ●
● ●●●
● ● ● ●
● ● ●
● ●●
● ● ●
0.50 ●
●
●●
●
●
● ● ●
●
● ●
●
● ● ● ● ●● ●
● ● ● ● ●
● ● ●● ● ● ●
● ●● ● ● ● ●
● ● ●● ●●
● ● ● ● ● ●
●
● ● ●
● ● ● ●
● ● ● ● ●
● ●● ● ● ● ● ●● ● ●
0.25 ●
● ● ● ●●
●
●
●
●
●
● ●
●● ●
●
● ● ● ●●
●
● ●
● ●
●
● ● ● ● ● ● ●
● ● ● ●● ● ● ●
● ● ● ● ● ●● ● ●●● ●● ● ●
● ● ● ● ● ●● ● ● ● ● ●
● ●● ● ● ● ●● ● ●
●● ●
● ● ●●● ● ●
●●
● ● ●
●● ●
● ●
● ● ●● ●
● ●● ● ●● ●
● ●●●
● ●●●● ● ● ● ●● ● ● ● ● ● ●● ● ●
● ● ●●●
●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●
● ● ●
● ● ●
●● ●●●●● ● ●● ● ●● ● ●● ● ●● ●●● ● ●●
● ●●
●
● ● ● ●● ●
● ●● ● ● ● ● ●● ●●●●● ● ● ● ● ●
● ● ●● ● ● ● ●● ● ●
● ●● ● ● ●● ●●●● ● ●●● ●●●● ● ● ● ●● ● ●● ● ● ●●● ●
● ●
●
0.00 ● ● ● ● ● ● ● ● ● ●●

0 100 200 300 400 500

randomized variable index
5 / 22
Knowing the latent values would help

●
●● ●● ●●●● ● ●● ●●
●
●● ●● ●●●
●● ● ●● ●
● ●●● ● ●●
● ●●●● ●● ●
●●●● ●●●● ●●●●● ●
● ●● ● ●
●●● ● ●
●● ● ● ●●●
0.75 ●
● ● ●●●●
● ● ● ●● ●
●●
●●
●● ●
●
●● ●
●● ●
●
●
● ●●
●● ● ●
●
● ●

● ● ●●● ●
● ●● ●●●●
●● ● ●●●●● ●
●●● ●
●
● ● ●
●
|R(xj,f)|

● ● ●
●
0.50 ●●
● ●

●
● ●●
●
● ● ●● ● ●
● ●● ●
● ●● ● ●
●● ● ● ● ● ●
● ●● ● ● ●●● ●● ●
● ● ● ● ●● ●
●
●
●●
●
0.25 ● ●● ●● ●
●
●
●● ● ● ●● ●●●●●●● ●
●
●●● ● ● ●● ●● ● ●
● ●
●
●
●
●●● ● ●●
●
● ●●●●● ●●●●●● ● ●
●
●●●●●●● ● ● ●●● ●●● ●● ●
●● ●● ●●
●
●●●
●● ●● ● ●●●● ●● ●● ●
● ●
●●●●●
● ● ●●● ●●●
● ●● ●● ●
●● ●
● ●●
●
● ●●●●●● ●●● ●● ●● ●
●
●● ●● ●●● ●●●
● ●●●● ●●●● ●
● ●● ● ●●● ●●
● ● ● ●●● ●
●●●●●
● ● ●●
●● ●●●●
●
● ● ●● ●
●●●●●●●●●
●
●●●
●
● ●
●●● ●●●●● ● ● ● ● ●
● ●
●● ● ●
●● ●●●
●●● ● ●●●●● ● ●
0.00 ●
● ●
●●●
● ●

0.00 0.25 0.50 0.75

|R(xj,y)|

irrelevant xj , relevant xj
A) Sample correlation with y vs. sample correlation with f

6 / 22
Estimating the latent values with a reference model helps

●
●● ●● ●●●● ● ●● ●● ●
●
●
●● ●● ●●● ●
● ●● ●● ●
●● ● ●● ●
● ●●● ● ●● ● ●●●
●●● ●● ● ● ●● ● ●● ●
● ● ●● ● ●●●●●●●● ● ●●● ● ●●● ● ●
● ●●
●● ● ● ● ● ●●● ●● ●●●●● ●
●●● ● ●
●● ● ● ●●● ● ● ● ● ●● ●●●
●
0.75 ●
● ● ●●●●
● ● ● ●● ●
●●
●●
●● ●
●
●● ●
●● ●
●
●
● ●●
●● ● ●
●
● ●
0.75 ●
●●● ●
● ●
●
● ●●●●● ●●
●● ● ●
● ●
●●● ●● ●●● ●
●●
●
●
●
● ● ●●● ● ● ●●●●●
●● ●● ● ●●● ●
● ●● ●●●●
●● ● ●●●●● ● ●
● ● ●● ● ●●
●
●●● ● ●●● ● ● ● ●● ●●
● ●
●● ●●●●
● ● ● ● ●● ● ● ●
● ●●

|R(xj,f∗)|
● ● ●
|R(xj,f)|

● ● ● ●
● ● ● ●
●
0.50 ●●
● ●
0.50 ●
●
● ● ● ●
● ●● ● ●●● ●● ●●
●
● ● ●● ● ● ●
●● ● ●● ● ● ● ●●
● ●
●● ● ● ● ● ● ●● ● ● ●
●● ●● ●● ●
●●●●
●● ● ● ● ● ● ●●●
●
● ●
● ●● ● ● ●●● ●● ● ● ●
● ● ● ● ●● ●
● ●● ●
● ● ● ● ●●
● ● ● ● ●●●●● ●●
0.25 ● ●● ●● ●
●
●
●● ● ● ●● ●●●●●●● ●
●
●●● ●● ●● ● ●
● ●
●
●
● 0.25 ●●●
●●● ●
●●●●
● ●
●●
●●● ● ● ●
● ●
●
●●●●● ●
●
●
●● ●
●
● ●● ●● ● ●
●
● ● ●
● ● ●● ● ●● ●● ● ● ●● ●
●● ●●●●● ●●●●●● ● ● ●
● ● ●● ●● ● ●
●●●●●●● ● ● ●●● ●●● ●● ●
● ●●
● ●● ●●●● ● ●●●
●● ●● ●●
●
●●●
●● ●● ● ●●●● ●● ●● ● ●●●●●
●
● ●●●●● ● ● ●
●●
●● ●●● ● ●
●●●
●
●
●●● ●●● ●
●● ●● ●● ●●
● ●● ● ●
● ● ●● ● ●●
●● ●●●
● ●●●●●● ●● ● ● ●●●●●●●●●●
●●
●●● ● ● ●
●● ●
●●●●●
●● ●●
● ●
●●●● ●●
● ●●●●●
●
●
●●
● ●
●●●● ●
●
●● ●●
●
●
●●● ●● ● ●
●
● ●● ●●●● ●
●●
● ● ● ● ● ●●
● ● ●
●●●●●
● ● ●
●●
●● ●●●●
●
● ●● ● ●
●● ●●● ● ●
●
●●
●●
●
●
● ●●
●
●
● ●
●●●●●● ●● ●
●●●●● ●●
●●●●● ● ●
●●●
● ●●● ●●●●● ● ● ● ●
●● ●●
●
●
●●●● ●●●● ●
●●● ●●
● ● ●
●●●
●●● ●
● ● ● ●●●●●●●●●●●●● ●● ●● ●● ●
●●
● ●
●●●
●●● ● ●●●●● ● ● ●●● ●●
●● ●
●● ● ●●●
0.00 ●
● ●
●●●
● ●
0.00 ●
● ●

0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75

|R(xj,y)| |R(xj,y)|

irrelevant xj , relevant xj
A) Sample correlation with y vs. sample correlation with f
B) Sample correlation with y vs. sample correlation with f∗
f∗ = linear regression fit with 3 principal components

6 / 22
Bayesian justification

• Theory says to integrate over all the uncertainties

• build a rich model
• make model checking etc.
• this model can be the reference model

7 / 22
Bayesian justification

• Theory says to integrate over all the uncertainties

• build a rich model
• make model checking etc.
• this model can be the reference model
• Consider model selection as decision problem

7 / 22
Bayesian justification

• Theory says to integrate over all the uncertainties

7 / 22
Bayesian justification

• Theory says to integrate over all the uncertainties

• build a rich model
• make model checking etc.
• this model can be the reference model
• Consider model selection as decision problem
• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so
that the predictive distribution changes as little as possible
• Example constraints
• q(𝜃) can have only point mass at some 𝜃 0
⇒ “Optimal point estimates”

7 / 22
Bayesian justification

• Theory says to integrate over all the uncertainties

7 / 22
Bayesian justification

• Theory says to integrate over all the uncertainties

7 / 22
Logistic regression with two covariates

Posterior Predictions

2
20

1
15

10
β2

x2
0

5
-1
0

-2
0 5 10 15 20 -2 -1 0 1 2
β1 x1

Full posterior for 𝛽1 and 𝛽2 and contours of predicted class probabil-

ity

8 / 22
Logistic regression with two covariates

Posterior Predictions

2
20

1
15

10
β2

x2
0

5
-1
0

-2
0 5 10 15 20 -2 -1 0 1 2
β1 x1

Projected point estimates for 𝛽1 and 𝛽2

8 / 22
Logistic regression with two covariates

Posterior Predictions

2
20

1
15

10
β2

x2
0

5
-1
0

-2
0 5 10 15 20 -2 -1 0 1 2
β1 x1

Projected point estimates, constraint 𝛽1 = 0

8 / 22
Logistic regression with two covariates

Posterior Predictions

2
20

1
15

10
β2

x2
0

5
-1
0

-2
0 5 10 15 20 -2 -1 0 1 2
β1 x1

Projected point estimates, constraint 𝛽2 = 0

8 / 22
Logistic regression with two covariates

Posterior Predictions

2
20

1
15

10
β2

x2
0

5
-1
0

-2
0 5 10 15 20 -2 -1 0 1 2
β1 x1

Draw-by-draw projection, constraint 𝛽1 = 0

8 / 22
Logistic regression with two covariates

Posterior Predictions

2
20

1
15

10
β2

x2
0

5
-1
0

-2
0 5 10 15 20 -2 -1 0 1 2
β1 x1

Draw-by-draw projection, constraint 𝛽2 = 0

8 / 22
Predictive projection

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

that the predictive distribution changes as little as possible

9 / 22
Predictive projection

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

9 / 22
Predictive projection

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

that the predictive distribution changes as little as possible
• As the full posterior p(𝜃 | D) is projected to q(𝜃)
• the prior is also projected and there is no need to define priors
for submodels separately
• even if we constrain some coefficients to be 0, the predictive
inference is conditoned on the information related features
contributed to the reference model

9 / 22
Predictive projection

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

9 / 22
Projective selection
• How to select a feature combination?

10 / 22
Projective selection
• How to select a feature combination?
• For a given model size, choose feature combination with
minimal projective loss

10 / 22
Projective selection
• How to select a feature combination?
• For a given model size, choose feature combination with
minimal projective loss
• Search heuristics, e.g.
• Monte Carlo search
• Forward search
• L1 -penalization (as in Lasso)
• Use cross-validation to select the appropriate model size
• need to cross-validate over the search paths

10 / 22
Projective selection vs. Lasso
Same simulated regression data as before,
n = 50, p = 500, prel = 150, 𝜌 = 0.5

Reference model
2.00
●
Lasso
Mean squared error

1.75
●

1.50 ●

●
● ●
● ●
● ●
● ● ●
● ● ● ● ● ● ●

1.25

0 5 10 15 20 25
Number of covariates

11 / 22
Projective selection vs. Lasso
Same simulated regression data as before,
n = 50, p = 500, prel = 150, 𝜌 = 0.5

Reference model
2.00
●
Lasso
Lasso, relaxed
Mean squared error

1.75
●

●
●
●

●
● ●
● ●
●
●
●

●
●
●
●

1.50 ●

●
●

●
●
● ●

●
●
●
●
●
● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ● ●

1.25

0 5 10 15 20 25
Number of covariates

11 / 22
Projective selection vs. Lasso
Same simulated regression data as before,
n = 50, p = 500, prel = 150, 𝜌 = 0.5

Reference model
Lasso
2.00
●

Lasso, relaxed
Projection
Mean squared error

1.75
●

●
●
●

●
● ●
● ●
●
●
●

●
●
●
●

● ●

1.50 ●

●
●

●
●
● ●

●
● ●
●
●
●
● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ● ●
●

1.25 ●

●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●

0 5 10 15 20 25
Number of covariates

11 / 22
Projective selection vs. Lasso
Same simulated regression data as before,
n = 50, p = 500, prel = 150, 𝜌 = 0.5

Reference model
Lasso
2.00
●
● ● ● ●

Lasso, relaxed ● ● ● ● ● ● ●
●

−1.5 ● ●
●
●
●
●

Log predictive density

Projection ●
●
Mean squared error

●
●

1.75
●
●
● ● ●

−1.6
●
●
● ●
● ● ●
●
●
●
● ●
● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
● ●

● ● ● ●
● ●
● ●

1.50 ●

●
●

−1.7 ●
●

●
●
● ●
●
● ●

●
● ● ●
●
●
● ●
● ●
● ● ●
● ●
● ● ● ●
● ● ● ● ● ● ● ●
● ●
●

1.25 ●

● −1.8
●
●
●
●
●
● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●

0 5 10 15 20 25 0 5 10
● 15 20 25
Number of covariates Number of covariates

11 / 22
Bodyfat: small p example of projection predictive

Predict bodyfat percentage. The reference value is obtained by

immersing person in water. n = 251.

12 / 22
Bodyfat: small p example of projection predictive

Predict bodyfat percentage. The reference value is obtained by

immersing person in water. n = 251.

abdomen

forearm
biceps
weight
height

chest

ankle
thigh
knee
neck

wrist
age

hip
siri

1
siri ● ● ● ● ●●● ● ● ● ● ● ●
●

age ●● ● ●
● ● ●
● ● ●
● ● ● ● 0.8
weight ● ● ● ● ●●●● ● ● ● ● ●
●

0.6
height ●
● ●● ● ● ● ● ● ● ● ● ● ●
neck ● ● ● ●● ● ● ● ● ● ● ● ●
● 0.4
chest ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.2
abdomen ● ● ● ● ● ●● ● ● ● ● ● ● ● 0
hip ● ● ● ● ● ●●● ● ● ● ● ●
●

thigh ● ● ● ● ● ● ● ●● ● ● ● ● ● −0.2
knee ● ● ● ● ● ● ● ●● ● ● ● ●
●

−0.4
ankle ● ● ● ● ● ● ● ● ●● ● ● ●
●

biceps ● ● ● ● ● ● ● ● ● ● ●● ●
●
−0.6
forearm ● ● ● ● ● ● ● ● ● ● ●● ●
●
−0.8
wrist ● ● ● ● ● ● ● ● ● ● ● ● ●●
−1
12 / 22
Bodyfat

Marginal posteriors of coefficients

age
weight
height
neck
chest
abdomen
hip
thigh
knee
ankle
biceps
forearm
wrist

−5 0 5 10
13 / 22
Bodyfat

Bivariate marginal of weight and height

4
weight

−4

−2 −1 0 1
height
14 / 22
Bodyfat

The predictive performance of the full and submodels

0 ● ● ●
●
●
● ● ● ● ● ● ●

●
Difference to the baseline

−50

elpd
−100

−150 ●

4 ●

rmse
2
1
●
● ● ●
0 ● ● ● ● ● ● ● ● ●

0 3 6 9 12 15
Number of variables in the submodel
15 / 22
Bodyfat

Marginals of the reference and projected posterior

age
weight
height
neck
chest abdomen
abdomen
hip
thigh
knee
ankle
weight
biceps
forearm
wrist

−5 0 5 10 −5 0 5 10

16 / 22
Predictive performance vs. selected variables

• The initial aim: find the minimal set of variables providing similar
predictive performance as the reference model

17 / 22
Predictive performance vs. selected variables

• The initial aim: find the minimal set of variables providing similar
predictive performance as the reference model
• Some keep asking can it find the true variables

17 / 22
Predictive performance vs. selected variables

• The initial aim: find the minimal set of variables providing similar
predictive performance as the reference model
• Some keep asking can it find the true variables
• What do you mean by true variables?
abdomen

forearm
biceps
weight
height

chest

ankle
thigh
knee
neck

wrist
age

hip
siri

1
siri ● ● ● ● ●●● ● ● ● ● ● ●
●

age ●● ● ●
● ● ●● ● ●
●
● ● ● 0.8
weight ● ● ● ● ●●●● ● ● ● ● ●
●

0.6
height ●
● ●● ● ● ● ● ● ● ● ● ● ●
neck ● ● ● ●● ● ● ● ● ● ● ● ●
● 0.4
chest ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.2 abdomen
abdomen ● ● ● ● ● ●● ● ● ● ● ● ● ● 0
hip ● ● ● ● ● ●●● ● ● ● ● ●
●

thigh ● ● ● ● ● ● ● ●● ● ● ● ● ● −0.2
knee ● ● ● ● ● ● ● ●● ● ● ● ●
●

−0.4
ankle ● ● ● ● ● ● ● ● ●● ● ● ●
●
weight
biceps ● ● ● ● ● ● ● ● ● ● ●● ●
●
−0.6
forearm ● ● ● ● ● ● ● ● ● ● ●● ●
●
−0.8
wrist ● ● ● ● ● ● ● ● ● ● ● ● ●●
−1
−5 0 5 10

17 / 22
Variability under data perturbation

Comparing projection predictive variable selection (projpred) and

stepwise maximum likelihood over bootstrapped datasets
100%

75%

50% projpred
steplm

25%

0%
abdomen weight wrist height age neck chest biceps thigh ankle forearm hip knee

18 / 22
Variability under data perturbation

Comparing projection predictive variable selection (projpred) and

stepwise maximum likelihood over bootstrapped datasets
100%

75%

50% projpred
steplm

25%

0%
abdomen weight wrist height age neck chest biceps thigh ankle forearm hip knee

18 / 22
Variability under data perturbation

Comparing projection predictive variable selection (projpred) and

stepwise maximum likelihood over bootstrapped datasets
100%

75%

50% projpred
steplm

25%

0%
abdomen weight wrist height age neck chest biceps thigh ankle forearm hip knee

• Reduced variability, but in case of noisy finite data, there will be

some variability under data perturbation

19 / 22
Variability under data perturbation

Comparing projection predictive variable selection (projpred) and

stepwise maximum likelihood over bootstrapped datasets
100%

75%

50% projpred
steplm

25%

0%
abdomen weight wrist height age neck chest biceps thigh ankle forearm hip knee

• Reduced variability, but in case of noisy finite data, there will be

some variability under data perturbation
• projpred uses
• Bayesian inference for the reference
• The reference model
• Projection for submodel inference

19 / 22
Variability under data perturbation

Comparing projection predictive variable selection (projpred) and

stepwise maximum likelihood over bootstrapped datasets
100%

75%

50% projpred
steplm

25%

0%
abdomen weight wrist height age neck chest biceps thigh ankle forearm hip knee

• Reduced variability, but in case of noisy finite data, there will be

some variability under data perturbation
• projpred uses
• Bayesian inference for the reference
• The reference model
• Projection for submodel inference

19 / 22
Multilevel regerssion and GAMMs

• projpred supports also hierarchical models in brms

Catalina, Bürkner, and Vehtari (2022). Projection predictive inference
for generalized linear and additive multilevel models. Proceedings of
the 24th International Conference on Artificial Intelligence and
Statistics (AISTATS), PMLR 151:4446–4461.
https://proceedings.mlr.press/v151/catalina22a.html

20 / 22
Scaling

• So far the biggest number of variables we’ve tested is 22K

• 96s for creating a reference model
• 14s for projection predictive variable selection

21 / 22
Intro paper and brms and rstanarm + projpred examples

• McLatchie, Rögnvaldsson, Weber, and Aki Vehtari (2024). Advances

in projection predictive inference. Statistical Science.
https://arxiv.org/abs/2306.15581
• https://mc-stan.org/projpred/articles/projpred.html
• https://users.aalto.fi/~ave/casestudies.html
• Fast and often sufficient if n ≫ p
varsel <- cv_varsel(fit, method='forward', cv_method='loo',
validate_search=FALSE)
• Slower but needed if not n ≫ p
varsel <- cv_varsel(fit, method='forward', cv_method='kfold', K=10,
validate_search=TRUE)
• If p is very big
varsel <- cv_varsel(fit, method='L1', cv_method='kfold', K=5,
validate_search=TRUE)

22 / 22

Rav4 Distribucion
100% (2)
Rav4 Distribucion
50 pages
Lifting Gear For Roller Guide: Note!
No ratings yet
Lifting Gear For Roller Guide: Note!
4 pages
Linear Regression Models 2018
No ratings yet
Linear Regression Models 2018
68 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Linear Review 1
No ratings yet
Linear Review 1
235 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Updated Module2 - OTML Updated
No ratings yet
Updated Module2 - OTML Updated
83 pages
(Ebook) Graybill & Iyer 2004 Regression Analysis - Concepts & Applications - With SAS & Minitab
No ratings yet
(Ebook) Graybill & Iyer 2004 Regression Analysis - Concepts & Applications - With SAS & Minitab
648 pages
Logistic Regression
0% (1)
Logistic Regression
71 pages
Chap 5
No ratings yet
Chap 5
13 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
No ratings yet
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
38 pages
Least Squares Curve Fitting: Numerical Methods
No ratings yet
Least Squares Curve Fitting: Numerical Methods
39 pages
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
3 pages
Lecture 7 - Introduction To Kriging
No ratings yet
Lecture 7 - Introduction To Kriging
52 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
LDPC Codes
No ratings yet
LDPC Codes
3 pages
Theoretical and Experimental Determination of Cell Constants of Planar-Interdigitated Electrolyte Conductivity Sensors
No ratings yet
Theoretical and Experimental Determination of Cell Constants of Planar-Interdigitated Electrolyte Conductivity Sensors
5 pages
Unit III Regression
No ratings yet
Unit III Regression
24 pages
Ch5 Slide VariableSelection
No ratings yet
Ch5 Slide VariableSelection
36 pages
Theo Assignment 2 New
No ratings yet
Theo Assignment 2 New
10 pages
Unit 2&3 - 250421 - 215911
No ratings yet
Unit 2&3 - 250421 - 215911
60 pages
Tricks and Treats For "CST Programmers": Amit Rappel, Itzik Haimov
No ratings yet
Tricks and Treats For "CST Programmers": Amit Rappel, Itzik Haimov
24 pages
ML Ques Mod-1
No ratings yet
ML Ques Mod-1
25 pages
A Coding Style Guide For Java WorkShop and Java Studio Programming - Achut Reddy
No ratings yet
A Coding Style Guide For Java WorkShop and Java Studio Programming - Achut Reddy
35 pages
An Asymptotic Theory For Linear Model Selection
No ratings yet
An Asymptotic Theory For Linear Model Selection
44 pages
Chemistry for Students
No ratings yet
Chemistry for Students
17 pages
Grade 8 Informal Activities For Algebraic Expressions Teacher Guide
No ratings yet
Grade 8 Informal Activities For Algebraic Expressions Teacher Guide
31 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
SST307 Complete
No ratings yet
SST307 Complete
72 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
3.2 Least Square and Polynomial Regression
No ratings yet
3.2 Least Square and Polynomial Regression
39 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Model Choice and Specification Analysis
No ratings yet
Model Choice and Specification Analysis
46 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
CISE301-Topic 3 Curve Fitting
No ratings yet
CISE301-Topic 3 Curve Fitting
38 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
Basic Hydrology Report
No ratings yet
Basic Hydrology Report
6 pages
Da Unit-Iii
No ratings yet
Da Unit-Iii
14 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
MAFE208IU-L6 - Least Squares Regression
No ratings yet
MAFE208IU-L6 - Least Squares Regression
45 pages
Sample Solution
No ratings yet
Sample Solution
4 pages
Statistics and Probability Quiz
No ratings yet
Statistics and Probability Quiz
6 pages
Lecture 12 - Adv. Correlation and Multiple Regression
No ratings yet
Lecture 12 - Adv. Correlation and Multiple Regression
32 pages
A Simple Technique For The Generation of Correlated Random Number Sequences
No ratings yet
A Simple Technique For The Generation of Correlated Random Number Sequences
7 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Lecture Notes On Ridge Regression
No ratings yet
Lecture Notes On Ridge Regression
113 pages
Regression Analysis Course Notes
No ratings yet
Regression Analysis Course Notes
73 pages
Pseudocode and Flow Charts
100% (1)
Pseudocode and Flow Charts
42 pages
Syllabus Musi1311
No ratings yet
Syllabus Musi1311
4 pages
Predictive Analytics Primer
No ratings yet
Predictive Analytics Primer
66 pages
Statistical Analysis for Researchers
No ratings yet
Statistical Analysis for Researchers
11 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
A Novel Bayesian Approach For Variable Selection in Linear Regression Models
No ratings yet
A Novel Bayesian Approach For Variable Selection in Linear Regression Models
24 pages
Guerra de La Corte, Adrián TFG
No ratings yet
Guerra de La Corte, Adrián TFG
90 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Ccnet 10f Lec02 ch2
No ratings yet
Ccnet 10f Lec02 ch2
42 pages
Ra Web
No ratings yet
Ra Web
70 pages
MIT18 650F16 Regression
No ratings yet
MIT18 650F16 Regression
44 pages
PracticeExamRegression3024 PDF
100% (2)
PracticeExamRegression3024 PDF
13 pages
Lecture6,7-Logic Design - Transistors To Gates-Final
No ratings yet
Lecture6,7-Logic Design - Transistors To Gates-Final
46 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
ET - W2021 (2131905) (GTURanker - Com)
No ratings yet
ET - W2021 (2131905) (GTURanker - Com)
2 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Revenue Grade Metering Standards
No ratings yet
Revenue Grade Metering Standards
2 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
CS311 Final Term Question File 2019, 2020, 2021
No ratings yet
CS311 Final Term Question File 2019, 2020, 2021
5 pages
Mineral Processing with CrossFlow
No ratings yet
Mineral Processing with CrossFlow
2 pages
Kathrein 80010430 PDF
No ratings yet
Kathrein 80010430 PDF
1 page
PM-0.5 MK: - Reference Manual
No ratings yet
PM-0.5 MK: - Reference Manual
7 pages
Vector Addition Activity
No ratings yet
Vector Addition Activity
4 pages
ADW511A
No ratings yet
ADW511A
61 pages
Attention Stern Tube 27-03-2025
No ratings yet
Attention Stern Tube 27-03-2025
2 pages
Wave Properties of Light
No ratings yet
Wave Properties of Light
36 pages
Tema 4 Synopsys Primer Ejemplo
No ratings yet
Tema 4 Synopsys Primer Ejemplo
21 pages
Common SQL Errors & Solutions Guide
No ratings yet
Common SQL Errors & Solutions Guide
13 pages
Complete Business Statistics: Multiple Regression
No ratings yet
Complete Business Statistics: Multiple Regression
64 pages
SPPS M1507 D Datasheet
No ratings yet
SPPS M1507 D Datasheet
2 pages
University Semester Practical Exam Schedule NOv-Dec 2024 - 3 - 5 - Semester
No ratings yet
University Semester Practical Exam Schedule NOv-Dec 2024 - 3 - 5 - Semester
6 pages
2024 Spring Project
No ratings yet
2024 Spring Project
7 pages
UART Interface Design & UVM Verification
No ratings yet
UART Interface Design & UVM Verification
4 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages

BDA Lecture 11a

Uploaded by

BDA Lecture 11a

Uploaded by

Variable selection with projpred

• In your project it is sufficient to compare 2–3 models

• In your project it is sufficient to compare 2–3 models

• In your project it is sufficient to compare 2–3 models

• In your project it is sufficient to compare 2–3 models

• Lindley (1968): The choice of variables in multiple regression

• Lindley (1968): The choice of variables in multiple regression

• Lindley (1968): The choice of variables in multiple regression

• Lindley (1968): The choice of variables in multiple regression

0 100 200 300 400 500

0 100 200 300 400 500

0 100 200 300 400 500

Correlation for xj , f∗ (f∗ = PCA + linear regression)

0 100 200 300 400 500

0.00 0.25 0.50 0.75

0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75

• Theory says to integrate over all the uncertainties

• Theory says to integrate over all the uncertainties

• Theory says to integrate over all the uncertainties

• Theory says to integrate over all the uncertainties

• Theory says to integrate over all the uncertainties

• Theory says to integrate over all the uncertainties

Full posterior for 𝛽1 and 𝛽2 and contours of predicted class probabil-

Projected point estimates for 𝛽1 and 𝛽2

Projected point estimates, constraint 𝛽1 = 0

Projected point estimates, constraint 𝛽2 = 0

Draw-by-draw projection, constraint 𝛽1 = 0

Draw-by-draw projection, constraint 𝛽2 = 0

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

• Replace full posterior p(𝜃 | D) with some constrained q(𝜃) so

Log predictive density

Predict bodyfat percentage. The reference value is obtained by

Predict bodyfat percentage. The reference value is obtained by

Marginal posteriors of coefficients

Bivariate marginal of weight and height

The predictive performance of the full and submodels

Marginals of the reference and projected posterior

Comparing projection predictive variable selection (projpred) and

Comparing projection predictive variable selection (projpred) and

Comparing projection predictive variable selection (projpred) and

• Reduced variability, but in case of noisy finite data, there will be

Comparing projection predictive variable selection (projpred) and

• Reduced variability, but in case of noisy finite data, there will be

Comparing projection predictive variable selection (projpred) and

• Reduced variability, but in case of noisy finite data, there will be

• projpred supports also hierarchical models in brms

• So far the biggest number of variables we’ve tested is 22K

• McLatchie, Rögnvaldsson, Weber, and Aki Vehtari (2024). Advances

You might also like