0% found this document useful (0 votes)

9 views19 pages

GVSusing BUGS

This paper details the implementation of Gibbs variable selection using the BUGS software, focusing on the specification of likelihood, prior distributions, and model probabilities. It discusses various MCMC methods for variable selection, particularly the Gibbs variable selection approach, and provides guidance on applying these methods in different statistical models. The paper is structured into sections that cover the algorithm, implementation in BUGS, and examples of application.

Uploaded by

Ahmed HAMIMES

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views19 pages

GVSusing BUGS

Uploaded by

Ahmed HAMIMES

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Gibbs Variable Selection Using BUGS

Ioannis Ntzoufras∗
Department of Business Administration
University of the Aegean, Chios, Greece
e-mail: [email protected]

Abstract

In this paper we discuss and present in detail the implementation of

Gibbs variable selection as defined by Dellaportas et al. (2000, 2002)
using the BUGS software (Spiegelhalter et al. , 1996a,b,c). The spec-
ification of the likelihood, prior and pseudo-prior distributions of the
parameters as well as the prior term and model probabilities are de-
scribed in detail. Guidance is also provided for the calculation of the
posterior probabilities within BUGS environment when the number of
models is limited. We illustrate the application of this methodology in a
variety of problems including linear regression, log-linear and binomial
response models.

Keywords: Logistic regression; Linear regression; MCMC; Model selec-

tion.

1 Introduction

In Bayesian model averaging or model selection we focus on the calculation of

posterior model probabilities which involve integrals analytically tractable only in
certain restricted cases. This obstacle has been overcomed via the construction of
efficient MCMC algorithms for model and variable selection problems.
A variety of MCMC methods have been proposed for variable selection including
the Stochastic Search Variable Selection (SSVS) of George and McCulloch (1993),
the reversible jump Metropolis by Green (1995), the model selection approach of
Carlin and Chib (1995) the variable selection sampler of Kuo and Mallick (1998)
and the Gibbs variable selection (GVS) by Dellaportas et al. (2000, 2002).
The primary aim of this paper is to clearly illustrate how we can utilize BUGS
(Spiegelhalter et al. , 1996a, see also www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml)
for the implementation of variable selection methods. We concentrate on Gibbs
variable selection introduced by Dellaportas et al. (2000, 2002) with independent
prior distributions. Extension to other Gibbs samplers such as George and Mc-
Cullogh (1993) SSVS and Kuo and Mallick (1998) sampler is straightforward; see
for example in Dellaportas et al. (2000). Finally, application of Carlin and Chib
(1995) algorithm is also illustrated using BUGS by Spiegelhalter et al. (1996c).
∗ Journal of Statistical Software, Volume 7, Issue 7, available from www.jstatsoft.org

1
The paper is organised into three sections additional to this introductory one. Sec-
tion 2 briefly describes the general Gibbs variable selection algorithm as introduced
by Dellaportas et al. (2002), Section 3 provides detailed guidance for implementa-
tion in BUGS and finally Section 4 presents three illustrated examples.

2 Gibbs Variable Selection

Many statistical models may be represented naturally as (s, γ) ∈ S × {0, 1}p , where
the indicator vector γ identifies which of the p possible sets of covariates are present
in the model and s denotes other structural properties of the model. For example,
for a generalised linear model, s may describe the distribution, link function and
variance function, and the linear predictor may be written as
p
X
η= γj X j β j (1)
j=1

where X j is the design matrix and β j the parameter vector related to the jth term.
In the following, we restrict attention to variable selection aspects assuming that s
is known and we concentrate on the estimation of the posterior distribution of γ.
We denote the likelihood of each model by f (y|β, γ) and the prior by f (β, γ) =
f (β|γ)f (γ), where f (β|γ) is the prior of the parameter vector β conditional on the
model structure γ and f (γ) is the prior of the corresponding model. Moreover, β
can be partitioned into two vectors β γ and β \γ corresponding to parameters of
variables included or excluded from the model. Under this approach the prior can
be rewritten as
f (β, γ) = f (β γ |γ)f (β \γ |β γ , γ)f (γ)
while, since we are using the linear predictor (1), the likelihood can be simplified to

f (y|β, γ) = f (y|β γ , γ).

From the above it is clear that the components of the vector β \γ do not affect the
model likelihood and hence the posterior distribution within each model γ is given
by
f (β|γ, y) = f (β γ |γ, y) × f (β \γ |β γ , γ)
where f (β γ |γ, y) is the actual posterior of the parameters of model γ and
f (β \γ |β γ , γ, y) is the conditional prior distribution of the parameters not included
in the model γ. We may now interpret f (β γ |γ) as the actual prior of the model
while the distribution f (β \γ |β γ , γ) may be called as ‘pseudoprior’ since the param-
eter vector β \γ does not gain any information from the data and does not influence
the actual posterior of the parameters of each model, f (β γ |γ, y). Although this
pseudoprior does not influence the posterior distributions of interest, it influences
the performance of the MCMC algorithm and hence it should be specified with
caution.
The sampling procedure is summarised by the following steps:

1. We sample the parameters included in the model by the posterior

f (β γ |β \γ , γ, y) ∝ f (y|β, γ)f (β γ |γ)f (β \γ |β γ , γ) (2)

2
2. Sample the parameters excluded from the model from the pseudoprior

f (β \γ |β γ , γ, y) ∝ f (β \γ |β γ , γ) (3)

3. Sample each variable indicator γj from a Bernoulli distribution with success

probability Oj /(1 + Oj ); where Oj is given by

f (y|β, γj = 1, γ \j ) f (β|γj = 1, γ \j ) f (γj = 1, γ \j )

Oj = . (4)
f (y|β, γj = 0, γ \j ) f (β|γj = 0, γ \j ) f (γj = 0, γ \j )

The selection of priors and pseudopriors is a very important aspect in model selec-
tion. Here we briefly present the simplest approach where f (β|γ) is given a product
Qp
of independent prior and pseudoprior densities: f (β|γ) = j=1 f (β j |γ j ). In such
case, a usual and simple choice of f (β j |γ j ) is given by

f (β j |γ j ) = (1 − γj )f (β j |γj = 0) + γj f (β j |γj = 1) (5)

Q
resulting to actual prior distribution f (β γ |γ) = γj =1 f (β j |γ j ) and pseudoprior
Q
f (β \γ |β γ , γ) = γj =0 f (β j |γ j ).
Note that the above prior can be efficiently used in any model selection problem
if we orthogonalize the data matrix and then perform model choice using the new
transformed data (see Clyde et al. , 1996). If orthogonalization is undesirable
then we may need to construct more sophisticated and flexible algorithms such as
reversible jump MCMC; see Green (1995) and Dellaportas et al. (2002).
The simplified prior (5) and model formulation such as (1), result in the following
full conditional posterior
n
Y f (y|γ, β)f (β j |γj = 1 ) γj = 1
f (β j |γ, β \j , y) ∝ f (y|β γ , γ) f (βk |γk ) ∝
f (β j |γj = 0) γj = 0
k=1
(6)
indicating that the pseudoprior, f (β j |γj = 0) does not affect the posterior distri-
bution of each model coefficient.
Similarly to George and McCulloch (1993), we use a mixture of Normal distribution
for model parameters.

f (βj |γj = 1) ≡ N (0, Σj ) and f (βj |γj = 0) ≡ N (µ̄j , Sj ). (7)

The hyperparameters µ̄j and Sj are parameters of the pseudoprior distribution;

therefore their choice is only relevant to the behaviour of the MCMC chain and
do not affect the posterior distribution. Ideal choices of these parameters are the
maximum likelihood or pilot run estimators of the full model (as, for example, in
Dellaportas and Forster, 1999). However, in the experimental process, we noted
that an automatic selection of µ̄j = 0 and Sj = Σj /k 2 with k = 10 has also been
proven an adequate choice; for more details see Ntzoufras (1999). This ‘automatic
selection’ uses the properties of the prior distributions with ‘large’ and ‘small’ vari-
ance introduced in SSVS by George and McCulloch (1993). The parameter k is
now only a pseudoprior parameter and therefore it does not affect the posterior
distribution. Suitable calibration of this parameter assists the chain to move better
(or worse) between different models.

3
The prior proposed by Dellaportas and Forster (1999) for contingency tables, is also
adopted here for logistic regression models with categorical explanatory variables
(see Dellaportas et al. , 2000). Alternatively, for generalized linear models, Raftery
(1996) has proposed to select the prior covariance matrix using elements from the
data matrix multiplied by a hyperparameter. The latter was selected in such way
that the effect of the prior distribution on the posterior odds becomes minimal.
When no restrictions on the model space are imposed then a common prior for the
term indicators γj is f (γj ) = Bernoulli (1/2), whereas in other cases (for example,
hierarchical or graphical log-linear models) it is required that f (γj |γ \j ) depends on
γ\j ; for more details see Chipman (1996) and Section 3.4.
Other Gibbs samplers for model selection have also been proposed by George and
McCulloch (1993), Carlin and Chib (1995) and Kuo and Mallick (1998). Detailed
comparison and discussion of the above methods is given by Dellaportas et al. (2000,
2002). Implementation of Carlin and Chib methodology in BUGS is illustrated by
Spiegelhater et al. (1996c, page 47) while an additional simple example of Gibbs
variable selection methods is provided by Dellaportas et al. (2000).

3 Applying Gibbs Variable Selection Using

BUGS

In this section we provide detailed guidance for implementing Gibbs variable selec-
tion using BUGS software. It is divided into four sub-sections involving the defi-
nition of the model likelihood f (y|β, γ), the specification of the prior distributions
f (β|γ) and f (γ), and, finally, the direct calculation of posterior model probabilities
using BUGS.

3.1 Definition of likelihood

The linear predictor of type (1) used in Gibbs variable selection and Kuo and Mallick
sampler can be easily incorporated in BUGS using the following code

for (i in 1:N) { for(j in 1:p) {z[i,j]<-x[i,j]b[j]g[j]}}

for (i in 1:N) {
eta[i] <-sum(z[i,]) ;
y[i]~distribution [ parameter1, parameter2 ] }

where

• N denotes the sample size,

• p the number of total variables under consideration,

• x[i,j] is the i, j component of the data or design matrix X,

• y[i] is i element of the response vector y,

• b[j] is the j element of the parameter vector β,

4
• g[j] is the inclusion indicator for j element of γ,

• z[i,j] is a matrix used to simplify calculations,

• eta[i] is the i element of linear predictor vector η and should be substituted

by the corresponding link function, for example logit(p[i]) in binomial
logistic regression,

• distribution should be substituted by appropriate BUGS command for the

distribution that the user prefers (for example dnorm for normal distribution),

• parameter1,parameter2 should be substituted according to distribution cho-

sen, for example for the normal distribution with mean µi and variance τ −1
we may use mu[i], tau.

For the usual normal, binomial and Poisson models the model formulations are
given by the following lines of BUGS code

Normal: for (i in 1:N) { mu[i] <- sum(z[i,]) ;

y[i]~dnorm (mu[i], tau) }
where mu[i] is the expected value for the ith observation and tau is the
precision of the regression model.

Poisson: for (i in 1:N) { log(lambda[i]) <- sum(z[i,]);

y[i] ~ dpois(lambda[i])}
where lambda[i] is the Poisson mean for the ith observation.

Binomial: for (i in 1:N) { logit(p[i]) <- sum(z[i,]);

y[i] ~ dbin(p[i], n[i])}
where p[i] is the probability of success and n[i] is the total number of
Bernoulli trials for the ith binomial experiment. Alternative link functions
maybe used by substituting logit(p[i]) by probit(p[i]) or cloglog(p[i])
for Φ−1 (p) and log(−log(1 − p)); where Φ is the standardised normal cumu-
lative distribution function.

3.2 Definition of Prior Distribution of Parameter Vector

When we use independent priors as given by (5) and each covariate parameter vector
is univariate, the definition of the prior is straightforward. Our prior is a mixture
of independent normal distributions

βj ∼ γj N (0, Σj ) + (1 − γj )N (µ̄j , Sj ), j = 1, 2, . . . , p (8)

where µ̄j , Sj are the mean and variance respectively used in the corresponding
pseudoprior distributions and Σj is the prior variance, when the j term is included
in the model. In order to use (8) in BUGS we write

• b[j]∼dnorm( bpriorm[j], tprior[j]) denoting βj ∼ N (mj , τj−1 ),

• bpriorm[j] < − (1-g[j])*mean[j] denoting mj = (1 − γj )µ̄j ,

5
• tprior[j] < − g[j]*t[j]+(1-g[j])*pow(se[j],-2) denoting τj = (1 −
γj )Sj−1 + γj Σ−1
j ,

for j = 1, 2, . . . , p; where mj and τj are the prior mean and precision for βj depend-
ing on γj and t[j], se[j], mean[j], bpriorm[j], tprior[j] are the BUGS
variables for Σ−1
p
j , Sj , µ̄j , mj and τj , respectively.
When we consider a categorical explanatory variable j with J > 2 categories then
the corresponding parameter vector β j will be multivariate with dimension dj =
J − 1. In such cases we denote by p and d(> p) the dimensions of γ and the full
parameter vector β, respectively. Therefore, we need one variable to facilitate the
association between these two vectors. This vector is denoted by the BUGS variable
pos. The pos vector, which has dimension equal to the dimension of β, takes values
from 1, 2, ..., p and denotes that kth element of the parameter vector β is associated
with the γposk binary indicator for all k = 1, 2, ..., d.
For illustration, let us consider an ANOVA model with two categorical variables X1
and X2 with 3 and 4 categories respectively. Then, the terms under consideration
are X0 , X1 , X2 and X12 ; where X0 denotes the constant term and X12 the inter-
action between the terms X1 and X2 . The corresponding dimensions are dX0 = 1,
dX1 = 2, dX2 = 3 and dX12 = dX1 × dX2 = 6. Then, we set the pos vector equal to
pos <- c ( 1, 2,2, 3,3,3, 4,4,4,4,4,4 )
to state that the first parameter corresponds to the first term (X0 ), parameters 2-3
correspond to the second term (X1 ), parameters 4-6 correspond to the third term
(X2 ) and parameters 7-12 correspond to the fourth term (X12 ). Finally, we use
another vector called gtemp of dimension d which is given by
gtemp[i] <- g[ pos[i] ]
for all i = 1, . . . , d. The vector gtemp is used in the likelihood instead of the g
vector. For details see example 1 and the associated BUGS code in the Appendix.
Moreover, the definition of the prior distribution when factors or terms with many
parameters are considered is more complicated. For example a mixture of multi-
variate normal prior distributions as given by (5) can be expressed as a multivariate
normal distribution on the ‘full’ parameter vector β. Therefore we may write in
BUGS

• b[ ] ∼ dmnorm( bpriorm[ ], Tau[,]) denoting β ∼ Nd (m, T −1 ),

• bpriorm[k]< −(1-g[pos[k]])*mean[k] denoting mk = (1 − γposk )µ̄k ,

• Tau[k,l] < − g[pos[k]]g[pos[l]]t[k,l]+

+(1-g[pos[k]]*g[pos[l]])*equals(k,l)*pow(se[k],-2) denoting that
 −1
 [Σ ]kl when γposk = γposl = 1
−2
Tkl = sek when k = l & γposk = 0 for k, l = 1, 2, . . . , d;

0 otherwise

where Nd is the d-dimensional normal distribution; mT = (m1 , m2 , . . . , md ) and T

are the prior mean vector and precision matrix depending on γ; µ̄k is the corre-
sponding pilot run estimate for k element of model parameter vector β; Σ is the
constructed prior covariance matrix for the whole parameter vector β when we use
for each β j the multivariate extension of prior distribution (8); Tkl and [Σ−1 ]kl is

6
the k row and l column elements of T and Σ−1 matrices respectively; and Tau[,],
t[,] are the BUGS matrices for T and Σ−1 , respectively. An illustration of usage
of such prior distribution is given in example 1.

3.3 Implementation of Other Gibbs Samplers for Variable

Selection

SSVS and Kuo and Mallick sampler can easily be applied with minor modifications
in the above code. In SSVS the prior (8) is used with µ̄j = 0 and Sj = Σj /kj2 ,
where kj2 should be large enough in order that βj will be close to zero when γj = 0.
For selection of the prior parameters in SSVS see semiautomatic prior selection of
George and McCulloch (1993). The above restriction can easily be applied in BUGS
by
bpriorm[j] <- 0
tprior[j] <- t[j]*g[j]+(1-g[j])*t[j]*pow(k[j],2) .

Moreover, the likelihood in SSVS should be slightly modified by substituting the

first line of the code in Section 3.1 with
for (i in 1:N) { for (j in 1:p) { z[i,j]<-x[i,j]*b[j]}}.

Kuo and Mallick sampler uses prior on β that does not depend on model indicator
γ. Therefore the specification of the prior is the same as in simple modelling using
BUGS. Moreover, the likelihood definition is the same as in Gibbs variable selection
described in Section 3.1.

3.4 Definition of Prior Term Probabilities

In order to apply any variable selection method in BUGS we need to define the prior
probabilities f (γ). When we are vague about models we may set f (γ) = 1/M , where
M is the number of all models under consideration. When the explanatory variables
do not involve interactions (for example in linear regression) then the number of
models under consideration is 2p . In these situations the latent variables γj can be
treated as a − priori independent and therefore set in BUGS

• g[j] ∼ dbern(0.5) denoting that γj ∼ Bernoulli(0.5).

for all j = 1, 2, . . . , p. This prior results to f (γ) = 2−p for all γ ∈ {0, 1}p . When
we are dealing with models using categorical explanatory variables with interaction
terms, such as AN OV A or log-linear models, we usually want to restrict attention to
hierarchical models. The conditional distributions of f (γj |γ \j ) need to be specified
in such way that f (γ) = 1/M when γ is referring to hierarchical model and f (γ) = 0
otherwise.
For example, in a two way AN OV A we have three terms under consideration; the
main effects A,B and the interaction AB. All possible models are eight, while the
hierarchical ones are only five (constant, [A], [B], [A][B] and [AB]). Therefore, we
wish to specify f (γ) = 0.20 for the above five models and f (γ) = 0 for the rest.
This can be applied by setting in BUGS

7
• g[3] ∼ dbern(0.2) denoting that γAB ∼ Bernoulli(0.2).

• pi < − g[3]+0.5(1-g[3]) denoting that π = γAB + 0.5 ∗ (1 − γAB ),

• for (i in 1:2) { g[j] ∼ dbern(pi) } denoting that for all i ∈ {A, B},
γj |γAB ∼ Bernoulli(π).

From the above it is evident that

f ([AB]) = f (γAB = 1)f (γA = 1|γAB = 1)f (γB = 1|γAB = 1)

= 0.2 × 1 × 1 = 0.2 ,

f ([A][B]) = f (γAB = 0)f (γA = 1|γAB = 0)f (γB = 1|γAB = 0)

= 0.8 × 0.5 × 0.5 = 0.2 .

Using similar calculations we find that f (γ) = 0.2 for all five models under consid-
eration. For further relevant discussion and application see Chipman (1996). For
implementation in BUGS see examples 1 and 3.

3.5 Calculating Model Probabilities in Bugs

In order to directly calculate the posterior model probabilities in BUGS and avoid
saving large MCMC output we may use matrix type variables with dimension equal
p
γj 2j−1 we transform
P
to the number of models. Using a simple coding such as 1+
j=1
the vector γ in a unique, for each model index (noted by mdl) for which pmdl[mdl]=1
and pmdl[j]=0 for all j 6= mdl. The above statements can be written in BUGS with
the code
for (j in 1:p) { index[j] < − pow(2,j-1) }
mdl < − 1+inprod(g[ ], index[ ])
for (m in 1:mdl) { pmdl[m] < − equals(m,mdl) }
Then using the command stats(pmdl) in BUGS environment (or cmd file) we can
monitor the posterior model probabilities. This is feasible only if the number of
models is limited and therefore applicable only in some simple problems.

4 Examples

The implementation of three illustrated examples are briefly presented. The first
example is a 3 × 2 × 4 contingency table used to illustrate how to handle factors
with more than two levels. Example 2 provides model selection details in a regres-
sion type problem involving many different error distributions while example 3 is
a simple logistic regression problem with random effects. In all examples posterior
probabilities are presented while the associated BUGS codes are provided in the
appendix. Additional details (for example, convergence plots) are omitted since our
aim is just to illustrate how to use BUGS for variable selection.

8
4.1 Example 1: 3 × 2 × 4 Contingency Table

This example is a 3 × 2 × 4 contingency table presented by Knuiman and Speed

(1988) where 491 individuals are classified by three categorical variables: obesity
(O: low,average,high), hypertension (H: yes,no) and alcohol consumption (A: 1,1–
2,3–5,6+ drinks per day); see Table 1.

Alcohol Intake
Obesity High BP 0 1-2 3-5 6+
Low Yes 5 9 8 10
No 40 36 33 24
Average Yes 6 9 11 14
No 33 23 35 30
High Yes 9 12 19 19
No 24 25 28 29

Table 1: 3 × 2 × 4 Contingency Table: Knuiman and Speed (1988) Dataset.

The full model is given by

nilk ∼ P oisson(λilk ), log(λilk ) = m + oi + hl + ak + ohil + oaik + halk + ohailk ,

for i = 1, 2, 3, l = 1, 2, k = 1, 2, 3, 4. The above model can be rewritten

with likelihood given by (1) where β can be divided to β j sub-vectors with j ∈
{∅, O, H, OH, A, OA, HA, OHA}; where β∅ = m, β TO = [o2 , o3 ], βH = h2 , βOH T
=
T T T
[oh22 , oh32 ], β A = [a2 , a3 , a4 ], β OA = [oa22 , oa23 , oa32 , oa33 ], β HA = [ha22 , ha23 ]
and β TOHA = [oha222 , oha223 , oha322 , oha323 ]. Each β j is a multivariate vector and
therefore each prior distribution involves mixture multivariate normal distributions.
We use sum-to-zero constraints and prior variance Σj as in Dellaportas and Forster
(1999). We restrict attention in hierarchical models including always the main ef-
fects since we are mainly interested for relationships between the categorical factors.
Under these restrictions, the models under consideration are nine (9). In order to
forbid moves to non hierarchical models we use the following BUGS code to define
the prior model probabilities:

• g[8] ∼ dbern(0.1111) for γOHA ∼ Bernoulli(1/9).

• pi < − g[8]+0.5*(1-g[8]) for π = γOHA + 0.5 ∗ (1 − γOHA ),

• for (i in 5:7) { g[j]∼dbern(pi) } for γj |γOHA ∼ Bernoulli(π) for all

i ∈ {OH, OA, HA},

• for (j in 1:4) { g[j] ∼ dbern(1) } for γj ∼ Bernoulli(1) for all i ∈

{constant, O, H, A}.

These priors result to prior probability for all hierarchical models equal to 1/9 and
zero otherwise.
Results using both pilot run pseudoprior and automatic pseudoprior with k = 10
are summarised in Table 2. The data give ‘strong’ evidence in favour of the model

9
Posterior Model Probabilities (%)
Pseudopriors k=10 Pilot Run
Burn-in 1,000 10,000 1,000 10,000
Iterations 1,000 10 × 10, 000 1,000 10 × 10, 000
Models
[O][H][A] 62.80 68.87 65.20 67.80
[OH][A] 36.90 30.53 34.40 31.63
[O][HA] 0.20 0.40 0.10 0.43
[OH][HA] 0.10 0.20 0.30 0.14
Terms
γOH = 1 37.00 30.63 34.70 31.77
γHA = 1 0.30 0.20 0.40 0.57

Table 2: 3 × 2 × 4 Contingency Table: Posterior Model Probabilities Using BUGS.

of independence. Model [OH][A], in which obesity and hypertension are depending

on each other given the level of alcohol consumption, is the model with the second
highest posterior probability. All the other models have probability lower than 1%.

4.2 Example 2: Stacks Dataset

This example involves stack-loss data analysed by Spiegelhalter et al. (1996b, page
27) using the Gibbs sampling. The dataset features 21 daily responses of stack loss
(y) which measures the amount of ammonia escaping with covariates the air flow
(x1 ), temperature (x2 ) and acid concentration (x3 ). Spiegelhalter et al. (1996b)
consider regression models with four different error structures (normal, double ex-
ponential, logistic and Student’s t4 distributions). They also consider the cases of
ridge and simple independent regression models. We extend their work by applying
Gibbs variable selection on all these eight cases.
The full model will be

yi ∼ D(µi , τ ), µi = β0 + β1 zi1 + β2 zi2 + β3 zi3 , i = 1, . . . , 21

where Di (µi , τ ) is the distribution of the errors with mean µi and variance τ −1 which
here is assumed to be normal, or double exponential, or logistic or t4 ; where zij =
(xij − x̄j )/sd(xj ) are the standardised covariates. The ridge regression approach
assumes a further restriction that the βj for j = 1, 2, 3 are exchangable (Lindley
and Smith, 1972) and therefore we have βj ∼ N (0, φ−1 ). We use ‘non-informative’
priors with prior precision equal to 10−3 for the independent regression and for φ
in ridge regression we use gamma prior with parameters equal to 10−3 . Since we do
not wish to apply any restriction on the model space we use the prior probabilities
γj ∼ Bernoulli(1/2) for j = 1, 2, 3 which results to prior probability of 1/8 for
all possible models. For the pilot run pseudoprior parameters we use the posterior
values as given Spiegelhalter et al. (1996b).
Tables 3 and 4 provide the results from all eight distinct cases using pilot run
pseudopriors. In all cases flow of air (z1 ) has posterior probability of inclusion
higher than 99%. The temperature (z2 ) seems to be also an important term with

10
posterior probability of inclusion varying from 39% to 96%. The last term (z3 ) which
measures the acid concentration in air has low posterior probabilities of inclusion
which are less than 5% for simple independence models and less than 20% for ‘ridge’
regression models.

Independence Regression
Models Normal D.Exp. Logistic t4
Constant 0.00 0.00 0.00 0.00
z1 14.12 58.48 41.19 56.46
z2 0.56 0.01 0.02 0.00
z1 + z2 81.25 38.64 55.25 40.46
z3 0.00 0.00 0.00 0.00
z1 + z3 0.63 1.75 1.35 1.82
z2 + z3 0.05 0.00 0.00 0.00
z1 + z2 + z3 3.39 1.11 2.18 1.26
Terms
γz1 = 1 99.30 99.98 99.97 100.00
γz2 = 1 84.90 39.76 57.45 41.72
γz3 = 1 4.30 2.86 3.53 3.08

Table 3: Stacks Dataset: Posterior Model Probabilities in Independence Regression

(burn-in 10,000, samples of 10 × 10, 000, with pilot run pseudopriors).
Ridge Regression
Models Normal D.Exp. Logistic t4
Constant 0.00 0.00 0.00 0.00
z1 3.26 22.54 14.42 13.30
z2 0.05 0.00 0.00 0.00
z1 + z2 79.79 65.00 73.32 70.92
z3 0.00 0.00 0.00 0.00
z1 + z3 0.44 1.74 1.32 1.86
z2 + z3 0.00 0.00 0.00 0.00
z1 + z2 + z3 16.46 10.72 11.01 13.92
Terms
γz1 = 1 100.00 100.00 100.00 100.00
γz2 = 1 96.50 75.72 84.33 84.84
γz3 = 1 16.10 12.46 12.33 15.78

Table 4: Stacks Dataset: Posterior Model Probabilities in Ridge Regression (burn-in

10,000, samples of 10 × 10, 000, with pilot run pseudopriors).

4.3 Example 3: Seeds Dataset, Logistic Regression with Ran-

dom Effects

This example involves the examination of a proportion of seeds that germinated

on 21 plates. For these 21 plates we have recorded the seed (bean or cucumber)
and the type of root extract. This data set is analysed by Spiegelhalter et al.
(1996b, page 10) using BUGS; for more details see references there in. The model

11
is a logistic regression with 2 categorical explanatory variables and random effects.
The full model will be written

pilk
yilk ∼ Bin(nilk , pilk ), log = m + ai + bl + abil + wk ,
1 − pilk

for i, l = 1, 2 and k = 1, . . . , 21; where yilk and nilk is the number of seeds germinated
and total number of seeds respectively for i seed, l type of root extract and k plate;
wk is the random effect for the k plate.
We use sum-to-zero constraints for both fixed and random effects. Following Della-
portas and Forster (1999) we use prior variance for the fixed effects Σ = 4 × 2. The
prior for the precision of the random effects is considered to be a gamma distribu-
tion with parameters equal to 10−3 . The pseudoprior parameters were taken from
a pilot chain of the saturated model. The models under consideration are ten. The
prior term probabilities for the fixed effects are assigned similarly as in the example
for two-way ANOVA models. For the random effects term indicator we have that
γw ∼ Bernoulli(0.5).

Fixed Effects Random Effects

Models k=10 Pilot k=10 Pilot
Constant 0.00 0.00 1.21 0.99
[A] 0.00 0.00 0.22 0.07
[B] 32.34 32.07 50.61 50.75
[A][B] 3.78 3.84 7.24 7.60
[AB] 2.80 2.83 1.80 1.85
Total 38.92 38.74 61.08 61.26

Table 5: Seeds Dataset: Posterior Model Probabilities Using BUGS (burn-in 10,000,
samples of 10 × 10, 000).

Table 5 provides the calculated posterior model probabilities. We used both pilot
run proposals and automatic pseudoprior with k = 10. Both chains gave the same
results as expected and the type of root extract (B) is the only factor that influences
the proportion of germinated gems. The corresponding models with random and
fixed effects have posterior probability equal to 51% and 32%, respectively. The
marginal posterior probability of random effects is 61% which is about 56% higher
than the posterior probability of fixed effects models.

12
5 Appendix: BUGS Codes

Bugs code and all associated data files are freely available in electronic form at
the Journal of Statistical Software web site www.jstatsoft.org/v07/i07/ or by
electronic mail request.

5.1 Example 1
model log-linear;
#
# 3x2x4 LOG-LINEAR MODEL SELECTION WITH BUGS (GVS)
# (c) OCTOBER 1996
# (c) REVISED OCTOBER 1997
#
const
terms=8, # number of terms
N = 24; # number of Poisson cells
var
include, # conditional prior probability for gi
pmdl[9], # model indicator vector
mdl, # code of model
b[N], # model coefficients
mean[N], # proposal mean used in pseudoprior
se[N], # proposal standard deviation used in
# pseudoprior
bpriorm[N], # prior mean for b depending on g
Tau[N,N], # model coefficients precision
tprior[N,N],# prior value for Tau when all terms
# are included in model
x[N,N], # design matrix
z[N,N], # matrix with z_ij=x_ij b_j g_j, used in
# likelihood
n[N], # Poisson cells
pos[N], # position of each parameter
lambda[N], # Poisson mean for each cell
gtemp[N], # temporary term indicator vector
g[terms]; # term indicator vector
data pos,n in "ex2.dat", x in ’ex2des.dat’,
mean, se in ’prop2.dat’, tprior in ’cov.dat’;
inits in "ex2.in";
{
#
# associate g[i] with coefficients.
#
for (i in 1:N) {
gtemp[i]<-g[pos[i]];
}
#
# calculation of the z matrix used in likelihood
#
for (i in 1:N) {
for (j in 1:N) {
z[i,j]<-x[i,j]*b[j]*gtemp[j]
}
}
#
# model configuration
for (i in 1:N) {
log(lambda[i])<-sum(z[i,])
n[i]~dpois(lambda[i]);
}
# defining model code
# 0 for independence model [A][B][C], 1 for [AB][C],

13
# 2 for [AC][B], 3 for [AB][AC], 4 for [BC][A],
# 5 for [AB][BC], 6 for [AC][BC], 7 for [AB][BC],
# 15 for [ABC].
#
mdl<-g[5]+2*g[6]+4*g[7]+8*g[8];
for (i in 0:7) {
pmdl[i+1]<-equals(mdl,i)
}
pmdl[9]<-equals(mdl,15)
#
# Prior for b model coefficient
# Mixture normal depending on current status of g[i]
#
for (i in 1:N) { for (j in 1:N) {
#
# GVS using se,mean from pilot run
# ********************************
#
Tau[i,j]<-0+tprior[i,j]*(gtemp[i]*gtemp[j])+
(1-gtemp[i]*gtemp[j])*equals(i,j)/(se[i]*se[i]);
#
# Automatic proposal using prior similar to SSVS
# with k=10
# ************************************************
# Tau[i,j]<-tprior[i,j]*pow(100,1-gtemp[i]*gtemp[j]);
#
# Kuo and Mallick proposal is independent of g[i]
# [tau[i]=1/2 and bpriorm[i]=0]
# ***********************************************
#
#
# Tau[i,j]<-tprior[i,j];
#
}
#
# GVS PRIOR M FROM PILOT RUN
# **************************
bpriorm[i]<-mean[i]*(1-gtemp[i]);
#
# PRIOR M FOR THAT DOES NOT DEPEND ON G.
# *************************************
# bpriorm[i]<-0.0;
}
b[]~dmnorm(bpriorm[],Tau[,]);
#
# defining prior information for gi to
# allow only hierarchical models with equal probability.
# We also ignore models nested to the model of
# independence [A][B][C] since we are interested in
# associations between factors.
g[8]~dbern(0.1111111);
include<-(1-g[8])*0.5+g[8]*1.0;
g[7]~dbern(include);
g[6]~dbern(include);
g[5]~dbern(include);
for (i in 1:4) {
g[i]~dbern(1.0);
}
}

14
5.2 Example 2

model stacks;
#
# LINEAR REGRESSION VARIABLE SELECTION WITH BUGS (GVS)
# BUGS EXAMPLE: STACKS, see BUGS examples vol.1
#
# (c) OCTOBER 1997
#
const
p = 3, # number of covariates
N = 21, # number of observations
models=8, # number of models under consideration 2^8
PI = 3.141593;
var
x[N,p], # raw covariates
z[N,p] , # standardised covariates
Y[N],mu[N], # data and expectations
stres[N], # standardised residuals
outlier[N], # indicator if |stan res| > 2.5
beta0,beta[p], # standardised intercept, coefficients
b0,b[p], # unstandardised intercept, coefficients
phi, # prior precision of standardised coef.
tau,sigma,d, # precision, sd and d.f. of t distribution
g[p], # variable indicators
mdl, # Model index
pmdl[models], # Vector with model indicators
mean[p],se[p], # pseudoprior mean and se error
bprior[p], # Conditional to model Prior prior mean
tprior[p]; # Conditional to model Prior prior precision
data Y,x in "STACKS.DAT",
# files with proposed values
mean,se in ’pnorm.dat’; # Normal distribution
#mean,se in ’pdexp.dat’; # Double exponential distribution
#mean,se in ’plogist.dat’;# Logistic distribution
#mean,se in ’pt4.dat’; # Student(4) distribution
inits in "STACKS.IN";
{
# Standardise x’s and coefficients
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j]) ;
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ;
}
}
b0<-beta0-b[1]*mean(x[,1])-b[2]*mean(x[,2])-b[3]*mean(x[,3]);
# Model
d <- 4; # degrees of freedom for t
for (i in 1:N) {
#
# Normal Distribution
# -------------------
Y[i] ~ dnorm(mu[i],tau);
#
# Double Exponential Distribution
# -------------------------------
# Y[i] ~ ddexp(mu[i],tau);
#
# Logistic Distribution
# ----------------------
# Y[i] ~ dlogis(mu[i],tau);
#
# Student t4 Distribution
# -----------------------

15
# Y[i] ~ dt(mu[i],tau,d);
#
mu[i] <- beta0 + g[1]*beta[1]*z[i,1]+g[2]*beta[2]*z[i,2]
+ g[3]*beta[3]*z[i,3];
stres[i] <- (Y[i] - mu[i])/sigma;
#
# if standardised residual is greater than 2.5 then outlier
outlier[i]<-step(stres[i] -2.5) + step(-(stres[i]+2.5) );
}
#
# Defining Model Code
mdl<- 1+g[1]*1+g[2]*2+g[3]*4
#
# defining vector with model indicators
for (j in 1:models){
pmdl[j]<-equals(mdl,j);}
# Priors
beta0 ~ dnorm(0,.00001);
for (j in 1:p) {
#
# ******** GVS PRIORS FOR INDEPENDENCE REGRESSION ********
#
# GVS priors with proposals from pilot run
# bprior[j]<-(1-g[j])*mean[j];
# tprior[j] <-g[j]*0.001+(1-g[j])/(se[j]*se[j]);
#
# GVS priors with proposals a mixture of Normals(0,c^2t^2)
bprior[j]<-0.0;
tprior[j] <-pow(100,1-g[j])*0.001;
#
# ******** GVS PRIORS FOR RIDGE REGRESSION ********
#
# GVS priors with proposals from pilot run
# bprior[j]<-(1-g[j])*mean[j];
# tprior[j] <-g[j]*phi+(1-g[j])/(se[j]*se[j]);
#
# GVS priors with proposals a mixture of Normals(0,c^2t^2)
# bprior[j]<-0.0;
# tprior[j] <-pow(100,1-g[j])*phi;
beta[j] ~ dnorm(bprior[j],tprior[j]); # coefs independent
}
tau ~ dgamma(1.0E-3,1.0E-3);
#
# phi ~ dgamma(1.0E-3,1.0E-3);
#
# when we use pilot run based pseudopriors bugs was unable
# to select update method. Therefore we use an upper limit
# which makes bugs work with Metropolis instead Gibbs
#
# phi ~ dgamma(1.0E-3,1.0E-3)I(0,10000);
# standard deviation of error distribution
sigma <- sqrt(1/tau); # normal errors
# sigma <- sqrt(2)/tau; # double exponential errors
# sigma <- sqrt(pow(PI,2)/3)/tau ; # logistic errors
# sigma <- sqrt(d/(tau*(d-2))); # errors of t with d d.f.
#
#
# Priors for variable indicators
for (j in 1:p) { g[j]~ dbern(0.5);}
}

16
5.3 Example 3
model seedszrogvs;
#
# LOGISTIC REGRESSION VARIABLE AND
# RANDOM EFFECTS SELECTION WITH BUGS (GVS)
#
# BUGS EXAMPLE: SEEDS, see BUGS examples vol.1
#
# (c) OCTOBER 1997
#
const
terms=4, # Number of terms under consideration
models=16,# number of models under consideration 2^4
N = 21; # number of samples
var
alpha0, alpha1, alpha2, alpha12, # model coefficients
tau, sigma, # variance of random effects (tau=1/sigma)
x1[N], x2[N], # Design Column for factor a1 and a2
# here we used the STZ constraints
p[N], # Success probability for Binomial
n[N], # Total number of trials for Binomial
r[N], # Binomial data
b[N], # Random effects (standardised)
c[N], # Random effects c[i] (unstandardised)
include, # conditional model probability for
# main effects
g[terms], # terms indicator vector
mdl, # model index
pmdl[models], # model indicator
mean[terms-1], # proposal mean
se[terms-1], # proposal se
bprior[terms-1],# prior mean for model coefficients
tprior[terms-1];# prior precision for model coefficients
data r,n,x1,x2 in "seeds.dat", mean,se in ’prop6.dat’;
inits in "seeds.in";
{
alpha0 ~ dnorm(0.0,1.0E-6); # intercept
for (j in 1:(terms-1)) {
# ******** GVS PRIORS ***********
#
# GVS priors with proposals from pilot run
bprior[j]<-(1-g[j])*mean[j];
tprior[j] <-g[j]/8+(1-g[j])/(se[j]*se[j]);
#
# GVS priors with proposals a mixture of Normals(0,c^2t^2)
# bprior[j]<-0.0;
# tprior[j] <-pow(100,1-g[j])/8;
}
#
#
alpha1 ~ dnorm(bprior[1],tprior[1]); # seed coeff
alpha2 ~ dnorm(bprior[2],tprior[2]); # extract coeff
alpha12 ~ dnorm(bprior[3],tprior[3]);
tau ~ dgamma(1.0E-3,1.0E-3); # 1/sigma^2
for (i in 1:N) {
c[i] ~ dnorm(0.0,tau);
b[i] <- c[i] - mean(c[]); # make sure b’s add to zero
logit(p[i]) <-alpha0+g[1]*alpha1*x1[i]+g[2]*alpha2*x2[i]
+g[3]*alpha12*x1[i]*x2[i]+g[4]*b[i];
r[i] ~ dbin(p[i],n[i]);
}

17
sigma <- 1.0/sqrt(tau);
#
# Defining Model Code
mdl<- 1+g[1]*1+g[2]*2+g[3]*4+g[4]*8
#
# defining vector with model indicators
for (j in 1:models){
pmdl[j]<-equals(mdl,j);}
# Priors for variable indicators
g[4]~ dbern(0.50);
g[3]~ dbern(0.20);
include<-g[3]+(1-g[3])*0.5
g[2]~ dbern(include);
g[1]~ dbern(include);
}

References
Carlin, B.P. and Chib, S. (1995). ‘Bayesian Model Choice via Markov Chain Monte
Carlo Methods’, Journal of the Royal Statistical Society B, 157, 473–484.
Chipman, H. (1996). ‘Bayesian Variable Selection with Related Predictors’, Cana-
dian Journal of Statistics, 24, 17–36.
Clyde, M., DeSimone, H. and Parmigiani, G. (1996). ‘Prediction via Orthogonalized
Model Mixing’, Journal of the American Statistical Association, 91, 1197–1208.
Dellaportas, P. and Forster, J.J. (1999). ‘Markov Chain Monte Carlo Model Deter-
mination for Hierarchical and Graphical Log-linear Models’, Biometrika, 86,
615–633.
Dellaportas, P., Forster, J.J. and Ntzoufras, I. (2000). ‘Bayesian Variable Selection
Using the Gibbs Sampler’, Generalized Linear Models: A Bayesian Perspective
(D. K. Dey, S. Ghosh, and B. Mallick, eds.). New York: Marcel Dekker, 271–
286.
Dellaportas, P., Forster, J.J. and Ntzoufras, I. (2002). ‘On Bayesian Model and
Variable Selection Using MCMC’, Statistics and Computing, 12, 27–36.
George, E.I. and McCulloch, R.E. (1993). ‘Variable Selection via Gibbs Sampling’,
Journal of the American Statistical Association, 88, 881–889.
Green, P. (1995). ‘Reversible Jump Markov Chain Monte Carlo Computation and
Bayesian Model Determination’, Biometrika, 82, 711–732.
Kuo, L. and Mallick, B. (1998). ‘Variable Selection for Regression Models’,
Sankhya B, 60, 65–81.
Knuiman, M.W. and Speed, T.P. (1988). ‘Incorporating Prior Information Into the
Analysis of Contingency Tables’, Biometrics, 44, 1061–1071.
Lindley, D.V. and Smith, A.F.M. (1972). ‘Bayes Estimates for the Linear Model’
(with discussion). Journal of the Royal Statistical Society B, 34, 1–41.
Ntzoufras, I. (1999). ‘Aspects of Bayesian Model and Variable Selection Using
MCMC’, Unpublished Ph.D. Thesis, Department of Statistics, Athens Uni-
versity of Economics and Business, Athens, Greece.
Raftery, A.E. (1996). ‘Approximate Bayes Factors and Accounting for Model Un-
certainty in Generalized Linear Models’, Biometrika, 83, 251–266.
Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W. (1996a). BUGS 0.5:
Bayesian Inference Using Gibbs Sampling Manual, MRC Biostatistics
Unit, Institute of Public health, Cambridge, UK. Available from
www.mrc-bsu.cam.ac.uk/bugs/documentation/bugs05/manual05.html.

18
Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W. (1996b). BUGS 0.5:
Examples Volume 1, MRC Biostatistics Unit, Institute of Public health,
Cambridge, UK. Available on line access from
www.mrc-bsu.cam.ac.uk/bugs/documentation/exampVol1/bugs.html.
Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W.(1996c). BUGS 0.5:
Examples Volume 2, MRC Biostatistics Unit, Institute of Public health, Cam-
bridge, UK. Available on line access from
www.mrc-bsu.cam.ac.uk/bugs/documentation/exampVol2/vol 2.html.

Regression Analysis Random Motors
100% (2)
Regression Analysis Random Motors
11 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
On Input Selection With Reversible Jump Markov Chain Monte Carlo Sampling
No ratings yet
On Input Selection With Reversible Jump Markov Chain Monte Carlo Sampling
10 pages
Lec 1 Prob Bayesian Modeling
No ratings yet
Lec 1 Prob Bayesian Modeling
49 pages
Report Jan Proj06
No ratings yet
Report Jan Proj06
9 pages
Regreesion Analysis
No ratings yet
Regreesion Analysis
24 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
13 pages
RVM Tutorial
No ratings yet
RVM Tutorial
25 pages
Bayesian Variable Selection
No ratings yet
Bayesian Variable Selection
2 pages
Locally Adaptive Bayes Nonparametric Reg
No ratings yet
Locally Adaptive Bayes Nonparametric Reg
31 pages
Bayesian Priors: Understanding and Application
No ratings yet
Bayesian Priors: Understanding and Application
22 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
KoKo Manual PDF
No ratings yet
KoKo Manual PDF
27 pages
ML 3
No ratings yet
ML 3
66 pages
Variable Selection - v2
No ratings yet
Variable Selection - v2
13 pages
MCMC Final Edition
No ratings yet
MCMC Final Edition
17 pages
Robust Bayesian Model Selection For Heavy-Tailed Linear Regression Using Finite Mixtures
No ratings yet
Robust Bayesian Model Selection For Heavy-Tailed Linear Regression Using Finite Mixtures
24 pages
Interaction Effects in MLR Guide
No ratings yet
Interaction Effects in MLR Guide
3 pages
Convergence Analysis of A Collapsed Gibbs Sampler For Bayesian Vector Autoregressions
No ratings yet
Convergence Analysis of A Collapsed Gibbs Sampler For Bayesian Vector Autoregressions
31 pages
Li-Ion Battery OCV Model Insights
No ratings yet
Li-Ion Battery OCV Model Insights
10 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Multiple Regression Insights
No ratings yet
Multiple Regression Insights
40 pages
American Statistical Association, Taylor & Francis, Ltd. Journal of The American Statistical Association
No ratings yet
American Statistical Association, Taylor & Francis, Ltd. Journal of The American Statistical Association
12 pages
Econometrics Exam for 3rd Year Students
No ratings yet
Econometrics Exam for 3rd Year Students
13 pages
Bayesian Optimization of Hyperparameters From Noisy Marginal Likelihood Estimates
No ratings yet
Bayesian Optimization of Hyperparameters From Noisy Marginal Likelihood Estimates
33 pages
Bayesian Computation and Model Selection Without
No ratings yet
Bayesian Computation and Model Selection Without
32 pages
Bayesian Analysis of A Stochastic Volatility
No ratings yet
Bayesian Analysis of A Stochastic Volatility
25 pages
Lab 1 Eng
No ratings yet
Lab 1 Eng
11 pages
Applied Bayesian Econometrics For Central Bankers Updated 2017 PDF
No ratings yet
Applied Bayesian Econometrics For Central Bankers Updated 2017 PDF
222 pages
Process Anleitung Alle Modelle
No ratings yet
Process Anleitung Alle Modelle
90 pages
Stata Time Series Reference Manual Release 18
No ratings yet
Stata Time Series Reference Manual Release 18
987 pages
ProblemSheet1 23
No ratings yet
ProblemSheet1 23
5 pages
MCMC Tips for Statisticians
No ratings yet
MCMC Tips for Statisticians
8 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
GLMConstrained
No ratings yet
GLMConstrained
11 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Bayesiantsrev
No ratings yet
Bayesiantsrev
12 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
MCMC Methods For Functions: Modifying Old Algorithms To Make Them Faster
No ratings yet
MCMC Methods For Functions: Modifying Old Algorithms To Make Them Faster
23 pages
17 Ee
No ratings yet
17 Ee
10 pages
Gibbs Sampling Algorithm From Scratch Using R Programming Language
No ratings yet
Gibbs Sampling Algorithm From Scratch Using R Programming Language
11 pages
VarSelThesis PDF
No ratings yet
VarSelThesis PDF
106 pages
Jpskycak 2018 Intuiting Predictive Algorithms 1
No ratings yet
Jpskycak 2018 Intuiting Predictive Algorithms 1
16 pages
3 Practical
No ratings yet
3 Practical
2 pages
AAEC 6984 / SPRING 2014 Instructor: Klaus Moeltner
No ratings yet
AAEC 6984 / SPRING 2014 Instructor: Klaus Moeltner
9 pages
Talk On Regression Based Method For Bayesian Nonparanormal Graphical Models
No ratings yet
Talk On Regression Based Method For Bayesian Nonparanormal Graphical Models
40 pages
The Gibbs Sampler: Function
No ratings yet
The Gibbs Sampler: Function
1 page
SI Nonlin
No ratings yet
SI Nonlin
14 pages
A Review of Bayesian Variable Selection
No ratings yet
A Review of Bayesian Variable Selection
34 pages
SPSS Answers (Chapter 5)
No ratings yet
SPSS Answers (Chapter 5)
6 pages
Bayesian Marginal Likelihood Estimation
No ratings yet
Bayesian Marginal Likelihood Estimation
13 pages
EC220 2017 Paper ST
No ratings yet
EC220 2017 Paper ST
5 pages
Bayesian Nonlinear Regression Guide
No ratings yet
Bayesian Nonlinear Regression Guide
33 pages
Model Selection Talk
No ratings yet
Model Selection Talk
48 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Bayesian Modeling Using The MCMC Procedure
No ratings yet
Bayesian Modeling Using The MCMC Procedure
22 pages
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
No ratings yet
Intro To Markov Chain Monte Carlo: Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601
35 pages
Gibbs Sampling for Bayesian Regression
No ratings yet
Gibbs Sampling for Bayesian Regression
11 pages
Bayesian Analysis in Stata With Winbugs
No ratings yet
Bayesian Analysis in Stata With Winbugs
20 pages
SVM Parameter Optimization Guide
No ratings yet
SVM Parameter Optimization Guide
7 pages
Bayes Factor Null Hypothesis Tests Are Still Null Hypothesis Tests
No ratings yet
Bayes Factor Null Hypothesis Tests Are Still Null Hypothesis Tests
21 pages
Trip Generation (Cont.)
No ratings yet
Trip Generation (Cont.)
36 pages
Nihms 1037926
No ratings yet
Nihms 1037926
32 pages
Nadaraya-Watson Teoria PDF
No ratings yet
Nadaraya-Watson Teoria PDF
9 pages
Stochastic Search Variable Selection: Sooraj Bhat Isye 6416 Final Presentation
No ratings yet
Stochastic Search Variable Selection: Sooraj Bhat Isye 6416 Final Presentation
12 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
Ba Yes Factor
No ratings yet
Ba Yes Factor
55 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Tospj 8 27
No ratings yet
Tospj 8 27
12 pages
International Statistical Institute (ISI) International Statistical Review / Revue Internationale de Statistique
No ratings yet
International Statistical Institute (ISI) International Statistical Review / Revue Internationale de Statistique
19 pages
Isms 2566
No ratings yet
Isms 2566
35 pages
Bayesian Linear Regression Guide
No ratings yet
Bayesian Linear Regression Guide
29 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
MCMC With Temporary Mapping and Caching With Application On Gaussian Process Regression
No ratings yet
MCMC With Temporary Mapping and Caching With Application On Gaussian Process Regression
16 pages
Ni Hms 659556
No ratings yet
Ni Hms 659556
29 pages
Lecture05 - Survival Analysis
No ratings yet
Lecture05 - Survival Analysis
52 pages
Unirtroot
No ratings yet
Unirtroot
28 pages
A Bayesian Analysis of The Multinomial Probit Model Using Marginal Data Augmentation
No ratings yet
A Bayesian Analysis of The Multinomial Probit Model Using Marginal Data Augmentation
24 pages
Chung 2001
No ratings yet
Chung 2001
22 pages
Lode Wyck X Etal 2011
No ratings yet
Lode Wyck X Etal 2011
17 pages
Robustness of Estimation of First o
No ratings yet
Robustness of Estimation of First o
16 pages
Bayesian Methods Decrypted 67 82
No ratings yet
Bayesian Methods Decrypted 67 82
16 pages
Sjos 2
No ratings yet
Sjos 2
15 pages
Con Sonni 1995
No ratings yet
Con Sonni 1995
11 pages
Bayesian Meta-Analysis With Hierarchical Modeling
No ratings yet
Bayesian Meta-Analysis With Hierarchical Modeling
11 pages
Lecture 4 Notes Final20180219203938
No ratings yet
Lecture 4 Notes Final20180219203938
21 pages
Machine Learning Econometrics Bayesian Algorithms
No ratings yet
Machine Learning Econometrics Bayesian Algorithms
33 pages
Stepwise Regression: Forward (Step-Up) Selection
No ratings yet
Stepwise Regression: Forward (Step-Up) Selection
7 pages
1999 WR 900152
No ratings yet
1999 WR 900152
8 pages
Bayesian Estimation of AR 1 Models
No ratings yet
Bayesian Estimation of AR 1 Models
5 pages
Estimation of The Cure Rate in Iranian Breast Cancer Patients
No ratings yet
Estimation of The Cure Rate in Iranian Breast Cancer Patients
4 pages
Bayesian Prediction Using Two Stage Desi
No ratings yet
Bayesian Prediction Using Two Stage Desi
13 pages
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
No ratings yet
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
8 pages
Artificial Intelligence Lec 4
No ratings yet
Artificial Intelligence Lec 4
13 pages
76 Ed
No ratings yet
76 Ed
41 pages
Ajol File Journals - 542 - Articles - 117314 - Submission - Proof - 117314 6385 325328 1 10 20150520
No ratings yet
Ajol File Journals - 542 - Articles - 117314 - Submission - Proof - 117314 6385 325328 1 10 20150520
10 pages
Handout 3 Non Stationarity
No ratings yet
Handout 3 Non Stationarity
27 pages
Influence Networks in International Relations: Social in Uence Regression, Provides A Way
No ratings yet
Influence Networks in International Relations: Social in Uence Regression, Provides A Way
34 pages
Statistics I: Parameter Estimation, Part II
No ratings yet
Statistics I: Parameter Estimation, Part II
22 pages
OLS Variance & Error Analysis
No ratings yet
OLS Variance & Error Analysis
42 pages
Tugas Praktikum Ekonometrika 2
No ratings yet
Tugas Praktikum Ekonometrika 2
18 pages
El Sayyad1973
No ratings yet
El Sayyad1973
7 pages
Stab Sco 03
No ratings yet
Stab Sco 03
6 pages
BMS40420171201
No ratings yet
BMS40420171201
5 pages
Tests For Structural Change and Stability: y X X I N
No ratings yet
Tests For Structural Change and Stability: y X X I N
8 pages
Exp No 2
No ratings yet
Exp No 2
5 pages
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
No ratings yet
The Infinite Gaussian Mixture Model: Carl Edward Rasmussen
7 pages
Microsoft Word - Edit Linear Regression Prep Session - Revision2
No ratings yet
Microsoft Word - Edit Linear Regression Prep Session - Revision2
5 pages
PSQ Q2
No ratings yet
PSQ Q2
2 pages
Blundell Bond 1998
No ratings yet
Blundell Bond 1998
29 pages

GVSusing BUGS

Uploaded by

GVSusing BUGS

Uploaded by

Gibbs Variable Selection Using BUGS

In this paper we discuss and present in detail the implementation of

Keywords: Logistic regression; Linear regression; MCMC; Model selec-

In Bayesian model averaging or model selection we focus on the calculation of

2 Gibbs Variable Selection

f (y|β, γ) = f (y|β γ , γ).

1. We sample the parameters included in the model by the posterior

f (β γ |β \γ , γ, y) ∝ f (y|β, γ)f (β γ |γ)f (β \γ |β γ , γ) (2)

3. Sample each variable indicator γj from a Bernoulli distribution with success

f (y|β, γj = 1, γ \j ) f (β|γj = 1, γ \j ) f (γj = 1, γ \j )

f (β j |γ j ) = (1 − γj )f (β j |γj = 0) + γj f (β j |γj = 1) (5)

f (βj |γj = 1) ≡ N (0, Σj ) and f (βj |γj = 0) ≡ N (µ̄j , Sj ). (7)

The hyperparameters µ̄j and Sj are parameters of the pseudoprior distribution;

3 Applying Gibbs Variable Selection Using

3.1 Definition of likelihood

for (i in 1:N) { for(j in 1:p) {z[i,j]<-x[i,j]*b[j]*g[j]}}

• N denotes the sample size,

• p the number of total variables under consideration,

• x[i,j] is the i, j component of the data or design matrix X,

• y[i] is i element of the response vector y,

• b[j] is the j element of the parameter vector β,

• z[i,j] is a matrix used to simplify calculations,

• eta[i] is the i element of linear predictor vector η and should be substituted

• distribution should be substituted by appropriate BUGS command for the

• parameter1,parameter2 should be substituted according to distribution cho-

Normal: for (i in 1:N) { mu[i] <- sum(z[i,]) ;

Poisson: for (i in 1:N) { log(lambda[i]) <- sum(z[i,]);

Binomial: for (i in 1:N) { logit(p[i]) <- sum(z[i,]);

3.2 Definition of Prior Distribution of Parameter Vector

βj ∼ γj N (0, Σj ) + (1 − γj )N (µ̄j , Sj ), j = 1, 2, . . . , p (8)

• b[j]∼dnorm( bpriorm[j], tprior[j]) denoting βj ∼ N (mj , τj−1 ),

• bpriorm[j] < − (1-g[j])*mean[j] denoting mj = (1 − γj )µ̄j ,

• b[ ] ∼ dmnorm( bpriorm[ ], Tau[,]) denoting β ∼ Nd (m, T −1 ),

• bpriorm[k]< −(1-g[pos[k]])*mean[k] denoting mk = (1 − γposk )µ̄k ,

• Tau[k,l] < − g[pos[k]]*g[pos[l]]*t[k,l]+

where Nd is the d-dimensional normal distribution; mT = (m1 , m2 , . . . , md ) and T

3.3 Implementation of Other Gibbs Samplers for Variable

Moreover, the likelihood in SSVS should be slightly modified by substituting the

3.4 Definition of Prior Term Probabilities

• g[j] ∼ dbern(0.5) denoting that γj ∼ Bernoulli(0.5).

• pi < − g[3]+0.5(1-g[3]) denoting that π = γAB + 0.5 ∗ (1 − γAB ),

From the above it is evident that

f ([AB]) = f (γAB = 1)f (γA = 1|γAB = 1)f (γB = 1|γAB = 1)

f ([A][B]) = f (γAB = 0)f (γA = 1|γAB = 0)f (γB = 1|γAB = 0)

3.5 Calculating Model Probabilities in Bugs

This example is a 3 × 2 × 4 contingency table presented by Knuiman and Speed

Table 1: 3 × 2 × 4 Contingency Table: Knuiman and Speed (1988) Dataset.

The full model is given by

nilk ∼ P oisson(λilk ), log(λilk ) = m + oi + hl + ak + ohil + oaik + halk + ohailk ,

for i = 1, 2, 3, l = 1, 2, k = 1, 2, 3, 4. The above model can be rewritten

• g[8] ∼ dbern(0.1111) for γOHA ∼ Bernoulli(1/9).

• pi < − g[8]+0.5*(1-g[8]) for π = γOHA + 0.5 ∗ (1 − γOHA ),

• for (i in 5:7) { g[j]∼dbern(pi) } for γj |γOHA ∼ Bernoulli(π) for all

• for (j in 1:4) { g[j] ∼ dbern(1) } for γj ∼ Bernoulli(1) for all i ∈

Table 2: 3 × 2 × 4 Contingency Table: Posterior Model Probabilities Using BUGS.

of independence. Model [OH][A], in which obesity and hypertension are depending

4.2 Example 2: Stacks Dataset

yi ∼ D(µi , τ ), µi = β0 + β1 zi1 + β2 zi2 + β3 zi3 , i = 1, . . . , 21

Table 3: Stacks Dataset: Posterior Model Probabilities in Independence Regression

Table 4: Stacks Dataset: Posterior Model Probabilities in Ridge Regression (burn-in

4.3 Example 3: Seeds Dataset, Logistic Regression with Ran-

This example involves the examination of a proportion of seeds that germinated

Fixed Effects Random Effects

You might also like

for (i in 1:N) { for(j in 1:p) {z[i,j]<-x[i,j]b[j]g[j]}}

• Tau[k,l] < − g[pos[k]]g[pos[l]]t[k,l]+