GVSusing BUGS
GVSusing BUGS
Ioannis Ntzoufras∗
Department of Business Administration
University of the Aegean, Chios, Greece
e-mail: [email protected]
Abstract
1 Introduction
1
The paper is organised into three sections additional to this introductory one. Sec-
tion 2 briefly describes the general Gibbs variable selection algorithm as introduced
by Dellaportas et al. (2002), Section 3 provides detailed guidance for implementa-
tion in BUGS and finally Section 4 presents three illustrated examples.
Many statistical models may be represented naturally as (s, γ) ∈ S × {0, 1}p , where
the indicator vector γ identifies which of the p possible sets of covariates are present
in the model and s denotes other structural properties of the model. For example,
for a generalised linear model, s may describe the distribution, link function and
variance function, and the linear predictor may be written as
p
X
η= γj X j β j (1)
j=1
where X j is the design matrix and β j the parameter vector related to the jth term.
In the following, we restrict attention to variable selection aspects assuming that s
is known and we concentrate on the estimation of the posterior distribution of γ.
We denote the likelihood of each model by f (y|β, γ) and the prior by f (β, γ) =
f (β|γ)f (γ), where f (β|γ) is the prior of the parameter vector β conditional on the
model structure γ and f (γ) is the prior of the corresponding model. Moreover, β
can be partitioned into two vectors β γ and β \γ corresponding to parameters of
variables included or excluded from the model. Under this approach the prior can
be rewritten as
f (β, γ) = f (β γ |γ)f (β \γ |β γ , γ)f (γ)
while, since we are using the linear predictor (1), the likelihood can be simplified to
From the above it is clear that the components of the vector β \γ do not affect the
model likelihood and hence the posterior distribution within each model γ is given
by
f (β|γ, y) = f (β γ |γ, y) × f (β \γ |β γ , γ)
where f (β γ |γ, y) is the actual posterior of the parameters of model γ and
f (β \γ |β γ , γ, y) is the conditional prior distribution of the parameters not included
in the model γ. We may now interpret f (β γ |γ) as the actual prior of the model
while the distribution f (β \γ |β γ , γ) may be called as ‘pseudoprior’ since the param-
eter vector β \γ does not gain any information from the data and does not influence
the actual posterior of the parameters of each model, f (β γ |γ, y). Although this
pseudoprior does not influence the posterior distributions of interest, it influences
the performance of the MCMC algorithm and hence it should be specified with
caution.
The sampling procedure is summarised by the following steps:
2
2. Sample the parameters excluded from the model from the pseudoprior
f (β \γ |β γ , γ, y) ∝ f (β \γ |β γ , γ) (3)
The selection of priors and pseudopriors is a very important aspect in model selec-
tion. Here we briefly present the simplest approach where f (β|γ) is given a product
Qp
of independent prior and pseudoprior densities: f (β|γ) = j=1 f (β j |γ j ). In such
case, a usual and simple choice of f (β j |γ j ) is given by
3
The prior proposed by Dellaportas and Forster (1999) for contingency tables, is also
adopted here for logistic regression models with categorical explanatory variables
(see Dellaportas et al. , 2000). Alternatively, for generalized linear models, Raftery
(1996) has proposed to select the prior covariance matrix using elements from the
data matrix multiplied by a hyperparameter. The latter was selected in such way
that the effect of the prior distribution on the posterior odds becomes minimal.
When no restrictions on the model space are imposed then a common prior for the
term indicators γj is f (γj ) = Bernoulli (1/2), whereas in other cases (for example,
hierarchical or graphical log-linear models) it is required that f (γj |γ \j ) depends on
γ\j ; for more details see Chipman (1996) and Section 3.4.
Other Gibbs samplers for model selection have also been proposed by George and
McCulloch (1993), Carlin and Chib (1995) and Kuo and Mallick (1998). Detailed
comparison and discussion of the above methods is given by Dellaportas et al. (2000,
2002). Implementation of Carlin and Chib methodology in BUGS is illustrated by
Spiegelhater et al. (1996c, page 47) while an additional simple example of Gibbs
variable selection methods is provided by Dellaportas et al. (2000).
In this section we provide detailed guidance for implementing Gibbs variable selec-
tion using BUGS software. It is divided into four sub-sections involving the defi-
nition of the model likelihood f (y|β, γ), the specification of the prior distributions
f (β|γ) and f (γ), and, finally, the direct calculation of posterior model probabilities
using BUGS.
The linear predictor of type (1) used in Gibbs variable selection and Kuo and Mallick
sampler can be easily incorporated in BUGS using the following code
where
4
• g[j] is the inclusion indicator for j element of γ,
For the usual normal, binomial and Poisson models the model formulations are
given by the following lines of BUGS code
When we use independent priors as given by (5) and each covariate parameter vector
is univariate, the definition of the prior is straightforward. Our prior is a mixture
of independent normal distributions
where µ̄j , Sj are the mean and variance respectively used in the corresponding
pseudoprior distributions and Σj is the prior variance, when the j term is included
in the model. In order to use (8) in BUGS we write
5
• tprior[j] < − g[j]*t[j]+(1-g[j])*pow(se[j],-2) denoting τj = (1 −
γj )Sj−1 + γj Σ−1
j ,
for j = 1, 2, . . . , p; where mj and τj are the prior mean and precision for βj depend-
ing on γj and t[j], se[j], mean[j], bpriorm[j], tprior[j] are the BUGS
variables for Σ−1
p
j , Sj , µ̄j , mj and τj , respectively.
When we consider a categorical explanatory variable j with J > 2 categories then
the corresponding parameter vector β j will be multivariate with dimension dj =
J − 1. In such cases we denote by p and d(> p) the dimensions of γ and the full
parameter vector β, respectively. Therefore, we need one variable to facilitate the
association between these two vectors. This vector is denoted by the BUGS variable
pos. The pos vector, which has dimension equal to the dimension of β, takes values
from 1, 2, ..., p and denotes that kth element of the parameter vector β is associated
with the γposk binary indicator for all k = 1, 2, ..., d.
For illustration, let us consider an ANOVA model with two categorical variables X1
and X2 with 3 and 4 categories respectively. Then, the terms under consideration
are X0 , X1 , X2 and X12 ; where X0 denotes the constant term and X12 the inter-
action between the terms X1 and X2 . The corresponding dimensions are dX0 = 1,
dX1 = 2, dX2 = 3 and dX12 = dX1 × dX2 = 6. Then, we set the pos vector equal to
pos <- c ( 1, 2,2, 3,3,3, 4,4,4,4,4,4 )
to state that the first parameter corresponds to the first term (X0 ), parameters 2-3
correspond to the second term (X1 ), parameters 4-6 correspond to the third term
(X2 ) and parameters 7-12 correspond to the fourth term (X12 ). Finally, we use
another vector called gtemp of dimension d which is given by
gtemp[i] <- g[ pos[i] ]
for all i = 1, . . . , d. The vector gtemp is used in the likelihood instead of the g
vector. For details see example 1 and the associated BUGS code in the Appendix.
Moreover, the definition of the prior distribution when factors or terms with many
parameters are considered is more complicated. For example a mixture of multi-
variate normal prior distributions as given by (5) can be expressed as a multivariate
normal distribution on the ‘full’ parameter vector β. Therefore we may write in
BUGS
6
the k row and l column elements of T and Σ−1 matrices respectively; and Tau[,],
t[,] are the BUGS matrices for T and Σ−1 , respectively. An illustration of usage
of such prior distribution is given in example 1.
SSVS and Kuo and Mallick sampler can easily be applied with minor modifications
in the above code. In SSVS the prior (8) is used with µ̄j = 0 and Sj = Σj /kj2 ,
where kj2 should be large enough in order that βj will be close to zero when γj = 0.
For selection of the prior parameters in SSVS see semiautomatic prior selection of
George and McCulloch (1993). The above restriction can easily be applied in BUGS
by
bpriorm[j] <- 0
tprior[j] <- t[j]*g[j]+(1-g[j])*t[j]*pow(k[j],2) .
Kuo and Mallick sampler uses prior on β that does not depend on model indicator
γ. Therefore the specification of the prior is the same as in simple modelling using
BUGS. Moreover, the likelihood definition is the same as in Gibbs variable selection
described in Section 3.1.
In order to apply any variable selection method in BUGS we need to define the prior
probabilities f (γ). When we are vague about models we may set f (γ) = 1/M , where
M is the number of all models under consideration. When the explanatory variables
do not involve interactions (for example in linear regression) then the number of
models under consideration is 2p . In these situations the latent variables γj can be
treated as a − priori independent and therefore set in BUGS
for all j = 1, 2, . . . , p. This prior results to f (γ) = 2−p for all γ ∈ {0, 1}p . When
we are dealing with models using categorical explanatory variables with interaction
terms, such as AN OV A or log-linear models, we usually want to restrict attention to
hierarchical models. The conditional distributions of f (γj |γ \j ) need to be specified
in such way that f (γ) = 1/M when γ is referring to hierarchical model and f (γ) = 0
otherwise.
For example, in a two way AN OV A we have three terms under consideration; the
main effects A,B and the interaction AB. All possible models are eight, while the
hierarchical ones are only five (constant, [A], [B], [A][B] and [AB]). Therefore, we
wish to specify f (γ) = 0.20 for the above five models and f (γ) = 0 for the rest.
This can be applied by setting in BUGS
7
• g[3] ∼ dbern(0.2) denoting that γAB ∼ Bernoulli(0.2).
• for (i in 1:2) { g[j] ∼ dbern(pi) } denoting that for all i ∈ {A, B},
γj |γAB ∼ Bernoulli(π).
Using similar calculations we find that f (γ) = 0.2 for all five models under consid-
eration. For further relevant discussion and application see Chipman (1996). For
implementation in BUGS see examples 1 and 3.
In order to directly calculate the posterior model probabilities in BUGS and avoid
saving large MCMC output we may use matrix type variables with dimension equal
p
γj 2j−1 we transform
P
to the number of models. Using a simple coding such as 1+
j=1
the vector γ in a unique, for each model index (noted by mdl) for which pmdl[mdl]=1
and pmdl[j]=0 for all j 6= mdl. The above statements can be written in BUGS with
the code
for (j in 1:p) { index[j] < − pow(2,j-1) }
mdl < − 1+inprod(g[ ], index[ ])
for (m in 1:mdl) { pmdl[m] < − equals(m,mdl) }
Then using the command stats(pmdl) in BUGS environment (or cmd file) we can
monitor the posterior model probabilities. This is feasible only if the number of
models is limited and therefore applicable only in some simple problems.
4 Examples
The implementation of three illustrated examples are briefly presented. The first
example is a 3 × 2 × 4 contingency table used to illustrate how to handle factors
with more than two levels. Example 2 provides model selection details in a regres-
sion type problem involving many different error distributions while example 3 is
a simple logistic regression problem with random effects. In all examples posterior
probabilities are presented while the associated BUGS codes are provided in the
appendix. Additional details (for example, convergence plots) are omitted since our
aim is just to illustrate how to use BUGS for variable selection.
8
4.1 Example 1: 3 × 2 × 4 Contingency Table
Alcohol Intake
Obesity High BP 0 1-2 3-5 6+
Low Yes 5 9 8 10
No 40 36 33 24
Average Yes 6 9 11 14
No 33 23 35 30
High Yes 9 12 19 19
No 24 25 28 29
These priors result to prior probability for all hierarchical models equal to 1/9 and
zero otherwise.
Results using both pilot run pseudoprior and automatic pseudoprior with k = 10
are summarised in Table 2. The data give ‘strong’ evidence in favour of the model
9
Posterior Model Probabilities (%)
Pseudopriors k=10 Pilot Run
Burn-in 1,000 10,000 1,000 10,000
Iterations 1,000 10 × 10, 000 1,000 10 × 10, 000
Models
[O][H][A] 62.80 68.87 65.20 67.80
[OH][A] 36.90 30.53 34.40 31.63
[O][HA] 0.20 0.40 0.10 0.43
[OH][HA] 0.10 0.20 0.30 0.14
Terms
γOH = 1 37.00 30.63 34.70 31.77
γHA = 1 0.30 0.20 0.40 0.57
This example involves stack-loss data analysed by Spiegelhalter et al. (1996b, page
27) using the Gibbs sampling. The dataset features 21 daily responses of stack loss
(y) which measures the amount of ammonia escaping with covariates the air flow
(x1 ), temperature (x2 ) and acid concentration (x3 ). Spiegelhalter et al. (1996b)
consider regression models with four different error structures (normal, double ex-
ponential, logistic and Student’s t4 distributions). They also consider the cases of
ridge and simple independent regression models. We extend their work by applying
Gibbs variable selection on all these eight cases.
The full model will be
where Di (µi , τ ) is the distribution of the errors with mean µi and variance τ −1 which
here is assumed to be normal, or double exponential, or logistic or t4 ; where zij =
(xij − x̄j )/sd(xj ) are the standardised covariates. The ridge regression approach
assumes a further restriction that the βj for j = 1, 2, 3 are exchangable (Lindley
and Smith, 1972) and therefore we have βj ∼ N (0, φ−1 ). We use ‘non-informative’
priors with prior precision equal to 10−3 for the independent regression and for φ
in ridge regression we use gamma prior with parameters equal to 10−3 . Since we do
not wish to apply any restriction on the model space we use the prior probabilities
γj ∼ Bernoulli(1/2) for j = 1, 2, 3 which results to prior probability of 1/8 for
all possible models. For the pilot run pseudoprior parameters we use the posterior
values as given Spiegelhalter et al. (1996b).
Tables 3 and 4 provide the results from all eight distinct cases using pilot run
pseudopriors. In all cases flow of air (z1 ) has posterior probability of inclusion
higher than 99%. The temperature (z2 ) seems to be also an important term with
10
posterior probability of inclusion varying from 39% to 96%. The last term (z3 ) which
measures the acid concentration in air has low posterior probabilities of inclusion
which are less than 5% for simple independence models and less than 20% for ‘ridge’
regression models.
Independence Regression
Models Normal D.Exp. Logistic t4
Constant 0.00 0.00 0.00 0.00
z1 14.12 58.48 41.19 56.46
z2 0.56 0.01 0.02 0.00
z1 + z2 81.25 38.64 55.25 40.46
z3 0.00 0.00 0.00 0.00
z1 + z3 0.63 1.75 1.35 1.82
z2 + z3 0.05 0.00 0.00 0.00
z1 + z2 + z3 3.39 1.11 2.18 1.26
Terms
γz1 = 1 99.30 99.98 99.97 100.00
γz2 = 1 84.90 39.76 57.45 41.72
γz3 = 1 4.30 2.86 3.53 3.08
11
is a logistic regression with 2 categorical explanatory variables and random effects.
The full model will be written
pilk
yilk ∼ Bin(nilk , pilk ), log = m + ai + bl + abil + wk ,
1 − pilk
for i, l = 1, 2 and k = 1, . . . , 21; where yilk and nilk is the number of seeds germinated
and total number of seeds respectively for i seed, l type of root extract and k plate;
wk is the random effect for the k plate.
We use sum-to-zero constraints for both fixed and random effects. Following Della-
portas and Forster (1999) we use prior variance for the fixed effects Σ = 4 × 2. The
prior for the precision of the random effects is considered to be a gamma distribu-
tion with parameters equal to 10−3 . The pseudoprior parameters were taken from
a pilot chain of the saturated model. The models under consideration are ten. The
prior term probabilities for the fixed effects are assigned similarly as in the example
for two-way ANOVA models. For the random effects term indicator we have that
γw ∼ Bernoulli(0.5).
Table 5: Seeds Dataset: Posterior Model Probabilities Using BUGS (burn-in 10,000,
samples of 10 × 10, 000).
Table 5 provides the calculated posterior model probabilities. We used both pilot
run proposals and automatic pseudoprior with k = 10. Both chains gave the same
results as expected and the type of root extract (B) is the only factor that influences
the proportion of germinated gems. The corresponding models with random and
fixed effects have posterior probability equal to 51% and 32%, respectively. The
marginal posterior probability of random effects is 61% which is about 56% higher
than the posterior probability of fixed effects models.
12
5 Appendix: BUGS Codes
Bugs code and all associated data files are freely available in electronic form at
the Journal of Statistical Software web site www.jstatsoft.org/v07/i07/ or by
electronic mail request.
5.1 Example 1
model log-linear;
#
# 3x2x4 LOG-LINEAR MODEL SELECTION WITH BUGS (GVS)
# (c) OCTOBER 1996
# (c) REVISED OCTOBER 1997
#
const
terms=8, # number of terms
N = 24; # number of Poisson cells
var
include, # conditional prior probability for gi
pmdl[9], # model indicator vector
mdl, # code of model
b[N], # model coefficients
mean[N], # proposal mean used in pseudoprior
se[N], # proposal standard deviation used in
# pseudoprior
bpriorm[N], # prior mean for b depending on g
Tau[N,N], # model coefficients precision
tprior[N,N],# prior value for Tau when all terms
# are included in model
x[N,N], # design matrix
z[N,N], # matrix with z_ij=x_ij b_j g_j, used in
# likelihood
n[N], # Poisson cells
pos[N], # position of each parameter
lambda[N], # Poisson mean for each cell
gtemp[N], # temporary term indicator vector
g[terms]; # term indicator vector
data pos,n in "ex2.dat", x in ’ex2des.dat’,
mean, se in ’prop2.dat’, tprior in ’cov.dat’;
inits in "ex2.in";
{
#
# associate g[i] with coefficients.
#
for (i in 1:N) {
gtemp[i]<-g[pos[i]];
}
#
# calculation of the z matrix used in likelihood
#
for (i in 1:N) {
for (j in 1:N) {
z[i,j]<-x[i,j]*b[j]*gtemp[j]
}
}
#
# model configuration
for (i in 1:N) {
log(lambda[i])<-sum(z[i,])
n[i]~dpois(lambda[i]);
}
# defining model code
# 0 for independence model [A][B][C], 1 for [AB][C],
13
# 2 for [AC][B], 3 for [AB][AC], 4 for [BC][A],
# 5 for [AB][BC], 6 for [AC][BC], 7 for [AB][BC],
# 15 for [ABC].
#
mdl<-g[5]+2*g[6]+4*g[7]+8*g[8];
for (i in 0:7) {
pmdl[i+1]<-equals(mdl,i)
}
pmdl[9]<-equals(mdl,15)
#
# Prior for b model coefficient
# Mixture normal depending on current status of g[i]
#
for (i in 1:N) { for (j in 1:N) {
#
# GVS using se,mean from pilot run
# ********************************
#
Tau[i,j]<-0+tprior[i,j]*(gtemp[i]*gtemp[j])+
(1-gtemp[i]*gtemp[j])*equals(i,j)/(se[i]*se[i]);
#
# Automatic proposal using prior similar to SSVS
# with k=10
# ************************************************
# Tau[i,j]<-tprior[i,j]*pow(100,1-gtemp[i]*gtemp[j]);
#
# Kuo and Mallick proposal is independent of g[i]
# [tau[i]=1/2 and bpriorm[i]=0]
# ***********************************************
#
#
# Tau[i,j]<-tprior[i,j];
#
}
#
# GVS PRIOR M FROM PILOT RUN
# **************************
bpriorm[i]<-mean[i]*(1-gtemp[i]);
#
# PRIOR M FOR THAT DOES NOT DEPEND ON G.
# *************************************
# bpriorm[i]<-0.0;
}
b[]~dmnorm(bpriorm[],Tau[,]);
#
# defining prior information for gi to
# allow only hierarchical models with equal probability.
# We also ignore models nested to the model of
# independence [A][B][C] since we are interested in
# associations between factors.
g[8]~dbern(0.1111111);
include<-(1-g[8])*0.5+g[8]*1.0;
g[7]~dbern(include);
g[6]~dbern(include);
g[5]~dbern(include);
for (i in 1:4) {
g[i]~dbern(1.0);
}
}
14
5.2 Example 2
model stacks;
#
# LINEAR REGRESSION VARIABLE SELECTION WITH BUGS (GVS)
# BUGS EXAMPLE: STACKS, see BUGS examples vol.1
#
# (c) OCTOBER 1997
#
const
p = 3, # number of covariates
N = 21, # number of observations
models=8, # number of models under consideration 2^8
PI = 3.141593;
var
x[N,p], # raw covariates
z[N,p] , # standardised covariates
Y[N],mu[N], # data and expectations
stres[N], # standardised residuals
outlier[N], # indicator if |stan res| > 2.5
beta0,beta[p], # standardised intercept, coefficients
b0,b[p], # unstandardised intercept, coefficients
phi, # prior precision of standardised coef.
tau,sigma,d, # precision, sd and d.f. of t distribution
g[p], # variable indicators
mdl, # Model index
pmdl[models], # Vector with model indicators
mean[p],se[p], # pseudoprior mean and se error
bprior[p], # Conditional to model Prior prior mean
tprior[p]; # Conditional to model Prior prior precision
data Y,x in "STACKS.DAT",
# files with proposed values
mean,se in ’pnorm.dat’; # Normal distribution
#mean,se in ’pdexp.dat’; # Double exponential distribution
#mean,se in ’plogist.dat’;# Logistic distribution
#mean,se in ’pt4.dat’; # Student(4) distribution
inits in "STACKS.IN";
{
# Standardise x’s and coefficients
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j]) ;
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ;
}
}
b0<-beta0-b[1]*mean(x[,1])-b[2]*mean(x[,2])-b[3]*mean(x[,3]);
# Model
d <- 4; # degrees of freedom for t
for (i in 1:N) {
#
# Normal Distribution
# -------------------
Y[i] ~ dnorm(mu[i],tau);
#
# Double Exponential Distribution
# -------------------------------
# Y[i] ~ ddexp(mu[i],tau);
#
# Logistic Distribution
# ----------------------
# Y[i] ~ dlogis(mu[i],tau);
#
# Student t4 Distribution
# -----------------------
15
# Y[i] ~ dt(mu[i],tau,d);
#
mu[i] <- beta0 + g[1]*beta[1]*z[i,1]+g[2]*beta[2]*z[i,2]
+ g[3]*beta[3]*z[i,3];
stres[i] <- (Y[i] - mu[i])/sigma;
#
# if standardised residual is greater than 2.5 then outlier
outlier[i]<-step(stres[i] -2.5) + step(-(stres[i]+2.5) );
}
#
# Defining Model Code
mdl<- 1+g[1]*1+g[2]*2+g[3]*4
#
# defining vector with model indicators
for (j in 1:models){
pmdl[j]<-equals(mdl,j);}
# Priors
beta0 ~ dnorm(0,.00001);
for (j in 1:p) {
#
# ******** GVS PRIORS FOR INDEPENDENCE REGRESSION ********
#
# GVS priors with proposals from pilot run
# bprior[j]<-(1-g[j])*mean[j];
# tprior[j] <-g[j]*0.001+(1-g[j])/(se[j]*se[j]);
#
# GVS priors with proposals a mixture of Normals(0,c^2t^2)
bprior[j]<-0.0;
tprior[j] <-pow(100,1-g[j])*0.001;
#
# ******** GVS PRIORS FOR RIDGE REGRESSION ********
#
# GVS priors with proposals from pilot run
# bprior[j]<-(1-g[j])*mean[j];
# tprior[j] <-g[j]*phi+(1-g[j])/(se[j]*se[j]);
#
# GVS priors with proposals a mixture of Normals(0,c^2t^2)
# bprior[j]<-0.0;
# tprior[j] <-pow(100,1-g[j])*phi;
beta[j] ~ dnorm(bprior[j],tprior[j]); # coefs independent
}
tau ~ dgamma(1.0E-3,1.0E-3);
#
# phi ~ dgamma(1.0E-3,1.0E-3);
#
# when we use pilot run based pseudopriors bugs was unable
# to select update method. Therefore we use an upper limit
# which makes bugs work with Metropolis instead Gibbs
#
# phi ~ dgamma(1.0E-3,1.0E-3)I(0,10000);
# standard deviation of error distribution
sigma <- sqrt(1/tau); # normal errors
# sigma <- sqrt(2)/tau; # double exponential errors
# sigma <- sqrt(pow(PI,2)/3)/tau ; # logistic errors
# sigma <- sqrt(d/(tau*(d-2))); # errors of t with d d.f.
#
#
# Priors for variable indicators
for (j in 1:p) { g[j]~ dbern(0.5);}
}
16
5.3 Example 3
model seedszrogvs;
#
# LOGISTIC REGRESSION VARIABLE AND
# RANDOM EFFECTS SELECTION WITH BUGS (GVS)
#
# BUGS EXAMPLE: SEEDS, see BUGS examples vol.1
#
# (c) OCTOBER 1997
#
const
terms=4, # Number of terms under consideration
models=16,# number of models under consideration 2^4
N = 21; # number of samples
var
alpha0, alpha1, alpha2, alpha12, # model coefficients
tau, sigma, # variance of random effects (tau=1/sigma)
x1[N], x2[N], # Design Column for factor a1 and a2
# here we used the STZ constraints
p[N], # Success probability for Binomial
n[N], # Total number of trials for Binomial
r[N], # Binomial data
b[N], # Random effects (standardised)
c[N], # Random effects c[i] (unstandardised)
include, # conditional model probability for
# main effects
g[terms], # terms indicator vector
mdl, # model index
pmdl[models], # model indicator
mean[terms-1], # proposal mean
se[terms-1], # proposal se
bprior[terms-1],# prior mean for model coefficients
tprior[terms-1];# prior precision for model coefficients
data r,n,x1,x2 in "seeds.dat", mean,se in ’prop6.dat’;
inits in "seeds.in";
{
alpha0 ~ dnorm(0.0,1.0E-6); # intercept
for (j in 1:(terms-1)) {
# ******** GVS PRIORS ***********
#
# GVS priors with proposals from pilot run
bprior[j]<-(1-g[j])*mean[j];
tprior[j] <-g[j]/8+(1-g[j])/(se[j]*se[j]);
#
# GVS priors with proposals a mixture of Normals(0,c^2t^2)
# bprior[j]<-0.0;
# tprior[j] <-pow(100,1-g[j])/8;
}
#
#
alpha1 ~ dnorm(bprior[1],tprior[1]); # seed coeff
alpha2 ~ dnorm(bprior[2],tprior[2]); # extract coeff
alpha12 ~ dnorm(bprior[3],tprior[3]);
tau ~ dgamma(1.0E-3,1.0E-3); # 1/sigma^2
for (i in 1:N) {
c[i] ~ dnorm(0.0,tau);
b[i] <- c[i] - mean(c[]); # make sure b’s add to zero
logit(p[i]) <-alpha0+g[1]*alpha1*x1[i]+g[2]*alpha2*x2[i]
+g[3]*alpha12*x1[i]*x2[i]+g[4]*b[i];
r[i] ~ dbin(p[i],n[i]);
}
17
sigma <- 1.0/sqrt(tau);
#
# Defining Model Code
mdl<- 1+g[1]*1+g[2]*2+g[3]*4+g[4]*8
#
# defining vector with model indicators
for (j in 1:models){
pmdl[j]<-equals(mdl,j);}
# Priors for variable indicators
g[4]~ dbern(0.50);
g[3]~ dbern(0.20);
include<-g[3]+(1-g[3])*0.5
g[2]~ dbern(include);
g[1]~ dbern(include);
}
References
Carlin, B.P. and Chib, S. (1995). ‘Bayesian Model Choice via Markov Chain Monte
Carlo Methods’, Journal of the Royal Statistical Society B, 157, 473–484.
Chipman, H. (1996). ‘Bayesian Variable Selection with Related Predictors’, Cana-
dian Journal of Statistics, 24, 17–36.
Clyde, M., DeSimone, H. and Parmigiani, G. (1996). ‘Prediction via Orthogonalized
Model Mixing’, Journal of the American Statistical Association, 91, 1197–1208.
Dellaportas, P. and Forster, J.J. (1999). ‘Markov Chain Monte Carlo Model Deter-
mination for Hierarchical and Graphical Log-linear Models’, Biometrika, 86,
615–633.
Dellaportas, P., Forster, J.J. and Ntzoufras, I. (2000). ‘Bayesian Variable Selection
Using the Gibbs Sampler’, Generalized Linear Models: A Bayesian Perspective
(D. K. Dey, S. Ghosh, and B. Mallick, eds.). New York: Marcel Dekker, 271–
286.
Dellaportas, P., Forster, J.J. and Ntzoufras, I. (2002). ‘On Bayesian Model and
Variable Selection Using MCMC’, Statistics and Computing, 12, 27–36.
George, E.I. and McCulloch, R.E. (1993). ‘Variable Selection via Gibbs Sampling’,
Journal of the American Statistical Association, 88, 881–889.
Green, P. (1995). ‘Reversible Jump Markov Chain Monte Carlo Computation and
Bayesian Model Determination’, Biometrika, 82, 711–732.
Kuo, L. and Mallick, B. (1998). ‘Variable Selection for Regression Models’,
Sankhya B, 60, 65–81.
Knuiman, M.W. and Speed, T.P. (1988). ‘Incorporating Prior Information Into the
Analysis of Contingency Tables’, Biometrics, 44, 1061–1071.
Lindley, D.V. and Smith, A.F.M. (1972). ‘Bayes Estimates for the Linear Model’
(with discussion). Journal of the Royal Statistical Society B, 34, 1–41.
Ntzoufras, I. (1999). ‘Aspects of Bayesian Model and Variable Selection Using
MCMC’, Unpublished Ph.D. Thesis, Department of Statistics, Athens Uni-
versity of Economics and Business, Athens, Greece.
Raftery, A.E. (1996). ‘Approximate Bayes Factors and Accounting for Model Un-
certainty in Generalized Linear Models’, Biometrika, 83, 251–266.
Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W. (1996a). BUGS 0.5:
Bayesian Inference Using Gibbs Sampling Manual, MRC Biostatistics
Unit, Institute of Public health, Cambridge, UK. Available from
www.mrc-bsu.cam.ac.uk/bugs/documentation/bugs05/manual05.html.
18
Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W. (1996b). BUGS 0.5:
Examples Volume 1, MRC Biostatistics Unit, Institute of Public health,
Cambridge, UK. Available on line access from
www.mrc-bsu.cam.ac.uk/bugs/documentation/exampVol1/bugs.html.
Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W.(1996c). BUGS 0.5:
Examples Volume 2, MRC Biostatistics Unit, Institute of Public health, Cam-
bridge, UK. Available on line access from
www.mrc-bsu.cam.ac.uk/bugs/documentation/exampVol2/vol 2.html.
19