Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ethan-alt/hdbayes

Repository files navigation

CRAN R-CMD-check

hdbayes: Bayesian Analysis of Generalized Linear Models with Historical Data

The goal of hdbayes is to make it easier for users to conduct Bayesian analysis methods that leverage historical data.

Installation

Using hdbayes package requires installing the R package cmdstanr (not available on CRAN) and the command-line interface to Stan: CmdStan. You may follow the instructions in Getting started with CmdStanR to install both.

You can install the released version of hdbayes from CRAN with:

install.packages("hdbayes", type = "source")

Please note that it is important to set type = "source". Otherwise, the ‘CmdStan’ models in the package may not be compiled during installation.

Alternatively, you can install the development version from GitHub with

remotes::install_github("ethan-alt/hdbayes")

Example

We now demonstrate how to use hdbayes via a simple simulation study.

Setting up MCMC parameters

We begin by setting up parameters for the MCMC. We detect the number of cores on our machine as well as set up the warmup and total number of desired samples.

## obtain number of cores
ncores        = max(1, parallel::detectCores() - 1)
chains        = 4      ## number of Markov chains to run
iter_warmup   = 1000   ## warmup per chain for MCMC sampling
iter_sampling = 2000   ## number of samples post warmup per chain

Simulating logistic regression data

Then we simulate some logistic regression data. For simplicity, we generate one current and one historical data set. Note that the package allows for using multiple historical data sets. Let the current data set be denoted by D = \ (y_i, x_i), i = 1, \ldots, n \, where n denotes the sample size of the current data. Suppose we have H historical data sets, and let the h^{th} historical data set, h \in \1, \ldots, H\, be denoted by D_{0h} = \ (y_{0hi}, x_{0hi}), i = 1, \ldots, n_{0h} \, where n_{0h} denotes the sample size of the h^{th} historical data. Let D_0 = \D_{01}, \ldots, D_{0H}\ denote all the historical data.

## simulate logistic regression data
set.seed(391)
n  = 200
n0 = 100
N  = n + n0

X    = cbind(1, 'z' = rbinom(N, 1, 0.5), 'x' = rnorm(N, mean = 1, sd = 1) )
beta = c(1, 0.5, -1)
mean = binomial('logit')$linkinv(X %*% beta)
y    = rbinom(N, 1, mean)

## create current and historical data sets
data      = data.frame('y' = y, 'z' = X[, 'z'], 'x' = X[, 'x'])
histdata  = data[1:n0, ]
data      = data[-(1:n0), ]
data.list = list(data, histdata)

Frequentist analysis of the data sets

We use the stats::glm function to conduct a frequentist GLM of the data sets

formula = y ~ z + x
family  = binomial('logit')
fit.mle.cur  = glm(formula, family, data)
fit.mle.hist = glm(formula, family, histdata)
pars         = names(coefficients(fit.mle.cur))
summary(fit.mle.cur)
#> 
#> Call:
#> glm(formula = formula, family = family, data = data)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   0.7600     0.2744   2.769  0.00562 ** 
#> z             0.6769     0.3078   2.199  0.02787 *  
#> x            -0.7500     0.1814  -4.134 3.56e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 273.33  on 199  degrees of freedom
#> Residual deviance: 250.60  on 197  degrees of freedom
#> AIC: 256.6
#> 
#> Number of Fisher Scoring iterations: 4

Bayesian analysis methods

We now utilize the functions in this package.

Bayesian hierarchical model

The Bayesian hierarchical model (BHM) may be expressed hierarhically as

\begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right), \\i = 1, \ldots, n \ y_{0hi} | x_{0hi}, \beta_{0h} &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_{0hi}'\beta_{0h}) \right), \\h = 1, \ldots, H, \\i = 1, \ldots, n_{0h} \ \beta, \beta_{01}, \ldots, \beta_{0H} &\sim N_p(\mu, \Sigma), \text{ where }\Sigma = \text{diag}(\sigma_1^2, \ldots, \sigma_p^2) \ \mu &\sim N_p(\mu_0, \Sigma_{0}), \text{ where } \Sigma_{0} \text{ is a diagonal matrix} \ \sigma_j &\sim N^{+}(\nu_{0,j}, \psi_{0,j}^2),\ j = 1, \ldots, p \end{align*}

where \beta is the vector of regression coefficients of the current data set, \beta_{0h} is the vector of regression coefficients for historical data set D_{0h}, \mu is the common prior mean of \beta and \beta_{0h}, which is treated as random with a normal hyperprior having mean \mu_0, and a diagonal covariance matrix \Sigma_{0}. \Sigma is also treated as random and assumed to be a diagonal matrix. Half-normal hyperpriors are elicited on the diagonal entries of \Sigma.

The defaults in hdbayes are

  • \mu_0 = \textbf{0}_p, where \textbf{0}_p denotes a p-dimensional vector of 0s
  • \Sigma_{0} = 100 \times I_p
  • \nu_{0,j} = 0 for j = 1, \ldots, p
  • \psi_{0,j} = 1 for j = 1, \ldots, p where p is the number of predictors (including the intercept if applicable).

We fit this model as follows

## Bayesian Hierarchical Model
fit.bhm = glm.bhm(
  formula = formula, family = family, data.list = data.list,
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, adapt_delta = 0.98,
  parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 2 finished in 7.9 seconds.
#> Chain 1 finished in 8.6 seconds.
#> Chain 3 finished in 8.9 seconds.
#> Chain 4 finished in 9.4 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 8.7 seconds.
#> Total execution time: 9.6 seconds.

suppressWarnings(
  fit.bhm[, 2:7] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
)
#> # A tibble: 6 × 10
#>   variable         mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>           <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)     0.842  0.84  0.246 0.246  0.438  1.25      1    9912.    7113.
#> 2 z               0.619  0.615 0.286 0.283  0.151  1.1       1    8977.    6988.
#> 3 x              -0.788 -0.786 0.166 0.159 -1.07  -0.519     1    9548.    6723.
#> 4 (Intercept)_h…  0.937  0.926 0.302 0.293  0.458  1.44      1    8523.    6766.
#> 5 z_hist_1        0.46   0.461 0.368 0.362 -0.159  1.05      1    8395.    6735.
#> 6 x_hist_1       -0.842 -0.836 0.215 0.207 -1.21  -0.497     1    8746.    7471.

Commensurate prior

The commensurate prior assumes the following hierarchical model

\begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) \ y_{0hi} | x_{0hi}, \beta_{0} &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_{0hi}'\beta_{0}) \right) \ \beta_{0} &\sim N_p(\mu_0, \Sigma_{0}) , \text{ where }\Sigma_{0} \text{ is a diagonal matrix} \ \beta_j &\sim N_1\left( \beta_{0j}, \tau_j^{-1} \right), j = 1, \ldots, p \ \tau_j &\overset{\text{i.i.d.}}{\sim} p_{\text{spike}} \cdot N^{+}(\mu_{\text{spike}}, \sigma_{\text{spike}}^2) + (1 - p_{\text{spike}}) \cdot N^{+}(\mu_{\text{slab}}, \sigma_{\text{slab}}^2),\ j = 1, \ldots, p \end{align*}

where \beta = (\beta_1, \ldots, \beta_p)', \beta_{0} = (\beta_{01}, \ldots, \beta_{0p})'. The commensurability parameters (i.e., \tau_j’s) are treated as random with a spike-and-slab prior, which is specified as a mixture of two half-normal priors. The defaults in hdbayes are

  • \mu_0 = \textbf{0}_p
  • \Sigma_{0} = 100 \times I_p
  • p_{\text{spike}}  = 0.1
  • \mu_{\text{spike}} = 200
  • \sigma_{\text{spike}} = 0.1
  • \mu_{\text{slab}} = 0
  • \sigma_{\text{slab}} = 5

This method can be fit as follows

fit.commensurate = glm.commensurate(
  formula = formula, family = family, data.list = data.list,
  p.spike = 0.1,
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 1 finished in 1.2 seconds.
#> Chain 2 finished in 1.2 seconds.
#> Chain 3 finished in 1.2 seconds.
#> Chain 4 finished in 1.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.2 seconds.
#> Total execution time: 1.4 seconds.

suppressWarnings(
  fit.commensurate[, 2:10] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
)
#> # A tibble: 9 × 10
#>   variable         mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>           <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)     0.838  0.837 0.251 0.251  0.435  1.26   1       5711.    5176.
#> 2 z               0.625  0.624 0.28  0.285  0.168  1.08   1       6389.    5850.
#> 3 x              -0.786 -0.784 0.17  0.167 -1.07  -0.509  1.00    6249.    5381.
#> 4 (Intercept)_h…  0.942  0.933 0.306 0.301  0.443  1.46   1       5755.    5623.
#> 5 z_hist          0.468  0.471 0.357 0.351 -0.124  1.04   1       6387.    5579.
#> 6 x_hist         -0.854 -0.849 0.229 0.226 -1.24  -0.486  1       6247.    5721.
#> 7 comm_prec[1]    4.73   4.16  3.06  3.03   0.724 10.5    1       5532.    3831.
#> 8 comm_prec[2]    4.62   4.11  2.98  2.94   0.736 10.2    1       5644.    3474.
#> 9 comm_prec[3]    4.93   4.42  3.08  3.08   0.834 10.7    1       5757.    3410.

Robust meta-analytic predictive (MAP) prior

The Robust MAP prior is a generalization of the Bayesian Hierarchical Model (BHM), and takes the form

\begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) \ y_{0i} | x_{0i}, \beta_{0h} &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta_{0h}) \right) \ \beta_{0h} &\sim N_p(\mu, \Sigma), \text{ where }\Sigma = \text{diag}(\sigma_1^2, \ldots, \sigma_p^2) \ \beta   &\sim w \times N_p(\mu, \Sigma) + (1 - w) N_p(\mu_v, \Sigma_v) \ \mu &\sim N_p(\mu_0, \Sigma_{0}), \text{ where }\Sigma_{0} \text{ is a diagonal matrix} \ \sigma_j &\sim N^{+}(\nu_{0,j}, \psi_{0,j}^2),\ j = 1, \ldots, p \end{align*}

where w \in (0,1) controls for the level of borrowing of the historical data. Note that when w = 1, the robust MAP prior effectively becomes the BHM. The defaults are the same as in the BHM except the default value for w is 0.1, and the default vague/non-informative prior N_p(\mu_v, \Sigma_v) is specified as:

  • \mu_v = \textbf{0}_p
  • \Sigma_v = 100 \times I_p

The posterior samples under the Robust MAP prior is obtained by using the marginal likelihoods of the vague and meta-analytic predictive (MAP) priors. Specifically, note that the posterior density under the Robust MAP prior can be written as

\begin{align*} p_{\text{RMAP}}(\beta | D, D_0, w) &= \frac{L(\beta | D) \left[ w \\pi_I(\beta | D_0) + (1 - w) \\pi_V(\beta) \right]}{\int L(\beta^* | D) \left[ w \\pi_I( \beta^* | D_0) + (1 - w) \\pi_V( \beta^* ) \right] d\beta^*}, \ &= \tilde{w} \p_I(\beta | D, D_0) + (1 - \tilde{w}) \p_V(\beta | D), \end{align*}

where p_I(\beta | D, D_0) = L(\beta | D) \pi_I(\beta | D_0) / Z_I(D, D_0) is the posterior density under the prior induced by the BHM (referred to as the meta-analytic predictive (MAP) prior), p_V(\beta | D) = L(\beta | D) \pi_V(\beta) / Z_V(D) is the posterior density under the vague prior, and

\tilde{w} = \frac{w \Z_I(D, D_0)}{w \Z_I(D, D_0) + (1 - w) \Z_V(D)}

is the updated mixture weight. The normalizing constants Z_I(D, D_0) and Z_V(D) are estimated via the R package bridgesampling in hdbayes.

res.rmap = glm.rmap(
  formula = formula, family = family, data.list = data.list,
  w = 0.1,
  bridge.args = list(silent = T),
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores, adapt_delta = 0.98,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 2 finished in 7.7 seconds.
#> Chain 4 finished in 9.2 seconds.
#> Chain 3 finished in 9.3 seconds.
#> Chain 1 finished in 11.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 9.4 seconds.
#> Total execution time: 11.7 seconds.
#> Warning: 3 of 8000 (0.0%) transitions ended with a divergence.
#> See https://mc-stan.org/misc/warnings for details.
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 3 finished in 3.5 seconds.
#> Chain 2 finished in 4.1 seconds.
#> Chain 4 finished in 4.1 seconds.
#> Chain 1 finished in 4.3 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 4.0 seconds.
#> Total execution time: 4.5 seconds.
#> 
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 1 finished in 0.8 seconds.
#> Chain 2 finished in 0.8 seconds.
#> Chain 3 finished in 0.8 seconds.
#> Chain 4 finished in 0.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.8 seconds.
#> Total execution time: 1.1 seconds.
fit.rmap = res.rmap$post.samples
fit.rmap[, -1] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
#> # A tibble: 3 × 10
#>   variable      mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)  0.849  0.85  0.253 0.251  0.438  1.26      1    4111.    3289.
#> 2 z            0.617  0.616 0.28  0.28   0.161  1.08      1    3921.    3705.
#> 3 x           -0.791 -0.787 0.169 0.17  -1.08  -0.521     1    4315.    3679.

Power prior

With H historical data sets, the power prior takes the form

\begin{align*} &y_i | x_i, \beta \sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \\i = 1, \ldots, n\ &\pi_{\text{PP}}(\beta | D_0, a_{0h}) \propto \pi_0(\beta) \prod_{h=1}^H L(\beta | D_{0h})^{a_{0h}} \end{align*}

where L(\beta | D) is the likelihood of the GLM based on data set D, a_{0h} \in (0,1) is a fixed hyperparameter controlling the effective sample size contributed by historical data set D_{0h} , and \pi_0(\beta) is an “initial prior” for \beta.

The default in hdbayes is a (non-informative) normal initial prior on \beta:

\beta \sim N_p(\textbf{0}_p, 100 \times I_p)

With H = 1, the power prior (with a_{01} = 0.5) may be fit as follows:

fit.pp = glm.pp(
  formula = formula, family = family, data.list = data.list,
  a0.vals = 0.5,
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 1 finished in 0.7 seconds.
#> Chain 2 finished in 0.8 seconds.
#> Chain 3 finished in 0.8 seconds.
#> Chain 4 finished in 0.8 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.8 seconds.
#> Total execution time: 1.0 seconds.

fit.pp[, -1] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
#> # A tibble: 3 × 10
#>   variable      mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)  0.834  0.828 0.251 0.249  0.43   1.26   1       3788.    3746.
#> 2 z            0.613  0.616 0.28  0.28   0.155  1.07   1.00    4106.    4060.
#> 3 x           -0.788 -0.785 0.165 0.165 -1.06  -0.526  1.00    4306.    4689.

Normalized power prior (NPP)

The NPP treats the hyperparameter a_{0h} as random, allowing the data to decide what is the best value. For non-Gaussian models, this requires estimating the normalizing constant Z(a_{0h} | D_{0h}) = \int L(\beta | D_{0h})^{a_{0h}} \pi_0(\beta)^{1/H} d\beta.

In hdbayes, there is one function to estimate the normalizing constant across a grid of values for a_{0h} and another to obtain posterior samples of the normalized power prior.

The posterior under the NPP may be summarized as

\begin{align*} &y_i | x_i, \beta \sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \\i = 1, \ldots, n\ &\pi_{\text{NPP}}(\beta, a_0 | D_0) = \pi_0(\beta) \prod_{h=1}^H \frac{ L(\beta | D_{0h})^{a_{0h}} }{Z(a_{0h} | D_{0h})} \pi(a_{0h}) \end{align*}

The default initial prior on \beta for NPP is the same as that for PP:

\beta \sim N_p(\textbf{0}_p, 100 \times I_p).

The default prior on each a_{0h} in hdbayes is:

\pi_0(a_{0h}) \propto a_{0h}^{\alpha_0 - 1} (1 - a_{0h})^{\gamma_0 - 1},

where \alpha_{0} = \gamma_{0} = 1. In this case, the prior on a_{0h} is a U(0,1) prior.

Estimating the normalizing constant

We begin by estimating the normalizing constant over a fine grid of values. On machines with multiple cores, this may be accomplished more efficiently using parallel computing.

## parallelize estimation of log normalizing constant
library(parallel)
a0     = seq(0, 1, length.out = ncores * 5)

## wrapper to obtain log normalizing constant in parallel package
logncfun = function(a0, ...){
  hdbayes::glm.npp.lognc(
    formula = formula, family = family, histdata = histdata, a0 = a0, ...
  )
}

cl = makeCluster(ncores)
clusterSetRNGStream(cl, 123)
clusterExport(cl, varlist = c('formula', 'family', 'histdata'))

## call created function
a0.lognc = parLapply(
  cl = cl, X = a0, fun = logncfun, iter_warmup = iter_warmup, iter_sampling = 5000, 
  chains = chains, refresh = 0
)
stopCluster(cl)

a0.lognc = data.frame( do.call(rbind, a0.lognc) )
head(
  a0.lognc %>% 
    mutate(across(where(is.numeric), round, 3))
)
#>      a0   lognc min_ess_bulk max_Rhat
#> 1 0.000   0.000     9947.309    1.000
#> 2 0.014  -4.230     6370.440    1.001
#> 3 0.027  -6.285     6449.416    1.001
#> 4 0.041  -7.859     6204.290    1.001
#> 5 0.054  -9.212     6420.296    1.000
#> 6 0.068 -10.439     6630.803    1.001

The provided function glm.npp.lognc estimates the logarithm of the normalizing constant, \log Z(a_{0h} | D_{0h}), for one specific value of a_{0h} and one historical data set D_{0h}. We created the function logncfun so that the first argument would be a_{0h}, allowing us to use the parLapply function in the parallel package.

The hdbayes function glm.npp.lognc outputs a_{0h}, Z(a_{0h} | D_{0h}), and the minimum bulk effective sample size and maximum R-hat value of the MCMC sampling of the power prior. It is a good idea to check that the minimum bulk effective sample size is at least 1,000 and the maximum R-hat value is less than 1.10

min(a0.lognc$min_ess_bulk) ## lowest bulk effective sample size
#> [1] 6059.067
max(a0.lognc$max_Rhat)  ## highest R-hat value
#> [1] 1.001153

We can then plot the logarithm of the normalizing constant

plot(a0.lognc$a0, a0.lognc$lognc)

Sampling from the posterior distribution

We can now sample from the posterior distribution. The function glm.npp takes, as input, values of a_{0h} and the estimated logarithm of the normalizing constant. Linear interpolation is used to estimate Z(a_{0h} | D_{0h}) for values not in the fine grid. Thus, it may be a good idea to conduct smoothing of the function such as using LOESS, but we ignore that here.

fit.npp = glm.npp(
  formula = formula, family = family, data.list = data.list,
  a0.lognc = a0.lognc$a0,
  lognc = matrix(a0.lognc$lognc, ncol = 1),
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 2 finished in 1.1 seconds.
#> Chain 3 finished in 1.1 seconds.
#> Chain 4 finished in 1.1 seconds.
#> Chain 1 finished in 1.2 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 1.1 seconds.
#> Total execution time: 1.3 seconds.
fit.npp[, -c(1, 6)] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
#> # A tibble: 4 × 10
#>   variable      mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)  0.855  0.851 0.239 0.244  0.463  1.25   1       4680.    5065.
#> 2 z            0.596  0.591 0.27  0.271  0.16   1.05   1.00    5529.    4746.
#> 3 x           -0.797 -0.793 0.164 0.162 -1.07  -0.536  1.00    5059.    4883.
#> 4 a0_hist_1    0.677  0.714 0.226 0.255  0.254  0.976  1       4909.    3632.

Normalized asymptotic power prior (NAPP)

NAPP uses a large sample theory argument to formulate a normal approximation to the power prior, i.e., the prior is given by

\beta | a_{0h} \sim N(\hat{\beta}_0, a_{0h}^{-1} [I_n(\beta)]^{-1}),

where \hat{\beta}_0 is the maximum likelihood estimate (MLE) of \beta based on the historical data and I_n(\beta) is the associated information matrix (negative Hessian).

In this case, the normalizing constant is known, so we do not need to estimate the normalizing constant before sampling.

The following code analyzes the data set using the NAPP:

fit.napp = glm.napp(
  formula = formula, family = family, data.list = data.list,
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 3 Exception: multi_normal_lpdf: Covariance matrix is not symmetric. Covariance matrix[1,2] = -inf, but Covariance matrix[2,1] = -inf (in '/var/folders/_g/wwn3pnf56mx5q1kv0t5574y80000gn/T/RtmpFSR3a9/model-fe8850461191.stan', line 129, column 6 to column 89)
#> Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 3
#> Chain 1 finished in 0.7 seconds.
#> Chain 2 finished in 0.7 seconds.
#> Chain 3 finished in 0.7 seconds.
#> Chain 4 finished in 0.8 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.7 seconds.
#> Total execution time: 0.9 seconds.
fit.napp[, -1] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
#> # A tibble: 5 × 10
#>   variable       mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>         <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)   0.839  0.837 0.233 0.233  0.466  1.23   1.00    4035.    4757.
#> 2 z             0.6    0.599 0.268 0.263  0.162  1.04   1.00    4589.    3985.
#> 3 x            -0.787 -0.786 0.162 0.159 -1.06  -0.519  1.00    4323.    4214.
#> 4 a0_hist_1     0.672  0.707 0.227 0.258  0.254  0.972  1       5792.    4259.
#> 5 logit_a0s[1]  1.01   0.883 1.43  1.36  -1.08   3.54   1       5792.    4259.

Latent exchangeability prior (LEAP)

The LEAP assumes that the historical data are generated from a finite mixture model consisting of K \ge 2 components, with the current data generated from the first component of this mixture. For single historical data set settings, the posterior under the LEAP may be expressed hierarchically as

\begin{align*} y_i | x_i, \beta &\sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \\i = 1, \ldots, n\ y_{0i} | x_{0i}, \beta, \beta_{0k}, \gamma &\sim \gamma_1 \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta) \right) + \sum_{k=2}^K \gamma_k \text{Bernoulli}\left( \text{logit}^{-1}(x_{0i}'\beta_{0k}) \right) , \\i = 1, \ldots, n_0 \ \beta, \beta_{0k} &\overset{\text{i.i.d.}}{\sim} N(\mu_0, \Sigma_{0}), \k = 2, \ldots, K, \text{ where }\Sigma_{0} = \text{diag}(\sigma_{01}^2, \ldots, \sigma_{0p}^2) \ \gamma &\sim \text{Dirichlet}(\alpha_{0}) \end{align*}

where \gamma = (\gamma_1, \ldots, \gamma_K)' is a vector of mixing probabilities, \alpha_{0} = (\alpha_1, \ldots, \alpha_K)' is a vector of concentration parameters, and \mu_0 and \Sigma_{0} are respectively the prior mean and covariance matrices for the K regression coefficients. The defaults in hdbayes are

  • \mu_0 = \textbf{0}_p
  • \Sigma_{0} = 100 \times I_p
  • \alpha_{0} = \textbf{1}_K, where \textbf{1}_K denotes a K-dimensional vector of 1s
  • K = 2

For multiple historical data sets, hdbayes assumes that all historical data sets come from a single finite mixture model.

The LEAP can be fit as follows:

fit.leap = glm.leap(
  formula = formula, family = family, data.list = data.list,
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 2 finished in 4.9 seconds.
#> Chain 3 finished in 5.6 seconds.
#> Chain 4 finished in 5.8 seconds.
#> Chain 1 finished in 6.9 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 5.8 seconds.
#> Total execution time: 7.0 seconds.

suppressWarnings(
 fit.leap[, 2:6] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3)) 
)
#> # A tibble: 5 × 10
#>   variable      mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)  0.85   0.847 0.245 0.245  0.465  1.26      1    2920.    3210.
#> 2 z            0.602  0.599 0.277 0.277  0.155  1.06      1    3595.    3336.
#> 3 x           -0.789 -0.785 0.17  0.172 -1.07  -0.518     1    2849.    2393.
#> 4 probs[1]     0.863  0.889 0.108 0.101  0.647  0.99      1    1395.    2503.
#> 5 probs[2]     0.137  0.111 0.108 0.101  0.01   0.353     1    1395.    2503.

Normal/half-normal prior

We also include a normal/half-normal prior, where the regression coefficients are assigned independent normal priors, and, if applicable, the dispersion parameter is assigned a half-normal prior. For this example, the normal/half-normal prior takes the form

\begin{align*} &y_i | x_i, \beta \sim \text{Bernoulli}\left( \text{logit}^{-1}(x_i'\beta) \right) , \\i = 1, \ldots, n\ &\beta \sim N_p(\mu_0, \Sigma_0), \text{ where }\Sigma_0 = \text{diag}(\sigma_{01}^2, \ldots, \sigma_{0p}^2). \end{align*}

The defaults in hdbayes are

  • \mu_0 = \textbf{0}_p
  • \Sigma_{0} = 100 \times I_p

The normal/half-normal prior can be fit as follows:

fit.post = glm.post(
  formula = formula, family = family, data.list = data.list,
  iter_warmup = iter_warmup, iter_sampling = iter_sampling, 
  chains = chains, parallel_chains = ncores,
  refresh = 0
)
#> Running MCMC with 4 chains, at most 15 in parallel...
#> 
#> Chain 1 finished in 0.5 seconds.
#> Chain 2 finished in 0.5 seconds.
#> Chain 3 finished in 0.5 seconds.
#> Chain 4 finished in 0.5 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.5 seconds.
#> Total execution time: 0.6 seconds.

fit.post[, -1] %>% 
    summarise_draws() %>% 
    mutate(across(where(is.numeric), round, 3))
#> # A tibble: 3 × 10
#>   variable      mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>        <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 (Intercept)  0.778  0.772 0.279 0.272  0.328  1.25   1       3631.    3699.
#> 2 z            0.693  0.697 0.304 0.301  0.19   1.20   1.00    4608.    4387.
#> 3 x           -0.772 -0.767 0.186 0.182 -1.08  -0.471  1.00    3524.    3683.

Comparison of methods

We now can compare the point estimate (MLE / posterior mean) and uncertainty (SE / posterior standard deviation) of the methods.

##
## COMPARE METHODS
##
fit.list = list('bhm' = fit.bhm, 'commensurate' = fit.commensurate,
                'robustmap' = fit.rmap, 'napp' = fit.napp,
                'npp' = fit.npp, 'pp' = fit.pp, 'leap' = fit.leap,
                'normal/half-normal' = fit.post)

post.mean = suppressWarnings(
  sapply(
    fit.list, function(x){
      d = x %>% 
        select(names(coef(fit.mle.cur))) %>% 
        summarise_draws()
      d$mean
    } 
  )
)
post.sd = suppressWarnings(
  sapply(
    fit.list, function(x){
      d = x %>% 
        select(names(coef(fit.mle.cur))) %>% 
        summarise_draws()
      d$sd
    } 
  )
)

post.mean = cbind(
  'truth' = beta,
  'mle.cur' = coef(fit.mle.cur),
  'mle.hist' = coef(fit.mle.hist),
  post.mean
)

post.sd = cbind(
  'mle.cur'  = summary(fit.mle.cur)$coefficients[, 'Std. Error'],
  'mle.hist' = summary(fit.mle.hist)$coefficients[, 'Std. Error'],
  post.sd
)

## posterior means
round( post.mean, 3 )
#>             truth mle.cur mle.hist    bhm commensurate robustmap   napp    npp
#> (Intercept)   1.0   0.760    1.036  0.842        0.838     0.849  0.839  0.855
#> z             0.5   0.677    0.313  0.619        0.625     0.617  0.600  0.596
#> x            -1.0  -0.750   -0.856 -0.788       -0.786    -0.791 -0.787 -0.797
#>                 pp   leap normal/half-normal
#> (Intercept)  0.834  0.850              0.778
#> z            0.613  0.602              0.693
#> x           -0.788 -0.789             -0.772

## posterior std dev.
round( post.sd, 3 )
#>             mle.cur mle.hist   bhm commensurate robustmap  napp   npp    pp
#> (Intercept)   0.274    0.377 0.246        0.251     0.253 0.233 0.239 0.251
#> z             0.308    0.442 0.286        0.280     0.280 0.268 0.270 0.280
#> x             0.181    0.266 0.166        0.170     0.169 0.162 0.164 0.165
#>              leap normal/half-normal
#> (Intercept) 0.245              0.279
#> z           0.277              0.304
#> x           0.170              0.186

About

Bayesian data analysis using historical data

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages