Similar to rstantools for
rstan, the instantiate package builds
pre-compiled CmdStan
models into CRAN-ready statistical modeling R packages. The models
compile once during installation, the executables live inside the file
systems of their respective packages, and users have the full power and
convenience of CmdStanR without any
additional compilation after package installation. This approach saves
time and helps R package developers migrate from
rstan to the more modern
CmdStanR.
The website at https://wlandau.github.io/instantiate/ includes a function reference and other documentation.
The instantiate package depends on the R package
CmdStanR and the command line tool
CmdStan, so it is
important to follow these stages in order:
- Install the R package
CmdStanR.CmdStanRis not on CRAN, so the recommended way to install it isinstall.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos"))). - Optional: set environment variables
CMDSTAN_INSTALLand/orCMDSTANto manage theCmdStaninstallation. See the “Administering CmdStan” section below for details. - Install
instantiateusing one of the R commands below.
| Type | Source | Command |
|---|---|---|
| Release | CRAN | install.packages("instantiate") |
| Development | GitHub | remotes::install_github("wlandau/instantiate") |
| Development | R-universe | install.packages("instantiate", repos = "https://wlandau.r-universe.dev") |
Packages that use instantiate may be published on CRAN. CRAN does not
have CmdStan, so the models are not pre-compiled in the Mac OS and
Windows binaries. If you install from CRAN, please install from the
source. For example:
install.packages("hdbayes", type = "source")The instantiate package uses environment variables to manage the
installation of
CmdStan. An
environment variable is an operating system setting with a name and a
value (both text strings). In R, there are two ways to set environment
variables:
Sys.setenv(), which sets environment variables temporarily for the current R session.- The
.Renvirontext file in you home directory, which passes environment variables to all new R sessions. theedit_r_environ()function from theusethispackage helps.
By default, instantiate looks for the copy of
CmdStan located at
cmdstanr::install_cmdstan(). If you upgrade
CmdStan, then the path
returned by cmdstanr::install_cmdstan() will change, which may not be
desirable in some cases. To permanently lock the path that instantiate
uses, follow these steps:
- Set the
CMDSTANenvironment variable to the desired path toCmdStan. - Set the
CMDSTAN_INSTALLenvironment variable to"fixed". - Install
instantiate.
Henceforth, instantiate will automatically use the
CmdStan path from (1),
regardless of the value of CMDSTAN after (3). To prefer
cmdstanr::cmdstan_path() instead, you could do one of the following:
- Reinstall
instantiatewithCMDSTAN_INSTALLnot equal to"fixed", or - Set
CMDSTAN_INSTALLto"implicit"at runtime, or - Set the
cmdstan_installargument to"implicit"for the currentinstantiatepackage function you are using.
The following section explains how to create an R package with
pre-compiled Stan models. This stage of the development workflow is
considered “runtime” for the purposes of administering
CmdStan as described
previously.
Begin with an R package with one or more Stan model files inside the
src/stan/ directory. stan_package_create() is a convenient way to
start.
stan_package_create(path = "package_folder")
#> Example package named "example" created at "package_folder". Run stan_package_configure(path = "package_folder") so that the built-in Stan model will compile when the package installs.At minimum the package file structure should look something like this:
fs::dir_tree("package_folder")
#> package_folder
#> ├── DESCRIPTION
#> └── src
#> └── stan
#> └── bernoulli.stanConfigure the package so the Stan models compile during installation.
stan_package_configure() writes required scripts cleanup,
cleanup.win, src/Makevars, src/Makevars.win, and
src/install.libs.R. Inside src/install.libs.R is a call to
instantiate::stan_package_compile() which you can manually edit to
control how your models are compiled. For example, different calls to
stan_package_compile() can compile different groups of models using
different C++ compiler flags.
fs::dir_tree("package_folder")
#> package_folder
#> ├── DESCRIPTION
#> ├── cleanup
#> ├── cleanup.win
#> └── src
#> ├── Makevars
#> ├── Makevars.win
#> ├── install.libs.R
#> └── stan
#> └── bernoulli.stanInstall the package just like you would any other R package. To install
it from your local copy of package_folder, open R and run:
install.packages(pkgs = "package_folder", type = "source", repos = NULL)A user can now run a model from the package without any additional
compilation. See the documentation of
CmdStanR to learn how to
use CmdStanR model objects.
library(example)
model <- stan_package_model(name = "bernoulli", package = "example")
print(model) # CmdStanR model object
#> data {
#> int<lower=0> N;
#> array[N] int<lower=0,upper=1> y;
#> }
#> parameters {
#> real<lower=0,upper=1> theta;
#> }
#> model {
#> theta ~ beta(1,1); // uniform prior on interval 0,1
#> y ~ bernoulli(theta);
#> }
fit <- model$sample(
data = list(N = 10, y = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0)),
refresh = 0,
iter_warmup = 2000,
iter_sampling = 4000
)
#> Running MCMC with 4 sequential chains...
#>
#> Chain 1 finished in 0.0 seconds.
#> Chain 2 finished in 0.0 seconds.
#> Chain 3 finished in 0.0 seconds.
#> Chain 4 finished in 0.0 seconds.
#>
#> All 4 chains finished successfully.
#> Mean chain execution time: 0.0 seconds.
#> Total execution time: 0.6 seconds.
fit$summary()
#> # A tibble: 2 × 10
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
#> <chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1 lp__ -8.15 -7.87 0.725 0.317 -9.60 -7.64 1.00 7365. 8498.
#> 2 theta 0.333 0.324 0.130 0.134 0.137 0.563 1.00 6229. 7560.You can write an exported user-side function in your R package to access
the model. For example, you might store this code in a R/model.R file
in the package:
#' @title Fit the Bernoulli model.
#' @export
#' @family models
#' @description Fit the Bernoulli Stan model and return posterior summaries.
#' @return A data frame of posterior summaries.
#' @param y Numeric vector of Bernoulli observations (zeroes and ones).
#' @param `...` Named arguments to the `sample()` method of CmdStan model
#' objects: <https://mc-stan.org/cmdstanr/reference/model-method-sample.html>
#' @examples
#' if (instantiate::stan_cmdstan_exists()) {
#' run_bernoulli_model(y = c(1, 0, 1, 0, 1, 0, 0, 0, 0, 0))
#' }
run_bernoulli_model <- function(y, ...) {
stopifnot(is.numeric(y) && all(y >= 0 & y <= 1))
model <- stan_package_model(name = "bernoulli", package = "mypackage")
fit <- model$sample(data = list(N = length(y), y = y), ...)
fit$summary()
}- In your package
DESCRIPTIONfile, list https://mc-stan.org/r-packages/ in theAdditional_repositories:field (example inbrms). This step is only necessary whilecmdstanris not yet on CRAN.
Additional_repositories:
https://mc-stan.org/r-packages/
- In your package
DESCRIPTIONandNAMESPACEfiles, import theinstantiatepackage and functionstan_package_model(). - Write user-side statistical modeling functions which call the models in your package as mentioned above.
CmdStanis too big for CRAN, soinstantiatewill not be able to access it there. So if you plan to submit your package to CRAN, please skip the appropriate code in your examples, vignettes, and tests wheninstantiate::stan_cmdstan_exists()isFALSE. Explicitif()statements like the above one in theroxygen2@exampleswork for examples and vignettes. For tests, it is convenient to usetestthat::skip_if_not(), e.g.skip_if_not(stan_cmdstan_exists()).pkgload::load_all()is not compatible withinstantiate. This is becauseinstantiaterelies on a customsrc/install.libs.Rscript to compile the models, andload_all()does not pick up custom binaries compiled this way. That meansdevtools::test()may not work as expected. Please install your package the standard way, then test with alternative means such asdevtools::check(),R CMD check, ortinytest.- For version
control,
it is best practice to commit only source code files and
documentation. Please do not commit any compiled executable Stan
model files to your repository. If you do commit them, then other
users with different machines will have trouble installing your
package, and your commit history will consume too much storage. For
Git, you may add the following lines to the
.gitigorefile at the root of your package:
src/stan/**
!src/stan/**/*.*
src/stan/**/*.exe
src/stan/**/*.EXE
- For continuous integration
(e.g. on GitHub Actions), please
use
cmdstanr-based installation as explained above, and tweak your workflow YAML files as explained in that section. - For general information on R package development, please consult the free online book R Packages (2e) by Hadley Wickham and Jennifer Bryan, as well as the official manual on Writing R Extensions by the R Core Team.
Please note that the instantiate project is released with a
Contributor Code of
Conduct.
By contributing to this project, you agree to abide by its terms.
To cite package ‘instantiate’ in publications use:
Landau WM (2023). _instantiate: A Minimal CmdStan Client for R Packages_.
https://wlandau.github.io/instantiate/, https://github.com/wlandau/instantiate.
A BibTeX entry for LaTeX users is
@Manual{,
title = {instantiate: A Minimal CmdStan Client for R Packages},
author = {William Michael Landau},
year = {2023},
note = {https://wlandau.github.io/instantiate/,
https://github.com/wlandau/instantiate},
}