0% found this document useful (0 votes)

115 views24 pages

Bayesian Modelling With Stan

This document introduces Bayesian inference using the software package Stan. It discusses how Stan can be used to estimate parameters of cognitive models as probability distributions rather than point estimates, accounting for uncertainty. The document provides a tutorial on how to define custom distributions in Stan using the linear ballistic accumulator model as an example. It contrasts Bayesian approaches that estimate full parameter distributions with traditional methods that provide only point estimates.

Uploaded by

Hassan Javed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views24 pages

Bayesian Modelling With Stan

Uploaded by

Hassan Javed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Behav Res (2017) 49:863–886

DOI 10.3758/s13428-016-0746-9

Bayesian inference with Stan: A tutorial on adding

custom distributions
Jeffrey Annis 1 & Brent J. Miller 1 & Thomas J. Palmeri 1

Published online: 10 June 2016

# Psychonomic Society, Inc. 2016

Abstract When evaluating cognitive models based on fits to Keywords Bayesian inference . Stan . Linear ballistic
observed data (or, really, any model that has free parameters), accumulator . Probabilistic programming
parameter estimation is critically important. Traditional tech-
niques like hill climbing by minimizing or maximizing a fit
statistic often result in point estimates. Bayesian approaches The development and application of formal cognitive models
instead estimate parameters as posterior probability distribu- in psychology has played a crucial role in theory development.
tions, and thus naturally account for the uncertainty associated Consider, for example, the near ubiquitous applications of
with parameter estimation; Bayesian approaches also offer accumulator models of decision making, such as the diffusion
powerful and principled methods for model comparison. model (see Ratcliff & McKoon, 2008, for a review) and the
Although software applications such as WinBUGS (Lunn, linear ballistic accumulator model (LBA; Brown &
Thomas, Best, & Spiegelhalter, Statistics and Computing, 10, Heathcote, 2008). These models have provided theoretical
325–337, 2000) and JAGS (Plummer, 2003) provide tools for understanding such constructs as aging and intelli-
Bturnkey^-style packages for Bayesian inference, they can be gence (e.g., Ratcliff, Thapar, & McKoon, 2010) and have
inefficient when dealing with models whose parameters are been used to understand and interpret data from functional
correlated, which is often the case for cognitive models, and magnetic resonance imaging (Turner, Van Maanen,
they can impose significant technical barriers to adding custom Forstmann, 2015; Van Maanen et al., 2011), electroencepha-
distributions, which is often necessary when implementing lography (Ratcliff, Philiastides, & Sajda, 2009), and neuro-
cognitive models within a Bayesian framework. A recently physiology (Palmeri, Schall, & Logan, 2015; Purcell, Schall,
developed software package called Stan (Stan Development Logan, & Palmeri, 2012). Nearly all cognitive models have
Team, 2015) can solve both problems, as well as provide a free parameters. In the case of accumulator models, these in-
turnkey solution to Bayesian inference. We present a tutorial clude the rate of evidence accumulation, the threshold level of
on how to use Stan and how to add custom distributions to it, evidence required to make a response, and the time for mental
with an example using the linear ballistic accumulator model processes not involved in making the decision. Unlike general
(Brown & Heathcote, Cognitive Psychology, 57, 153–178. statistical models of observed data, the parameters of cogni-
doi:10.1016/j.cogpsych.2007.12.002, 2008). tive models usually have well-defined psychological interpre-
tations. This makes it particularly important that the parame-
Electronic supplementary material The online version of this article
ters be estimated properly, including not just their most likely
(doi:10.3758/s13428-016-0746-9) contains supplementary material, value, but also the uncertainty in their estimation.
which is available to authorized users. Traditional methods of parameter estimation minimize or
maximize a fit statistic (e.g., SSE, χ2, ln L) using various hill-
* Jeffrey Annis climbing methods (e.g., simplex or Hooke and Jeeves). The
[email protected] result is usually point estimates of parameter values, and pos-
sibly later applying such techniques as parametric or nonpara-
1
Vanderbilt University, 111 21st Ave S., 301 Wilson Hall, metric bootstrapping to obtain indices of the uncertainty of
Nashville, TN 37240, USA those estimates (Lewandowsky & Farrell, 2011). By contrast,
864 Behav Res (2017) 49:863–886

Bayesian approaches to parameter estimation naturally treat The first MCMC algorithm was the Metropolis–Hastings
model parameters as full probability distributions (Gelman, algorithm (Metropolis, Rosenbluth, Rosenbluth, Teller, &
Carlin, Stern, Dunson, Vehtari, & Rubin, 2013; Kruschke, Teller, 1953; Hastings, 1970), and it is still popular today as a
2011; Lee & Wagenmakers, 2014). By so doing, the uncertain- default MCMC method. On each step of the algorithm a proposal
ty over the range of potential parameter values is also estimat- sample is generated. If the proposal sample has a higher proba-
ed, rather than a single point estimate. bility than the current sample, then the proposal is accepted as the
Whereas a traditional parameter estimation method might next sample; otherwise, the acceptance rate is dependent upon
find some vector of parameters, θ, that maximizes the likeli- the ratio of the posterior probabilities of the proposal sample and
hood of the data, D, given those parameters [P(D | θ)], a the current sample. The Bmagic^ of MCMC algorithms like
Bayesian method will find the entire posterior probability dis- Metropolis–Hastings is that they do not require calculating the
tribution of the parameters given the data, P(θ | D), by a con- nasty P(D) term in Bayes’s rule. Instead, by relying on ratios of
ceptually straightforward application of Bayes’s rule: P(θ | D) the posterior probabilities, the P(D) term cancels out, so the
= P(D | θ) P(θ) / P(D). A virtue—though some argue it is a decision to accept or reject a new sample is based solely on the
curse—of Bayesian methods is that they allow the researcher prior and the likelihood, which are given. The proposal step is
to express their a priori beliefs (or lack thereof) about the generated via a random process that must be Btuned^ so that the
parameter values, as a prior distribution P(θ). If a researcher algorithm efficiently samples from the posterior. If the proposals
thinks all values are equally likely, they might choose a uni- are wildly different from or too similar to the current sample, the
form or otherwise flat distribution to represent that belief; sampling process can become very inefficient. Poorly tuned
alternatively, if a researcher has reason to believe that some MCMC algorithms can lead to samples that fail to meet mini-
parameters might be more likely than others, that knowledge mum standards for approximating the posterior distribution or to
can be embodied in the prior as well. Bayes provides the Markov chain lengths that become computationally intractable
mechanism to combine the prior on parameter values, P(θ), on even the most powerful computer workstations.
with the likelihood of the data given certain parameter values, A different type of MCMC algorithm that largely does away
P(D | θ), resulting in the posterior distribution of the parame- with the difficulty of sampler tuning is Gibbs sampling. Several
ters given the data, P(θ | D). software applications have been built around this algorithm
Bayes is completely generic. It could be used with a model (WinBUGS: Lunn, Thomas, Best, & Spiegelhalter, 2000;
having one parameter or one having dozens or hundreds of JAGS: Plummer, 2003; OpenBUGS: Thomas, O’Hara,
parameters. It could rely on a likelihood based on a well- Ligges, & Sturtz, 2006). These applications allow the user to
known probability distribution, like a normal or a Gamma easily define their model in a specification language and then
distribution, or it could rely on a likelihood of response times generate the posterior distributions for the respective model
predicted by a cognitive model like the LBA. parameters. The fact that the Gibbs sampler does not require
Although the application of Bayes is conceptually straight- tuning makes these applications effectively Bturnkey^ methods
forward, its application to real data and real models is anything for Bayesian inference. These applications can be used for a
but. For one thing, the calculation of the probability of the data wide variety of problems and include a number of built-in
term in the denominator, P(D), involves a multivariate inte- distributions; one must only specify, for example, that the prior
gral, which can be next to impossible to solve using traditional on a certain parameter is distributed uniformly or that the like-
techniques for all but the simplest models. For small models lihood of the data given the parameter is normally distributed.
with only one or two parameters, the posterior distribution can Although these programs provide dozens of built-in distribu-
sometimes be calculated directly using calculus or can be rea- tions, researchers inevitably will discover that some particular
sonably estimated using numerical methods. However, as the distribution they are interested in is not built into the applica-
number of parameters in the model grows, direct mathemati- tion. This will often be the case for specialized cognitive
cal solutions using calculus become scarce, and traditional models whose distributions are not part of the standard suites
numerical methods quickly become intractable. For more so- of built-in distributions that come with these applications.
phisticated models, a technique called Markov chain Monte Thus, it is necessary for the researcher who wishes to use one
Carlo (MCMC) was developed (Brooks, Gelman, Jones, & of these Bayesian inference applications to add a custom dis-
Meng, 2011; Gelman et al., 2013; Gilks, Richardson, & tribution to the application’s distribution library. This process,
Spiegelhalter, 1996; Robert & Casella, 2004). MCMC is a however, can be technically challenging using most of the ap-
class of algorithms that utilize Markov chains to allow one plications listed above (see Wabersich & Vandekerckhove,
to approximate the posterior distribution. In short, a given 2014, for a recent tutorial with JAGS).
MCMC algorithm can take the prior distribution and likeli- In addition to the technical challenges of adding custom dis-
hood as input and generate random samples from the posterior tributions, both the Gibbs and Metropolis–Hastings algorithms
distribution without having to have a closed-form solution or often do not sample efficiently from posterior distributions with
numeric estimate for the desired posterior distribution. correlated parameters (Hoffman & Gelman, 2014; Turner,
Behav Res (2017) 49:863–886 865

Sederberg, Brown, & Steyvers, 2013). Some MCMC algo- applications, but to illustrate how to use Stan as an application
rithms (e.g., MCMC-DE; Turner et al., 2013) are designed to that perhaps may be adopted more easily by some researchers.
solve this problem, but these algorithms often require careful
tuning of the MCMC algorithm parameters to ensure efficient
sampling of the posterior. In addition, implementing models that Built-in distributions in Stan
use such algorithms can be more difficult than implementing
models in turnkey applications, because the user must work at In Stan, a Bayesian model is implemented by defining its
the implementation level of the MCMC algorithm. likelihood and priors. This is accomplished in a Stan program
Recently, a new type of MCMC application has emerged, with a set of variable declarations and program statements that
called Stan (Hoffman & Gelman, 2014; Stan Development are displayed in this article using Courier font. Stan sup-
Team, 2015). Stan uses the No-U-Turn Sampler (NUTS; ports a range of standard variable types, including integers,
Hoffman & Gelman, 2014) which extends a type of MCMC real numbers, vectors, and matrices. Stan statements are proc-
algorithm known as Hamiltonian Monte Carlo (HMC; Duane, essed sequentially and allow for standard control flow ele-
Kennedy, Pendleton, & Roweth, 1987; Neal, 2011) NUTS re- ments, such as for and while loops, and conditionals such
quires no tuning parameters and can efficiently sample from as if-then and if-then-else.
posterior distributions with correlated parameters. It is therefore Variable definitions and program statements are placed with-
an example of a turnkey Bayesian inference application that in what are referred to in Stan as code blocks. Each code block
allows the user to work at the level of the model without having has a particular function within a Stan program. For example,
to worry about the implementation of the MCMC algorithm there is a code block for user-defined functions, and others for
itself. In this article, we provide a brief tutorial on how to use data, parameters, model definitions, and generated quantities.
the Stan modeling language to implement Bayesian models in Our tutorial will introduce each of these code blocks in turn.
Stan using both built-in and user-defined distributions; we do To make the most out of this tutorial, it will be necessary to
assume that readers have some prior programming experience install both Stan (http://mc-stan.org/) and R (https://cran.r-
and some knowledge of probability theory. project.org/), as well as the RStan package (http://mc-stan.
Our first example is the exponential distribution. The ex- org/interfaces/rstan.html) so that R can interface with Stan.
ponential is built into the Stan application. We will first define Step-by-step instructions for how to do all of this can be found
the model statistically, and then outline how to implement a online (http://mc-stan.org/interfaces/rstan.html).
Bayesian model based on the exponential distribution using
Stan. Following this implementation, we will show how to run An example with the exponential distribution
the model and collect samples from the posterior. As we will
see, one way that this can be done is by interfacing with Stan In this section, we provide a simple example of how to use
via another programming language, such as R. The command Stan to implement a Bayesian model using built-in distribu-
to run the Stan model is sent from R, and then the samples are tions. For simplicity, we will use a one-parameter distribution:
sent back to the R workspace for further analysis. the exponential. To begin, suppose that we have some data (y)
Our second example will again consider the exponential that appear to be exponentially distributed. We can write this
distribution, but this time instead of using the built-in expo- in more formal terms with the following definition:
nential distribution, we will explicitly define the likelihood
function of the exponential distribution using the tools and yeExponentialðλÞ; ð1Þ
techniques that allow the user to define distributions in Stan.
Our third example will illustrate how to implement a more This asserts that the data points (y) are assumed to come from
complicated user-defined function in Stan—the LBA model an exponential distribution, which has a single parameter called
(Brown & Heathcote, 2008). We will then show how to extend the rate parameter, λ. Using traditional parameter-fitting
this model to situations with multiple participants and multiple methods, we might find the value of λ that maximized the
conditions. likelihood of observed data that we thought followed an expo-
Throughout the tutorial, we will benchmark the results from nential. Because here we are using a Bayesian approach, we
Stan against a conventional Metropolis–Hastings algorithm. As can conceive of our parameters as probability distributions.
we will see, Stan performs equally as well as Metropolis– What distribution should we choose as the prior on λ? The
Hastings for the simple exponential model, but much better for rate parameter of the exponential distribution is bounded be-
more complex models with correlated dimensions, such as the tween zero and infinity, so we should choose a distribution
LBA. We are quite certain that suitably tuned versions of with the same bounds. One distribution that fits this criterion is
MCMC-DE (Turner et al., 2013) and other more sophisticated the Gamma distribution. Formally, then, we can write
methods would perform at least as well as Stan. The goal here
was not to make fine distinctions between alternative successful λeGammaðα; β Þ: ð2Þ
866 Behav Res (2017) 49:863–886

The Gamma distribution has two parameters, referred to as the variable within that range in Stan. We do this by adding the
shape (α) and rate (β) parameters; to represent our prior beliefs <lower=0> constraint as part of its definition.
about what parameter values are more likely than others, we The third block of code is the model block, in which the
chose weakly informative shape and rate parameters of 1. So, Bayesian model is defined. The model described by Eqs. 1
Eq. 1 specifies our likelihood, and Eq. 2 specifies our prior. and 2 is easily implemented, as is shown in Box 1. First, the
This completes the specification of the Bayesian model in variables of the Gamma prior, alpha and beta, are defined
mathematical terms. The next section shows how to easily im- as real numbers of type real, and both are assigned our
plement the exponential model in the Stan modeling language. chosen values of 1.0. Note that unlike the variables in the
data and parameters blocks, variables defined in the
model block are local variables. This means that their scope
Stan code To implement this model in Stan, we first open a
does not extend beyond the block in which they are defined; in
new text file in any basic text editor. Note that the line num-
less technical terms, other blocks do not Bknow^ about vari-
bers in the code-text boxes in this article are for reference and
ables initialized in the model block.
are not part of the actual Stan code. The body of every code
After having defined these local variables, the next part of
block is delimited using curly braces {}; Stan programming
the model block defines a sampling statement. The sampling
statements are placed within these curly braces. All statements
statement lambda ~ gamma(alpha,beta) indicates that
must be followed by a semicolon.
the prior on lambda is sampled from a Gamma distribution
In the text file, we first declare a data block as is shown in
with rate and shape parameters alpha and beta, respective-
Box 1. The data code block stores all of the to-be-modeled
ly. Note that sampling statements contain a tilde character (~),
variables containing the user’s data. In this example, we can see
distinct from the assignment character (<-) in Stan. The next
that the data we will pass to the Stan program are contained in a
statement, Y ~ exponential(lambda), is also a sam-
vector of size LENGTH. It is important to note that the data are
pling statement and indicates that the data Y are exponentially
not explicitly defined in the Stan program itself. Rather, the Stan
distributed with rate parameter lambda.
program is interfaced via the command line or an alternative
The final block of code in the Stan file is the generated
method (like RStan), and the data are passed to the Stan program
quantities block. This block can be used, for example, to
in that way. We will describe this procedure in a later section.
perform what is referred to as a posterior predictive check.
Box 1 Stan code for the exponential model (in all Boxes,
The purpose of the check is to determine whether the model
line numbers are included for illustration only)
accurately fits the data in question; in other words, this lets us
compare model predictions with the observed data. Box 1
shows how this is accomplished in Stan. First, a real-number
variable named pred is created. This variable will contain the
predictions of the model. Next, the exponential random num-
ber generator (RNG) function exponential_rng takes as
input the posterior samples of lambda and outputs the pos-
terior prediction. The posterior prediction is returned from
Stan and can be used outside of Stan—for example, to com-
pare the predictions to the actual data in order to assess visu-
ally how well the model fits the data.
This completes the Stan model. When all the code has been
entered into the text file, we save the file as
exponential.stan. Stan requires that the extension .stan
be used for all Stan model files.

R code Stan can be interfaced from the command line, R,

Python, MATLAB, or Julia.1 In this article, we will describe
how to use the R interface for Stan, called RStan (Stan
Development Team, 2015). Links to online instructions for
The next block of code is the parameters block, in how to install R and RStan were given earlier.
which all parameters contained in the Stan model are declared.
Our exponential model has only a single parameter λ. Here, 1
Julia is a relatively new high-level programming language for scientific
that parameter lambda is a real number of type real and is and technical computing, similar in nature to R, MATLAB, and Python
bounded on the interval [0, ∞), so we must constrain our (http://julialang.org/).
Behav Res (2017) 49:863–886 867

In this section, we will be first simulating data and then fitting Box 2 shows the R code that will run the parameter
the model to those simulated data. This is in contrast to most recovery example. The first three lines of the R code clear
real-world applications, in which models are fit to the actual the workspace (line 1), set the working directory (line 2),
observed data from an experiment. It is good practice before and load the RStan library (line 3). Then we generate some
fitting a model to real data to fit the model to simulated data simulated data, drawing 500 exponentially distributed
with known parameter values and try to recover those values. If samples (line 5) assuming a rate parameter, lambda, equal
the model cannot recover the known parameter values of the to 1. These simulated data, dat, will then be fed into Stan to
model that generated the simulated data, then it will never be obtain parameter estimates for lambda. If the Stan
able to be fitted with any confidence to real observed data. This implementation is working correctly, we should obtain a
type of exercise is usually referred to as parameter recovery. posterior distribution of λ that is centered over 1. So far, all
Here, this also serves us well in a tutorial capacity. of this is just standard R code.

Box 2 R code for running the exponential model in Stan and

for retrieving and analyzing the returned posterior samples

The Stan model described earlier (exponential.stan) from an exponential) the name Y in line 8 so that Stan knows
is run via the stan function. This is the way that R Btalks^ to that these are the data. Stan also expects a variable named
Stan, tells it what to run, and gets back the results of the Stan LENGTH to be holding the length of the data vector Y. We
run. The first argument of the function, file, is a character assign the variable len (the length of dat computed in line
string that defines the location and name of the Stan model 6) the name LENGTH. This is the way that R feeds data into
file. This is simply the Stan file from Box 1. The data argu- Stan. Next, the warmup argument defines the number of steps
ment is a list containing the data to be passed to the Stan used to automatically tune the sampler in which Stan opti-
program. Stan will be expecting a variable named Y to be mizes the HMC algorithm. These samples can be discarded
holding the data (see line 3 of the Stan code in Box 1). We afterward and are referred to as warmup samples. The iter
assign the variable dat (in this case, our simulated draws argument defines the total number of iterations the algorithm
868 Behav Res (2017) 49:863–886

will run. Choosing the number of iterations and warmup steps package in R that is loaded automatically when R is opened.
usually proceeds by starting with relatively small numbers and The left panel of Fig. 1 shows that the autocorrelation drops to
then doubling them, each time checking for convergence values close to zero at around lags of six for the samples
(discussed below). It is recommended that warmup be half returned by Stan. The Metropolis–Hastings algorithm has
of iter. The chains argument defines the number of inde- slightly higher autocorrelation but is still reasonable in this
pendent chains that will be consecutively run. Usually, at least example.
three chains are run. After running the model, the samples are High autocorrelation indicates that the sampler is not
returned and assigned to the fit object. efficiently exploring the posterior distribution. This can
A summary of the parameter distributions can be obtained be overcome by simply running longer chains. By run-
by using print(fit)2 (line 10), which provides posterior ning longer chains, the sampler is given the chance to
estimates for each of the parameters in the model. Before any explore more of the distribution. The technique of run-
inferences can be made, however, it is critically important to ning longer chains, however, is sometimes limited by
determine whether the sampling process has converged to the memory and data storage constraints. One way to run
posterior distribution. Convergence can be diagnosed in sev- very long chains and reduce memory overhead is to use
eral different ways. One way is to look at convergence statis- a technique called thinning, which is done by saving
^ (Gelman &
tics such as the potential scale reduction factor, R every nth posterior sample from the chain and
Rubin, 1992), and the effective number of samples, Neff discarding the rest. Increasing n reduces autocorrelation
(Gelman et al., 2013), both of which are output in the summa- as well as the resulting size of the chain. Although
ry statistics with print(fit). A rule of thumb is that when thinning can reduce autocorrelation and chain length, it
^ is less than 1.1, convergence has been achieved; otherwise,
R leads to a linear increase in computational cost with
the chains need to be run longer. The Neff statistic gives the increases in n. For example, if one could only save 1,
number of independent samples represented in the chain. For 000 samples, but needed to run a chain of 10,000 sam-
example, a chain may contain 1,000 samples, but this may be ples to effectively explore the posterior, one could thin
equivalent to having drawn very few independent samples by ten steps, and those 1,000 samples would have lower
from the posterior. The larger the effective sample size, the autocorrelation than if 1,000 samples were generated
greater the precision of the MCMC estimate. To give an esti- without thinning. If memory constraints are not an is-
mate of an acceptable effective sample size, Gelman et al. sue, however, it is advised to save the entire chain
(2013) recommended an Neff of 100 for most applications. (Link & Eaton, 2011).
Of course, the target Neff can be set higher if greater precision Another diagnostic test that should always be per-
is desired. formed is to plot the chains themselves (i.e., the posterior
Both the R ^ and Neff statistics are influenced by what is sample at each iteration of the MCMC). This can be used
referred to as autocorrelation. To give an example, adjacent to determine whether the sampling process has converged
samples usually have some amount of correlation, due to the to the posterior distribution; it is easily performed in R
way that MCMC algorithms work. However, as the samples using the traceplot function (part of the RStan pack-
become more distant from each other in the chain, this corre- age) on line 16. The left panel of Fig. 2 shows the sam-
lation should decrease quickly. The distance between succes- ples from Stan, and the right panel shows the samples
sive samples is usually referred to as the lag. The autocorre- from Metropolis–Hastings. The researcher can use a few
lation function (ACF) relates correlation and lag. The values criteria to diagnose convergence. First, as an initial visual
of the ACF should quickly decrease with increasing lag; ACFs diagnostic, one can determine whether the chains look
that do not decrease quickly with lag often indicate that the Blike a fuzzy caterpillar^ (Lee & Wagenmakers, 2014)—
sampler is not exploring the posterior distribution efficiently do they have a strong central tendency with evenly dis-
^ values and decreased Neff values. tributed aberrations? This indicates that the samples are
and result in increased R
not highly correlated with one another and that the sam-
The ACF can easily be plotted in R on lines 12 and 14. The
pling algorithm is exploring the posterior distribution ef-
separate chains are first collapsed into a single chain with
ficiently. Second, the chains should also not drift up or
as.matrix(fit) (the as.matrix function is part of
down, but should remain around a constant value. Lastly,
the base package in R), and the ACF of lambda is plotted
it should be difficult to distinguish between individual
with acf(mcmc_chain[,'lambda']), shown in Fig. 1.
chains. Both panels of Fig. 2 clearly demonstrate all of
The acf function is part of the stats package, a base
these criteria, suggesting convergence to the posterior
2
distribution.
The print function behaves differently given different classes of ob-
Once it is determined that the sampling process has con-
jects in R. For the Stan fit object, it prints a summary table. The RStan
library must be loaded for this behavior to occur (the library defines the verged to the posterior, we can then move on to analyzing the
fit object in R). parameter estimates themselves and determining whether the
Behav Res (2017) 49:863–886 869

Fig. 1 Autocorrelation function

of the rate parameter

model can fit the observed (in this case, simulated) data. Lines the samples as a histogram. The right panel of Fig. 3
18 through 20 of the R code (Box 2) show how the posterior shows the posterior distribution of λ. The 95 % highest
predictive can be obtained by plotting the data (dat) as a density interval (HDI) is also depicted. The HDI is the
histogram and then overlaying the density of predicted values, smallest interval that can be obtained in which 95 % of
pred. The left panel of Fig. 3 shows that the posterior pre- the mass of the distribution rests; this interval can be
dictive density (solid line) fits the data (histogram bars) quite obtained from the summary statistics output by
well. Lastly, line 22 of the R code shows how to plot the print(fit). The HDI is different from a confidence
posterior distribution of λ with the following command: interval because values closer to the center of the HDI
plot(density(mcmc_chain[,'lambda'])). There are Bmore credible^ than values farther from the center
are two steps to this command. First, the density function, a (e.g., Kruschke, 2011). As the HDI increases, uncertainty
base function in R, is called. This function estimates the den- about the parameter value also increases. As the HDI de-
sity of the posterior distribution of λ from the MCMC samples creases, the range of credible values also decreases, there-
held in mcmc_chain[,'lambda']. Second, the plot by decreasing the uncertainty. Figure 3 shows that 95 %
function, also part of the base R distribution, is called, which of the mass of λ is between 0.92 and 1.09, indicating that
outputs a plot of the density plot of the MCMC samples. It is parameter recovery was successful, since the simulated
also possible to call the histogram function, hist, and plot data were generated with λ = 1.

Fig. 2 Samples from each chain

as a function of iteration
generated by Stan (left) and
Metropolis–Hastings (right)
870 Behav Res (2017) 49:863–886

Fig. 3 A posterior predictive

check (left), where the solid line
represents the predictions and the
histogram bars represent the data,
and the posterior distribution of λ
(right)

As we have just seen, the Stan model successfully recov- the parameter recovery was successful, since most values fall
ered the single parameter value of λ that was used to generate close to the diagonal. If we use the classic Metropolis–
the exponentially distributed data. Oftentimes, parameter re- Hastings algorithm, we can see in the right panel of Fig. 4 that
covery is more rigorous, testing recovery over a range of its performance is very similar to our parameter recovery in
parameter values. The parameter recovery process can be re- Stan.
peated many times, each time storing the actual and recovered
parameter values. A plot can then be made of the actual pa-
rameter values as a function of the recovered parameter User-defined distributions in Stan
values. The values should fall close to the diagonal (i.e., the
recovered parameters should be close to the actual parame- Thus far we have implemented an exponential model in Stan
ters). This also lets us explore how well Stan does over a range using built-in probability distributions for the likelihood and
of parameterizations of the exponential. the prior. Although there are dozens of built-in probability
To better test the Stan model in this way, we simulated 200 distributions in Stan (as in other Bayesian applications), some-
sets of data over a range of values of λ. The λ parameter times the user requires a distribution that might not already be
values were drawn from a truncated normal distribution with implemented. This will often be the case for specialized dis-
mean 2.5 and standard deviation 0.25. Each data set contained tributions of the kind assumed in many cognitive models. But
500 data points. The Stan model was fit to each data set, before moving on to complicated cognitive models, we first
saving the mean of the posterior of lambda for each fit. The want to present an example using the exponential model, but
left panel of Fig. 4 shows the parameter recovery for the ex- without the benefit of using Stan’s built-in probability distri-
ponential distribution implemented in Stan. We can see that bution function.

Fig. 4 Actual parameter values

plotted as a function of the
recovered parameter values for
Stan (left) and Metropolis–
Hastings (right)
Behav Res (2017) 49:863–886 871

An example with the exponential distribution, redux log likelihood, so we simply take the log of Eq. 3 in
our Stan implementation.
The exponential distribution is a built-in distribution in Stan, Having mathematically defined the (log) likelihood func-
and therefore it is not necessary to implement it as a user- tion, we can now implement it in Stan. Once we implement
defined function. We do so here for tutorial purposes. the user-defined function, we can then call it just as we would
To begin, the likelihood function of the exponential distri- call a built-in function. In this example, we will replace the
bution is built-in distribution, exponential, in the sampling state-
ment Y ~ exponential(lambda) in line 14 of Box 1

N with a user-defined exponential distribution.
P yλ ¼ ∏i¼1 λe−λyi ; ð3Þ
Box 3 shows how this is accomplished. To add a user-
defined function, it is first necessary to define a functions
where λ is the rate parameter of the exponential, y is code block. The functions block must come before all
the vector of data points, each data point yi ∈ [0, ∞), and other blocks of Stan code, but when there are no user-
N is the number of data points in y. Stan requires the defined functions this block may be omitted entirely.

Box 3 Stan code of exponential model as a user-defined

function

When dealing with functions that implement proba- considered. First, Stan requires the name of any func-
bility distributions, three important rules must be tion that implements a probability distribution to end
872 Behav Res (2017) 49:863–886

with _log; 3 the _log suffix permits access to the distribution (Box 1). This is not surprising, given that
increment_log_prob function (an internal function the built-in and user-defined exponential distributions
that can be ignored for the purposes of this tutorial), are mathematically equivalent.
where the total log probability of the model is comput-
ed. Second, when calling such defined functions, the An example with the LBA model
_log suffix must be dropped. Lastly, when naming a
user-defined function, the name must be different from In this section, we briefly describe the LBA model and
a n y b u i l t - i n f u n c t i o n w he n d e f i n i n g i t i n t h e how it can be utilized in a Bayesian framework, before
functions block and it must different from any describing how it can be implemented in Stan.
built-in function when the _log suffix is dropped. For Accumulator models attempt to describe how the evi-
example, suppose that we named our user-defined func- dence for one or more decision alternatives is accumu-
tion exp_log. When called, this would be different lated over time. LBA predicts response probabilities as
from the built-in exponential()function, but unfor- well as distributions of the response times, much like
tunately it would conflict with another built-in function, other accumulator models. Unlike some models, which
exp, and result in an error. With these rules in mind, assume a noisy accumulation of evidence to threshold
we can now properly name our user-defined exponential within a trial, LBA instead assumes a linear and con-
likelihood function. Line 2 of Box 3 shows that we tinuous accumulation to threshold —hen ce, the
have named the exponential likelihood function Bballistic^ in LBA. LBA assumes that the variabilities
newexp_log. This name works because there are no in response probabilities and response times are deter-
built-in function called newexp and no built-in distri- mined by between-trial variability in the accumulation
bution newexp_log. rate and other parameters.
The newexp_log function returns a real number of LBA assumes a separate accumulator for each re-
type real. The first argument of the function is the sponse alternative i. A response is made when the evi-
data vector x, and the second argument is the rate pa- dence accrued for one of these alternatives exceeds
rameter lam. Note that the scope of the variables within some predetermined threshold, b. The rate of accumula-
each function is local. Within the function itself, another tion of evidence is referred to as the drift rate. The
local variable is defined called prob, which is a vector LBA model assumes that the drift rate, di, is sampled
of the same length as the data vector x. For each ele- on each trial from a normal distribution with mean vi
ment in the data vector, a probability density will be and standard deviation s. Figure 5 illustrates an example
computed and stored in prob, implementing the ele- in which the drift for response m1 is greater than that
ments in Eq. 3. As we noted previously, Stan requires
the log likelihood, so instead of multiplying the proba-
bility densities, we take the natural logs and sum them.
The sum of the log densities will be assigned to the
variable lprob.4 Then lprob value—representing the
log likelihood of the exponential distribution (log of
Eq. 3)—is then returned by the function. After this,
lines 12 through 30 are identical to the code shown in
Box 1 lines 1 through 19, with the exception of the call
to newexp (Box 3 line 25) instead of the built-in Stan
exponential distribution (Box 1 line 14).
We found that this implementation produced exactly
the same results as the implementation using the built-in

3
This naming convention holds for user-defined and as well as built-in
functions. For example, in line 14 of Box 1, in the sampling statement Y ~
exponential(lambda) we are actually calling the built-in function
exponential_log by dropping the _log suffix.
4
Stan includes C++ libraries designed for efficient vector and matrix
operations, and therefore it is often more efficient to use the vectorized
form of a function. For example, the log likelihood can be computed in a
single line with lprob <- sum(log(lam) - x*lam);. For simplic-
ity, we do not consider vectorization any further, and instead refer readers Fig. 5 Graphical depiction of the linear ballistic accumulator (LBA)
to the Stan manual. model (Brown & Heathcote, 2008)
Behav Res (2017) 49:863–886 873

for response m2. In this example trial, the participant Stan. If we consider the vector of binary responses, R,
will make response m1 because that accumulator reaches and response times, T, for each trial i and a total of N
its threshold, b, before the other accumulator. trials, the likelihood function is given by
Each accumulator starts with some a priori amount of
N
evidence. This start point is assumed to vary across
PðT ; RjθÞ ¼ ∏ LBAtrunc ðT i ; Ri jθÞ: ð8Þ
trials. The start-point variability is assumed to be uni- i¼1
formly distributed between 0 and A (A must be less
than the threshold b). To implement the model in a Bayesian framework, priors are
Like other accumulator models, LBA also makes the placed on each of the parameters of the LBA model. We chose
assumption that there is a period of nondecision time, τ, priors that one might encounter in real-world applications and
that occurs before evidence begins to accumulate (as based them on Turner et al. (2013). First, to make the model
well as after, leading to whatever motor response is identifiable, we set s to a constant value (Donkin, Brown, &
required). In this implementation, as in some other Heathcote, 2009). Here, we fix s at 1. We then assume that the
LBA implementations, we assume that the nondecision priors for the drift rates are truncated normal distributions:
time is fixed across trials. vi eNormalð2; 1Þϵ ð0; ∞Þ: ð9Þ
The following equations are a formalization of these pro-
cessing assumptions, showing the likelihood function for the We assume a uniform prior on nondecision time:
LBA (see Brown & Heathcote, 2008, for derivations). Given
the processing assumptions of the LBA, the response time, t, τeUniformð0; 1Þ: ð10Þ
on trial j is given by
The prior for the maximum starting evidence A is a truncated
b−ai normal distribution:
t j ¼ τ þ min : ð4Þ
i di
AeNormalð:5; 1Þϵ ð0; ∞Þ: ð11Þ
Let us assume that θ is the full set of LBA parameters θ = {v1,
v2,b,A,s,τ}. Then the joint probability density function of mak- To ensure that the threshold, b, is always greater than the
ing response m1 at time t (referred to as the defective PDF) is starting point a, we reparameterize the model by shifting b
by k units away from A. We refer to k as the relative threshold.
LBAðm1 ; tjθÞ ¼ f ðt−τ jv1 ; b; A; sÞ½1−F ðt−τ jv2 ; b; A; sÞ; ð5Þ Thus, we do not model b directly, but as the sum of k and A,
and assume that the prior for k is a truncated normal:
and the joint density for making response m2 at time t is
keNormalð:5; 1Þϵð0; ∞Þ: ð12Þ
LBAðm2 ; tjθÞ ¼ f ðt−τ jv2 ; b; A; sÞ½1−F ðt−τ jv1 ; b; A; sÞ; ð6Þ

where f and F are the probability density function (PDF) and

the cumulative distribution function (CDF) of the LBA distri- Stan code The Stan code for the LBA likelihood function is
bution, respectively. We refer the reader to Brown and shown in Box 4. Note that lines 2 through 102 are omitted
Heathcote for the full mathematical descriptions and justifica- from the code for the sake of brevity. These omitted lines
tions of the LBA’s CDF F(t) and PDF f(t). For the Stan imple- contain the code implementing the PDF and CDF functions
mentation details of these distributions, please see the of the LBA, and can be found in the Appendix. The likelihood
Appendix. function of the LBA is named lba_log (recall that the _log
Negative drift rates may arise in this model, because drifts suffix is only used in its definition and is dropped when the
across trials are drawn from a normal distribution. If both drift function is called in the model block) and accepts the follow-
rates are negative, this will lead to an undefined response time. ing arguments: a matrix, RT, whose first column contains the
2 −v observed response times, and whose second column contains
i
The probability of this happening is ∏ Φ . If we assume
i¼1 s the observed responses; the relative threshold, k; the maxi-
that at least one drift rate will be positive, then we can truncate mum starting evidence, A; a vector holding the drift rates, v;
the defective PDF: the standard deviation of the drift rates, s; and the nondecision
time, tau. Note that the Stan implementation of the LBA
LBAðmi ; tjθÞ differs from other Bayesian implementations of accumulator

LBAtrunc mi ; t θ ¼ 2 −v : ð7Þ models, which treat negatively coded response times as errors
i
1−∏i¼1 Φ and positively coded response times as correct responses (e.g.,
s
Wabersich & Vandekerckhove, 2014). This is a major advan-
Thus, this model assumes there are zero undefined re- tage of the LBA and the implementation that we present here,
sponse times. This will be the model we implement in in that it allows for any number of response choices.
874 Behav Res (2017) 49:863–886

Box 4 Likelihood function of the LBA implemented in Stan

First, the local variables to be used in the function are defined arising from taking the natural logarithm of extremely small
(lines 105–111 in Box 4). Then, to obtain the decision threshold values of the defective PDF. Once all of the densities are com-
b, k is added to A. On each iteration of the for loop, the puted, the likelihood is obtained by taking the sum of the natural
decision time t is obtained by subtracting the nondecision time logarithms of the densities in prob and returning the result.
tau from the response time RT. If the decision time is greater Box 5 continues the code from Box 4 and, as in our
than zero, then the defective PDF is computed as in Eqs. 5 and 6, earlier example, shows the data block defining the
and the CDF and PDF functions described earlier accordingly data variables that are to be modeled. The LENGTH
are called on lines 120 and 122 (see the Appendix for the Stan variable defines the number of rows in RT, whose first
implementation of each). The defective PDF associated with column contains response times and whose second col-
each row in RT is stored in the prob array. If the value of the umn contains responses. A response coded as 1 corre-
defective PDF is less than 1 × 10–10, then the value stored in sponds to the first accumulator finishing first, and a
prob is set to 1 × 10–10; this is to avoid underflow problems response coded as 2 corresponds to the second
Behav Res (2017) 49:863–886 875

accumulator finishing first. One of the advantages of the number of choices in the task and must be equal to
LBA is that it can be applied to tasks with more than the length of the drift rate vector defined in the
two choices. The NUM_CHOICES variable defines the parameters block.

Box 5 Continuation of Stan code for the LBA model

The parameters block shows that the parameters functions block, along with all of the other user-
are all real numbers of type real and include the rel- defined functions, but has been omitted in Box 4 for
ative threshold k, the maximum starting evidence A, the brevity. The code and explanation for this function can
nondecision time tau, and the vector of drift rates v. be found in the Appendix. This code is based on the
All parameters have normal priors truncated at zero, and Brtdists^ package for R (Singman et al., 2016), which
therefore are constrained with <lower=0>. can be found online (https://cran.r-project.org/web/
The Bayesian LBA model is implemented in the packages/rtdists/index.html). We note that porting code
model block, which shows that the priors for the rela- from R to Stan is relatively straightforward, as they
tive threshold k and the maximum threshold A are both both are geared toward vector and matrix operations
assumed to be normally distributed with a mean of .5 and transformations.
and standard deviation 1. The prior for nondecision time
is assumed to be normally distributed with a mean of .5 R code Box 6 shows the R code that runs the LBA model
and standard deviation .5, and the priors for drift rates in Stan. This should look very similar to the code we
are distributed normally with means of 2 and standard used for the simple exponential example earlier. We
deviations of 1. The data, RT, is assumed to be distrib- again begin by clearing the workspace, setting the
uted according to the LBA distribution, lba. working directory, and loading the RStan library. After
The implementation of the generated simulating the data from the LBA distribution using a
quantities block for the LBA uses a user-defined file called Blba.r^ (see the website listed above for the
function called lba_rng, which generates random sim- code), which contains the rlba function that generates
ulated samples from the LBA model given the posterior random samples drawn from the LBA distribution, the
parameter estimates. The function is also defined in the model is then run on line 10.
876 Behav Res (2017) 49:863–886

Box 6 R code that runs the LBA model in Stan

1 rm(list=ls())
2 setwd("~/LBA/")
3 source('lba.r')
4 library(rstan)
5 #make simualated data
6 out = rlba(500,1,1,c(2,1),1,.5)
7 rt = cbind(out$rt,out$resp)
8 len = length(rt[,1])
9 #run the Stan model
10 fit <- stan(file = 'lba.stan',
data = list(RT=rt,LENGTH=len,NUM_CHOICES=2),
warmup = 750,
iter = 1500,
chains = 3)

As we noted in our earlier example, in real-world applica- by Metropolis–Hastings. In the left panel, the Stan chains show
tions of the model, the data would not be simulated but would good convergence: They look like Bfuzzy caterpillars,^ it is dif-
be collected from a behavioral experiment. We use simulated ficult to distinguish one chain from the others, and the chains do
data here for convenience of the tutorial and because we are not drift up and down. In the right panels of Fig. 7, the
interested in determining whether the Bayesian model can re- Metropolis–Hastings chains clearly do not meet any of the nec-
cover the known parameters used to generate the simulated data essary criteria for convergence. The only way we found to cor-
(parameter recovery). With just some minor modification, the rect for this was to thin by at least 50 or more steps.
code we provide using simulated data can be generalized to an Figure 8 shows the results of a larger parameter recovery
application to real data. For example, real data stored in a text study for Stan and the Metropolis–Hastings algorithm. In this
file or spreadsheet can be read into R and then formatted and exploration, 200 simulated data sets containing 500 data points
coded in the same way as the simulated data. each were generated, each with a different set of parameter
The Bayesian LBA model can be validated in a fashion values. The parameter values were drawn randomly from a trun-
similar to that for the Bayesian exponential model. Figure 6 cated normal with a lower bound of 0, a mean of 1, and a
shows the autocorrelation function for each parameter. For standard deviation of 1. The Stan model was fit to each data
Stan, autocorrelation across all parameters became undetect- set, and the resulting mean of the posterior distribution for each
able after approximately 15 iterations. The right panels shows parameter was saved. Figure 8 shows the actual parameter values
that the Metropolis–Hastings algorithm had high autocorrela- plotted against the recovered parameter values. For Stan, most of
tion for long lags, indicating that the sampler was not taking the points fall along the diagonal, indicating that parameter re-
independent samples from the posterior distribution. This high covery was successful. For Metropolis–Hastings, it is clear visu-
autocorrelation leads to lower numbers of effective samples ally that parameter recovery is poorer—this is due to the afore-
and longer convergence times. The Neff values returned by mentioned difficulty this algorithm has with the inherent corre-
Metropolis–Hastings across all chains were on average 27 for lations between parameters in the LBA model.
each parameter. This means that running 4,500 iterations (three We note that parameter recovery in sequential-sampling
chains of 1,500 samples) is equivalent to drawing only 27 models is often difficult if the experimental design is uncon-
independent samples. On the other hand, Stan returned on av- strained, like the one we present here, which benefited from a
erage 575 effective samples for each parameter after 4,500 large number of data points for each data set as well as priors
^ for all parameters was above 1.1 for
iterations. In addition, R that were similar to the actual parameters that had generated
Metropolis–Hastings, and below 1.1 for Stan, indicating that the data. We present this parameter recovery as a sanity check
the chains converged for Stan but not for Metropolis–Hastings. to ensure that Stan can recover sensible parameter values un-
The deleterious effect of the high autocorrelation of der optimal conditions. Obviously, this will not be the case in
Metropolis–Hastings in comparison to the low autocorrelation real-world applications, and therefore, great care must be tak-
of Stan is apparent in Fig. 7. The left panels show the chains en when designing experiments to test sequential-sampling
produced by Stan, and the right panels show the chains produced models like the LBA.
Behav Res (2017) 49:863–886 877

Fig. 6 Autocorrelation function

(ACF) of each parameter, plotted
Stan Metropolis-Hastings
as a function of lag, for Stan (left) 1.0
and Metropolis–Hastings (right) 0.8

v1 ACF
0.6
0.4
0.2
0.0
1.0
0.8

v2 ACF 0.6
0.4
0.2
0.0
1.0
0.8
b ACF

0.6
0.4
0.2
0.0
1.0
0.8
A ACF

0.6
0.4
0.2
0.0
1.0
0.8
ACF

0.6
0.4
0.2
0.0
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

Lag

Better Metropolis–Hastings sampling might be achieved A reason behind the poor sampling of Metropolis–Hastings
by careful adjustment and experimentation with the proposal is the correlated parameters of the LBA. The half below the
step process. Here, the proposal step was generated by sam- diagonal of Fig. 9 shows the joint posterior distribution for each
pling from a normal distribution with a mean equal to the parameter pair of the LBA for a given set of simulated data, and
current sample and a standard deviation of .05. Increasing the half above the diagonal gives the corresponding correlations.
the standard deviation increases the average distance between Each point in each panel in the lower half of the grid represents a
the current sample and the proposal, but decreases the proba- posterior sample from the joint posterior probability distribution
bility of accepting the proposal. We found that different set- of a particular parameter pair for the LBA model. For example,
tings of the standard deviation largely led to autocorrelations the bottom left corner panel shows the joint posterior probability
similar to those we have presented here. The only thing we distribution between τ and v1. We can see that this distribution
found that led to improvements in autocorrelation was thin- has negatively correlated parameters. The upper right corner
ning. Thinning by 50 steps led to autocorrelation dropping to panel of the grid confirms this, showing the correlation between
nonsignificant values at around lags of 40. At 75 steps, τ and v1 to be –.45. Five joint distributions have correlations
Metropolis–Hastings’s performance was similar to that of with absolute values well above .50 (v1–v2, v1–b, v2–b, v2–τ, and
^ values.
Stan, resulting in similar autocorrelation, Neff, and R A–τ). These correlations in parameter values cause some
878 Behav Res (2017) 49:863–886

Fig. 7 Samples of each Stan Metropolis-Hastings

parameter for each iteration of 4
each chain for Stan (left) and
Metropolis–Hastings (right) 3

v1
2

0
4
3

v2
2
1
0
3.0
2.5
2.0
b 1.5
1.0
0.5
0.0

2.0
1.5
A

1.0
0.5
0.0

1.0
0.8
0.6
0.4
0.2
0.0
0 500 1000 1500 0 500 1000 1500

Iteration

MCMC algorithms such Gibbs sampling and the Metropolis– the posterior distribution when compared to Metropolis–
Hastings to perform poorly. On the other hand, Stan does not Hastings, due in large part to the correlated parameters of the
drastically suffer from the model’s correlated parameters. LBA model. Whereas Stan was designed with the intention to
In summary, the Stan implementation of the LBA model handle these situations properly, standard MCMC techniques
shows successful parameter recovery and efficient sampling of such as the Metropolis–Hastings algorithm were not, and they

Fig. 8 Actual parameter values,

plotted as a function of the
parameters recovered by Stan
(left) and Metropolis–Hastings
(right)
Behav Res (2017) 49:863–886 879

Fig. 9 The lower left of the grid

shows the joint posterior
probability distributions for each
pair of key parameters in the LBA
model fitted to a simulated set of
data. Each point in each panel
represents a posterior sample
from the joint posterior
probability distribution of a
particular parameter pair for the
LBA model. For example, the
bottom left corner panel shows
the joint posterior probability
distribution between τ and v1. The
upper right of the grid gives the
correlation between each
parameter pair. For example, the
upper right corner shows the
correlation between τ and v1 to be
–.45

do not converge to the posterior distribution in any sort of reliable The model we consider assumes that the vector of response
manner. times for each participant i in condition j is distributed according
to the LBA:

Fitting multiple subjects in multiple conditions: %%scale85%RT i; j eLBA k i ; Ai ; v1i; j ; v2i; j ; s; τ i ; ð13Þ
a hierarchical extension of the LBA model where, as before, the responses are coded as 1 and 2, corre-
sponding to each accumulator; ki is the relative threshold; Ai is
The simple LBA model just described was designed for a single
the maximum starting evidence; v1i,j and v2i,j are the mean drift
subject in a single condition. This is never the case in any real-
rates for each accumulator; s is the standard deviation, held
world application of the LBA model. In this section, we describe
constant across participants and conditions; and τi is the nonde-
and implement an LBA model that is designed for multiple sub-
cision time. As before, we assume that s is fixed at 1.0 and that
jects in multiple conditions. The model will include parameters
the prior on each parameter follows a truncated normal distribu-
that model performance at both the group and individual levels.
tion with its own group mean μ and standard deviation σ.
The Bayesian approach allows both the group- and individual-
level parameters to be estimated simultaneously. This type of
k i eNormal uk ; σk ϵ ð0; ∞Þ ð14Þ
model is called a Bayesian hierarchical model (e.g., Kruschke,

2011; Lee & Wagenmakers, 2014). In a Bayesian hierarchical Ai eNormal uA ; σA ϵ ð0; ∞Þ ð15Þ
model, the parameters for individual participants are informed by

the group parameter estimates. This reduces the potential for the v1i; j eNormal μvj1 ; σvj1 ϵ ð0; ∞Þ ð16Þ
individual parameter estimates to be sensitive to outliers, and

decreases the overall number of participants necessary to achieve v2i; j eNormal μvj2 ; σvj2 ϵ ð0; ∞Þ ð17Þ
reliable parameter estimates.
880 Behav Res (2017) 49:863–886

τ i eNormalðuτ ; στ Þ ϵ ð0; ∞Þ ð18Þ This model is conceptually easy to implement in Stan, as

Having defined the priors at the individual level, the is shown in Box 7, even if it requires a good bit more
next step in designing a hierarchical model is to define coding than the simple version. Lines 3 through 16 define
the group-level priors on the parameters in Eqs. 14–18. the group-level priors shown in Eqs. 19 through 22. Lines
The group-level priors that we choose are grounded on 13 through 16 define a for loop in which the group-level
what one might encounter in real-world situations and mean drift rates are estimated for each condition j and
are based on Turner et al. (2013). The priors for the each accumulator n. The individual-level parameters are
group-level means for k, A, and τ are assumed be con- defined in lines 18 through 28; for example, line 19 indi-
stant across conditions: cates that for each subject i, the prior on k[i] is dis-
tributed according to a truncated normal distribution on
uk ; uA eNormalð:5; 1Þ ϵð0; ∞Þ; ð19Þ the interval (0, ∞) with mean k_mu and standard devia-
τ tion k_sigma. Likewise, lines 23 through 25 define the
u eNormalð:5; :5Þ ϵ ð0; ∞Þ: ð20Þ
mean drift rates on each accumulator n for each subject i
The priors for the group-level drift rates for each condition are in condition j; for example, line 24 indicates that the
given by the following: prior on the individual-level mean drift rates,
v[i,j,n], is distributed according to a normal distribu-
v1j ; v2j eNormalð2; 1Þ ϵ ð0; ∞Þ: ð21Þ tion with a group-level mean v_mu[j,n] and standard
deviation v_sigma[j,n]. Thus, the individual-level
Following Turner et al., the priors for the group-level standard drift rates are drawn from a normal distribution with
deviations are each assumed to be weakly informative: the group-level means. This is what makes the model
hierarchical and allows for the simultaneous fitting of
σk ; σA ; σvj1 ; σvj2 ; στ eGammað1; 1Þ ϵ ð0; ∞Þ: ð22Þ group-level and individual-level parameters.

Box 7 Hierarchical LBA model implemented in Stan

1 model {
2
3 k_mu ~ normal(.5,1)T[0,];
4 A_mu ~ normal(.5,1)T[0,];
5 tau_mu ~ normal(.5,.5)T[0,];
6
7 k_sigma ~ gamma(1,1);
8 A_sigma ~ gamma(1,1);
9 tau_sigma ~ gamma(1,1);
10
11 for (j in 1:NUM_COND){
12 for (n in 1:NUM_CHOICES){
13 v_mu[j,n] ~ normal(2,1)T[0,];
14 v_sigma[j,n] ~ gamma(1,1);
15 }
16 }
17
18 for (i in 1:NUM_SUBJ){
19 k[i] ~ normal(k_mu,k_sigma)T[0,];
20 A[i] ~ normal(A_mu,A_sigma)T[0,];
21 tau[i] ~ normal(tau_mu,tau_sigma)T[0,];
22 for(j in 1:NUM_COND){
23 for(n in 1:NUM_CHOICES){
24 v[i,j,n] ~ normal(v_mu[j,n],v_sigma[j,n])T[0,];
25 }
26 RT[i,j] ~ lba(k[i],A[i],v[i,j],1,tau[i]);
27 }
28 }
29 }
Behav Res (2017) 49:863–886 881

Fig. 10 Group-level parameter

estimates of the hierarchical LBA
model for simulated data. For the
panels plotting v1 and v2, solid
lines indicate Condition 1, dotted
lines indicate Condition 2, and
dashed lines indicate Condition 3

To test whether the model could successfully recover We then fit the hierarchical LBA model to the simulated
the parameters, we simulated 20 subjects, each with 100 data. The group-level parameter estimates are shown in
responses and response times. Each simulated subject’s Fig. 10. For the panels plotting v1 and v2, solid lines indicate
parameters were drawn from a group-level distribution. Condition 1, dotted lines indicate Condition 2, and dashed
Specifically, the maximum starting evidence parameter, lines indicate Condition 3. All other parameters were held
A, relative threshold parameter, k, and nondecision time constant across conditions. The group-level parameter esti-
parameter, τ, were drawn from a truncated normal distri- mates for the hierarchical model shown in Fig. 10 closely
bution with a mean of .5, standard deviation of .5, and align with the group-level distribution parameters used to gen-
lower bound of 0. We then varied the drift rates across erate the simulated data.
three conditions. The drift rates of the first accumulator To further illustrate the advantages of the hierarchical LBA
were drawn from a truncated normal with means of 2 model, we also fit the nonhierarchical LBA model shown in
(Condition 1), 3 (Condition 2), and 4 (Condition 3), re- Box 5 to the same set of simulated data. The nonhierarchical
spectively, all with standard deviations of 1 and lower model assumed that for each subject, the priors on k and A
bounds of 0. The mean drift rate of the second accumula- were normally distributed with a mean of .5 and standard
tor for all three conditions was drawn from a truncated deviation of 1. The prior on τ for each subject was normal
normal with a mean of 2 and standard deviation of 1. In with mean .5 and standard deviation .5. Lastly, the prior on
applications to real data, the distribution of the drift rates the drift rate for each accumulator was drawn from a normal
corresponding to the incorrect choice will usually have a distribution with a mean of 2 and standard deviation 1. Thus,
lower mean and larger standard deviation than the distri- the priors on the parameters for each subject in the nonhierar-
bution of the drift rates corresponding to the correct chical model mirrored the group-level priors in the hierarchi-
choice. cal model.

Fig. 11 Hierarchical model

parameter estimates (solid lines)
versus nonhierarchical parameter
estimates (dashed lines) for a
single subject in a single
condition
882 Behav Res (2017) 49:863–886

Figure 11 shows the hierarchical model (solid line) and Discussion

the nonhierarchical model (dotted line) individual-level
parameter estimates for the first subject in the first condi- Because parameters in cognitive models have psycholog-
tion. It is clear that there is far less uncertainty in the ical interpretations, accurately estimating those parame-
parameter estimates of the hierarchical model. This de- ters is crucial for theoretical development. Traditional
crease in uncertainty is due to a property of hierarchical parameter estimation methods find the set of parameter
models called shrinkage, through which the group esti- values that maximize the likelihood of the data or max-
mates inform the individual-level parameter estimates imize or minimize some other fit statistic. These methods
(Kruschke, 2011). Therefore, increases in sample size will often result in point estimates that do not take into ac-
usually result in better parameter estimates for both the count the uncertainty of the estimate. The Bayesian ap-
group and individual levels. This is in contrast to nonhi- proach to parameter estimation, on the other hand, treats
erarchical models, which treat subjects independently, so parameters as probability distributions, naturally
that increases in sample size will only result in better encompassing the estimate’s uncertainty. Because the es-
group-level estimates. timated uncertainty of parameters factors into the com-
plexity of the underlying model, Bayesian analysis is
also particularly well-suited to aiding model selection
between models of differing complexities. This advan-
Speeding up runtimes tage comes at a computational cost: Bayesian parameter
estimation beyond very simple models with only one or
Depending on the design of the experiment and the number of two parameters requires the use of MCMC algorithms to
observed data points per condition, hierarchical LBA create samples from the posterior distribution. Some of
implementations within Stan can have runtimes that are quite these algorithms, such as Metropolis–Hastings, require
long. For 20 subjects, each with 100 data points per condition, careful tuning to ensure that the sampling process con-
runtimes for a single chain were approximately 3–6 h on an verges to the posterior distribution. Metropolis–Hastings
Intel Xeon 2.90-GHz processor with more than sufficient and other algorithms, such as Gibbs sampling, often
RAM. If one is equipped with a multicore machine, speed- show poor performance when the parameters of the mod-
ups can be achieved by running multiple chains in parallel. el are correlated (as in cognitive models like the LBA).
This is a built-in option in Stan and can be achieved in a single Moreover, implementing custom models, such as the
line after the RStan library is loaded, with LBA, can be technically challenging in many Bayesian
o p t i o n s ( m c . c o r e s = analysis packages.
parallel::detectCores()). If Stan is prohibitively Stan (Stan Development Team, 2015) was developed to
slow, we also recommend using a method in Stan that approx- solve these issues by utilizing HMC (Duane, Kennedy,
imates the posterior, called automatic differentiation variation- Pendleton, & Roweth, 1987; Neal, 2011), which can effi-
al inference (ADVI; Kucukelbir, Tran, Ranganath, Gelman, & ciently sample from distributions with correlated dimen-
Blei, 2016). All models coded in Stan can be run using ADVI. sions, making it particularly easy to implement custom
Models that might take days to run in Stan using conventional distributions. We wrote this tutorial to show how to use
methods might take less than an hour to run using ADVI. This Stan and how to develop custom distributions in it. As
can result in massive speed-ups during the initial stages of compared to some other packages, it is relatively easy to
model development, when many iterative versions of a model implement a custom model distribution, so long as the
need to be tested. If possible, it is recommended that the NUTS likelihood function is known (see Turner & Sederberg,
algorithm be used to draw final inferences using starting points 2012). It is relatively easy to extend model
based on samples drawn using ADVI. Although ADVI is be- implementations to more complex scenarios that involve
yond the scope of this tutorial, we have included example R multiple subjects and multiple conditions by using hierar-
code to run the hierarchical LBA with ADVI. chical models.
Behav Res (2017) 49:863–886 883

The computation involved in NUTS is fairly expensive and Author note This work was supported by Grant Nos. NEI R01-
EY021833 and NSF SBE-1257098, the Temporal Dynamics of
can be slow for complex models. It should be noted that this
Learning Center (NSF SMA-1041755), and the Vanderbilt Vision
lowered speed is, by design, traded off for greater effective Research Center (NEI P30-EY008126).
sample rates. Other techniques, such as MCMC-DE (Turner
et al., 2013), which approximate some of the more expensive
computations involved in NUTS, may offer an alternative if
the sampling rate becomes an issue. Appendix
Although Bayesian parameter estimation has many advan-
tages over traditional methods, implementing the MCMC algo- The Stan implementations of the PDF and CDF of the
rithm can be technically challenging. Turnkey Bayesian infer- LBA are given in Boxes A1 and A2, respectively.
ence applications allow the researcher to work at the level of the These functions are used in the calculation of the
model and not of the sampler, but they are likewise not without likelihood function of the LBA given in Box 4 of the
issues. Stan is a viable alternative to other applications that do main text and are nothing more than implementations of
automatic Bayesian inference, especially when the researcher is the equations that Brown and Heathcote (2008) provid-
interested in distributions that are uncommon and require user ed. Here, we simply note some implementation details
implementation or when the model’s parameters are correlated. of each.

Box A1 Probability density function of the LBA, snipped

from Box 4 in the main text.

22 real lba_cdf(real t, real b, real A, real v, real s){

23
24 real b_A_tv;
25 real b_tv;
26 real ts;
27 real term_1;
28 real term_2;
29 real term_3;
30 real term_4;
31 real cdf;
32
33 b_A_tv <- b - A - t*v;
34 b_tv <- b - t*v;
35 ts <- t*s;
36 term_1 <- b_A_tv/A * Phi(b_A_tv/ts);
37 term_2 <- b_tv/A * Phi(b_tv/ts);
38 term_3 <- ts/A * exp(normal_log(b_A_tv/ts,0,1));
39 term_4 <- ts/A * exp(normal_log(b_tv/ts,0,1));
40 cdf <- 1 + term_1 - term_2 + term_3 - term_4;
41
42 return cdf;
43
44 }
884 Behav Res (2017) 49:863–886

Box A2 Cumulative distribution function of the LBA,

continued from Box A1

22 real lba_cdf(real t, real b, real A, real v, real s){

The first thing to note is that both are real-valued functions name the function that generates samples from the LBA
of type real. They both take as arguments the decision time, model lba_rng. After defining the local variables, the
t, the decision threshold, b, the maximum starting evidence, drift rates for each accumulator are drawn from the
A, the drift rate, v, and the standard deviation, s. Lines 4 normal distribution. Negative drift rates result in
through 10 in Box A1 and lines 24 through 31 in Box A2 negative response times, and drift rates of zero result in
define all local variables that will be used in each computation. undefined response times. The LBA model that we
After defining the local variables, the PDF or CDF is comput- implement assumes that at least one accumulator has a
ed and the result is returned. Some built-in functions allow for positive drift rate, and therefore no negative or
an easier computation of the PDF and CDF. The Phi function undefined response times. This is achieved in the
is a built-in Stan function that implements the normal cumu- while loop beginning on line 64, in which drift rates
lative distribution function. The exp function is the exponen- are drawn from the normal distribution until at least one
tial function, and the normal_log function is the natural drift rate is positive. The loop terminates after a maximum
logarithm of the PDF of the normal distribution, where the of 1,000 iterations if at least one positive drift rate has not
last two arguments are the mean and standard deviation, been drawn. If this is the case, a negative value is
respectively. returned, denoting an undefined response time (lines 79
Box A3 implements the LBA model in Stan. This code and 80). In practice, we have found this works very well
is based on the Brtdists^ package for R (Singman et al., and will not return negative or undefined response times
2015), which can be found online (https://cran.r-project. given a reasonable model and data. After drawing the drift
org/web/packages/rtdists/index.html). All Stan functions rates, the start points for each accumulator are drawn (line
that generate samples from a given distribution are 84). The finishing times of each accumulator are
c a l l e d r a n d o m n u m b e r g e n e r a t o r s ( R N G s ) . To computed according to the processing assumptions of
distinguish between functions, RNGs must contain the the LBA on line 85. Lastly, the response alternative and
_rng suffix. For example, the RNG for the exponential lowest positive response time are stored in the pred
distribution is called exponential_rng. Here, we vector and returned.
Behav Res (2017) 49:863–886 885

Box A3 LBA random number generator (RNG). The mod-

el assumes that on every trial at least one of the drift rates is
positive. Code is continued from Box A2.
886 Behav Res (2017) 49:863–886

References T. Townsend, & A. Eidels (Eds.), Oxford handbook of computation-

al and mathematical psychology (pp. 320–340). Oxford, UK:
Oxford University Press. doi:10.1093/oxfordhb/9780199957996.
Brooks, S., Gelman, A., Jones, G. L., & Meng, X. L. (2011). Handbook of 013.15
Markov chain Monte Carlo. Boca Raton, FL: Chapman & Hall/CRC. Plummer, M. (2003, March). JAGS: A program for analysis of Bayesian
Brown, S. D., & Heathcote, A. (2008). The simplest complete model of graphical models using Gibbs sampling. Paper presented at the 3rd
choice reaction time: Linear ballistic accumulation. Cognitive International Workshop on Distributed Statistical Computing,
Psychology, 57, 153–178. doi:10.1016/j.cogpsych.2007.12.002 Vienna, Austria.
Donkin, C., Brown, S. D., & Heathcote, A. (2009). The overconstraint of Purcell, B. A., Schall, J. D., Logan, G. D., & Palmeri, T. J. (2012). Gated
response time models: Rethinking the scaling problem. stochastic accumulator model of visual search decisions in FEF.
Psychonomic Bulletin & Review, 16, 1129–1135. doi:10.3758/ Journal of Neuroscience, 32, 3433–3446.
PBR.16.6.1129 Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory
Duane, S., Kennedy, A. D., Pendleton, B. J., & Roweth, D. (1987). and data for two-choice decision tasks. Neural Computation, 20,
Hybrid Monte Carlo. Physics Letters B, 195, 216–222. 873–922. doi:10.1162/neco.2008.12-06-420
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., &
Ratcliff, R., Philiastides, M. G., & Sajda, P. (2009). Quality of evidence
Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton,
for perceptual decision making is indexed by trial-to-trial variability
FL: Chapman & Hall/CRC.
of the EEG. Proceedings of the National Academy of Sciences, 106,
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation
6539–6544.
using multiple sequences. Statistical Science, 7, 457–511.
Ratcliff, R., Thapar, A., & McKoon, G. (2010). Individual differences,
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain
aging, and IQ in two-choice tasks. Cognitive Psychology, 60, 127–
Monte Carlo in practice. Boca Raton, FL: Chapman & Hall/CRC.
157. doi:10.1016/j.cogpsych.2009.09.001
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov
chains and their applications. Biometrika, 57, 97–109. Robert, C., & Casella, G. (2004). Monte Carlo statistical methods. New
York, NY: Springer.
Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler:
Adaptively setting path lengths in Hamiltonian Monte Carlo. Singman, H., Brown, S., Gretton, M., Heathcote, A. Voss, A., Voss, J., &
Journal of Machine Learning Research, 15, 1351–1381. Terry, A. (2016). rtdists: Response time distributions (Version 0.4-
Kruschke, J. K. (2011). Doing Bayesian data analysis: A tutorial with R 9).
and BUGS. Burlington, MA: Academic Press. Stan Development Team. (2015). Stan: A C++ library for probability and
Kucukelbir, A., Tran, D., Rajesh, R., Gelman, A., & Blei, D. M. (submit- sampling (Version 2.8.0).
ted). Automatic differentiation variational inference. Retrieved May Thomas, A., O’Hara, B., Ligges, U., & Sturtz, S. (2006). Making BUGS
5, 2016, from http://arxiv.org/pdf/1603.00788v1.pdf Open. R News, 6(1), 12–17.
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: Turner, B. M., & Sederberg, P. B. (2012). Approximate Bayesian com-
A practical course. New York, NY: Cambridge University Press. putation with differential evolution. Journal of Mathematical
Lewandowsky, S., & Farrell, S. (2011). Computational modeling in cog- Psychology, 56, 375–385.
nition: Principles and practice. Thousand Oaks, CA: Sage. Turner, B. M., Sederberg, P. B., Brown, S. D., & Steyvers, M. (2013). A
Link, W. A., & Eaton, M. J. (2011). On thinning of chains in MCMC. method for efficiently sampling from distributions with correlated
Methods in Ecology and Evolution, 3, 112–115. dimensions. Psychological Methods, 18, 368–384. doi:10.1037/
Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). a0032222
WinBUGS—A Bayesian modelling framework: Concepts, struc- Turner, B. M., Van Maanen, L., & Forstmann, B. U. (2015). Combining
ture, and extensibility. Statistics and Computing, 10, 325–337. cognitive abstractions with neurophysiology: The neural drift diffu-
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & sion model. Psychological Review, 122, 312–336.
Teller, E. (1953). Equations of state calculations by fast computing van Maanen, L., Brown, S. D., Eichele, T., Wagenmakers, E.-J., Ho, T.,
machines. Journal of Chemical Physics, 21, 1087–1092. Serences, J. T., & Forstmann, B. U. (2011). Neural correlates of trial-
Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. to-trial fluctuations in response caution. Journal of Neuroscience,
Gelman, G. L. Jones, & X. L. Meng (Eds.), Handbook of Markov chain 31, 17488–17495. doi:10.1523/JNEUROSCI.2924-11.2011
Monte Carlo (pp. 113–162). Boca Raton, FL: Chapman & Hall/CRC. Wabersich, D., & Vandekerckhove, J. (2014). Extending JAGS: A tutorial
Palmeri, T. J., Schall, J. D., & Logan, G. D. (2015). Neurocognitive on adding custom distributions to JAGS (with a diffusion model
modelling of perceptual decisions. In J. R. Busemeyer, Z. Wang, J. example). Behavior Research Methods, 46, 15–28.

Monaco TPS Strategies Monaco Tips and Tricks
100% (2)
Monaco TPS Strategies Monaco Tips and Tricks
86 pages
Introduction C
100% (1)
Introduction C
28 pages
Rajasthan Basin
No ratings yet
Rajasthan Basin
239 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Sports Acoustics
No ratings yet
Sports Acoustics
43 pages
Computational Modeling in Cognition
No ratings yet
Computational Modeling in Cognition
377 pages
Book List For Iit Jee
100% (2)
Book List For Iit Jee
13 pages
Examples: 238 17 Psychrometrics
No ratings yet
Examples: 238 17 Psychrometrics
12 pages
Plunger Lift Brochure
No ratings yet
Plunger Lift Brochure
4 pages
Cognitive Models of Decision Making With Identifiable P - 2025 - Journal of Math
No ratings yet
Cognitive Models of Decision Making With Identifiable P - 2025 - Journal of Math
19 pages
Statistics - Sem 4 Ba Programme
No ratings yet
Statistics - Sem 4 Ba Programme
53 pages
Astm A278 A278m
No ratings yet
Astm A278 A278m
4 pages
Bayesian Cognitive Modeling A Practical Course, 1st Edition Entire Book Download
100% (20)
Bayesian Cognitive Modeling A Practical Course, 1st Edition Entire Book Download
15 pages
LDPC Codes
No ratings yet
LDPC Codes
3 pages
GIS-Based Erosion Risk Mapping
No ratings yet
GIS-Based Erosion Risk Mapping
17 pages
Exploring The Latent Structure of Behavior Using The Human Connectome Project's Data
No ratings yet
Exploring The Latent Structure of Behavior Using The Human Connectome Project's Data
13 pages
CCS - Lec 2
No ratings yet
CCS - Lec 2
24 pages
Statistical Laboratory Department of Education The Ilniversety 0f Chicago
No ratings yet
Statistical Laboratory Department of Education The Ilniversety 0f Chicago
55 pages
Electromagnetism Research Paper
No ratings yet
Electromagnetism Research Paper
3 pages
Kass 2015 Irvine
No ratings yet
Kass 2015 Irvine
111 pages
Vewlix VLX 1 Base
No ratings yet
Vewlix VLX 1 Base
6 pages
Pae Receiver Type t6r Maintenance Handbook
80% (5)
Pae Receiver Type t6r Maintenance Handbook
80 pages
Parts Manual: Mechanical Unit
No ratings yet
Parts Manual: Mechanical Unit
240 pages
Phy Pract Mock
No ratings yet
Phy Pract Mock
9 pages
1 Inference
No ratings yet
1 Inference
9 pages
Kit - 500 Coating Thickness Gauge
No ratings yet
Kit - 500 Coating Thickness Gauge
8 pages
rpp2024 Rev Statistics
No ratings yet
rpp2024 Rev Statistics
38 pages
A Gentle Introduction To The Dirichlet Process
No ratings yet
A Gentle Introduction To The Dirichlet Process
118 pages
Using Bayesian Hierarchical Parameter Estimation To Assess The Generalizability of Cognitive Models of Choice
No ratings yet
Using Bayesian Hierarchical Parameter Estimation To Assess The Generalizability of Cognitive Models of Choice
17 pages
Bayesian Regression Analysis With The Drift-Diffusion Model: Zekai Jin Yaakov Stern Seonjoo Lee
No ratings yet
Bayesian Regression Analysis With The Drift-Diffusion Model: Zekai Jin Yaakov Stern Seonjoo Lee
17 pages
Wilson - Collins - 2019 - Ten Simple Rules For The Computational Modeling of Behavioral Data
No ratings yet
Wilson - Collins - 2019 - Ten Simple Rules For The Computational Modeling of Behavioral Data
33 pages
4 - Estimation Handout
No ratings yet
4 - Estimation Handout
2 pages
Cognitive Assessment Models With Few Assumptions, and - Junker, B. W. Sijtsma, K.
No ratings yet
Cognitive Assessment Models With Few Assumptions, and - Junker, B. W. Sijtsma, K.
15 pages
Cognitive Computing
No ratings yet
Cognitive Computing
20 pages
16 - The Bayesian Mind
No ratings yet
16 - The Bayesian Mind
33 pages
Part 3 Chapter 13
No ratings yet
Part 3 Chapter 13
13 pages
SI Chapter-2
No ratings yet
SI Chapter-2
53 pages
ParkM GPactivelearning NC14
No ratings yet
ParkM GPactivelearning NC14
23 pages
Cole Et Al 2025 Practical Problems Estimating and Reporting Power When Hypotheses Are Embedded in Complex Statistical
No ratings yet
Cole Et Al 2025 Practical Problems Estimating and Reporting Power When Hypotheses Are Embedded in Complex Statistical
17 pages
Parametric and Non Parametric Test
No ratings yet
Parametric and Non Parametric Test
76 pages
Computational Modeling - What Are Some of The Drawbacks To Probabilistic Models of Cognition - Psychology & Neuroscience Stack Exchange
No ratings yet
Computational Modeling - What Are Some of The Drawbacks To Probabilistic Models of Cognition - Psychology & Neuroscience Stack Exchange
1 page
Barthelme EP2
No ratings yet
Barthelme EP2
58 pages
Mcneish2017 (1) Fit
No ratings yet
Mcneish2017 (1) Fit
11 pages
Neuroscience Data Analysis Approaches
No ratings yet
Neuroscience Data Analysis Approaches
12 pages
Rutgers Lib 38502 - PDF 1
No ratings yet
Rutgers Lib 38502 - PDF 1
232 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
7 pages
UNIT3 2marks
No ratings yet
UNIT3 2marks
7 pages
Statistics
No ratings yet
Statistics
7 pages
Chapter1 Introduction
No ratings yet
Chapter1 Introduction
38 pages
Introduction To Psychological Testing
100% (1)
Introduction To Psychological Testing
63 pages
Bayesian Inference For Psychology Part 3: Parameter Estimation in Nonstardard Models
No ratings yet
Bayesian Inference For Psychology Part 3: Parameter Estimation in Nonstardard Models
25 pages
Bayesian Cognitive Modeling A Practical Course - 1st Edition ISBN 1107018455, 9781107018457 Digital Download
100% (1)
Bayesian Cognitive Modeling A Practical Course - 1st Edition ISBN 1107018455, 9781107018457 Digital Download
14 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
33 pages
Behavioral Measures
No ratings yet
Behavioral Measures
7 pages
Oriental College of Technology: Ritika Makhija
No ratings yet
Oriental College of Technology: Ritika Makhija
23 pages
Ai Unit V
No ratings yet
Ai Unit V
18 pages
The Complexity of Measuring Reliability in Learning Tasks: An Illustration Using The Alternating Serial Reaction Time Task
No ratings yet
The Complexity of Measuring Reliability in Learning Tasks: An Illustration Using The Alternating Serial Reaction Time Task
17 pages
Software Regression
No ratings yet
Software Regression
2 pages
9 in Add-On To
No ratings yet
9 in Add-On To
2 pages
Igual-SeguÃ 2017 Chapter StatisticalInference
No ratings yet
Igual-SeguÃ 2017 Chapter StatisticalInference
15 pages
9 in Gain To
No ratings yet
9 in Gain To
2 pages
Samp Sol
No ratings yet
Samp Sol
14 pages
Business Research Methods: Problem Definition and The Research Proposal
No ratings yet
Business Research Methods: Problem Definition and The Research Proposal
37 pages
tmp3C81 TMP
No ratings yet
tmp3C81 TMP
24 pages
Unit 4 Notes Psych Stats
No ratings yet
Unit 4 Notes Psych Stats
6 pages
Physics 1.1
No ratings yet
Physics 1.1
3 pages
Optimal Predictions in Everyday Cognition: Research Article
No ratings yet
Optimal Predictions in Everyday Cognition: Research Article
7 pages
Brain Insights on Decision Biases
No ratings yet
Brain Insights on Decision Biases
8 pages
Bayesian Inference For Psychology, Part 4: Parameter Estimation and Bayes Factors
No ratings yet
Bayesian Inference For Psychology, Part 4: Parameter Estimation and Bayes Factors
12 pages
Unit 4th DAQ and Amplifier
No ratings yet
Unit 4th DAQ and Amplifier
88 pages
BN in Medicine
No ratings yet
BN in Medicine
19 pages
Project Report (Org) 4
No ratings yet
Project Report (Org) 4
49 pages
Adaptive Data Analysis Validity
No ratings yet
Adaptive Data Analysis Validity
29 pages
Statistics Course Review Notes
No ratings yet
Statistics Course Review Notes
20 pages
2014 Ma Jazayeri
No ratings yet
2014 Ma Jazayeri
19 pages
Physics of Fusion Power
No ratings yet
Physics of Fusion Power
22 pages
Gersh Man Blei 2012
No ratings yet
Gersh Man Blei 2012
12 pages
Tema 4 Synopsys Primer Ejemplo
No ratings yet
Tema 4 Synopsys Primer Ejemplo
21 pages
Sharma 2017
No ratings yet
Sharma 2017
6 pages
CC218 Lec1 DiscreteMath Logic of Compound Stat
No ratings yet
CC218 Lec1 DiscreteMath Logic of Compound Stat
7 pages
C B C N P D M Decision Theory, Reinforcement Learning, and The Brain
No ratings yet
C B C N P D M Decision Theory, Reinforcement Learning, and The Brain
25 pages
A Brief Introduction To Some Simple Stochastic Processes: Benjamin Lindner
No ratings yet
A Brief Introduction To Some Simple Stochastic Processes: Benjamin Lindner
28 pages
Carbon Black Surface Area Analysis
No ratings yet
Carbon Black Surface Area Analysis
39 pages
Volvo Penta Air Heater Guide
No ratings yet
Volvo Penta Air Heater Guide
2 pages
4in SB12MNRX2 25 4
No ratings yet
4in SB12MNRX2 25 4
1 page
A Bayesian Graphical Model For Adaptive Crowdsourcing and Aptitude Testing
No ratings yet
A Bayesian Graphical Model For Adaptive Crowdsourcing and Aptitude Testing
8 pages
Neuropsych Test Methodology Fixes
No ratings yet
Neuropsych Test Methodology Fixes
8 pages

Bayesian Modelling With Stan

Uploaded by

Bayesian Modelling With Stan

Uploaded by

Behav Res (2017) 49:863–886

Bayesian inference with Stan: A tutorial on adding

Published online: 10 June 2016

R code Stan can be interfaced from the command line, R,

Box 2 R code for running the exponential model in Stan and

Fig. 1 Autocorrelation function

Fig. 2 Samples from each chain

Fig. 3 A posterior predictive

Fig. 4 Actual parameter values

Box 3 Stan code of exponential model as a user-defined

where f and F are the probability density function (PDF) and

Box 4 Likelihood function of the LBA implemented in Stan

Box 5 Continuation of Stan code for the LBA model

Box 6 R code that runs the LBA model in Stan

Fig. 6 Autocorrelation function

Fig. 7 Samples of each Stan Metropolis-Hastings

Fig. 8 Actual parameter values,

Fig. 9 The lower left of the grid

τ i eNormalðuτ ; στ Þ ϵ ð0; ∞Þ ð18Þ This model is conceptually easy to implement in Stan, as

Box 7 Hierarchical LBA model implemented in Stan

Fig. 10 Group-level parameter

Fig. 11 Hierarchical model

Figure 11 shows the hierarchical model (solid line) and Discussion

Box A1 Probability density function of the LBA, snipped

22 real lba_cdf(real t, real b, real A, real v, real s){

Box A2 Cumulative distribution function of the LBA,

22 real lba_cdf(real t, real b, real A, real v, real s){

Box A3 LBA random number generator (RNG). The mod-

References T. Townsend, & A. Eidels (Eds.), Oxford handbook of computation-

You might also like