0% found this document useful (0 votes)

4 views15 pages

The Metropolis-Hastings Algorithm: C.P. Robert

Uploaded by

imgilbertmarthin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views15 pages

The Metropolis-Hastings Algorithm: C.P. Robert

Uploaded by

imgilbertmarthin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

The Metropolis–Hastings

algorithm
C.P. Robert1,2,3
1
Université Paris-Dauphine, 2 University of Warwick, and 3 CREST

Abstract. This article is a self-contained introduction to the Metropolis-

arXiv:1504.01896v3 [stat.CO] 27 Jan 2016

Hastings algorithm, this ubiquitous tool for producing dependent simula-

tions from an arbitrary distribution. The document illustrates the principles
of the methodology on simple examples with R codes and provides entries
to the recent extensions of the method.
Key words and phrases: Bayesian inference, Markov chains, MCMC meth-
ods, Metropolis–Hastings algorithm, intractable density, Gibbs sampler,
Langevin diffusion, Hamiltonian Monte Carlo.

1. INTRODUCTION
There are many reasons why computing an integral like
Z
I(h) = h(x)dπ(x),
X

where dπ is a probability measure, may prove intractable, from the shape of

the domain X to the dimension of X (and x), to the complexity of one of the
functions h or π. Standard numerical methods may be hindered by the same
reasons. Similar difficulties (may) occur when attempting to find the extrema
of π over the domain X . This is why the recourse to Monte Carlo methods
may prove unavoidable: exploiting the probabilistic nature of π and its weighting
of the domain X is often the most natural and most efficient way to produce
approximations to integrals connected with π and to determine the regions of
the domain X that are more heavily weighted by π. The Monte Carlo approach
(Hammersley and Handscomb, 1964; Rubinstein, 1981) emerged with computers,
at the end of WWII, as it relies on the ability of producing a large number of
realisations of a random variable distributed according to a given distribution,
taking advantage of the stabilisation of the empirical average predicted by the Law
of Large Numbers. However, producing simulations from a specific distribution
may prove near impossible or quite costly and therefore the (standard) Monte
Carlo may also face intractable situations.
An indirect approach to the simulation of complex distributions and in par-
ticular to the curse of dimensionality met by regular Monte Carlo methods is to
use a Markov chain associated with this target distribution, using Markov chain
theory to validate the convergence of the chain to the distribution of interest and
the stabilisation of empirical averages (Meyn and Tweedie, 1994). It is thus little
surprise that Markov chain Monte Carlo (MCMC) methods have been used for
almost as long as the original Monte Carlo techniques, even though their impact
1
2

on Statistics has not been truly felt until the very early 1990s. A comprehensive
entry about the history of MCMC methods can be found in Robert and Casella
(2010).
The paper1 is organised as follows: in Section 2, we define and justify the
Metropolis–Hastings algorithm, along historical notes about its origin. In Section
3, we provide details on the implementation and calibration of the algorithm.
A mixture example is processed in Section 4. Section 5 includes recent exten-
sions of the standard Metropolis–Hastings algorithm, while Section 6 concludes
about further directions for Markov chain Monte Carlo methods when faced with
complex models and huge datasets.

2. THE ALGORITHM
2.1 Motivations
Given a probability density π called
the target, defined on a state space
X , and computable up to a mul-
tiplying constant, π(x) ∝ π̃(x),
0.6
the Metropolis–Hastings algorithm,
named after Metropolis et al. (1953)
0.4
Density

and Hastings (1970), proposes a

generic way to construct a Markov
chain on X that is ergodic and sta-
0.2

tionary with respect to π—meaning

that, if X (t) ∼ π(x), then X (t+1) ∼
0.0

π(x)—and that therefore converges −3 −2 −1 0 1 2 3

in distribution to π. While there x

are other generic ways of delivering Fig 1. Fit of the histogram of a Metropolis–
Markov chains associated with an ar- Hastings sample to its target, for T = 104 it-
bitrary stationary distribution, see, erations, a scale α = 1, and a starting value
x(1) = 3.14.
e.g., Barker (1965), the Metropolis–
Hastings algorithm is the workhorse of MCMC methods, both for its simplicity
and its versatility, and hence the first solution to consider in intractable situa-
tions. The main motivation for using Markov chains is that they provide shortcuts
in cases where generic sampling requires too much effort from the experimenter.
Rather than aiming at the “big picture” immediately, as an accept-reject algo-
rithm would do (Robert and Casella, 2009), Markov chains construct a progres-
sive picture of the target distribution, proceeding by local exploration of the state
space X until all the regions of interest have been uncovered. An analogy for the
method is the case of a visitor to a museum forced by a general blackout to watch
a painting with a small torch. Due to the narrow beam of the torch, the person
cannot get a global view of the painting but can proceed along this painting until
all parts have been seen.2
Before describing the algorithm itself, let us stress the probabilistic foundations
of Markov chain Monte Carlo (MCMC) algorithms: the Markov chain returned
1
I am most grateful to Alexander Ly, Department of Psychological Methods, University of
Amsterdam, for pointing out mistakes in the R code of an earlier version of this paper.
2
Obviously, this is only an analogy in that a painting is more than the sum of its parts!
3

by the method, X (1) , X (2) , . . . , X (t) , . . . is such that X (t) is converging to π. This
means that the chain can be considered as a sample, albeit a dependent sam-
ple, and approximately distributed from π. Due to the Markovian nature of the
simulation, the first values are highly dependent on the starting value X (1) and
usually removed from the sample as burn-in or warm-up. While there are very few
settings where the time when the chain reaches stationarity can be determined,
see, e.g., Hobert and Robert (2004), there is no need to look for such an instant
since the empirical average
T
1X
(1) ÎT (h) = h(X (t) )
T
t=1

converges almost surely to I(h), no matter what the starting value, if the Markov
chain is ergodic, i.e., forgets about its starting value. This implies that, in theory,
simulating a Markov chain is intrinsically equivalent to a standard i.i.d. simulation
from the target, the difference being in a loss of efficiency, i.e., in the necessity to
simulate more terms to achieve a given variance for the above Monte Carlo esti-
mator. The foundational principle for MCMC algorithms is thus straightforward,
even though the practical implementation of the method may prove delicate or
in cases impossible.
Without proceeding much further
into Markov chain theory, we stress
2.0

that the existence of a stationary dis-

tribution for a chain implies this chain
1.5

automatically enjoys a strong stability

called irreducibility. Namely, the chain
Density

can move all over the state space, i.e.,

1.0

can eventually reach any region of the

state space, no matter its initial value.
0.5

2.2 The algorithm

0.0

The Metropolis–Hastings algorithm 3.5 4.0 4.5

associated with a target density π re- x

quires the choice of a conditional den- Fig 2. Fit of the histogram of a Metropolis–
sity q also called proposal or candidate Hastings sample to its target, for T = 104 it-
kernel. The transition from the value erations, a scale α = 0.1, and a starting value
of the Markov chain (X (t) ) at time t x(1) = 3.14.
and its value at time t + 1 proceeds
via the following transition step:

Algorithm 1. Metropolis–Hastings
Given X (t) = x(t) ,
1. Generate Yt ∼ q(y|x(t) ).
2. Take (
(t+1) Yt with probability ρ(x(t) , Yt ),
X =
x(t) with probability 1 − ρ(x(t) , Yt ),
4

where
π̃(y) q(x|y)
ρ(x, y) = min ,1 .
π̃(x) q(y|x)

Then, as shown in Metropolis et al.

(1953), this transition preserves the
stationary density π if the chain is irre-

1.2
ducible, that is, if q has a wide enough

1.0
support to eventually reach any region
of the state space X with positive mass

0.8
Density
under π. A sufficient condition is that

0.6
q is positive everywhere. The very na-

0.4
ture of accept-reject step introduced
by those authors is therefore sufficient

0.2
to turn a simulation from an almost ar-

0.0
bitrary proposal density q into a gen- 0 1 2 3 4

eration that preserves π as the sta- x

tionary distribution. This sounds both Fig 3. Fit of the histogram of a Metropolis–
amazing and too good to be true! Hastings sample to its target, for T = 105 it-
But it is true, in the theoretical sense erations, a scale α = 0.2, and a starting value
x(1) = 3.14.
drafted above. In practice, the perfor-
mances of the algorithm are obviously highly dependent on the choice of the
transition q, since some choices see the chain unable to converge in a manageable
time.
2.3 An experiment with the algorithm
To capture the mechanism behind the algorithm, let us consider an elementary
example:

Example 1. Our target density is a perturbed version of the normal N (0, 1)

density, ϕ(·),
π̃(x) = sin2 (x) × sin2 (2x) × ϕ(x) .
And our proposal is a uniform U(x − α, x + α) kernel,
1
q(y|x) = I (y) .
2α (x−α,x+α)
Implementing this algorithm is straightforward: two functions to define are the
target and the transition

target=function(x){
sin(x)^2*sin(2*x)^2*dnorm(x)}

metropolis=function(x,alpha=1){
y=runif(1,x-alpha,x+alpha)
if (runif(1)>target(y)/target(x)) y=x
return(y)}
5

and all we need is a starting value

T=10^4
x=rep(3.14,T)
for (t in 2:T) x[t]=metropolis(x[t-1])

which results in the histogram of Figure 1, where the target density is properly
normalised by a numerical integration. If we look at the sequence (x(t) ) returned
by the algorithm, it changes values around 5000 times. This means that one
proposal out of two is rejected. If we now change the scale of the uniform to
α = 0.1, the chain (x(t) ) takes more than 9000 different values, however the
histogram in Figure 2 shows a poor fit to the target in that only one mode is
properly explored. The proposal lacks the power to move the chain far enough to
reach the other parts of the support of π(·). A similar behaviour occurs when we
start at 0. A last illustration of the possible drawbacks of using this algorithm is
shown on Figure 3: when using the scale α = 0.2 the chain is slow in exploring the
support, hence does not reproduce the correct shape of π after T = 105 iterations.
J

From this example, we learned that some choices of proposal kernels work well
to recover the shape of the target density, while others are poorer, and may even
fail altogether to converge. Details about the implementation of the algorithm
and the calibration of the proposal q are detailed in Section 3.
2.4 Historical interlude
The initial geographical localisation of the MCMC algorithms is the nuclear
research laboratory in Los Alamos, New Mexico, which work on the hydrogen
bomb eventually led to the derivation Metropolis algorithm in the early 1950s.
What can be reasonably seen as the first MCMC algorithm is indeed the Metropo-
lis algorithm, published by Metropolis et al. (1953). Those algorithms are thus
contemporary with the standard Monte Carlo method, developped by Ulam and
von Neumann in the late 1940s. (Nicolas Metropolis is also credited with sug-
gesting the name “Monte Carlo“, see Eckhardt, 1987, and published the very
first Monte Carlo paper, see Metropolis and Ulam, 1949.) This Metropolis algo-
rithm, while used in physics, was only generalized by Hastings (1970) and Peskun
(1973, 1981) towards statistical applications, as a method apt to overcome the
curse of dimensionality penalising regular Monte Carlo methods. Even those later
generalisations and the work of Hammersley, Clifford, and Besag in the 1970’s
did not truly impact the statistical community until Geman and Geman (1984)
experimented with the Gibbs sampler for image processing, Tanner and Wong
(1987) created a form of Gibbs sampler for latent variable models and Gelfand
and Smith (1990) extracted the quintessential aspects of Gibbs sampler to turn
it into a universal principle and rekindle the appeal of the Metropolis–Hastings
algorithm for Bayesian computation and beyond.

3. IMPLEMENTATION DETAILS
When working with a Metropolis–Hastings algorithm, the generic nature of
Algorithm 1 is as much an hindrance as a blessing in that the principle remains
6

valid for almost every choice of the proposal q. It thus does not give indications
about the calibration of this proposal. For instance, in Example 1, the method
is valid for all choices of α but the comparison of the histogram of the outcome
with the true density shows that α has a practical impact on the convergence
of the algorithm and hence on the number of iterations it requires. Figure 4
illustrates this divergence in performances via the autocorrelation graphs of three
chains produced by the R code in Example 1 for three highly different values of
α = 0.3, 3, 30. It shows why α = 3 should be prefered to the other two values
in that each value of the Markov chain contains “more” information in that
case. The fundamental difficulty when using the Metropolis–Hastings algorithm
is in uncovering which calibration is appropriate without engaging into much
experimentation or, in other words, in an as automated manner as possible.
A (the?) generic version of the 0.3 3 30

Metropolis–Hastings algorithm is the

1.0

1.0
0.8
0.8

0.8
random walk Metropolis–Hastings al-

0.6
0.6

0.6
ACF

ACF

ACF
0.4
0.4

0.4
gorithm, which exploits as little as pos-

0.2
0.2

0.2
sible knowledge about the target dis-

0.0
0.0

0.0
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40

tribution, proceeding instead in a lo- Lag Lag Lag

cal if often myopic manner. To achieve

this, the proposal distribution q aims Fig 4. Comparison of the acf of three Markov
chains corresponding to scales α = 0.3, 3, 30, for
at a local exploration of the neigh- T = 104 iterations, and a starting value x(1) =
borhood of the current value of the 3.14.
Markov chain, i.e., simulating the pro-
posed value Yt as
Yt = X (t) + εt ,
where εt is a random perturbation with distribution g, for instance a uniform
distribution as in Example 1 or a normal distribution. If we call random walk
Metropolis–Hastings algorithms all the cases when g is symmetric, the acceptance
probability in Algorithm 1 gets simplified into

π̃(y)
ρ(x, y) = min ,1 .
π̃(x)
While this probability is independent of the scale of the proposal g, we just saw
that the performances of the algorithm are quite dependent on such quantities. In
order to achieve an higher degree of efficiency, i.e., towards a decrease of the Monte
Carlo variance, Roberts et al. (1997) studied a formal Gaussian setting aiming at
the ideal acceptance rate. Indeed, Example 1 showed that acceptance rates that
are either “too high” or “too low” slow down the convergence of the Markov chain.
They then showed that the ideal variance in the proposal is twice the variance of
the target or, equivalently, that the acceptance rate should be close to 1/4. While
this rule is only an indication (in the sense that it was primarily designed for a
specific and asymptotic Gaussian environment), it provides a golden rule for the
default calibration of random walk Metropolis–Hastings algorithms.
3.1 Effective sample size
We now consider the alternative of the effective sample size for comparing
and calibrating MCMC algorithms. Even for a stationary Markov chain, using T
iterations does not amount to simulating an iid sample from π that would lead to
7

the same variability. Indeed, the empirical average (1) cannot be associated with
the standard variance estimator
T
1 X 2
σ̂T2 = h(X (t) ) − ÎT (h)T
T −1
t=1

due to the correlations amongst the X (t) ’s. In this setting, the effective sample
size is defined as the correction factor τT such that σ̂T2 /τT is the variance of the
empirical average (1)). This quantity can be computed as in Geweke (1992) and
Heidelberger and Welch (1983) by

τT = T /κ(h) ,

where κ(h) is the autocorrelation associated with the sequence h(X (t) ),
∞
X
κ(h) = 1 + 2 corr h(X (0) ), h(X (t) ) ,
t=1

estimated by spectrum0 and effectiveSize from the R library coda, via the
spectral density at zero. A rough alternative is to rely on subsampling, as in
Robert and Casella (2009), so that X (t+G) is approximately independent from
X (t) . The lag G is possibly determined in R via the autocorrelation function
autocorr.

Example 2. (Example 1 continued) We can compare the Markov chains

obtained with α = 0.3, 3, 30 against those two criteria:

> autocor(mcmc(x))
[,1] [,2] [,3]
Lag 0 1.0000000 1.0000000 1.0000000
Lag 1 0.9672805 0.9661440 0.9661440
Lag 5 0.8809364 0.2383277 0.8396924
Lag 10 0.8292220 0.0707092 0.7010028
Lag 50 0.7037832 -0.033926 0.1223127
> effectiveSize(x)
[,1] [,2] [,3]
33.45704 1465.66551 172.17784

This shows how much comparative improvement is brought by the value α = 3,

but also that even this quasi-optimal case is far from an i.i.d. setting. J

3.2 In practice
In practice, the above tools of ideal acceptance rate and of higher effective
sample size give goals for calibrating Metropolis–Hastings algorithms. This means
comparing a range of values of the parameters involved in the proposal and se-
lecting the value that achieves the highest target for the adopted goal. For a
multidimensional parameter, global maximisation run afoul of the curse of di-
mensionality as exploring a grid of possible values quickly becomes impossible.
8

The solution to this difficulty stands in running partial optimisations, with sim-
ulated (hence controlled) data, for instance setting all parameters but one fixed
to the values used for the simulated data. If this is not possible, optimisation of
the proposal parameters can be embedded in a Metropolis-within-Gibbs3 since
for each step several values of the corresponding parameter can be compared via
the Metropolis–Hastings acceptance probability.
We refer the reader to Chapter 8
of Robert and Casella (2009) for more
detailed descriptions of the calibra-
tion of MCMC algorithms, including
the use of adaptive mecchanisms. In-
deed, calibration is normaly operated
in a warm-up stage since, otherwise,
if one continuously tune an MCMC al-
gorithm according to its past outcome,
the algorithm stops being Markovian.
In order to preserve convergence in an
adaptive MCMC algorithm, the solu-
tion found in the literature for this
difficulty is to progressively tone/tune Fig 5. Output of a two-dimensional ran-
down the adaptive aspect. For in- dom walk Metropolis–Hastings algorithm for 123
stance, Roberts and Rosenthal (2009) observations from a Poisson distribution with
propose a diminishing adaptation con- mean 1, under the assumed model of a mixture
between Poisson and Geometric distributions.
dition that states that the distance be-
tween two consecutive Markov kernels must uniformly decrease to zero. For in-
stance, a random walk proposal that relies on the empirical variance of the past
sample as suggested in Haario et al. (1999) does satisfy this condition. An alter-
native proposed by Roberts and Rosenthal (2009) proceeds by tuning the scale
of a random walk for each component against the acceptance rate, which is the
solution implemented in the amcmc package developed by Rosenthal (2007).

4. ILLUSTRATION
Kamary et al. (2014) consider the special case of a mixture of a Poisson and
of a Geometric distributions with the same mean parameter λ:

αP(λ) + (1 − α)Geo(1/1+λ) ,

where λ > 0 and 0 ≤ α ≤ 1. Given n observations (x1 , . . . , xn ) and a prior

decomposed into π(λ) ∝ 1/λ and π(α) ∝ [α(1 − α)]a0 −1 , a0 = 0.5 being the
default value, the likelihood function is available in closed form as
n
λxi
Y
xi −xi −1
α exp{−λ} + (1 − α)λ (1 + λ)
xi !
i=1

where sn = x1 + . . . + xn . In R code, this translates as

3
The Metropolis-within-Gibbs algorithm aims at simulating a multidimensional distribution
by successively simulating from some of the associated conditional distributions—this is the
Gibbs part—and by using one Metropolis–Hastings step instead of an exact simulation scheme
from this conditional, with the same validation as the original Gibbs sampler.
9

likelihood=function(x,lam,alp){
prod(alp*dpois(x,lam)+(1-alp)*dgeom(x,lam/(1+lam)))}
posterior=function(x,lam,alp){
sum(log(alp*dpois(x,lam)+(1-alp)*dgeom(x,1/(1+lam))))-
log(lam)+dbeta(alp,.5,.5,log=TRUE)}

If we want to build a Metropolis–Hastings algorithm that simulates from the

associated posterior, the proposal can proceed by either proposing a joint move on
(α, λ) or moving one parameter at a time in a Metropolis-within-Gibbs fashion.
In the first case, we can imagine a random walk in two dimensions,

α0 ∼ E(α, (1 − α)) , λ0 ∼ LN (log(λ), δ(1 + log(λ)2 )) , ,δ > 0

with an acceptance probability

π(α0 , λ0 |x)q(α, λ|α0 , λ0 )
∧ 1.
π(α, λ|x)q(α, λ|α0 , λ0 )
The Metropolis–Hastings R code would then be

metropolis=function(x,lam,alp,eps=1,del=1){
prop=c(exp(rnorm(1,log(lam),sqrt(del*(1+log(lam)^2)))),
rbeta(1,1+eps*alp,1+eps*(1-alp)))
rat=posterior(x,prop[1],prop[2])-posterior(x,lam,alp)+
dbeta(alp,1+eps*prop[2],1+eps*(1-prop[2]),log=TRUE)-
dbeta(prop[2],1+eps*alp,1+eps*(1-alp),log=TRUE)+
dnorm(log(lam),log(prop[1]),
sqrt(del*(1+log(prop[1])^2)),log=TRUE)-
dnorm(log(prop[1]),log(lam),
sqrt(del*(1+log(lam)^2)),log=TRUE)+
log(prop[1]/lam)
if (log(runif(1))>rat) prop=c(lam,alp)
return(prop)}

where the ratio prop[1]/lam in the acceptance probability is just the Jacobian
for the log-normal transform. Running the following R code

T=1e4
x=rpois(123,lambda=1)
para=matrix(c(mean(x),runif(1)),nrow=2,ncol=T)
like=rep(0,T)
for (t in 2:T){
para[,t]=metropolis(x,para[1,t-1],para[2,t-1],eps=.1,del=.1)
like[t]=posterior(x,para[1,t],para[2,t])}

then produced the histograms of Figure 5, after toying with the values of and
δ to achieve a large enough average acceptance probability, which is provided by
length(unique(para[1,]))/T The second version of the Metropolis–Hastings
10

algorithm we can test is to separately modify λ by a random walk proposal, test

whether or not it is acceptable, and repeat with α: the R code is then very similar
to the above one:

metropolis=function(x,lam,alp,eps=1,del=1){
prop=exp(rnorm(1,log(lam),sqrt(del*(1+log(lam)^2))))
rat=posterior(x,prop,alp)-posterior(x,lam,alp)+
dnorm(log(lam),log(prop[1]),
sqrt(del*(1+log(prop[1])^2)),log=TRUE)-
dnorm(log(prop[1]),log(lam),
sqrt(del*(1+log(lam)^2)),log=TRUE)+
log(prop/lam)
if (log(runif(1))>rat) prop=lam
qrop=rbeta(1,1+eps*alp,1+eps*(1-alp))
rat=posterior(x,prop,qrop)-posterior(x,prop,alp)+
dbeta(alp,1+eps*qrop,1+eps*(1-qrop),log=TRUE)-
dbeta(qrop,1+eps*alp,1+eps*(1-alp),log=TRUE)
if (log(runif(1))>rat) qrop=alp
return(c(prop,qrop))}

In this special case, both algo-

rithms thus return mostly equivalent
outcomes, with a slightly more dis-
persed output in the case of the
Metropolis-within-Gibbs version (Fig-
ure 7). In a more general perspective,
calibrating random walks in multiple
dimensions may prove unwieldly, es-
pecially with large dimensions, while
the Metropolis-within-Gibbs remains
manageable. One drawback of the later
is common to all Gibbs implementa-
tions, namely that it induces higher
correlations between the components, Fig 6. Output of a Metropolis-within-Gibbs ran-
which means a slow convergence of the dom walk Metropolis–Hastings algorithm for 123
chain (and in extreme cases no conver- observations from a Poisson distribution with
mean 1, under the assumed model of a mixture
gence at all).
between Poisson and Geometric distributions.

5. EXTENSIONS
5.1 Langevin algorithms
An extension of the random walk Metropolis–Hastings algorithm is based on
the Langevin diffusion solving

dXt = 1/2∇ log π(Xt )dt + dBt ,

where Bt is the standard Brownian motion and ∇f denotes the gradient of f ,

since this diffusion has π as its stationary and limiting distribution. The algorithm
11

is based on a discretised version of the above, namely

Yn+1 |Xn ∼ N x + h/2∇ log π(x), h /2 Id ,
1

for a discretisation step h, which is used as a proposed value for Xn+1 , and ac-
cepted with the standard Metropolis–Hastings probability (Roberts and Tweedie,
1995). This new proposal took the name of Metropolis adjusted Langevin al-
gorithms (hence MALA). While computing (twice) the gradient of π at each
iteration requires extra time, there is strong support for doing so, as MALA algo-
rithms do provide noticeable speed-ups in convergence for most problems. Note
that π(·) only needs to be known up to a multiplicative constant because of the
log transform.
5.2 Particle MCMC
Another extension of the Metropolis–
Hastings algorithm is the particle
MCMC (or pMCMC), developed by
Andrieu et al. (2011). While we can-
not provide an introduction to par-
ticle filters here, see, e.g., Del Moral
et al. (2006), we want to point out
the appeal of this approach in state
space models like hidden Markov mod-
els (HMM). This innovation is similar
to the pseudo-marginal algorithm ap-
proach of Beaumont (2003); Andrieu
and Roberts (2009), taking advantage
of the auxiliary variables exploited by Fig 7. Output of a Metropolis-within-Gibbs
particle filters. (blue) and of a two-dimensional (gold) ran-
In the case of an HMM, i.e., where a dom walk Metropolis–Hastings algorithm for 123
observations from a Poisson distribution with
latent Markov chain x0:T with density
mean 1, under the assumed model of a mixture
p0 (x0 |θ)p1 (x1 |x0 , θ) · · · pT (xT |xT −1 , θ) , between Poisson and Geometric distributions.

is associated with an observed sequence y1+T such that

T
Y
y1+T |X1:T , θ ∼ qi (yi |xi , θ) ,
i=1

pMCMC applies as follows. At every iteration t, a value θ0 of the parameter

θ ∼ h(θ|θ(t) ) is proposed, followed by a new value of the latent series x00:T gen-
erated from a particle filter approximation of p(x0:T |θ0 , y1:T ). As the particle fil-
ter produces in addition (Del Moral et al., 2006) an unbiased estimator of the
marginal posterior of y1:T , q̂(y1:T |θ0 ), this estimator can be directly included in
the Metropolis–Hastings ratio
q̂(y1:T |θ0 )π(θ0 )h(θ(t) |θ0 )
∧ 1.
q̂(y1:T |θ)π(θ(t) )h(θ0 |θ(t) )
The validation of this substitution follows from the general argument of Andrieu
and Roberts (2009) for pseudo-marginal techniques, even though additional ar-
guments are required to establish that all random variables used therein are
12

accounted for (see Andrieu et al., 2011 and Wilkinson, 2011). We however stress
that the general validation of those algorithm as converging to the joint poste-
rior does not proceed from pseudo-marginal arguments. An extension of pMCMC
called SMC2 that approximates the sequential filtering distribution is proposed
in Chopin et al. (2013).
5.3 Pseudo-marginals
As illustrated by the previous section, there are many settings where com-
puting the target density π(·) is impossible. Another example is made of doubly
intractable likelihoods (Murray et al., 2006a), when the likelihood function con-
tains a term that is intractable, for instance `(θ|x) ∝ g(x|θ) with an intractable
normalising constant Z
Z(θ) = g(x|θ) dx .
X
This phenomenon is quite common in graphical models, as for instance for the
Ising model (Murray et al., 2006b; Møller et al., 2006). Solutions based on aux-
iliary variables have been proposed (see, e.g., Murray et al., 2006a; Møller et al.,
2006), but they may prove difficult to calibrate.
In such settings, Andrieu and Roberts (2009) developped an approach based
on an idea of Beaumont (2003), designing a valid Metropolis–Hastings algorithm
that substitutes the intractable target π(·|x) with an unbiased estimator. A slight
change to the Metropolis–Hastings acceptance ratio ensures that the stationary
density of the corresponding Markov chain is still equal to the target π. Indeed,
provided π̂(θ|z) is an unbiased estimator of π(θ) when z ∼ q(·|θ), it is rather
straightforward to check that the acceptance ratio

π̂(θ∗ |z ∗ ) q(θ∗ , θ)q(z|θ)

π̂(θ|z) q(θ, θ∗ )q(z ∗ |θ∗ )

preserves stationarity with respect to an extended target (see Andrieu and Roberts
(2009)) for details) when z ∗ ∼ q(·|θ), and θ∗ |θ ∼ q(θ, θ∗ ). Andrieu and Vihola
(2015) propose an alternative validation via auxiliary weights used in the un-
biased estimation, assuming the unbiased estimator (or the weight) is generated
conditional on the proposed value in the original Markov chain. The performances
of pseudo-marginal solutions depend on the quality of the estimators π̂ and are
always poorer than when using the exact target π. In particular, improvements
can be found by using multiple samples of z to estimate π (Andrieu and Vihola,
2015).

6. CONCLUSION AND NEW DIRECTIONS

The Metropolis–Hastings algorithm is to be understood as a default or off-the-
shelf solution, meaning that (a) it rarely achieves optimal rates of convergence
(Mengersen and Tweedie, 1996) and may get into convergence difficulties if im-
properly calibrated but (b) it can be combined with other solutions as a baseline
solution, offering further local or more rarely global exploration to a taylored
algorithm. Provided reversibility is preserved, it is indeed valid to mix several
MCMC algorithms together, for instance picking one of the kernels at random or
following a cycle (Tierney, 1994; Robert and Casella, 2004). Unless a proposal is
13

relatively expensive to compute or to implement, it rarely hurts to add an extra

kernel into the MCMC machinery.
This is not to state that the Metropolis–Hastings algorithm is the ultimate
solution to all simulation and stochastic evaluation problems. For one thing, there
exist settings where the intractability of the target is such that no practical
MCMC solution is available. For another thing, there exist non reversible versions
of those algorithms, like Hamiltonian (or hybrid) Monte Carlo (HMC) (Duane
et al., 1987; Neal, 2013; Betancourt et al., 2014). This method starts by creating
a completly artificial variable p, inspired by the momentum in physics, and a
joint distribution on (q, p) which energy—minus the log-density—is defined by
the Hamiltonian
H(q, p) = − log π(q) + pT M −1 p/2 ,
where M is the so-called mass matrix. The second part in the target is called the
kinetic energy, still by analogy with mechanics. When the joint vector (q, p) is
driven by Hamilton’s equations
dq ∂H ∂H
= = = M −1 p
dt ∂p ∂p
dp ∂H ∂ log π
=− =
dt ∂q ∂q

this dynamics preserves the joint distribution with density exp −H(p, q). If we
could simulate exactly from this joint distribution of (q, p), a sample from π(q)
would be a by-product. In practice, the equation is only solved approximately and
hence requires a Metropolis–Hastings correction. Its practical implementation is
called the leapfrog approximation (Neal, 2013; Girolami and Calderhead, 2011) as
it relies on a small discretisation step , updating p and q via a modified Euler’s
method called the leapfrog that is reversible and preserves volume as well. This
discretised update can be repeated for an arbitrary number of steps.
The appeal of HMC against other MCMC solutions is that the value of the
Hamiltonian changes very little during the Metropolis step, while possibly pro-
ducing a very different value of q. Intuitively, moving along level sets in the
augmented space is almost energy-free, but if those move proceeds far enough,
the Markov chain on q can reach distant regions, thus avoid the typical local
nature of regular MCMC algorithms. This strengthe explains in part why a sta-
tistical software like STAN (Stan Development Team, 2014) is mostly based on
HMC moves.
As a last direction for new MCMC solutions, let us point out the require-
ments set by Big Data, i.e., in settings where the likelihood function cannot be
cheaply evaluated for the entire dataset. See, e.g., Scott et al. (2013); Wang and
Dunson (2013), for recent entries on different parallel ways of handling massive
datasets, and Brockwell (2006); Strid (2010); Banterle et al. (2015) for delayed
and prefetching MCMC techniques that avoid considering the entire likelihood
at once.

REFERENCES
Andrieu, C., Doucet, A., and Holenstein, R. (2011). “Particle Markov chain Monte Carlo (with
discussion).” J. Royal Statist. Society Series B , 72 (2): 269–342.
14

Andrieu, C. and Roberts, G. (2009). “The pseudo-marginal approach for efficient Monte Carlo
computations.” Ann. Statist., 37(2): 697–725.
Andrieu, C. and Vihola, M. (2015). “Convergence properties of pseudo-marginal Markov chain
Monte Carlo algorithms.” Ann. Applied Probab., 25(2): 1030–1077.
Banterle, M., Grazian, C., Lee, A., and Robert, C. P. (2015). “Accelerating Metropolis-Hastings
algorithms by Delayed Acceptance.” ArXiv e-prints.
Barker, A. (1965). “Monte Carlo calculations of the radial distribution functions for a proton
electron plasma.” Aust. J. Physics, 18: 119–133.
Beaumont, M. (2003). “Estimation of population growth or decline in genetically monitored
populations.” Genetics, 164: 1139–1160.
Betancourt, M. J., Byrne, S., Livingstone, S., and Girolami, M. (2014). “The Geometric Foun-
dations of Hamiltonian Monte Carlo.” ArXiv e-prints.
Brockwell, A. (2006). “Parallel Markov chain Monte Carlo Simulation by Pre-Fetching.” J.
Comput. Graphical Stat., 15(1): 246–261.
Chopin, N., Jacob, P. E., and Papaspiliopoulos, O. (2013). “SMC2: an efficient algorithm for
sequential analysis of state space models.” J. Royal Statist. Society Series B , 75(3): 397–426.
Del Moral, P., Doucet, A., and Jasra, A. (2006). “Sequential Monte Carlo samplers.” J. Royal
Statist. Society Series B , 68(3): 411–436.
Duane, S., Kennedy, A. D., Pendleton, B. J., , and Roweth, D. (1987). “Hybrid Monte Carlo.”
Phys. Lett. B , 195: 216–222.
Eckhardt, R. (1987). “Stan Ulam, John Von Neumann, and the Monte Carlo Method.” Los
Alamos Science, Special Issue, 131–141.
Gelfand, A. and Smith, A. (1990). “Sampling based approaches to calculating marginal densi-
ties.” J. American Statist. Assoc., 85: 398–409.
Geman, S. and Geman, D. (1984). “Stochastic relaxation, Gibbs distributions and the Bayesian
restoration of images.” IEEE Trans. Pattern Anal. Mach. Intell., 6: 721–741.
Geweke, J. (1992). “Evaluating the accuracy of sampling-based approaches to the calculation
of posterior moments (with discussion).” In Bernardo, J., Berger, J., Dawid, A., and Smith,
A. (eds.), Bayesian Statistics 4 , 169–193. Oxford: Oxford University Press.
Girolami, M. and Calderhead, B. (2011). “Riemann manifold Langevin and Hamiltonian Monte
Carlo methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology),
73: 123–214.
Haario, H., Saksman, E., and Tamminen, J. (1999). “Adaptive Proposal Distribution for Random
Walk Metropolis Algorithm.” Computational Statistics, 14(3): 375–395.
Hammersley, J. and Handscomb, D. (1964). Monte Carlo Methods. New York: John Wiley.
Hastings, W. (1970). “Monte Carlo sampling methods using Markov chains and their applica-
tion.” Biometrika, 57: 97–109.
Heidelberger, P. and Welch, P. (1983). “A spectral method for confidence interval generation
and run length control in simulations.” Comm. Assoc. Comput. Machinery, 24: 233–245.
Hobert, J. and Robert, C. (2004). “A mixture representation of π with applications in Markov
chain Monte Carlo and perfect sampling.” Ann. Applied Prob., 14: 1295–1305.
Kamary, K., Mengersen, K., Robert, C., and Rousseau, J. (2014). “Testing hypotheses as a
mixture estimation model.” arxiv:1214.2044 .
Mengersen, K. and Tweedie, R. (1996). “Rates of convergence of the Hastings and Metropolis
algorithms.” Ann. Statist., 24: 101–121.
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. (1953). “Equations
of state calculations by fast computing machines.” J. Chem. Phys., 21(6): 1087–1092.
Metropolis, N. and Ulam, S. (1949). “The Monte Carlo method.” J. American Statist. Assoc.,
44: 335–341.
Meyn, S. and Tweedie, R. (1994). “Computable bounds for convergence rates of Markov chains.”
Ann. Appl. Probab., 4: 981–1011.
Møller, J., Pettitt, A. N., Reeves, R., and Berthelsen, K. K. (2006). “An efficient Markov chain
Monte Carlo method for distributions with intractable normalising constants.” Biometrika,
93(2): 451–458.
Murray, I., Ghahramani, Z., , and MacKay, D. (2006a). “MCMC for doubly-intractable distri-
butions.” In Uncertainty in Artificial Intelligence. UAI-2006.
Murray, I., MacKay, D. J., Ghahramani, Z., and Skilling, J. (2006b). “Nested sampling for Potts
models.” In Weiss, Y., Schölkopf, B., and Platt, J. (eds.), Advances in Neural Information
Processing Systems 18 , 947–954. Cambridge, MA: MIT Press.
15

Neal, R. (2013). “MCMC using Hamiltonian dynamics.” In Brooks, S., Gelman, A., Jones,
G., and Meng, X.-L. (eds.), Handbook of Markov Chain Monte Carlo, 113–162. Chapman &
Hall/CRC Press.
Peskun, P. (1973). “Optimum Monte Carlo sampling using Markov chains.” Biometrika, 60:
607–612.
— (1981). “Guidelines for chosing the transition matrix in Monte Carlo methods using Markov
chains.” Journal of Computational Physics, 40: 327–344.
Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New York,
second edition.
— (2009). Introducing Monte Carlo Methods with R. Springer-Verlag, New York.
— (2010). “A history of Markov Chain Monte Carlo-Subjective recollections from incomplete
data.” In Brooks, S., Gelman, A., Meng, X., and Jones, G. (eds.), Handbook of Markov Chain
Monte Carlo, 49–66. Chapman and Hall, New York. ArXiv0808.2902.
Roberts, G., Gelman, A., and Gilks, W. (1997). “Weak convergence and optimal scaling of
random walk Metropolis algorithms.” Ann. Applied Prob., 7: 110–120.
Roberts, G. and Rosenthal, J. (2009). “Examples of Adaptive MCMC.” J. Comp. Graph. Stat.,
18: 349–367.
Roberts, G. and Tweedie, R. (1995). “Exponential convergence for Langevin diffusions and their
discrete approximations.” Technical report, Statistics Laboratory, Univ. of Cambridge.
Rosenthal, J. (2007). “AMCM: An R interface for adaptive MCMC.” Comput. Statist. Data
Analysis, 51: 5467–5470.
Rubinstein, R. (1981). Simulation and the Monte Carlo Method . New York: John Wiley.
Scott, S., Blocker, A., Bonassi, F., Chipman, H., George, E., and McCulloch, R. (2013). “Bayes
and big data: The consensus Monte Carlo algorithm.” EFaBBayes 250 conference, 16.
Stan Development Team (2014). “STAN: A C++ Library for Probability and Sampling, Version
2.5.0, http://mc-stan.org/.”
Strid, I. (2010). “Efficient parallelisation of Metropolis–Hastings algorithms using a prefetching
approach.” Computational Statistics & Data Analysis, 54(11): 2814–2835.
Tanner, M. and Wong, W. (1987). “The calculation of posterior distributions by data augmen-
tation.” J. American Statist. Assoc., 82: 528–550.
Tierney, L. (1994). “Markov chains for exploring posterior distributions (with discussion).” Ann.
Statist., 22: 1701–1786.
Wang, X. and Dunson, D. (2013). “Parallizing MCMC via Weierstrass Sampler.” arXiv preprint
arXiv:1312.4605 .
Wilkinson, D. (2011). “The particle marginal Metropolis–Hastings (PMMH) particle MCMC
algorithm.” https://darrenjw.wordpress.com/2011/05/17/the-particle-marginal-metropolis-
hastings-pmmh-particle-mcmc-algorithm/. Darren Wilkinson’s research blog.

Revision Sheet of GCSE Trig Transformations
No ratings yet
Revision Sheet of GCSE Trig Transformations
3 pages
Cra I U Rosenthal Ann Rev
No ratings yet
Cra I U Rosenthal Ann Rev
40 pages
General State Space Markov Chains and MCMC Algorithms - Gareth O. Roberts, Jeffrey S. Rosenthal
No ratings yet
General State Space Markov Chains and MCMC Algorithms - Gareth O. Roberts, Jeffrey S. Rosenthal
64 pages
Bayesian Inference
No ratings yet
Bayesian Inference
28 pages
Chib Greenberg 1995
No ratings yet
Chib Greenberg 1995
12 pages
Markov Chain Monte Carlo and Gibbs Sampling
No ratings yet
Markov Chain Monte Carlo and Gibbs Sampling
24 pages
Chib UnderstandingMetropolisHastingsAlgorithm 1995
No ratings yet
Chib UnderstandingMetropolisHastingsAlgorithm 1995
10 pages
An Introduction To MCMC For Machine Learning
No ratings yet
An Introduction To MCMC For Machine Learning
39 pages
An Introduction To MCMC For Machine Learning: Abstract
No ratings yet
An Introduction To MCMC For Machine Learning: Abstract
39 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
13 pages
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
No ratings yet
CSE291D Lecture 6: Monte Carlo Methods 2: Markov Chain Monte Carlo
66 pages
Annurev Statistics 022513 115540
No ratings yet
Annurev Statistics 022513 115540
26 pages
Hitchcock HistoryMetropolisHastingsAlgorithm 2003
No ratings yet
Hitchcock HistoryMetropolisHastingsAlgorithm 2003
5 pages
PDF Sampling: Markov Chain Monte Carlo: X N I I
No ratings yet
PDF Sampling: Markov Chain Monte Carlo: X N I I
13 pages
Bayesian Modelling Tuts-12-15
No ratings yet
Bayesian Modelling Tuts-12-15
4 pages
MCMC Final Edition
No ratings yet
MCMC Final Edition
17 pages
WIREs Computational Stats - 2018 - Robert - Accelerating MCMC Algorithms
No ratings yet
WIREs Computational Stats - 2018 - Robert - Accelerating MCMC Algorithms
14 pages
Hidden Markov
No ratings yet
Hidden Markov
42 pages
Mcmc-A Comparative Study
No ratings yet
Mcmc-A Comparative Study
29 pages
03 Markov Chain Monte Carlo
No ratings yet
03 Markov Chain Monte Carlo
4 pages
Lectures 6
No ratings yet
Lectures 6
17 pages
Markov Chain Monte Carlo Explained
100% (1)
Markov Chain Monte Carlo Explained
31 pages
MCMC Notes
No ratings yet
MCMC Notes
77 pages
Annurev Statistics 031219 041300
No ratings yet
Annurev Statistics 031219 041300
26 pages
MCMC
No ratings yet
MCMC
46 pages
Monte Carlo Simulation Technique
No ratings yet
Monte Carlo Simulation Technique
48 pages
Bayesian - Lec - 4
No ratings yet
Bayesian - Lec - 4
25 pages
CPSC 440: Advanced Machine Learning: Markov Chain Monte Carlo
No ratings yet
CPSC 440: Advanced Machine Learning: Markov Chain Monte Carlo
27 pages
MCMC Tips for Statisticians
No ratings yet
MCMC Tips for Statisticians
8 pages
5d MCMC
No ratings yet
5d MCMC
9 pages
Adaptive MCMC For Everyone
No ratings yet
Adaptive MCMC For Everyone
13 pages
This Content Downloaded From 47.39.198.204 On Wed, 06 Oct 2021 13:46:12 UTC
No ratings yet
This Content Downloaded From 47.39.198.204 On Wed, 06 Oct 2021 13:46:12 UTC
18 pages
MCMC
No ratings yet
MCMC
70 pages
Markov Chain Monte Carlo Guide
No ratings yet
Markov Chain Monte Carlo Guide
6 pages
Metropolis-Hastings Algorithm - Wikipedia
No ratings yet
Metropolis-Hastings Algorithm - Wikipedia
10 pages
Lecture 3
No ratings yet
Lecture 3
21 pages
Markov Chain Monte Carlo Sampling Using A Reservoir Method
No ratings yet
Markov Chain Monte Carlo Sampling Using A Reservoir Method
11 pages
Markov Chain Monte Carlo:: A Workhorse For Modern Scientific Computation
No ratings yet
Markov Chain Monte Carlo:: A Workhorse For Modern Scientific Computation
22 pages
Markov Chain Monte Carlo in R
No ratings yet
Markov Chain Monte Carlo in R
14 pages
Markov Chain Monte Carlo: A Method of Simulation
No ratings yet
Markov Chain Monte Carlo: A Method of Simulation
10 pages
MCMC Convergence Diagnostics Review
No ratings yet
MCMC Convergence Diagnostics Review
52 pages
1 s2.0 S0304414912002621 Main
No ratings yet
1 s2.0 S0304414912002621 Main
39 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
29 pages
Metropolis Hastings Explained
No ratings yet
Metropolis Hastings Explained
2 pages
Monte Carlo
No ratings yet
Monte Carlo
59 pages
ML - Unit-V-1
No ratings yet
ML - Unit-V-1
42 pages
CPSC 540: Machine Learning: Monte Carlo Methods
No ratings yet
CPSC 540: Machine Learning: Monte Carlo Methods
32 pages
AnthonyTrubiano WrittenAndOral
No ratings yet
AnthonyTrubiano WrittenAndOral
8 pages
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
No ratings yet
On The Markov Chain Monte Carlo (MCMC) Method: Rajeeva L Karandikar
24 pages
Bayesian Nonlinear Regression Guide
No ratings yet
Bayesian Nonlinear Regression Guide
33 pages
MCMC
No ratings yet
MCMC
7 pages
Markov Chain Monte Carlo
No ratings yet
Markov Chain Monte Carlo
51 pages
3-MS2 (MCMC)
No ratings yet
3-MS2 (MCMC)
32 pages
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics
No ratings yet
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics
24 pages
Computation
No ratings yet
Computation
11 pages
MCMC
No ratings yet
MCMC
71 pages
Scalable Monte Carlo For Bayesian Learning: Paul Fearnhead, Christopher Nemeth, Chris J. Oates and Chris Sherlock
No ratings yet
Scalable Monte Carlo For Bayesian Learning: Paul Fearnhead, Christopher Nemeth, Chris J. Oates and Chris Sherlock
244 pages
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
No ratings yet
Walter R. Gilks, Sylvia Richardson (Auth.), Walter R. Gilks, Sylvia Richardson, David J. Spiegelhalter (Eds.) - Markov Chain Monte Carlo in Practice-Springer US (1996)
487 pages
Discrete-Time Sequences in MATLAB
0% (1)
Discrete-Time Sequences in MATLAB
4 pages
Control System Question Bank
No ratings yet
Control System Question Bank
3 pages
10 Linear PDE With CC L1
No ratings yet
10 Linear PDE With CC L1
11 pages
Continuation of Random Variable
No ratings yet
Continuation of Random Variable
11 pages
Igcse p2 Hot Topics June 2025 Set C
No ratings yet
Igcse p2 Hot Topics June 2025 Set C
22 pages
Answer Key - CK-12 Chapter 07 Middle School Math Concepts - Grade 7 (Revised PDF
No ratings yet
Answer Key - CK-12 Chapter 07 Middle School Math Concepts - Grade 7 (Revised PDF
14 pages
Optimization With R
100% (1)
Optimization With R
54 pages
Lathi 3ed-719
No ratings yet
Lathi 3ed-719
1 page
Relation and Function
No ratings yet
Relation and Function
6 pages
Critical Phenomenon
100% (2)
Critical Phenomenon
701 pages
Introduction To Tensors: Contravariant and Covariant Vectors
No ratings yet
Introduction To Tensors: Contravariant and Covariant Vectors
18 pages
Rational Inequalities Sample Problems: Lecture Notes
100% (1)
Rational Inequalities Sample Problems: Lecture Notes
6 pages
Lecture On Helix PDF
No ratings yet
Lecture On Helix PDF
5 pages
Fundamentals of Physics Sixth Edition: Halliday Resnick Walker
No ratings yet
Fundamentals of Physics Sixth Edition: Halliday Resnick Walker
5 pages
TD 6 Correction
No ratings yet
TD 6 Correction
9 pages
Calc1 Chapter 1
No ratings yet
Calc1 Chapter 1
104 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
M53 Lec1.4 Continuity On An Interval IVT and Squeeze Theorem PDF
No ratings yet
M53 Lec1.4 Continuity On An Interval IVT and Squeeze Theorem PDF
142 pages
Appendixc
No ratings yet
Appendixc
144 pages
Functions and Relations: Relation
No ratings yet
Functions and Relations: Relation
4 pages
Whittaker E.T. - On An Expression of The Electromagnetic Field Due To Electrons by Means of Two Scalar Potential Functions
No ratings yet
Whittaker E.T. - On An Expression of The Electromagnetic Field Due To Electrons by Means of Two Scalar Potential Functions
6 pages
Maths II Tutoral 3
No ratings yet
Maths II Tutoral 3
2 pages
VTAMPS 11.0 Senior Secondary Set 1
No ratings yet
VTAMPS 11.0 Senior Secondary Set 1
11 pages
Efficient Power Flow via Newton's Method
No ratings yet
Efficient Power Flow via Newton's Method
12 pages
Class-Ix Mathematics Assignment-2 - 2 Polynomials
0% (1)
Class-Ix Mathematics Assignment-2 - 2 Polynomials
2 pages
The Feynman Path Integral Approach To Atomic Interferometry. A Tutorial
No ratings yet
The Feynman Path Integral Approach To Atomic Interferometry. A Tutorial
30 pages
2015 Regional Algebra II Exam Final
No ratings yet
2015 Regional Algebra II Exam Final
14 pages
Rutgers: 332:322 Principles of Communications Systems Spring 2004
No ratings yet
Rutgers: 332:322 Principles of Communications Systems Spring 2004
8 pages
Grade 9 Math Worksheeet
No ratings yet
Grade 9 Math Worksheeet
4 pages

The Metropolis-Hastings Algorithm: C.P. Robert

Uploaded by

The Metropolis-Hastings Algorithm: C.P. Robert

Uploaded by

The Metropolis–Hastings

Abstract. This article is a self-contained introduction to the Metropolis-

Hastings algorithm, this ubiquitous tool for producing dependent simula-

where dπ is a probability measure, may prove intractable, from the shape of

and Hastings (1970), proposes a

tionary with respect to π—meaning

π(x)—and that therefore converges −3 −2 −1 0 1 2 3

in distribution to π. While there x

that the existence of a stationary dis-

automatically enjoys a strong stability

can move all over the state space, i.e.,

can eventually reach any region of the

2.2 The algorithm

The Metropolis–Hastings algorithm 3.5 4.0 4.5

associated with a target density π re- x

Then, as shown in Metropolis et al.

eration that preserves π as the sta- x

Example 1. Our target density is a perturbed version of the normal N (0, 1)

and all we need is a starting value

Metropolis–Hastings algorithm is the

tribution, proceeding instead in a lo- Lag Lag Lag

cal if often myopic manner. To achieve

Example 2. (Example 1 continued) We can compare the Markov chains

This shows how much comparative improvement is brought by the value α = 3,

where λ > 0 and 0 ≤ α ≤ 1. Given n observations (x1 , . . . , xn ) and a prior

where sn = x1 + . . . + xn . In R code, this translates as

If we want to build a Metropolis–Hastings algorithm that simulates from the

α0 ∼ E(α, (1 − α)) , λ0 ∼ LN (log(λ), δ(1 + log(λ)2 )) , ,δ > 0

with an acceptance probability

algorithm we can test is to separately modify λ by a random walk proposal, test

In this special case, both algo-

dXt = 1/2∇ log π(Xt )dt + dBt ,

where Bt is the standard Brownian motion and ∇f denotes the gradient of f ,

is based on a discretised version of the above, namely

is associated with an observed sequence y1+T such that

pMCMC applies as follows. At every iteration t, a value θ0 of the parameter

π̂(θ∗ |z ∗ ) q(θ∗ , θ)q(z|θ)

6. CONCLUSION AND NEW DIRECTIONS

relatively expensive to compute or to implement, it rarely hurts to add an extra

You might also like

α0 ∼ E(α, (1 − α)) , λ0 ∼ LN (log(λ), δ(1 + log(λ)2 )) , ,δ > 0