MCMC for DSGE Models – Metropolis-Hastings
Algorithm
Frank Schorfheide
University of Pennsylvania, CEPR, NBER
December, 2012
Posterior Inference
• We discussed how to solve a DSGE model;
• and how to compute the likelihood function p(Y |θ) for a DSGE
model.
• Bayesian inference requires us to specify a prior p(θ) (more on that
later) and
• according to Bayes Theorem
p(Y |θ)p(θ)
p(θ|Y ) = R
p(Y |θ)p(θ)dθ
• We want to generate draws from posterior...
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Random-Walk Metropolis (RWM) Algorithm for DSGE
Model
1 Use a numerical optimization routine to maximize the log posterior,
which up to a constant is given by ln p(Y |θ) + ln p(θ). Denote the
posterior mode by θ̃.
2 Let Σ̃ be the inverse of the (negative) Hessian computed at the
posterior mode θ̃, which can be computed numerically.
3 Draw θ(0) from N(θ̃, c02 Σ̃) or directly specify a starting value.
4 For s = 1, . . . , nsim :
• Draw ϑ from the proposal distribution N(θ (s−1) , c 2 Σ̃).
• Let
p(Y |ϑ)p(ϑ)
r (θ(s−1) , ϑ|Y ) = . 2
p(Y |θ(s−1) )p(θ(s−1) )
• Let
with probability min {1, r (θ(s−1) , ϑ|Y )}
ϑ
θ(s) =
θ(s−1) otherwise
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Prior-Posterior Draws
• Generated 100,000 draws, discarded first 10,000 draws (burn-in)
• For posterior every 500th draw is plotted.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
500 Posterior Draws
• First 500 draws after initial 10,000 burn-in draws.
• Posterior mean (Blue) and 90% credible set (Green) are based on
the 90,000 posterior draws.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
50,000 Posterior Draws
• First 50,000 draws after initial 10,000 burn-in draws.
• Posterior mean (Blue) and 90% credible set (Green) are based on
the 90,000 posterior draws.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Recursive Means Based on 500 Draws
• Based on first 500 draws after initial 10,000 burn-in draws.
• Posterior mean (Blue) and 90% credible set (Green) are based on
the 90,000 posterior draws.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Recursive Means Based on 50,000 Draws
• Based on first 50,000 draws after initial 10,000 burn-in draws.
• Posterior mean (Blue) and 90% credible set (Green) are based on
the 90,000 posterior draws.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Why Does This Algorithm Work (In Principle...)?
• Suppose parameter vector θ is scalar and takes only two values:
Θ = {0, 1}
• The posterior distribution p(θ|Y ) can be represented by a set of
probabilities collected in the vector π, say π = [1/4, 3/4].
• Goal: we want to generate a sequence of draws {θ (s) }ns=1
sim
from a
discrete Markov process such that P{θ(s) = 0} −→ 1/4.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Background
• Consider a 2-state Markov process with transition probabilities
p11 p12
P=
p21 p22
where pij is the probability of moving from state i to state j.
• Let π s = [π1s , π2s ] be a 1 × 2 vector of probabilities of being in state i
in iteration s.
• The corresponding probabilities for period s + 1 are π s+1 = π s P.
• Definition: π is equilibrium distribution if π = πP.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Discrete MH Algorithm: Idea
• In our RWM Algorithm we generated draws from a Normal
distribution and they magically turned into draws from some
complicated posterior distribution...
• Idea of Metropolis algorithm: provide a general way of constructing
a transition matrix P to generate a chain with (pre-specified)
equilibrium distribution π, which is the posterior of interest.
• In our discrete example, let’s use the following proposal distribution.
We use a Markov-chain with transition matrix Q:
λ (1 − λ)
Q= .
(1 − λ) λ
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Excursion: Equilibrium distribution of Q
• The equilibrium distribution has to satisfy
λ (1 − λ)
ω (1 − ω) = ω (1 − ω)
(1 − λ) λ
• Thus,
ω = ωλ + (1 − ω)(1 − λ)
• This leads to
1−λ 1
ω= = .
2(1 − λ) 2
• Thus, the equilibrium distribution associated with Q does NOT
equal the targeted posterior distribution!
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Back to the Algorithm
• Iteration s: suppose that θ (s−1) = θi . Based on transition matrix
λ (1 − λ)
Q= ,
(1 − λ) λ
determine a proposed state ϑ (which is either 0 or 1 in our
example).
• With probability αij proposed state is accepted. Set θ (s) = ϑ.
• With probability 1 − αij stay in old state and set θ (s) = θ (s−1) .
• Choose αij = min [1, πj /πi ].
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Discrete MH Algorithm: Implementation
• Why choose αij = min [1, πj /πi ], where π = [1/4, 3/4] corresponds
to the targeted posterior distribution?
• The resulting chain is reversible:
πi pij = πi min[1, πj /πi ]qij
= min[πi , πj ]qij
= min[πi , πj ]qji
= πj pji
• In turn, π = [1/4, 3/4] is an equilibrium distribution. Suppose at
iteration s π s = π. Then:
m
X m
X m
X
πjs+1 = (π s P)j = πis pij = πjs pji = πjs pji = πjs
i=1 i=1 i=1
• For λ ∈ [0, 1) the chain is also irreducible and the equilibrium
distribution is unique.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Discrete MH Algorithm
• The chain’s transition matrix is:
λ (1 − λ)
P= .
(1 − λ) 13 λ + (1 − λ) 32
You can verify that π = [1/4, 3/4] is indeed an equilibrium
distribution.
• The persistence of the chain is determined by the second largest
eigenvalue of P:
4
ev (λ) = λ − 1/3
3
• For λ = 1/4 the second eigenvalue is zero and the chain delivers iid
draws.
• A discrete Markov chain is irreducible if all states communicate with
each other, that is, the expected time of arriving from state i in
state j is finite.
• For λ = 1 the second eigenvalue is one, the chain is NOT irreducible
and the equilibrium distribution is not unique.
• These ideas can be generalized to the continuous case... (we’ll do so
tomorrow)
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
MCMC: What works and what doesn’t
• State-space representation:
φ1 0 1
yt = [ 1 1 ]st , st = st−1 + t .
φ3 φ2 0
• The state-space model can be re-written as ARMA(2,1) process
(1 − φ1 L)(1 − φ2 L)yt = (1 − (φ2 − φ3 )L)t .
• Relationship between state-space parameters φ and structural
parameters θ:
φ1 = θ12 , φ2 = (1 − θ12 ), φ3 − φ2 = −θ1 θ2 .
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Stylized Example
Model
Reduced form: (1 − φ1 L)(1 − φ2 L)yt = (1 − (φ2 − φ3 )L)t .
Relationship of φ and θ: φ1 = θ12 , φ2 = (1 − θ12 ), φ3 − φ2 = −θ1 θ2 .
• Local identification problem arises as θ1 −→ 0.
• Global identification problem p(Y |θ) = p(Y |θ̃):
θ12 = ρ, (1 − θ12 ) = θ1 θ2
versus
θ̃12 = 1 − ρ, θ̃12 = θ̃1 θ̃2
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Bayesian Inference
Global Identification Problem Local Identification Problem
Difficult to draw from Less problematic, because it is
multi-model posteriors fairly straightforward to
generate draws from prior
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Bayesian Inference
• Bayesian inference with proper priors does not require identifiability
as a regularity condition.
• If θ = [θ10 , θ20 ]0 and p(Y |θ) = p(Y |θ1 ), then
p(θ1 , θ2 |Y ) = p(θ1 |Y )p(θ2 |θ1 ).
• If you don’t like priors in identified models, you won’t like them in
partially/weakly identified models...
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Blocking
• In high-dimensional parameter spaces the RWM algorithm generates
highly persistent Markov chains.
• What’s bad about persistence?
n n
√
1X 1 XX
n(X̄ −E[X̄ ]) =⇒ N 0, V[Xi ]+ COV (Xi , Xj ) .
n n
i=1 i=1 j6=i
• Potential Remedy:
• Partition θ = [θ1 , . . . , θK ].
• Iterate over conditional posteriors p(θk |Y , θ<−k> ).
• To reduce persistence of the chain try to find partitions such that
parameters are strongly correlated within blocks and weakly
correlated across blocks.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm
Blocking
• Chib and Ramamurthy (2010, JoE):
• Use randomized partitions
• Use simulated annealing to find mode of p(θk |Y , θ<−k> ). Then
construct Hessian to obtain covariance matrix for proposal density.
• Herbst (2011, Penn Dissertation):
• Utilize analytical derivatives
• Use information in Hessian (evaluated at an earlier parameter draw)
to construct parameter blocks. For non-elliptical distribution
partitions change as sampler moves through parameter space.
• Use Gauss-Newton step to construct proposal densities
• Performance measure: CPU time to generate, say, 1,000
“independent” draws.
Frank Schorfheide MCMC for DSGE Models – Metropolis-Hastings Algorithm