Note6 Multivariate Time Series
Note6 Multivariate Time Series
In this section, we will extend our discussion to vector valued time series. We will be mostly
interested in vector autoregression (VAR), which is much easier to be estimated in applications.
We will fist introduce the properties and basic tools in analyzing stationary VAR process, and then
we’ll move on to estimation and inference of the VAR model.
where E(1t 2s ) = σ12 for t = s and zero for t 6= s. We could rewrite it as
x1t φ11 φ12 x1,t−1 0 0 x1,t−2 1t
= + + ,
x2t 0 φ21 x2,t−1 0 φ22 x2,t−2 2t
or just
xt = Φ1 xt−1 + Φ2 xt−2 + t (1)
and E(t ) = 0,E(t s ) = 0 for s 6= t and
σ12 σ12
E(t 0t ) = .
σ21 σ22
As you can see, in this example, the vector-valued random variable xt follows a VAR(2) process.
A general VAR(p) process with white noise can be written as
xt = Φ1 xt−1 + Φ2 xt−2 + . . . + t
Xp
= Φj xt−j + t
j=1
1
where
Φ(L) = Ik − Φ1 L − . . . − Φp Lp .
The error terms follow a vector white noise, i.e., E(t ) = 0,
Ω for t = s
E(t 0s ) =
0 otherwise
φ(L)xt = t ,
we have the results that the process {xt } is covariance-stationary as long as all the roots in (2)
1 − φ1 z − φ2 z 2 − . . . − φp z p = 0 (2)
lies out side of the unit circle. Similarly, for the VAR(p) process to be stationary, we must have
that the roots in the equation
|Ik − Φ1 z − . . . − Φp z p | = 0
all lies outside the unit circle.
(Ik − Φ1 L − Φ2 L2 − . . . − Φp Lp )(Ik + Ψ1 L + Ψ2 L2 + . . .) = Ik .
2
where ∼ N (0, σ 2 ). We could equivalently write it as
xt φ1 φ2 . . . φp−1 φp xt−1 t
xt−1 1 0 ... 0 0 xt−2
0
= .. +
.. .. .. .. .. .. ..
. . . . . . . .
xt−p+1 0 ... ... 1 0 xt−p 0
If we let ξ t = (xt , xt−1 , . . . , xt−p+1 )0 , ξ t−1 = (xt−1 , xt−2 , . . . , xt−p ), t = (t , 0, . . . , 0), and let F
denote the parameter matrix, then we can write the process as:
ξ t = F ξ t−1 + t
let
xt
xt−1
ξt = ,
..
.
xt−p+1
Φ1 Φ2 . . . Φp−1 Φp
Ik 0 ... 0 0
F =
0 Ik ... 0 0 ,
.. .. .. .. ..
. . . . .
0 0 ... Ik 0
t
0
vt = . ,
..
0
Then we could rewrite the VAR(p) process in state space notations,
ξ t = F ξ t−1 + vt . (3)
3
1.3 The autocovariance matrix
1.3.1 VAR process
For a covariance stationary k dimensional vector process {xt }, let E(xt ) = µ, then the autocovari-
ance is defined to be the following k by k matrix
For simplicity, assume that µ = 0. Then we have Γ(h) = E(xt x0t−h ). Because of the lead-lag effect,
we may not have Γ(h) = Γ(−h), but we have Γ(h)0 = Γ(−h). To show this,
taking transpose
Γ(h)0 = E(xt x0t+h ) = Γ(−h).
Similar as in the scalar case, we define the autocovariance generating function of the process x
as
∞
X
Gx (z) = Γ(h)z h
h=−∞
Σ = E(ξ t ξ 0t )
xt
xt−1
= E x0t x0t−1 . . . x0t−p+1
..
.
xt−p+1
Γ(0) Γ(1) . . . Γ(p − 1)
Γ(1)0 Γ(0) . . . Γ(p − 2)
= .
.. .. ..
. . ... .
Γ(p − 1)0 Γ(p − 2)0 . . . Γ(0)
E(ξ t ξ 0t ) = E[(F ξ t−1 + vt )(F ξ t−1 + vt )0 ] = F E(ξ t−1 ξ 0t−1 )F 0 + E(vt vt0 ),
or
Σ = F ΣF 0 + Q. (4)
To solve for Σ, we need to use the Kronecker product, and the following result: let A, B, C be
matrices whose dimensions are such that the product ABC exists. Then
vec(ABC) = (C 0 ⊗ A) · vec(B).
4
where vec is the operator to stack each column of a matrix (k × k) into a k 2 -dimensional vector,
for example,
a11
a11 a12 a21
A= a12 .
vec(A) =
a21 a22
a22
Apply vec operator on both sides of (4), we get
which gives
vec(Σ) = (Im − F ⊗ F )−1 vec(Q),
where m = k 2 p2 . We can use this equation to solve for the first p order of autocovariance of x,
Γ(0), . . . , Γ(p). To derive the hth autocovariance of ξ, denoted by Σ(h), we can postmultiplying
(3) by ξ 0t−h and take expectations,
then
Σ(h) = F Σ(h − 1), or Σ(h) = F h Σ.
Therefore we have the following relationship for Γ(h)
As in the scalar case, any vector MA(q) process is stationary. Next consider the MA(∞) process
5
A sequence of matrices {Ψs }∞
−∞ is absolutely summable if each of its element forms an absolutely
summable scalar sequence, i.e.
∞
(s)
X
|ψij | < ∞ for i, j = 1, 2, . . . n,
s=0
(s)
where ψij is the row i column j element (will use ijth for short) of Ψs . Some important results
about MA(∞) process is summarized as follows:
(a) The autocovaiance between the ith variable at time t and the jth variable s periods earlier,
E(xit xj,t−s ) exists and is given by the ijth element of
∞
X
Γ(s) = Ψs+v ΩΨ0v for s = 0, 1, 2, . . . ;
v=0
(b) {Γ(h)}∞
h=0 is absolutely summable.
If {t }∞
t=−∞ is i.i.d. with E|i1,t i2,t i3,t i4,t | < ∞ for i1, i2, i3, i4 = 1, 2, . . . , k then we also have
(c) E|xi1,t1 xi2,t2 xi3,t3 xi4,t4 | < ∞ for i1, i2, i3, i4 = 1, 2, . . . , k and all t1, t2, t3, t4.
(d) n−1 nt=1 xit xj,t−s →p E(xit xj,t−s ) for i, j = 1, 2, . . . , k and for all s.
P
All of these results can be viewed as extensions from the scalar case to vector case, and its proof
can be found on page 286-288 in Hamilton’s book.
E[(x̄n x̄0n )
1
= E[(x1 + . . . xn )(x1 + . . . xn )0 ]
n2
n
1 X
= E(xi x0j )
n2
i,j
6
∞
1 X
= Γ(h)
n2
h=−∞
n−1
!
1 X |h|
= 1− Γ(h)
n n
h=−n+1
Then
nE[(x̄n x̄0n )]
n−1
!
X |h|
= 1− Γ(h)
n
h=−n+1
1 2
= Γ(0) + 1 − (Γ(1) + Γ(−1)) + 1 − (Γ(2) + Γ(−2)) + . . .
n n
X∞
→ Γ(h)
h=−∞
This is very similar as what we did in the scalar case. Then we have the following proposition:
Proposition 2 Let xt be a zero mean stationary process with E(xt ) = 0 and E(xt xt−h ) = Γ(h),
where Γ(h) is absolutely summable, then the sample mean satisfies
(a) x̄n →p 0
P∞
(b) limn→∞ [nE(x̄n x̄0n )] = h=−∞ Γ(h).
Let S denote the limit variance of nE(x̄n x̄0n ). If the data are generated by a MA(q) process,
then results (b) implies that
X q
S= Γ(h).
h=−q
where
n
1 X
Γ̂(h) = (xt − x̄n )(xt−h − x̄n )0 .
n
t=h+1
Ŝ defined in (5) provides a consistent estimator for a large class of stationary processes. Even
when the process has time-varying second moments, as long as
n
1 X
(xt − x̄n )(xt − x̄n )0
n
t=h+1
7
converges in probability to
n
1 X
E(xt xt−h ),
n
t=h+1
which is positive semidefinite and has the same consistency properties of Ŝ when q, n → ∞ with
q/n1/4 → 0.
8
1.5.2 Orthogonalization and model specification
In economic modeling, we calculate the impulse-response dynamics as we are interested how eco-
nomic variables response to certain source of shocks. If the shocks are correlated, then it is hard
to identify what is the response to a particular shock. From that view, we may want to choose the
Q to make ut = Qt orthonormal, or uncorrelated across each other and with unit variance, i.e.,
E(ut u0t ) = I. To do so, we need a Q such that
0
Q−1 Q−1 = Ω,
then E(ut u0t ) = E(Qt 0t Q0 ) = QΩQ0 = Ik . So, we can use Choleski decomposition to find Q.
However, Q is still not unique as you can form other Qs by multiplying an orthogonal matrix.
Sims (1980) proposes that we could specify the model by choosing a particular leading term in
the coefficient, A0 . In (6), we see that Ψ0 = Ik . However, in (7), A0 = Q−1 cannot be identity
matrix unless Ω is diagonal. In our example, we would choose the Q which produces A0 = Q−1 as
a lower triangular matrix. That means after this transformation, shock u2t has no effects on x1t .
The nice thing is that Choleski decomposition itself will produce a triangular matrix.
where
2 1
Ω= E(t 0t ) = .
1 4
First we verify that this process is stationary, as
λ 0 0.5 0.2
0 λ − 0.3 0.4 = 0
gives λ1 = 0.94 and λ2 = −0.04, both lies inside the unit circle. Invert it to a moving average
process,
xt = Ψ(L)t .
We know that Ψ0 = I2 , Ψ1 = Φ1 , etc. Then we find Q by Choleski decomposition of Ω, which
gives
0.70 0 −1 1.41 0
Q= and Q = .
−0.27 0.53 0.70 1.87
Then we can write
xt = Ψ(L)Q−1 Qt = Ψ(L)Q−1 ut
where we define that ut = Qt . Then we have
or
x1t 1.41 0 u1t 0.85 0.37 u1,t−1
= + + ...
x2t 0.70 1.87 u2t 0.70 0.75 u2,t−1
9
In this example you see that we find a unique MA representation which is linear combination of
uncorrelated error (E(ut u0t ) = I2 ), and the second sources of shock does not have instantaneous
effects on x1t . We can then use this representation to compute the impulse-responses.
There are also other ways to specify the representation, depending on the problem of interest.
For example, Quah (1988) suggests that find a Q so that the long-run response of one variable to
another shocks is zero.
yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + t ,
10
The log likelihood function of observations (y1 , . . . , yn ) is (constant omitted)
n
X
l(y, x; θ) = (n/2)log|Ω−1 | − (1/2) (yt − Π0 xt )0 Ω−1 (yt − Π0 xt ) .
(8)
t=1
which is the estimated coefficient vector from an OLS regression of yjt on xt . So the MLE
estimates of the coefficients for the jth equation of a VAR are found by an OLS regression of yjt
on a constant term and p lags of all of the variables in the system.
The MLE estimate of Ω is
Xn
Ω̂n = (1/n) ˆt ˆ0t
t=1
where
ˆt = yt − Π̂0n xt
The details on the derivations can be found on page 292-296 on Hamilton book. The MLE
estimates Π̂ and Ω̂ are consistent even if the true innovations are non-Gaussian. In the next
subsection, we will consider regression with non-Gaussian errors, and we will use the LS approach
to derive for the asymptotics.
t = i.i.d.(0, Ω), and E(it jt lt mt ) < ∞ for all i, j, l, and m and where roots of
|Ik − Φ1 z − . . . − Φp z p | = 0
x0t = [ 1 yt−1
0 0
yt−2 0
. . . yt−p ],
So xt is a m-dimensional vector. Let π̂ n = vec(Π̂n ) denote the km×1 vector of coefficients resulting
from OLS regression of each of the elements of yt on xt for a sample of size n:
11
where #−1 "
n n
" #
X X
π̂ i,n = xt x0t x0t yit ,
t=1 t=1
and let π̂ 0 denote the km by 1 vector of the true parameter. Finally, let
n
X
Ω̂n = n−1 ˆt ˆ0t ,
t=1
where
Then
Pn
(a) n−1 0
t=1 xt xt →p Q where Q = E(xt x0t );
(b) π̂ n →p π;
(c) Ω̂n →p Ω;
√
(d) n(π̂ n − π) →d N (0, Ω ⊗ Q−1 ).
Result (a) is a vector version of that sample second moment converges to the population moment,
and it follows that the coefficients are absolutely summable and it has finite fourth moment. Result
(b) and (c) are similar to the derivations for single OLS regression in case 3 in lecture 5. To show
result (d), let
Xn
−1
Qn = n xt x0t ,
t=1
and Pn
Q−1 −1/2
n n x
Pnt=1 t 1t
−1
Qn n −1/2
√ t=1 xt 2t
n(π̂ n − π) = . (9)
..
.
−1 −1/2
Pn
Qn n t=1 xt kt
Define ξ t to be a km × 1 vector
xt 1t
xt 2t
ξt = . .
..
.
xt kt
12
Note that ξ t is a mds with finite fourth moments and variance
E(21t )
E(1t 2t ) . . . E(1t kt )
E(2t 1t ) E(22t ) . . . E(2t kt )
E(ξ t ξ 0t ) = ⊗ E(xt x0t )
.. .. ..
. . ... .
E(kt 1t ) E(kt 2t ) . . . E(kt )2
= Ω⊗Q
From (10) we know that this has a distribution that is Gaussian with mean 0 and variance
(Ik ⊗ Q−1 )(Ω ⊗ Q)(Ik ⊗ Q−1 ) = (Ik ΩIk ) ⊗ (Q−1 QQ−1 ) = Ω ⊗ Q−1 .
Hence we got result (d). Each of π̂ i has the distribution
√
n(π̂ i,n − π i ) →d N (0, σi2 Q−1 ).
Given that the estimators are asymptotically normal, we can use it to test linear or nonlinear
restrictions on the coefficients with the Wald statistics.
We know that vec is an operator to stack each column of a k × k matrix into one k 2 × 1 vector.
A similar operator, vech, is to stack all elements under the principal diagonal (so it transforms a
k × k matrix into one k(k + 1)/2 × 1 vector). For example,
a11
a11 a12
A= vech(A) = a21 .
a21 a22
a22
13
We will apply this operator on the variance matrix, which is symmetric. The joint distribution
of π̂ n and Ω̂n is given in the following proposition.
Proposition 4
|Ik − Φ1 z − . . . − Φp z p | = 0
lie outside the unit circle. Let π̂ n , Ω̂n , and Q be as defined in proposition 3, then
Let σij denote the ijth element of Ω then the element of Σ22 corresponding to the covariance between
σ̂ij and σ̂lm is given by (σil σjm + σim σjl ) for all i, j, l, m = 1, . . . k.
The detailed proof can be found onPpage 341-342 in Hamilton book. Basically there are three
steps: first, we show that Ω̂n = n−1 nt=1 ˆt ˆt 0 has the same asymptotic distribution as Ω̂∗n =
n−1 nt=1 t 0t . In the second step, write
P
where
21t − σ11
. . . 1t kt − σ1k
λt = vech .. ..
.
. ... .
kt k1 − σk1 2
. . . kt − σkk
Now, (ξ 0t , λ0t ) is an mds and we apply the CLT for mds to get (with a few more computations)
−1/2 Pn
Ω ⊗ Q−1 0
n t=1 ξt 0
→d N , .
n−1/2 nt=1 λt
P
0 0 Σ22
The final step in the proof is to show that E(λt λ0t ) is given by the matrix Σ22 as described in the
proposition, which can be proved with a constructed error sequence which is uncorrelated Gaussian
with zero mean and unit variance (see Hamilton’s book for details).
With the asymptotic variance of Ω̂n , we can then test if two errors are correlated. For example,
for k = 2,
2 2
√ σ̂11,n − σ11 0 2σ11 2σ11 σ12 σ12
n σ̂12,n − σ12 →d N 0 , 2σ11 σ12 σ11 σ22 + σ12 2 2σ12 σ22 .
σ̂22,n − σ22 0 2
2σ12 2σ12 σ22 2
2σ22
Then a Wald test of the null hypothesis that there is no covariance between 1t and 2t is given
by √
nσ̂12
2 )1/2
≈ N (0, 1).
(σ̂11 σ̂22 + σ12
14
The matrix Σ22 can be expressed more compactly using the duplication matrix. Duplication
matrix Dk is a matrix of size k 2 × k(k + 1)/2 matrix that transforms vech(Ω) into vec(Ω), i.e.
Dk vech(Ω) = vec(Ω).
For example,
1 0 0 σ11
0 σ11
1 0
σ21
0 σ21 = .
1 1 σ12
σ22
0 0 1 σ22
Define
0 −1
D+
k ≡ (Dk Dk ) Dk
Note that D+ +
k Dk = Ik(k+1)/2 . Dk is like the ‘reverse’ of Dk as it transform vec(Ω) into vech(Ω),
vech(Ω) = D+
k vec(Ω).
With Dk and D+
k we can write
+ 0
Σ22 = 2D+
k (Ω ⊗ Ω)(Dk ) .
3 Granger Causality
In most regressions in econometrics, it is very hard to discuss causality. For instance, the significance
of the coefficient β in the regression
yi = βxi + i ,
only tells the ‘co-occurrence’ of x and y, not that x causes y. In other words, usually the regression
only tells us there is some ‘relationship’ between x and y, and does not tell the nature of the
relationship, such as whether x causes y or y causes x.
One good thing of time series vector autoregression is that we could test ‘causality’ in some
sense. This test is first proposed by Granger (1969), and therefore we refer it Granger causality.
We will restrict our discussion to a system of two variables, x and y. y is said to Granger-cause
x if current or lagged values of y helps to predict future values of x. On the other hand, y fails to
Granger-cause x if for all s > 0, the mean squared error of a forecast of xt+s based on (xt , xt−1 , . . .)
is the same as that is based on (yt , yt−1 , . . .) and (xt , xt−1 , . . .). If we restrict ourselves to linear
functions, x fails to Granger-cause x if
Equivalently, we can say that x is exogenous in the time series sense with respect to y, or y is not
linearly informative about future x.
15
In the VAR equation, the example we proposed above implies a lower triangular coefficient
matrix:
1 p
xt c1 φ11 0 xt−1 φ11 0 xt−p 1t
= + + ... + + (11)
yt c2 φ121 φ122 yt−1 φp21 φp22 yt−p 2t
Or if we use MA representations,
xt µ1 φ11 (L) 0 1t
= + , (12)
yt µ2 φ21 (L) φ22 (L) 2t
where
φij (L) = φ0ij + φ1ij L + φ2ij L + . . .
with φ011 = φ022 = 1 and φ021 = 0. Another implication of Granger causality is stressed by Sims
(1972).
where E(ηt xτ ) = 0 for all t and τ . Then y fails to Granger-cause x iff dj = 0 for j = 1, 2, . . ..
Econometric tests of whether the series y Granger causes x can be based on any of the three
implications (11), (12), or (13). The simplest test is to estimate the regression which is based on
(11),
p
X X p
xt = c1 + αi xt−i + βi yt−j + ut
i=1 j=1
H0 : β1 = β2 = . . . = βp = 0.
Note: we have to be aware of that Granger causality does not equal to what we usually mean
by causality. For instance, even if x1 does not cause x2 , it may still help to predict x2 , and thus
Granger-causes x2 if changes in x1 precedes that of x2 for some reason. A naive example is that
we observe that a dragonfly flies much lower before a rain storm, due to the lower air pressure.
We know that dragonflies do not cause a rain storm, but it does help to predict a rain storm, thus
Granger-causes a rain storm.
Reading: Hamilton Ch. 10, 11, 14.
16