Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
49 views16 pages

Note6 Multivariate Time Series

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views16 pages

Note6 Multivariate Time Series

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 6: Vector Autoregression∗

In this section, we will extend our discussion to vector valued time series. We will be mostly
interested in vector autoregression (VAR), which is much easier to be estimated in applications.
We will fist introduce the properties and basic tools in analyzing stationary VAR process, and then
we’ll move on to estimation and inference of the VAR model.

1 Covariance-stationary VAR(p) process


1.1 Introduction to stationary vector ARMA processes
1.1.1 VAR processes
A VAR model applies when each variable in the system does not only depend on its own lags, but
also the lags of other variables. A simple VAR example is:

x1t = φ11 x1,t−1 + φ12 x2,t−1 + 1t


x2t = φ21 x2,t−1 + φ22 x2,t−2 + 2t

where E(1t 2s ) = σ12 for t = s and zero for t 6= s. We could rewrite it as
         
x1t φ11 φ12 x1,t−1 0 0 x1,t−2 1t
= + + ,
x2t 0 φ21 x2,t−1 0 φ22 x2,t−2 2t
or just
xt = Φ1 xt−1 + Φ2 xt−2 + t (1)
and E(t ) = 0,E(t s ) = 0 for s 6= t and

σ12 σ12
 
E(t 0t ) = .
σ21 σ22

As you can see, in this example, the vector-valued random variable xt follows a VAR(2) process.
A general VAR(p) process with white noise can be written as

xt = Φ1 xt−1 + Φ2 xt−2 + . . . + t
Xp
= Φj xt−j + t
j=1

or, if we make use of the lag operator,


Φ(L)xt = t ,

Copyright 2002-2006 by Ling Hu.

1
where
Φ(L) = Ik − Φ1 L − . . . − Φp Lp .
The error terms follow a vector white noise, i.e., E(t ) = 0,

Ω for t = s
E(t 0s ) =
0 otherwise

with Ω a (k × k) symmetric positive definite matrix.


Recall that in studying the scalar AR(p) process,

φ(L)xt = t ,

we have the results that the process {xt } is covariance-stationary as long as all the roots in (2)

1 − φ1 z − φ2 z 2 − . . . − φp z p = 0 (2)

lies out side of the unit circle. Similarly, for the VAR(p) process to be stationary, we must have
that the roots in the equation
|Ik − Φ1 z − . . . − Φp z p | = 0
all lies outside the unit circle.

1.1.2 Vector moving average processes


Recall that we could invert a scalar stationary AR(p) process, φ(L)xt = t to a MA(∞) process,
xt = θ(L)t , where θ(L) = φ(L)−1 . The same is true for a covariance-stationary VAR(p) process,
Φ(L)xt = t . We could invert it to
xt = Ψ(L)t
where
Ψ(L) = Φ(L)−1
The coefficients of Ψ can be solved in the same way as in the scalar case, i.e., if Φ−1 (L) = Ψ(L),
then Φ(L)Ψ(L) = Ik :

(Ik − Φ1 L − Φ2 L2 − . . . − Φp Lp )(Ik + Ψ1 L + Ψ2 L2 + . . .) = Ik .

Equating the coefficients of Lj , we have Ψ0 = Ik , Ψ1 = Φ1 , Ψ2 = Φ1 Ψ1 + Φ2 , and in general, we


have
Ψs = Φ1 Ψs−1 + Φ2 Ψs−2 + . . . + Φp Ψs−p .

1.2 Transforming to a state space representation


Sometime, it is more convenient to write a scalar valued time series, say an AR(p) process, in vector
form. For example,
Xp
xt = θj xt−j + t .
j=1

2
where  ∼ N (0, σ 2 ). We could equivalently write it as
      
xt φ1 φ2 . . . φp−1 φp xt−1 t
 xt−1   1 0 ... 0 0   xt−2
    0 
 =  .. +
     
 .. .. .. .. ..   .. .. 
 .   . . . . .  .   . 
xt−p+1 0 ... ... 1 0 xt−p 0

If we let ξ t = (xt , xt−1 , . . . , xt−p+1 )0 , ξ t−1 = (xt−1 , xt−2 , . . . , xt−p ), t = (t , 0, . . . , 0), and let F
denote the parameter matrix, then we can write the process as:

ξ t = F ξ t−1 + t

where  ∼ N (0, σ 2 Ip ). So we have rewrite an AR(p) scalar process as an vector autoregression of


order one, denoted by VAR(1).
Similarly, we could also transform a VAR(p) process to a VAR(1) process. For the process

xt = Φ1 xt−1 + Φ2 xt−2 + . . . + Φp xt−p + t ,

let  
xt
 xt−1 
ξt =  ,
 
..
 . 
xt−p+1
 
Φ1 Φ2 . . . Φp−1 Φp

 Ik 0 ... 0 0 
F =
 0 Ik ... 0 0 ,
 .. .. .. .. .. 
 . . . . . 
0 0 ... Ik 0
 
t
 0 
vt =  .  ,
 
 .. 
0
Then we could rewrite the VAR(p) process in state space notations,

ξ t = F ξ t−1 + vt . (3)

where E(v t v 0s ) equals Q for t = s and equals zero otherwise, and


 
Ω 0 ... 0
 0 0 ... 0 
Q= . . ..  .
 
 .. .. ... . 
0 0 ... 0

3
1.3 The autocovariance matrix
1.3.1 VAR process
For a covariance stationary k dimensional vector process {xt }, let E(xt ) = µ, then the autocovari-
ance is defined to be the following k by k matrix

Γ(h) = E[(xt − µ)(xt−h − µ)0 ].

For simplicity, assume that µ = 0. Then we have Γ(h) = E(xt x0t−h ). Because of the lead-lag effect,
we may not have Γ(h) = Γ(−h), but we have Γ(h)0 = Γ(−h). To show this,

Γ(h) = E(xt+h x0t+h−h ) = E(xt+h x0t ),

taking transpose
Γ(h)0 = E(xt x0t+h ) = Γ(−h).
Similar as in the scalar case, we define the autocovariance generating function of the process x
as

X
Gx (z) = Γ(h)z h
h=−∞

where z is again a complex scalar.


Let ξ t as defined in (3). Assume that ξ and x are stationary, and let Σ denote the variance of
ξ,

Σ = E(ξ t ξ 0t )
  
xt
 xt−1  
= E  x0t x0t−1 . . . x0t−p+1 
  
.. 
 .  
xt−p+1
 
Γ(0) Γ(1) . . . Γ(p − 1)
 Γ(1)0 Γ(0) . . . Γ(p − 2) 
=  .
 
.. .. ..
 . . ... . 
Γ(p − 1)0 Γ(p − 2)0 . . . Γ(0)

Postmultiplying (3) by its transpose and taking expectations gives

E(ξ t ξ 0t ) = E[(F ξ t−1 + vt )(F ξ t−1 + vt )0 ] = F E(ξ t−1 ξ 0t−1 )F 0 + E(vt vt0 ),

or
Σ = F ΣF 0 + Q. (4)
To solve for Σ, we need to use the Kronecker product, and the following result: let A, B, C be
matrices whose dimensions are such that the product ABC exists. Then

vec(ABC) = (C 0 ⊗ A) · vec(B).

4
where vec is the operator to stack each column of a matrix (k × k) into a k 2 -dimensional vector,
for example,  
  a11
a11 a12  a21 
A=  a12  .
vec(A) =  
a21 a22
a22
Apply vec operator on both sides of (4), we get

vec(Σ) = (F ⊗ F ) · vec(Σ) + vec(Q),

which gives
vec(Σ) = (Im − F ⊗ F )−1 vec(Q),
where m = k 2 p2 . We can use this equation to solve for the first p order of autocovariance of x,
Γ(0), . . . , Γ(p). To derive the hth autocovariance of ξ, denoted by Σ(h), we can postmultiplying
(3) by ξ 0t−h and take expectations,

E(ξ t ξ 0t−h ) = F E(ξ t−1 ξ 0t−h ) + E(vt ξ 0t−h ),

then
Σ(h) = F Σ(h − 1), or Σ(h) = F h Σ.
Therefore we have the following relationship for Γ(h)

Γ(h) = Φ1 Γ(h − 1) + Φ2 Γ(h − 2) + . . . + Φp Γ(h − p).

1.3.2 Vector MA processes


We first consider the MA(q) process.

xt = t + Ψ1 t−1 + Ψ2 t−2 + . . . + Ψq t−q .

Then the variance of xt is

Γ(0) = E(xt x0t )


= E(t 0t ) + Ψ1 E(t−1 0t−1 )Ψ01 + . . . + Ψq E(t−q 0t−q )Ψ0q
= Ω + Ψ1 ΩΨ01 + Ψ2 ΩΨ02 + . . . + Ψq ΩΨ0q .

and the autocovariances

 Ψh Ω + Ψh+1 ΩΨ1 + Ψh+2 ΩΨ02 + . . . + Ψq ΩΨ0q−j for h = 1, . . . , q.


Γ(h) = ΩΨ0−h + Ψ1 ΩΨ0−h+1 + Ψ2 ΩΨ0−h+2 + . . . + Ψq+h ΩΨ0q for h = −1, . . . , −q.


0 for |h| > q

As in the scalar case, any vector MA(q) process is stationary. Next consider the MA(∞) process

xt = t + Ψ1 t−1 + Ψ2 t−2 + . . . = Ψ(L)t .

5
A sequence of matrices {Ψs }∞
−∞ is absolutely summable if each of its element forms an absolutely
summable scalar sequence, i.e.

(s)
X
|ψij | < ∞ for i, j = 1, 2, . . . n,
s=0

(s)
where ψij is the row i column j element (will use ijth for short) of Ψs . Some important results
about MA(∞) process is summarized as follows:

Proposition 1 Let xt be a k × 1 vector satisfying



X
xt = Ψj t−j ,
j=0

where t is vector white noise and Ψj is absolutely summable. Then

(a) The autocovaiance between the ith variable at time t and the jth variable s periods earlier,
E(xit xj,t−s ) exists and is given by the ijth element of

X
Γ(s) = Ψs+v ΩΨ0v for s = 0, 1, 2, . . . ;
v=0

(b) {Γ(h)}∞
h=0 is absolutely summable.

If {t }∞
t=−∞ is i.i.d. with E|i1,t i2,t i3,t i4,t | < ∞ for i1, i2, i3, i4 = 1, 2, . . . , k then we also have

(c) E|xi1,t1 xi2,t2 xi3,t3 xi4,t4 | < ∞ for i1, i2, i3, i4 = 1, 2, . . . , k and all t1, t2, t3, t4.

(d) n−1 nt=1 xit xj,t−s →p E(xit xj,t−s ) for i, j = 1, 2, . . . , k and for all s.
P

All of these results can be viewed as extensions from the scalar case to vector case, and its proof
can be found on page 286-288 in Hamilton’s book.

1.4 The Sample Mean of a Vector Process


Let xt be a stationary process with E(xt ) = 0 and E(xt xt−h ) = Γ(h), where Γ(h) is absolutely
summable. Then we consider the properties of the sample mean
n
1X
x̄n = xt .
n
t=1

E[(x̄n x̄0n )
1
= E[(x1 + . . . xn )(x1 + . . . xn )0 ]
n2
n
1 X
= E(xi x0j )
n2
i,j

6

1 X
= Γ(h)
n2
h=−∞
n−1
!
1 X  |h|

= 1− Γ(h)
n n
h=−n+1

Then

nE[(x̄n x̄0n )]
n−1
!
X  |h|

= 1− Γ(h)
n
h=−n+1
   
1 2
= Γ(0) + 1 − (Γ(1) + Γ(−1)) + 1 − (Γ(2) + Γ(−2)) + . . .
n n
X∞
→ Γ(h)
h=−∞

This is very similar as what we did in the scalar case. Then we have the following proposition:

Proposition 2 Let xt be a zero mean stationary process with E(xt ) = 0 and E(xt xt−h ) = Γ(h),
where Γ(h) is absolutely summable, then the sample mean satisfies

(a) x̄n →p 0
P∞
(b) limn→∞ [nE(x̄n x̄0n )] = h=−∞ Γ(h).

Let S denote the limit variance of nE(x̄n x̄0n ). If the data are generated by a MA(q) process,
then results (b) implies that
X q
S= Γ(h).
h=−q

Then a natural estimate for S is


q
X
Ŝ = Γ̂(h) + (Γ(h) + Γ(h)0 ), (5)
h=1

where
n
1 X
Γ̂(h) = (xt − x̄n )(xt−h − x̄n )0 .
n
t=h+1

Ŝ defined in (5) provides a consistent estimator for a large class of stationary processes. Even
when the process has time-varying second moments, as long as
n
1 X
(xt − x̄n )(xt − x̄n )0
n
t=h+1

7
converges in probability to
n
1 X
E(xt xt−h ),
n
t=h+1

Ŝ is a consistent estimate of nE(x̄n x̄0n ).


It is used not only for MA(q) process. Write the autoco-
variance as E(xt xs ), even it is nonzero for all t and s, if the matrix goes to zero sufficiently fast as
|t − s| → ∞, and q is growing with the sample n, then we still have Ŝ → S.
However, a problem with Ŝ is that it may not be positive semidefinite in small samples. There-
fore, we can use the Newey and West estimate
q  
X h
S̃ = Γ̂0 + 1− (Γ(h) + Γ(h)0 ),
q+1
h=1

which is positive semidefinite and has the same consistency properties of Ŝ when q, n → ∞ with
q/n1/4 → 0.

1.5 Impulse-response Function and Orthogonalization


1.5.1 Impulse-response function
Impulse-response function gives how a time series variable is affected given a shock at time t. Recall
that for a scalar time series process, say, a AR(1) process xt = φxt−1 + t with |φ| < 1, we can
invert it to a MA process xt = (1 + φL + φ2 L + . . .)t , and the effects of  on x are:
 : 0 1 0 0 ...
x : 0 1 φ φ2 . . .
In other words, after we invert φ(L)xt = t to xt = θ(L)t , the θ(L) function gives us how x
response to a a unit shock from t .
We could do similar thing on a VAR process. In our earlier example, we have a VAR(2) system,
xt = Φ1 xt−1 + Φ2 xt−2 + t
and t ∼ W N (0, Ω) where
σ12 σ12
 
Ω= .
σ21 σ22
After we invert it to a MA(∞) representation
xt = Ψ(L)t (6)
where Ψ(L) = (1 − Φ1 L − Φ2 L2 )−1 , we see that in this representation, the observations xt is a linear
combinations of shocks t . However, suppose we are interested in another form of shocks, say
ut = Qt
where Q is an arbitrary square matrix (in this example, it is 2 by 2), we have
xt = Ψ(L)Q−1 Qt = A(L)ut (7)
where we let A(L) = Ψ(L)Q−1 . Since this Q is arbitrary, you see that we can have many linear
combinations of shocks, and response functions. Then which combinations shall we use?

8
1.5.2 Orthogonalization and model specification
In economic modeling, we calculate the impulse-response dynamics as we are interested how eco-
nomic variables response to certain source of shocks. If the shocks are correlated, then it is hard
to identify what is the response to a particular shock. From that view, we may want to choose the
Q to make ut = Qt orthonormal, or uncorrelated across each other and with unit variance, i.e.,
E(ut u0t ) = I. To do so, we need a Q such that
0
Q−1 Q−1 = Ω,

then E(ut u0t ) = E(Qt 0t Q0 ) = QΩQ0 = Ik . So, we can use Choleski decomposition to find Q.
However, Q is still not unique as you can form other Qs by multiplying an orthogonal matrix.
Sims (1980) proposes that we could specify the model by choosing a particular leading term in
the coefficient, A0 . In (6), we see that Ψ0 = Ik . However, in (7), A0 = Q−1 cannot be identity
matrix unless Ω is diagonal. In our example, we would choose the Q which produces A0 = Q−1 as
a lower triangular matrix. That means after this transformation, shock u2t has no effects on x1t .
The nice thing is that Choleski decomposition itself will produce a triangular matrix.

Example 1 Consider a AR(1) process of a 2-dimensional vector,


      
x1t 0.5 0.2 x1,t−1 1t
= +
x2t 0.3 0.4 x2,t−1 2t

where  
2 1
Ω= E(t 0t ) = .
1 4
First we verify that this process is stationary, as
   
λ 0 0.5 0.2
0 λ − 0.3 0.4 = 0

gives λ1 = 0.94 and λ2 = −0.04, both lies inside the unit circle. Invert it to a moving average
process,
xt = Ψ(L)t .
We know that Ψ0 = I2 , Ψ1 = Φ1 , etc. Then we find Q by Choleski decomposition of Ω, which
gives    
0.70 0 −1 1.41 0
Q= and Q = .
−0.27 0.53 0.70 1.87
Then we can write
xt = Ψ(L)Q−1 Qt = Ψ(L)Q−1 ut
where we define that ut = Qt . Then we have

xt = Ψ0 Q−1 ut + Ψ1 Q−1 ut−1 + . . . .

or        
x1t 1.41 0 u1t 0.85 0.37 u1,t−1
= + + ...
x2t 0.70 1.87 u2t 0.70 0.75 u2,t−1

9
In this example you see that we find a unique MA representation which is linear combination of
uncorrelated error (E(ut u0t ) = I2 ), and the second sources of shock does not have instantaneous
effects on x1t . We can then use this representation to compute the impulse-responses.
There are also other ways to specify the representation, depending on the problem of interest.
For example, Quah (1988) suggests that find a Q so that the long-run response of one variable to
another shocks is zero.

1.5.3 Variance decomposition


Now, let’s consider how we could decompose the variance of the forecasting errors. xt = Ψ(L)t =
A(L)ut where A(L) = Ψ(L)Q, ut = Qt and E(ut u0t ) = I. For simplicity, we let (xt = (x1t , x02t ).
Suppose we do a one-period ahead forecasting, and let yt+1 denote the forecast error,
 0
A11 A012
 
u1,t+1
yt+1 = xt+1 − Et (xt+1 ) = A0 ut+1 = .
A021 A022 u2,t+1
0 )=
Since E(u1t u2t ) = 0, E(u2it ) = 1, the variance of the forecasting error is given by E(yt+1 yt+1
0 0 2 0 2
A0 A0 . So the variance of forecasting error for x1t is given by (A11 ) + (A12 ) . We can interpret
that (A011 )2 is the amount of the one-step ahead forecasting error variance due to shock u1 , and
(A012 )2 is the amount due to shock u2 . Similarly the variance of forecasting error of x2t is given by
(A021 )2 + (A022 )2 , and we can interpret them as amount due to shock u1 and u2 respectively. The
variance for k-period ahead forecasting error can be computed in a similar way.

2 Estimation of VAR(p) process


2.1 Maximum Likelihood Estimation
Usually we use conditional likelihood in VAR estimation (recall that conditional likelihood functions
are much easier to work with than unconditional likelihood functions).
Given a k-vector VAR(p) process,

yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + t ,

we could rewrite it more concisely as


yt = Π0 xt + t .
where
c0
   
1

 Φ01 

 yt−1



Π=
 Φ02 
 and xt =  yt−2
 

 ..   .. 
 .   . 
Φ0p yt−p
If we assume that  ∼ i.i.d.N (0, Ω), then we could use MLE to estimate the parameters in θ =
(c, Π, Ω). Following the same way in the scalar case, assume that we have observed (y−p+1 , . . . , y0 ),
then the likelihood function for the yt is

L(yt , xt ; θ) = (2π)−k/2 |Ω−1 |1/2 exp[(−1/2)(yt − Π0 xt )0 Ω−1 (yt − Π0 xt )]

10
The log likelihood function of observations (y1 , . . . , yn ) is (constant omitted)
n
X
l(y, x; θ) = (n/2)log|Ω−1 | − (1/2) (yt − Π0 xt )0 Ω−1 (yt − Π0 xt ) .
 
(8)
t=1

Taking first derivative with respect to Π and Ω, we have that


" n #" n #−1
X X
0 0 0
Π̂n = yt xt xt xt .
t=1 t=1

The jth row of Π̂0n is


" n
#" n
#−1
X X
π̂ 0j = yjt x0t xt x0t .
t=1 t=1

which is the estimated coefficient vector from an OLS regression of yjt on xt . So the MLE
estimates of the coefficients for the jth equation of a VAR are found by an OLS regression of yjt
on a constant term and p lags of all of the variables in the system.
The MLE estimate of Ω is
Xn
Ω̂n = (1/n) ˆt ˆ0t
t=1

where
ˆt = yt − Π̂0n xt
The details on the derivations can be found on page 292-296 on Hamilton book. The MLE
estimates Π̂ and Ω̂ are consistent even if the true innovations are non-Gaussian. In the next
subsection, we will consider regression with non-Gaussian errors, and we will use the LS approach
to derive for the asymptotics.

2.2 LS estimation and asymptotics


The asymptotic distribution of Π̂ is summarized in the following proposition
Proposition 3

yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + Φp yt−p + t ,

t = i.i.d.(0, Ω), and E(it jt lt mt ) < ∞ for all i, j, l, and m and where roots of

|Ik − Φ1 z − . . . − Φp z p | = 0

lie outside the unit circle. Let m = kp + 1 and let

x0t = [ 1 yt−1
0 0
yt−2 0
. . . yt−p ],

So xt is a m-dimensional vector. Let π̂ n = vec(Π̂n ) denote the km×1 vector of coefficients resulting
from OLS regression of each of the elements of yt on xt for a sample of size n:

π̂ 0n = [ π̂ 01,n π̂ 02,n . . . π̂ 0k,n ]

11
where #−1 "
n n
" #
X X
π̂ i,n = xt x0t x0t yit ,
t=1 t=1

and let π̂ 0 denote the km by 1 vector of the true parameter. Finally, let
n
X
Ω̂n = n−1 ˆt ˆ0t ,
t=1

where

ˆ0t = [ ˆ1t ˆ2t . . . ˆkt ]


ˆit = yit − x0t π̂ i,n

Then
Pn
(a) n−1 0
t=1 xt xt →p Q where Q = E(xt x0t );

(b) π̂ n →p π;

(c) Ω̂n →p Ω;

(d) n(π̂ n − π) →d N (0, Ω ⊗ Q−1 ).

Result (a) is a vector version of that sample second moment converges to the population moment,
and it follows that the coefficients are absolutely summable and it has finite fourth moment. Result
(b) and (c) are similar to the derivations for single OLS regression in case 3 in lecture 5. To show
result (d), let
Xn
−1
Qn = n xt x0t ,
t=1

then we could write


n
" #
√ X
n(π̂ i,n − π i ) = Q−1
n n−1/2 xt it
t=1

and Pn
Q−1 −1/2
 
n n x
Pnt=1 t 1t
−1
Qn n −1/2
√ t=1 xt 2t
 
n(π̂ n − π) =  . (9)
 
..
 . 
−1 −1/2
Pn
Qn n t=1 xt kt
Define ξ t to be a km × 1 vector  
xt 1t
 xt 2t 
ξt =  . .
 
..
 . 
xt kt

12
Note that ξ t is a mds with finite fourth moments and variance
E(21t )
 
E(1t 2t ) . . . E(1t kt )
 E(2t 1t ) E(22t ) . . . E(2t kt ) 
E(ξ t ξ 0t ) =   ⊗ E(xt x0t )
 
.. .. ..
 . . ... . 
E(kt 1t ) E(kt 2t ) . . . E(kt )2

= Ω⊗Q

We can also show that


n
X
n−1 ξ t ξ 0t →p Ω ⊗ Q.
t=1
Apply the CLT for vector mds, we have
n
X
−1/2
n ξ t →d N (0, Ω ⊗ Q). (10)
t=1

Now rewrite (9) as

Qn−1 n−1/2 nt=1 xt 1t


  P 
0 ... 0
Q−1 n−1/2 nt=1 xt 2t
P
√  0 n ... 0  
n(π̂ n − π) = 
  
.. .. ..  .. 
 . . ... .  . 
. . . Q−1 −1/2
Pn
0 0 n n t=1 xt kt
n
X
= (Ik ⊗ Q−1
n )n
−1/2
ξt
t=1

By result (a) we have Q−1


n →p Q−1 . Thus
n
X
n1/2 (π̂ n − π) →p (Ik ⊗ Q−1 )n−1/2 ξt .
t=1

From (10) we know that this has a distribution that is Gaussian with mean 0 and variance
(Ik ⊗ Q−1 )(Ω ⊗ Q)(Ik ⊗ Q−1 ) = (Ik ΩIk ) ⊗ (Q−1 QQ−1 ) = Ω ⊗ Q−1 .
Hence we got result (d). Each of π̂ i has the distribution

n(π̂ i,n − π i ) →d N (0, σi2 Q−1 ).
Given that the estimators are asymptotically normal, we can use it to test linear or nonlinear
restrictions on the coefficients with the Wald statistics.
We know that vec is an operator to stack each column of a k × k matrix into one k 2 × 1 vector.
A similar operator, vech, is to stack all elements under the principal diagonal (so it transforms a
k × k matrix into one k(k + 1)/2 × 1 vector). For example,
 
  a11
a11 a12
A= vech(A) =  a21  .
a21 a22
a22

13
We will apply this operator on the variance matrix, which is symmetric. The joint distribution
of π̂ n and Ω̂n is given in the following proposition.
Proposition 4

yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + Φp yt−p + t ,

t = i.i.d.N (0, Ω), and where roots of

|Ik − Φ1 z − . . . − Φp z p | = 0

lie outside the unit circle. Let π̂ n , Ω̂n , and Q be as defined in proposition 3, then

n1/2 [π̂ n − π] Ω ⊗ Q−1 0


     
0
→d N , .
n1/2 [vech(Ω̂n ) − vech(Ω)] 0 0 Σ22

Let σij denote the ijth element of Ω then the element of Σ22 corresponding to the covariance between
σ̂ij and σ̂lm is given by (σil σjm + σim σjl ) for all i, j, l, m = 1, . . . k.

The detailed proof can be found onPpage 341-342 in Hamilton book. Basically there are three
steps: first, we show that Ω̂n = n−1 nt=1 ˆt ˆt 0 has the same asymptotic distribution as Ω̂∗n =
n−1 nt=1 t 0t . In the second step, write
P

n1/2 [π̂ n − π] (Ik ⊗ Q−1 )n−1/2 nt=1 ξ t


   P 
→d
n1/2 [vech(Ω̂n ) − vech(Ω)] n−1/2 nt=1 λt
P

where
21t − σ11
 
. . . 1t kt − σ1k
λt = vech  .. ..
.
 
. ... .
kt k1 − σk1 2
. . . kt − σkk
Now, (ξ 0t , λ0t ) is an mds and we apply the CLT for mds to get (with a few more computations)
 −1/2 Pn
Ω ⊗ Q−1 0
    
n t=1 ξt 0
→d N , .
n−1/2 nt=1 λt
P
0 0 Σ22

The final step in the proof is to show that E(λt λ0t ) is given by the matrix Σ22 as described in the
proposition, which can be proved with a constructed error sequence which is uncorrelated Gaussian
with zero mean and unit variance (see Hamilton’s book for details).
With the asymptotic variance of Ω̂n , we can then test if two errors are correlated. For example,
for k = 2,
2 2
     
√ σ̂11,n − σ11 0 2σ11 2σ11 σ12 σ12
n  σ̂12,n − σ12  →d N  0  ,  2σ11 σ12 σ11 σ22 + σ12 2 2σ12 σ22  .
σ̂22,n − σ22 0 2
2σ12 2σ12 σ22 2
2σ22

Then a Wald test of the null hypothesis that there is no covariance between 1t and 2t is given
by √
nσ̂12
2 )1/2
≈ N (0, 1).
(σ̂11 σ̂22 + σ12

14
The matrix Σ22 can be expressed more compactly using the duplication matrix. Duplication
matrix Dk is a matrix of size k 2 × k(k + 1)/2 matrix that transforms vech(Ω) into vec(Ω), i.e.

Dk vech(Ω) = vec(Ω).

For example,    
1 0 0   σ11
 0 σ11
1 0 
  σ21 

 0 σ21  =  .
1 1   σ12 
σ22
0 0 1 σ22
Define
0 −1
D+
k ≡ (Dk Dk ) Dk
Note that D+ +
k Dk = Ik(k+1)/2 . Dk is like the ‘reverse’ of Dk as it transform vec(Ω) into vech(Ω),

vech(Ω) = D+
k vec(Ω).

For example, when k = 2, we have


 
 σ
0 0  11
  
σ11 1 0
 σ21  =  0 1/2 1/2 0   σ21

.
 σ12 
σ22 0 0 0 1
σ22

With Dk and D+
k we can write
+ 0
Σ22 = 2D+
k (Ω ⊗ Ω)(Dk ) .

3 Granger Causality
In most regressions in econometrics, it is very hard to discuss causality. For instance, the significance
of the coefficient β in the regression
yi = βxi + i ,
only tells the ‘co-occurrence’ of x and y, not that x causes y. In other words, usually the regression
only tells us there is some ‘relationship’ between x and y, and does not tell the nature of the
relationship, such as whether x causes y or y causes x.
One good thing of time series vector autoregression is that we could test ‘causality’ in some
sense. This test is first proposed by Granger (1969), and therefore we refer it Granger causality.
We will restrict our discussion to a system of two variables, x and y. y is said to Granger-cause
x if current or lagged values of y helps to predict future values of x. On the other hand, y fails to
Granger-cause x if for all s > 0, the mean squared error of a forecast of xt+s based on (xt , xt−1 , . . .)
is the same as that is based on (yt , yt−1 , . . .) and (xt , xt−1 , . . .). If we restrict ourselves to linear
functions, x fails to Granger-cause x if

MSE[Ê(xt+s |xt , xt−1 , . . .)] = MSE[Ê(xt+s |xt , xt−1 , . . . , yt , yt−1 , . . .)].

Equivalently, we can say that x is exogenous in the time series sense with respect to y, or y is not
linearly informative about future x.

15
In the VAR equation, the example we proposed above implies a lower triangular coefficient
matrix:
     1    p    
xt c1 φ11 0 xt−1 φ11 0 xt−p 1t
= + + ... + + (11)
yt c2 φ121 φ122 yt−1 φp21 φp22 yt−p 2t

Or if we use MA representations,
      
xt µ1 φ11 (L) 0 1t
= + , (12)
yt µ2 φ21 (L) φ22 (L) 2t

where
φij (L) = φ0ij + φ1ij L + φ2ij L + . . .
with φ011 = φ022 = 1 and φ021 = 0. Another implication of Granger causality is stressed by Sims
(1972).

Proposition 5 Consider a linear projection of yt on past, present and future x’s,



X ∞
X
yt = c + bj xt−j + dj xt+j + ηt , (13)
j=0 j=1

where E(ηt xτ ) = 0 for all t and τ . Then y fails to Granger-cause x iff dj = 0 for j = 1, 2, . . ..

Econometric tests of whether the series y Granger causes x can be based on any of the three
implications (11), (12), or (13). The simplest test is to estimate the regression which is based on
(11),
p
X X p
xt = c1 + αi xt−i + βi yt−j + ut
i=1 j=1

using OLS and then conduct a F-test of the null hypothesis

H0 : β1 = β2 = . . . = βp = 0.

Note: we have to be aware of that Granger causality does not equal to what we usually mean
by causality. For instance, even if x1 does not cause x2 , it may still help to predict x2 , and thus
Granger-causes x2 if changes in x1 precedes that of x2 for some reason. A naive example is that
we observe that a dragonfly flies much lower before a rain storm, due to the lower air pressure.
We know that dragonflies do not cause a rain storm, but it does help to predict a rain storm, thus
Granger-causes a rain storm.
Reading: Hamilton Ch. 10, 11, 14.

16

You might also like