0% found this document useful (0 votes)

49 views16 pages

Note6 Multivariate Time Series

Uploaded by

Dewi Setyawati Putri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views16 pages

Note6 Multivariate Time Series

Uploaded by

Dewi Setyawati Putri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Lecture 6: Vector Autoregression∗

In this section, we will extend our discussion to vector valued time series. We will be mostly
interested in vector autoregression (VAR), which is much easier to be estimated in applications.
We will fist introduce the properties and basic tools in analyzing stationary VAR process, and then
we’ll move on to estimation and inference of the VAR model.

1 Covariance-stationary VAR(p) process

1.1 Introduction to stationary vector ARMA processes
1.1.1 VAR processes
A VAR model applies when each variable in the system does not only depend on its own lags, but
also the lags of other variables. A simple VAR example is:

x1t = φ11 x1,t−1 + φ12 x2,t−1 + 1t

x2t = φ21 x2,t−1 + φ22 x2,t−2 + 2t

where E(1t 2s ) = σ12 for t = s and zero for t 6= s. We could rewrite it as

x1t φ11 φ12 x1,t−1 0 0 x1,t−2 1t
= + + ,
x2t 0 φ21 x2,t−1 0 φ22 x2,t−2 2t
or just
xt = Φ1 xt−1 + Φ2 xt−2 + t (1)
and E(t ) = 0,E(t s ) = 0 for s 6= t and

σ12 σ12

E(t 0t ) = .
σ21 σ22

As you can see, in this example, the vector-valued random variable xt follows a VAR(2) process.
A general VAR(p) process with white noise can be written as

xt = Φ1 xt−1 + Φ2 xt−2 + . . . + t
Xp
= Φj xt−j + t
j=1

or, if we make use of the lag operator,

Φ(L)xt = t ,
∗
Copyright 2002-2006 by Ling Hu.

1
where
Φ(L) = Ik − Φ1 L − . . . − Φp Lp .
The error terms follow a vector white noise, i.e., E(t ) = 0,

Ω for t = s
E(t 0s ) =
0 otherwise

with Ω a (k × k) symmetric positive definite matrix.

Recall that in studying the scalar AR(p) process,

φ(L)xt = t ,

we have the results that the process {xt } is covariance-stationary as long as all the roots in (2)

1 − φ1 z − φ2 z 2 − . . . − φp z p = 0 (2)

lies out side of the unit circle. Similarly, for the VAR(p) process to be stationary, we must have
that the roots in the equation
|Ik − Φ1 z − . . . − Φp z p | = 0
all lies outside the unit circle.

1.1.2 Vector moving average processes

Recall that we could invert a scalar stationary AR(p) process, φ(L)xt = t to a MA(∞) process,
xt = θ(L)t , where θ(L) = φ(L)−1 . The same is true for a covariance-stationary VAR(p) process,
Φ(L)xt = t . We could invert it to
xt = Ψ(L)t
where
Ψ(L) = Φ(L)−1
The coefficients of Ψ can be solved in the same way as in the scalar case, i.e., if Φ−1 (L) = Ψ(L),
then Φ(L)Ψ(L) = Ik :

(Ik − Φ1 L − Φ2 L2 − . . . − Φp Lp )(Ik + Ψ1 L + Ψ2 L2 + . . .) = Ik .

Equating the coefficients of Lj , we have Ψ0 = Ik , Ψ1 = Φ1 , Ψ2 = Φ1 Ψ1 + Φ2 , and in general, we

have
Ψs = Φ1 Ψs−1 + Φ2 Ψs−2 + . . . + Φp Ψs−p .

1.2 Transforming to a state space representation

Sometime, it is more convenient to write a scalar valued time series, say an AR(p) process, in vector
form. For example,
Xp
xt = θj xt−j + t .
j=1

2
where ∼ N (0, σ 2 ). We could equivalently write it as
      
xt φ1 φ2 . . . φp−1 φp xt−1 t
 xt−1   1 0 ... 0 0   xt−2
    0 
 =  .. +
     
 .. .. .. .. ..   .. .. 
 .   . . . . .  .   . 
xt−p+1 0 ... ... 1 0 xt−p 0

If we let ξ t = (xt , xt−1 , . . . , xt−p+1 )0 , ξ t−1 = (xt−1 , xt−2 , . . . , xt−p ), t = (t , 0, . . . , 0), and let F
denote the parameter matrix, then we can write the process as:

ξ t = F ξ t−1 + t

where ∼ N (0, σ 2 Ip ). So we have rewrite an AR(p) scalar process as an vector autoregression of

order one, denoted by VAR(1).
Similarly, we could also transform a VAR(p) process to a VAR(1) process. For the process

xt = Φ1 xt−1 + Φ2 xt−2 + . . . + Φp xt−p + t ,

let  
xt
 xt−1 
ξt =  ,
 
..
 . 
xt−p+1
 
Φ1 Φ2 . . . Φp−1 Φp

 Ik 0 ... 0 0 
F =
 0 Ik ... 0 0 ,
 .. .. .. .. .. 
 . . . . . 
0 0 ... Ik 0
 
t
 0 
vt =  .  ,
 
 .. 
0
Then we could rewrite the VAR(p) process in state space notations,

ξ t = F ξ t−1 + vt . (3)

where E(v t v 0s ) equals Q for t = s and equals zero otherwise, and

 
Ω 0 ... 0
 0 0 ... 0 
Q= . . ..  .
 
 .. .. ... . 
0 0 ... 0

3
1.3 The autocovariance matrix
1.3.1 VAR process
For a covariance stationary k dimensional vector process {xt }, let E(xt ) = µ, then the autocovari-
ance is defined to be the following k by k matrix

Γ(h) = E[(xt − µ)(xt−h − µ)0 ].

For simplicity, assume that µ = 0. Then we have Γ(h) = E(xt x0t−h ). Because of the lead-lag effect,
we may not have Γ(h) = Γ(−h), but we have Γ(h)0 = Γ(−h). To show this,

Γ(h) = E(xt+h x0t+h−h ) = E(xt+h x0t ),

taking transpose
Γ(h)0 = E(xt x0t+h ) = Γ(−h).
Similar as in the scalar case, we define the autocovariance generating function of the process x
as
∞
X
Gx (z) = Γ(h)z h
h=−∞

where z is again a complex scalar.

Let ξ t as defined in (3). Assume that ξ and x are stationary, and let Σ denote the variance of
ξ,

Σ = E(ξ t ξ 0t )
  
xt
 xt−1  
= E  x0t x0t−1 . . . x0t−p+1 
  
.. 
 .  
xt−p+1
 
Γ(0) Γ(1) . . . Γ(p − 1)
 Γ(1)0 Γ(0) . . . Γ(p − 2) 
=  .
 
.. .. ..
 . . ... . 
Γ(p − 1)0 Γ(p − 2)0 . . . Γ(0)

Postmultiplying (3) by its transpose and taking expectations gives

E(ξ t ξ 0t ) = E[(F ξ t−1 + vt )(F ξ t−1 + vt )0 ] = F E(ξ t−1 ξ 0t−1 )F 0 + E(vt vt0 ),

or
Σ = F ΣF 0 + Q. (4)
To solve for Σ, we need to use the Kronecker product, and the following result: let A, B, C be
matrices whose dimensions are such that the product ABC exists. Then

vec(ABC) = (C 0 ⊗ A) · vec(B).

4
where vec is the operator to stack each column of a matrix (k × k) into a k 2 -dimensional vector,
for example,  
a11
a11 a12  a21 
A=  a12  .
vec(A) =  
a21 a22
a22
Apply vec operator on both sides of (4), we get

vec(Σ) = (F ⊗ F ) · vec(Σ) + vec(Q),

which gives
vec(Σ) = (Im − F ⊗ F )−1 vec(Q),
where m = k 2 p2 . We can use this equation to solve for the first p order of autocovariance of x,
Γ(0), . . . , Γ(p). To derive the hth autocovariance of ξ, denoted by Σ(h), we can postmultiplying
(3) by ξ 0t−h and take expectations,

E(ξ t ξ 0t−h ) = F E(ξ t−1 ξ 0t−h ) + E(vt ξ 0t−h ),

then
Σ(h) = F Σ(h − 1), or Σ(h) = F h Σ.
Therefore we have the following relationship for Γ(h)

Γ(h) = Φ1 Γ(h − 1) + Φ2 Γ(h − 2) + . . . + Φp Γ(h − p).

1.3.2 Vector MA processes

We first consider the MA(q) process.

xt = t + Ψ1 t−1 + Ψ2 t−2 + . . . + Ψq t−q .

Then the variance of xt is

Γ(0) = E(xt x0t )

= E(t 0t ) + Ψ1 E(t−1 0t−1 )Ψ01 + . . . + Ψq E(t−q 0t−q )Ψ0q
= Ω + Ψ1 ΩΨ01 + Ψ2 ΩΨ02 + . . . + Ψq ΩΨ0q .

and the autocovariances

 Ψh Ω + Ψh+1 ΩΨ1 + Ψh+2 ΩΨ02 + . . . + Ψq ΩΨ0q−j for h = 1, . . . , q.



Γ(h) = ΩΨ0−h + Ψ1 ΩΨ0−h+1 + Ψ2 ΩΨ0−h+2 + . . . + Ψq+h ΩΨ0q for h = −1, . . . , −q.

0 for |h| > q


As in the scalar case, any vector MA(q) process is stationary. Next consider the MA(∞) process

xt = t + Ψ1 t−1 + Ψ2 t−2 + . . . = Ψ(L)t .

5
A sequence of matrices {Ψs }∞
−∞ is absolutely summable if each of its element forms an absolutely
summable scalar sequence, i.e.
∞
(s)
X
|ψij | < ∞ for i, j = 1, 2, . . . n,
s=0

(s)
where ψij is the row i column j element (will use ijth for short) of Ψs . Some important results
about MA(∞) process is summarized as follows:

Proposition 1 Let xt be a k × 1 vector satisfying

∞
X
xt = Ψj t−j ,
j=0

where t is vector white noise and Ψj is absolutely summable. Then

(a) The autocovaiance between the ith variable at time t and the jth variable s periods earlier,
E(xit xj,t−s ) exists and is given by the ijth element of
∞
X
Γ(s) = Ψs+v ΩΨ0v for s = 0, 1, 2, . . . ;
v=0

(b) {Γ(h)}∞
h=0 is absolutely summable.

If {t }∞
t=−∞ is i.i.d. with E|i1,t i2,t i3,t i4,t | < ∞ for i1, i2, i3, i4 = 1, 2, . . . , k then we also have

(d) n−1 nt=1 xit xj,t−s →p E(xit xj,t−s ) for i, j = 1, 2, . . . , k and for all s.
P

All of these results can be viewed as extensions from the scalar case to vector case, and its proof
can be found on page 286-288 in Hamilton’s book.

1.4 The Sample Mean of a Vector Process

Let xt be a stationary process with E(xt ) = 0 and E(xt xt−h ) = Γ(h), where Γ(h) is absolutely
summable. Then we consider the properties of the sample mean
n
1X
x̄n = xt .
n
t=1

E[(x̄n x̄0n )
1
= E[(x1 + . . . xn )(x1 + . . . xn )0 ]
n2
n
1 X
= E(xi x0j )
n2
i,j

6
∞
1 X
= Γ(h)
n2
h=−∞
n−1
!
1 X |h|

= 1− Γ(h)
n n
h=−n+1

Then

nE[(x̄n x̄0n )]
n−1
!
X |h|

= 1− Γ(h)
n
h=−n+1

1 2
= Γ(0) + 1 − (Γ(1) + Γ(−1)) + 1 − (Γ(2) + Γ(−2)) + . . .
n n
X∞
→ Γ(h)
h=−∞

This is very similar as what we did in the scalar case. Then we have the following proposition:

Proposition 2 Let xt be a zero mean stationary process with E(xt ) = 0 and E(xt xt−h ) = Γ(h),
where Γ(h) is absolutely summable, then the sample mean satisfies

(a) x̄n →p 0
P∞
(b) limn→∞ [nE(x̄n x̄0n )] = h=−∞ Γ(h).

Let S denote the limit variance of nE(x̄n x̄0n ). If the data are generated by a MA(q) process,
then results (b) implies that
X q
S= Γ(h).
h=−q

Then a natural estimate for S is

q
X
Ŝ = Γ̂(h) + (Γ(h) + Γ(h)0 ), (5)
h=1

where
n
1 X
Γ̂(h) = (xt − x̄n )(xt−h − x̄n )0 .
n
t=h+1

Ŝ defined in (5) provides a consistent estimator for a large class of stationary processes. Even
when the process has time-varying second moments, as long as
n
1 X
(xt − x̄n )(xt − x̄n )0
n
t=h+1

7
converges in probability to
n
1 X
E(xt xt−h ),
n
t=h+1

Ŝ is a consistent estimate of nE(x̄n x̄0n ).

It is used not only for MA(q) process. Write the autoco-
variance as E(xt xs ), even it is nonzero for all t and s, if the matrix goes to zero sufficiently fast as
|t − s| → ∞, and q is growing with the sample n, then we still have Ŝ → S.
However, a problem with Ŝ is that it may not be positive semidefinite in small samples. There-
fore, we can use the Newey and West estimate
q
X h
S̃ = Γ̂0 + 1− (Γ(h) + Γ(h)0 ),
q+1
h=1

which is positive semidefinite and has the same consistency properties of Ŝ when q, n → ∞ with
q/n1/4 → 0.

1.5 Impulse-response Function and Orthogonalization

1.5.1 Impulse-response function
Impulse-response function gives how a time series variable is affected given a shock at time t. Recall
that for a scalar time series process, say, a AR(1) process xt = φxt−1 + t with |φ| < 1, we can
invert it to a MA process xt = (1 + φL + φ2 L + . . .)t , and the effects of on x are:
: 0 1 0 0 ...
x : 0 1 φ φ2 . . .
In other words, after we invert φ(L)xt = t to xt = θ(L)t , the θ(L) function gives us how x
response to a a unit shock from t .
We could do similar thing on a VAR process. In our earlier example, we have a VAR(2) system,
xt = Φ1 xt−1 + Φ2 xt−2 + t
and t ∼ W N (0, Ω) where
σ12 σ12

Ω= .
σ21 σ22
After we invert it to a MA(∞) representation
xt = Ψ(L)t (6)
where Ψ(L) = (1 − Φ1 L − Φ2 L2 )−1 , we see that in this representation, the observations xt is a linear
combinations of shocks t . However, suppose we are interested in another form of shocks, say
ut = Qt
where Q is an arbitrary square matrix (in this example, it is 2 by 2), we have
xt = Ψ(L)Q−1 Qt = A(L)ut (7)
where we let A(L) = Ψ(L)Q−1 . Since this Q is arbitrary, you see that we can have many linear
combinations of shocks, and response functions. Then which combinations shall we use?

8
1.5.2 Orthogonalization and model specification
In economic modeling, we calculate the impulse-response dynamics as we are interested how eco-
nomic variables response to certain source of shocks. If the shocks are correlated, then it is hard
to identify what is the response to a particular shock. From that view, we may want to choose the
Q to make ut = Qt orthonormal, or uncorrelated across each other and with unit variance, i.e.,
E(ut u0t ) = I. To do so, we need a Q such that
0
Q−1 Q−1 = Ω,

then E(ut u0t ) = E(Qt 0t Q0 ) = QΩQ0 = Ik . So, we can use Choleski decomposition to find Q.
However, Q is still not unique as you can form other Qs by multiplying an orthogonal matrix.
Sims (1980) proposes that we could specify the model by choosing a particular leading term in
the coefficient, A0 . In (6), we see that Ψ0 = Ik . However, in (7), A0 = Q−1 cannot be identity
matrix unless Ω is diagonal. In our example, we would choose the Q which produces A0 = Q−1 as
a lower triangular matrix. That means after this transformation, shock u2t has no effects on x1t .
The nice thing is that Choleski decomposition itself will produce a triangular matrix.

Example 1 Consider a AR(1) process of a 2-dimensional vector,

x1t 0.5 0.2 x1,t−1 1t
= +
x2t 0.3 0.4 x2,t−1 2t

where
2 1
Ω= E(t 0t ) = .
1 4
First we verify that this process is stationary, as

λ 0 0.5 0.2
0 λ − 0.3 0.4 = 0

gives λ1 = 0.94 and λ2 = −0.04, both lies inside the unit circle. Invert it to a moving average
process,
xt = Ψ(L)t .
We know that Ψ0 = I2 , Ψ1 = Φ1 , etc. Then we find Q by Choleski decomposition of Ω, which
gives
0.70 0 −1 1.41 0
Q= and Q = .
−0.27 0.53 0.70 1.87
Then we can write
xt = Ψ(L)Q−1 Qt = Ψ(L)Q−1 ut
where we define that ut = Qt . Then we have

xt = Ψ0 Q−1 ut + Ψ1 Q−1 ut−1 + . . . .

or
x1t 1.41 0 u1t 0.85 0.37 u1,t−1
= + + ...
x2t 0.70 1.87 u2t 0.70 0.75 u2,t−1

9
In this example you see that we find a unique MA representation which is linear combination of
uncorrelated error (E(ut u0t ) = I2 ), and the second sources of shock does not have instantaneous
effects on x1t . We can then use this representation to compute the impulse-responses.
There are also other ways to specify the representation, depending on the problem of interest.
For example, Quah (1988) suggests that find a Q so that the long-run response of one variable to
another shocks is zero.

1.5.3 Variance decomposition

Now, let’s consider how we could decompose the variance of the forecasting errors. xt = Ψ(L)t =
A(L)ut where A(L) = Ψ(L)Q, ut = Qt and E(ut u0t ) = I. For simplicity, we let (xt = (x1t , x02t ).
Suppose we do a one-period ahead forecasting, and let yt+1 denote the forecast error,
0
A11 A012

u1,t+1
yt+1 = xt+1 − Et (xt+1 ) = A0 ut+1 = .
A021 A022 u2,t+1
0 )=
Since E(u1t u2t ) = 0, E(u2it ) = 1, the variance of the forecasting error is given by E(yt+1 yt+1
0 0 2 0 2
A0 A0 . So the variance of forecasting error for x1t is given by (A11 ) + (A12 ) . We can interpret
that (A011 )2 is the amount of the one-step ahead forecasting error variance due to shock u1 , and
(A012 )2 is the amount due to shock u2 . Similarly the variance of forecasting error of x2t is given by
(A021 )2 + (A022 )2 , and we can interpret them as amount due to shock u1 and u2 respectively. The
variance for k-period ahead forecasting error can be computed in a similar way.

2 Estimation of VAR(p) process

2.1 Maximum Likelihood Estimation
Usually we use conditional likelihood in VAR estimation (recall that conditional likelihood functions
are much easier to work with than unconditional likelihood functions).
Given a k-vector VAR(p) process,

yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + t ,

we could rewrite it more concisely as

yt = Π0 xt + t .
where
c0
   
1

 Φ01 

 yt−1



Π=
 Φ02 
 and xt =  yt−2
 

 ..   .. 
 .   . 
Φ0p yt−p
If we assume that ∼ i.i.d.N (0, Ω), then we could use MLE to estimate the parameters in θ =
(c, Π, Ω). Following the same way in the scalar case, assume that we have observed (y−p+1 , . . . , y0 ),
then the likelihood function for the yt is

L(yt , xt ; θ) = (2π)−k/2 |Ω−1 |1/2 exp[(−1/2)(yt − Π0 xt )0 Ω−1 (yt − Π0 xt )]

10
The log likelihood function of observations (y1 , . . . , yn ) is (constant omitted)
n
X
l(y, x; θ) = (n/2)log|Ω−1 | − (1/2) (yt − Π0 xt )0 Ω−1 (yt − Π0 xt ) .

(8)
t=1

Taking first derivative with respect to Π and Ω, we have that

" n #" n #−1
X X
0 0 0
Π̂n = yt xt xt xt .
t=1 t=1

The jth row of Π̂0n is

" n
#" n
#−1
X X
π̂ 0j = yjt x0t xt x0t .
t=1 t=1

which is the estimated coefficient vector from an OLS regression of yjt on xt . So the MLE
estimates of the coefficients for the jth equation of a VAR are found by an OLS regression of yjt
on a constant term and p lags of all of the variables in the system.
The MLE estimate of Ω is
Xn
Ω̂n = (1/n) ˆt ˆ0t
t=1

where
ˆt = yt − Π̂0n xt
The details on the derivations can be found on page 292-296 on Hamilton book. The MLE
estimates Π̂ and Ω̂ are consistent even if the true innovations are non-Gaussian. In the next
subsection, we will consider regression with non-Gaussian errors, and we will use the LS approach
to derive for the asymptotics.

2.2 LS estimation and asymptotics

The asymptotic distribution of Π̂ is summarized in the following proposition
Proposition 3

yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + Φp yt−p + t ,

t = i.i.d.(0, Ω), and E(it jt lt mt ) < ∞ for all i, j, l, and m and where roots of

|Ik − Φ1 z − . . . − Φp z p | = 0

lie outside the unit circle. Let m = kp + 1 and let

x0t = [ 1 yt−1
0 0
yt−2 0
. . . yt−p ],

So xt is a m-dimensional vector. Let π̂ n = vec(Π̂n ) denote the km×1 vector of coefficients resulting
from OLS regression of each of the elements of yt on xt for a sample of size n:

π̂ 0n = [ π̂ 01,n π̂ 02,n . . . π̂ 0k,n ]

11
where #−1 "
n n
" #
X X
π̂ i,n = xt x0t x0t yit ,
t=1 t=1

and let π̂ 0 denote the km by 1 vector of the true parameter. Finally, let
n
X
Ω̂n = n−1 ˆt ˆ0t ,
t=1

where

ˆ0t = [ ˆ1t ˆ2t . . . ˆkt ]

ˆit = yit − x0t π̂ i,n

Then
Pn
(a) n−1 0
t=1 xt xt →p Q where Q = E(xt x0t );

(b) π̂ n →p π;

Result (a) is a vector version of that sample second moment converges to the population moment,
and it follows that the coefficients are absolutely summable and it has finite fourth moment. Result
(b) and (c) are similar to the derivations for single OLS regression in case 3 in lecture 5. To show
result (d), let
Xn
−1
Qn = n xt x0t ,
t=1

then we could write

n
" #
√ X
n(π̂ i,n − π i ) = Q−1
n n−1/2 xt it
t=1

and Pn
Q−1 −1/2
 
n n x
Pnt=1 t 1t
−1
Qn n −1/2
√ t=1 xt 2t
 
n(π̂ n − π) =  . (9)
 
..
 . 
−1 −1/2
Pn
Qn n t=1 xt kt
Define ξ t to be a km × 1 vector  
xt 1t
 xt 2t 
ξt =  . .
 
..
 . 
xt kt

12
Note that ξ t is a mds with finite fourth moments and variance
E(21t )
 
E(1t 2t ) . . . E(1t kt )
 E(2t 1t ) E(22t ) . . . E(2t kt ) 
E(ξ t ξ 0t ) =   ⊗ E(xt x0t )
 
.. .. ..
 . . ... . 
E(kt 1t ) E(kt 2t ) . . . E(kt )2

= Ω⊗Q

We can also show that

n
X
n−1 ξ t ξ 0t →p Ω ⊗ Q.
t=1
Apply the CLT for vector mds, we have
n
X
−1/2
n ξ t →d N (0, Ω ⊗ Q). (10)
t=1

Now rewrite (9) as

Qn−1 n−1/2 nt=1 xt 1t

  P 
0 ... 0
Q−1 n−1/2 nt=1 xt 2t
P
√  0 n ... 0  
n(π̂ n − π) = 
  
.. .. ..  .. 
 . . ... .  . 
. . . Q−1 −1/2
Pn
0 0 n n t=1 xt kt
n
X
= (Ik ⊗ Q−1
n )n
−1/2
ξt
t=1

By result (a) we have Q−1

n →p Q−1 . Thus
n
X
n1/2 (π̂ n − π) →p (Ik ⊗ Q−1 )n−1/2 ξt .
t=1

From (10) we know that this has a distribution that is Gaussian with mean 0 and variance
(Ik ⊗ Q−1 )(Ω ⊗ Q)(Ik ⊗ Q−1 ) = (Ik ΩIk ) ⊗ (Q−1 QQ−1 ) = Ω ⊗ Q−1 .
Hence we got result (d). Each of π̂ i has the distribution
√
n(π̂ i,n − π i ) →d N (0, σi2 Q−1 ).
Given that the estimators are asymptotically normal, we can use it to test linear or nonlinear
restrictions on the coefficients with the Wald statistics.
We know that vec is an operator to stack each column of a k × k matrix into one k 2 × 1 vector.
A similar operator, vech, is to stack all elements under the principal diagonal (so it transforms a
k × k matrix into one k(k + 1)/2 × 1 vector). For example,
 
a11
a11 a12
A= vech(A) =  a21  .
a21 a22
a22

13
We will apply this operator on the variance matrix, which is symmetric. The joint distribution
of π̂ n and Ω̂n is given in the following proposition.
Proposition 4

yt = c + Φ1 yt−1 + Φ2 yt−2 + . . . + Φp yt−p + t ,

t = i.i.d.N (0, Ω), and where roots of

|Ik − Φ1 z − . . . − Φp z p | = 0

lie outside the unit circle. Let π̂ n , Ω̂n , and Q be as defined in proposition 3, then

n1/2 [π̂ n − π] Ω ⊗ Q−1 0

0
→d N , .
n1/2 [vech(Ω̂n ) − vech(Ω)] 0 0 Σ22

Let σij denote the ijth element of Ω then the element of Σ22 corresponding to the covariance between
σ̂ij and σ̂lm is given by (σil σjm + σim σjl ) for all i, j, l, m = 1, . . . k.

The detailed proof can be found onPpage 341-342 in Hamilton book. Basically there are three
steps: first, we show that Ω̂n = n−1 nt=1 ˆt ˆt 0 has the same asymptotic distribution as Ω̂∗n =
n−1 nt=1 t 0t . In the second step, write
P

n1/2 [π̂ n − π] (Ik ⊗ Q−1 )n−1/2 nt=1 ξ t

P
→d
n1/2 [vech(Ω̂n ) − vech(Ω)] n−1/2 nt=1 λt
P

where
21t − σ11
 
. . . 1t kt − σ1k
λt = vech  .. ..
.
 
. ... .
kt k1 − σk1 2
. . . kt − σkk
Now, (ξ 0t , λ0t ) is an mds and we apply the CLT for mds to get (with a few more computations)
−1/2 Pn
Ω ⊗ Q−1 0

n t=1 ξt 0
→d N , .
n−1/2 nt=1 λt
P
0 0 Σ22

The final step in the proof is to show that E(λt λ0t ) is given by the matrix Σ22 as described in the
proposition, which can be proved with a constructed error sequence which is uncorrelated Gaussian
with zero mean and unit variance (see Hamilton’s book for details).
With the asymptotic variance of Ω̂n , we can then test if two errors are correlated. For example,
for k = 2,
2 2
     
√ σ̂11,n − σ11 0 2σ11 2σ11 σ12 σ12
n  σ̂12,n − σ12  →d N  0  ,  2σ11 σ12 σ11 σ22 + σ12 2 2σ12 σ22  .
σ̂22,n − σ22 0 2
2σ12 2σ12 σ22 2
2σ22

Then a Wald test of the null hypothesis that there is no covariance between 1t and 2t is given
by √
nσ̂12
2 )1/2
≈ N (0, 1).
(σ̂11 σ̂22 + σ12

14
The matrix Σ22 can be expressed more compactly using the duplication matrix. Duplication
matrix Dk is a matrix of size k 2 × k(k + 1)/2 matrix that transforms vech(Ω) into vec(Ω), i.e.

Dk vech(Ω) = vec(Ω).

For example,    
1 0 0   σ11
 0 σ11
1 0 
  σ21 

 0 σ21  =  .
1 1   σ12 
σ22
0 0 1 σ22
Define
0 −1
D+
k ≡ (Dk Dk ) Dk
Note that D+ +
k Dk = Ik(k+1)/2 . Dk is like the ‘reverse’ of Dk as it transform vec(Ω) into vech(Ω),

vech(Ω) = D+
k vec(Ω).

For example, when k = 2, we have

 
 σ
0 0  11
  
σ11 1 0
 σ21  =  0 1/2 1/2 0   σ21

.
 σ12 
σ22 0 0 0 1
σ22

With Dk and D+
k we can write
+ 0
Σ22 = 2D+
k (Ω ⊗ Ω)(Dk ) .

3 Granger Causality
In most regressions in econometrics, it is very hard to discuss causality. For instance, the significance
of the coefficient β in the regression
yi = βxi + i ,
only tells the ‘co-occurrence’ of x and y, not that x causes y. In other words, usually the regression
only tells us there is some ‘relationship’ between x and y, and does not tell the nature of the
relationship, such as whether x causes y or y causes x.
One good thing of time series vector autoregression is that we could test ‘causality’ in some
sense. This test is first proposed by Granger (1969), and therefore we refer it Granger causality.
We will restrict our discussion to a system of two variables, x and y. y is said to Granger-cause
x if current or lagged values of y helps to predict future values of x. On the other hand, y fails to
Granger-cause x if for all s > 0, the mean squared error of a forecast of xt+s based on (xt , xt−1 , . . .)
is the same as that is based on (yt , yt−1 , . . .) and (xt , xt−1 , . . .). If we restrict ourselves to linear
functions, x fails to Granger-cause x if

MSE[Ê(xt+s |xt , xt−1 , . . .)] = MSE[Ê(xt+s |xt , xt−1 , . . . , yt , yt−1 , . . .)].

Equivalently, we can say that x is exogenous in the time series sense with respect to y, or y is not
linearly informative about future x.

15
In the VAR equation, the example we proposed above implies a lower triangular coefficient
matrix:
1 p
xt c1 φ11 0 xt−1 φ11 0 xt−p 1t
= + + ... + + (11)
yt c2 φ121 φ122 yt−1 φp21 φp22 yt−p 2t

Or if we use MA representations,

xt µ1 φ11 (L) 0 1t
= + , (12)
yt µ2 φ21 (L) φ22 (L) 2t

where
φij (L) = φ0ij + φ1ij L + φ2ij L + . . .
with φ011 = φ022 = 1 and φ021 = 0. Another implication of Granger causality is stressed by Sims
(1972).

Proposition 5 Consider a linear projection of yt on past, present and future x’s,

∞
X ∞
X
yt = c + bj xt−j + dj xt+j + ηt , (13)
j=0 j=1

where E(ηt xτ ) = 0 for all t and τ . Then y fails to Granger-cause x iff dj = 0 for j = 1, 2, . . ..

Econometric tests of whether the series y Granger causes x can be based on any of the three
implications (11), (12), or (13). The simplest test is to estimate the regression which is based on
(11),
p
X X p
xt = c1 + αi xt−i + βi yt−j + ut
i=1 j=1

using OLS and then conduct a F-test of the null hypothesis

H0 : β1 = β2 = . . . = βp = 0.

Note: we have to be aware of that Granger causality does not equal to what we usually mean
by causality. For instance, even if x1 does not cause x2 , it may still help to predict x2 , and thus
Granger-causes x2 if changes in x1 precedes that of x2 for some reason. A naive example is that
we observe that a dragonfly flies much lower before a rain storm, due to the lower air pressure.
We know that dragonflies do not cause a rain storm, but it does help to predict a rain storm, thus
Granger-causes a rain storm.
Reading: Hamilton Ch. 10, 11, 14.

Modelling Binary Data: Second Edition
0% (2)
Modelling Binary Data: Second Edition
4 pages
Stationary ARMA Processes Guide
No ratings yet
Stationary ARMA Processes Guide
14 pages
Time Series
No ratings yet
Time Series
32 pages
STA03B3 Lecture 21
No ratings yet
STA03B3 Lecture 21
16 pages
Analysis of Multiple Time Series
No ratings yet
Analysis of Multiple Time Series
56 pages
ST304 Notes Zetai (v2)
No ratings yet
ST304 Notes Zetai (v2)
20 pages
HSTS203 Time Series
No ratings yet
HSTS203 Time Series
22 pages
All Tutorials Dynamic Econometrics
No ratings yet
All Tutorials Dynamic Econometrics
58 pages
Univariate Time Series Modelling and Forecasting: Introductory Econometrics For Finance' © Chris Brooks 2002 1
No ratings yet
Univariate Time Series Modelling and Forecasting: Introductory Econometrics For Finance' © Chris Brooks 2002 1
62 pages
2023 Tsa Arma 2 Revised
No ratings yet
2023 Tsa Arma 2 Revised
59 pages
Booklet Exercises
No ratings yet
Booklet Exercises
31 pages
LN LinearTSModels
No ratings yet
LN LinearTSModels
31 pages
Forecasting Techniques Notes
No ratings yet
Forecasting Techniques Notes
26 pages
Univariate Time Series Models - Cropped PDF
No ratings yet
Univariate Time Series Models - Cropped PDF
54 pages
Ch6 Slides Ed3 Feb2024
No ratings yet
Ch6 Slides Ed3 Feb2024
31 pages
Time Series Analysis for Students
No ratings yet
Time Series Analysis for Students
24 pages
Time Series DES 3 5 201320130514143106
No ratings yet
Time Series DES 3 5 201320130514143106
69 pages
Univariate Time Series Analysis Guide
No ratings yet
Univariate Time Series Analysis Guide
75 pages
Lecture-9-Univariate-Time-Series-Modelling - Part 1
No ratings yet
Lecture-9-Univariate-Time-Series-Modelling - Part 1
37 pages
Linear Stationary Models
No ratings yet
Linear Stationary Models
16 pages
Lecture 2 M
No ratings yet
Lecture 2 M
28 pages
Var Slides
No ratings yet
Var Slides
28 pages
Stationary Time Series
No ratings yet
Stationary Time Series
21 pages
TSNotes 2
No ratings yet
TSNotes 2
28 pages
Ch6 Slides Ed3 Feb2021
No ratings yet
Ch6 Slides Ed3 Feb2021
63 pages
MIT14 384F13 Lec1 PDF
No ratings yet
MIT14 384F13 Lec1 PDF
6 pages
MATH545-Time Series
No ratings yet
MATH545-Time Series
79 pages
Univariate Time Series Forecasting
100% (2)
Univariate Time Series Forecasting
72 pages
ARIMA
No ratings yet
ARIMA
24 pages
Graduate Macro Theory II: Notes On Time Series
No ratings yet
Graduate Macro Theory II: Notes On Time Series
20 pages
Machine Learning 2
No ratings yet
Machine Learning 2
65 pages
Univ Stat TS Stationarity
No ratings yet
Univ Stat TS Stationarity
25 pages
ARMA Processes: 1 Some Notation
No ratings yet
ARMA Processes: 1 Some Notation
6 pages
L4 Modeling Cycles
No ratings yet
L4 Modeling Cycles
80 pages
An Glicky 2016
No ratings yet
An Glicky 2016
130 pages
Sta 445 2 Time Series Models Ma
No ratings yet
Sta 445 2 Time Series Models Ma
6 pages
Chapter 4X
No ratings yet
Chapter 4X
96 pages
TS PartII
100% (1)
TS PartII
50 pages
Sta 445 3 Time Series Models Ar
No ratings yet
Sta 445 3 Time Series Models Ar
16 pages
STA222
No ratings yet
STA222
6 pages
5 Models
No ratings yet
5 Models
10 pages
TIme-series Analysis
No ratings yet
TIme-series Analysis
17 pages
Arma Model
No ratings yet
Arma Model
15 pages
Time Series Analysis Course
No ratings yet
Time Series Analysis Course
246 pages
Topic 3: Vector Autoregressive (VAR) Models
No ratings yet
Topic 3: Vector Autoregressive (VAR) Models
24 pages
Lectures 2-3 Notes Final20180308013455
No ratings yet
Lectures 2-3 Notes Final20180308013455
21 pages
Univariate Time Series Modelling and Forecasting
No ratings yet
Univariate Time Series Modelling and Forecasting
62 pages
Lecture 2
No ratings yet
Lecture 2
20 pages
Handout2 Arma
No ratings yet
Handout2 Arma
58 pages
Chapter 06 Merged
No ratings yet
Chapter 06 Merged
333 pages
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
6 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
11 Time Series
No ratings yet
11 Time Series
17 pages
Week4 1
No ratings yet
Week4 1
37 pages
Private Placements and Rights Issues in Singapore
No ratings yet
Private Placements and Rights Issues in Singapore
26 pages
Graham - Harvey, 2003
No ratings yet
Graham - Harvey, 2003
17 pages
Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406
No ratings yet
Lec 3 Autoregressive Moving Average (ARMA) Models and Their Practical Applications20200209000406
32 pages
Time Series Analysis Essentials
No ratings yet
Time Series Analysis Essentials
12 pages
Prof. Massimo Guidolin: 20192 - Financial Econometrics
No ratings yet
Prof. Massimo Guidolin: 20192 - Financial Econometrics
17 pages
Examples of Path Analysis in Research
No ratings yet
Examples of Path Analysis in Research
1 page
TS Lec11 Stationary Time Series (Chapter 2)
No ratings yet
TS Lec11 Stationary Time Series (Chapter 2)
18 pages
Analisis Pengaruh Produk, Merek, Harga, Dan Promosi Terhadap Keputusan Pembelian Sepeda Motor Honda Beat
No ratings yet
Analisis Pengaruh Produk, Merek, Harga, Dan Promosi Terhadap Keputusan Pembelian Sepeda Motor Honda Beat
26 pages
R22 ML Question Bank For It and CSM
No ratings yet
R22 ML Question Bank For It and CSM
4 pages
Service Quality Correlation Study
No ratings yet
Service Quality Correlation Study
3 pages
405 Econometrics: by Domodar N. Gujarati
No ratings yet
405 Econometrics: by Domodar N. Gujarati
47 pages
Chap 14 - Simple Linear Regression
No ratings yet
Chap 14 - Simple Linear Regression
3 pages
Predicting Stem Volume To Any Height Limit For Native Tree Species in Southern New South Wales and Victoria - Bi - 1999
No ratings yet
Predicting Stem Volume To Any Height Limit For Native Tree Species in Southern New South Wales and Victoria - Bi - 1999
14 pages
Functional Regression Insights
No ratings yet
Functional Regression Insights
7 pages
Economic Questions and Data: Multiple Choice
No ratings yet
Economic Questions and Data: Multiple Choice
19 pages
Cholesterol Analysis Using ANOVA
No ratings yet
Cholesterol Analysis Using ANOVA
2 pages
B.Tech IT ML Study Guide
100% (2)
B.Tech IT ML Study Guide
21 pages
Balanced Incomplete Block
No ratings yet
Balanced Incomplete Block
4 pages
Cars
No ratings yet
Cars
103 pages
Econometrics: Domodar N. Gujarati
No ratings yet
Econometrics: Domodar N. Gujarati
36 pages
SML Lab Manuel
No ratings yet
SML Lab Manuel
24 pages
4499-Article Text-16699-1-10-20220413
No ratings yet
4499-Article Text-16699-1-10-20220413
21 pages
Design of Experiments 1
No ratings yet
Design of Experiments 1
37 pages
Final Assign Harshi
0% (1)
Final Assign Harshi
15 pages
Regression Analysis: y Versus LN x1, LN x2, LN x3, LN x4, LN x5, LN x6, LN x7
No ratings yet
Regression Analysis: y Versus LN x1, LN x2, LN x3, LN x4, LN x5, LN x6, LN x7
5 pages
P4 Project Report
No ratings yet
P4 Project Report
28 pages
Chapter 9
No ratings yet
Chapter 9
23 pages
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
4 pages
Classification of Variable
No ratings yet
Classification of Variable
2 pages
Thuy T. Pham: By, U. of Technology Sydney
No ratings yet
Thuy T. Pham: By, U. of Technology Sydney
5 pages
Stata 11: GMM Estimation Guide
No ratings yet
Stata 11: GMM Estimation Guide
29 pages
Minitab Demonstration For Randomized Block Design
100% (1)
Minitab Demonstration For Randomized Block Design
3 pages
Fraud Detection Using Machine Learning and Deep Learning
No ratings yet
Fraud Detection Using Machine Learning and Deep Learning
6 pages
Chapter 4 PowerPoint
No ratings yet
Chapter 4 PowerPoint
76 pages