0% found this document useful (0 votes)

67 views16 pages

Estimation

This summarizes a chapter on estimation for linear models in time series analysis. It discusses estimating the mean and autocovariance function of a time series, including the sample mean and sample autocovariance as estimators. It also presents Bartlett's formula, which shows that under the assumption of linearity, the variance of the sample correlation greatly simplifies compared to the variance of the sample covariance. Finally, it discusses using these results to construct a test for uncorrelatedness of a time series known as the Portmanteau test.

Uploaded by

fatiha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views16 pages

Estimation

Uploaded by

fatiha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Chapter 4

Estimation for Linear models

Prerequisites

• Linear regression.

• The Gaussian likelihood.

• Some idea of what a cumulant is.

Objectives

• To derive the sample autocovariance of a time series, and show that this is a positive
definite sequence.

• To show that the variance of the sample covariance involves fourth order cumulants,
which can be unwielding to estimate in practice. But under linearity the expression
for the variance greatly simplifies.

• To show that under linearity the correlation does not involve the fourth order cumulant.
This is the Bartlett formula.

• To use the above results to construct a test for uncorrelatedness of a time series (the
Portmanteau test). And understand how this test may be useful for testing for inde-
pendence in various different setting. Also understand situations where the test may
fail.

• To be able to derive the Yule-Walker and least squares estimator of the AR parameters.

• To understand what the quasi-Gaussian likelihood for the estimation of ARMA models
is, and how the Durbin-Levinson algorithm is useful in obtaining this likelihood (in
practice). Also how we can approximate it by using approximations of the predictions.

64
• Understand that there exists alternative methods for estimating the ARMA parame-
ters, which exploit the fact that the ARMA can be written as an AR(∞).
We now consider various methods for estimating the parameters in a stationary time
series. We first consider estimation of the mean and covariance and then look at estimation
of the parameters of an AR and ARMA process.

4.1 Estimation of the mean and autocovariance func-

tion
Let us suppose the stationary time series Yt satisfies
Yt = µ + Xt ,
where µ is the finite mean, {Xt } is a zero mean stationary time series with absolutely
!
summable covariances ( k |cov(X0 , Xk )| < ∞). Below we consider methods to estimate the
mean µ and autocovariance function.

4.1.1 Estimating the mean

Suppose we observe {Yt }nt=1 , and we want to estimate the mean µ. In an ideal world we
would observe independent replications of Yt . We would then use the sample mean, that is
!
Ȳn = n−1 nt=1 Yt as an estimator of µ. If the variance of Yt is finite, then Ȳn is a ‘good’
estimator of the mean which convergences at the rate O(n−1 ) (that is var(Ȳn ) = n−1 var(Y1 )).
However in the case that {Yt } are not independent and we observe a time series, then we
can still use Ȳn as an estimator of µ. The only drawback is that the dependency means that
one observation will influence the next and the resulting estimator may have a (much) larger
variance (depending on how much correlation there is in the time series).

4.1.2 Estimating the covariance

Suppose we observe {Yt }nt=1 , to estimate the covariance we can estimate the covariance
c(k) = E(X0 Xk ) from the the observations. A plausible estimator is
n−|k|
1 "
ĉn (k) = (Yt − Ȳn )(Yt+|k| − Ȳn ), (4.1)
n t=1
since E((Yt − Ȳn )(Yt+|k| − Ȳn ) ≈ c(k). Of course if the mean of Yt were zero (Yt = Xt ), then
the covariance estimator is
n−|k|
1 "
ĉn (k) = Xt Xt+|k| . (4.2)
n t=1

65
1
!n−|k|
The eagle-eyed amongst you may wonder why we don’t use T −|k| t=1 Xt Xt+|k| , and that
1
! n−|k|
ĉn (k) is more biased than T −|k| t=1 Xt Xt+|k| . However ĉn (k) has some very nice properties
which are discussed in the remark below.

Remark 4.1.1 Suppose we define the empirical covariances

# !
1 n−k
n t=1 Xt Xt−k |k| ≤ n − 1
ĉn (k) =
0 otherwise

then {ĉn (k)} is a positive definite sequence. Therefore, using Lemma 1.1.1 there exists a
stationary time series {Zt } which has the covariance ĉn (k).
There are various ways to show that {ĉn (k)} is a positive definite sequence. One method
uses that corresponding spectral density is non-negative, we give this proof in Section 5.3.1.
An alternative proof uses that a positive definite means that for any sequence a = (a1 , . . . , an )"
we have
 
ĉn (0) ĉn (1) ĉn (2) . . . ĉn (n − 1)
 
 ĉn (1) ĉn (0) ĉn (1) . . . ĉn (n − 2) 
"  a ≥ 0,
a  .. .. ... .. ..
 . . . . 

.. ..
ĉn (n − 1) ĉn (n − 2) . . ĉn (0)
!n−k !
noting that ĉn (k) = n1 t=1 Xt Xt−k . However, ĉn (k) = n1 n−kt=1 Xt Xt−k has a very interesting
construction, it can be shown that the above convariance matrix is Ĉn = Xn X"n , where Xn is
a T × 2n matrix with
 
0 0 ... 0 X1 X2 . . . Xn−1 Xn
 
 0 0 . . . X1 X2 . . . Xn−1 Xn 0 
Xn =   .. .. .. .. .. .. .. .. .. 

 . . . . . . . . . 
X1 X2 . . . Xn−1 Xn 0 ... ... 0

Using the above we have a" X"n Xn a = 'Xa'2 ≥ 0.

4.1.3 Some asymptotic results on the covariance estimator

Two of the principle reasons we construct an estimator is either for testing or constructing
a confidence interval for the parameter of interest. To do this we need the variance and
distribution of the estimator. It is impossible to derive the finite sample distribution, thus
we look at the estimators asymptotic distribution. In a later chapter we show how to derive
asymptotic normality, here we will focus on the variance. In an ideal world the variance will
be simple and we not require us to estimate any unknown parameters, usually in time series

66
this will not be the case, and the variance will involve several (infinite) number of parameters
which are not straightforward to estimate. Later in this section we show that the variance
of the sample covariance can be extremely complicated, however we show that the variance
of the sample correlation (under linearity) is extremely simple (in comparison) - this is
known as Bartlett’s formula (you may have come across Maurice Bartlett before, outside his
fundamental contributions in time series he is well known for proposing the famous Bartlett
correction). This example demonstrates, how the assumption of linearity can really simplify
problems in time series and also how we can circumvent certain problems in which arise in an
estimator by making slight modifications of it, such as going from covariance to correlation.
The following theorem gives the asymptotic sampling properties of the covariance esti-
mator (4.1). The proof of the result can be found in Brockwell and Davis (1998), Chapter 8,
Fuller (1995), but it goes back to Bartlett (indeed its called Bartlett’s formula). We prove
the result in Section 4.1.4

Theorem 4.1.1 Suppose {Xt } is a linear stationary time series where

∞
"
Xt = µ + ψj εt−j ,
j=−∞

!
where j |ψj | < ∞, {εt } are iid random variables with E(ε4t ) < ∞. Suppose we observe
{Xt : t = 1, . . . , n} and use (4.1) as an estimator of the covariance c(k) = cov(X0 , Xk ).
Define ρ̂n (r) = ĉn (r)/ĉn (0) as the sample correlation. Then for each h ∈ {1, . . . , n}
√ D
n(ρ̂n (h) − ρ(h)) → N (0, Wh ) (4.3)

where ρ̂n (h) = (ρ̂n (1), . . . , ρ̂n (h)), ρ(h) = (ρ(1), . . . , ρ(h)) and
∞
"
(Wh )ij = {ρ(k + i) + ρ(k − i) − 2ρ(i)ρ(k)}{ρ(k + j) + ρ(k − j) − 2ρ(j)ρ(k)}.
k=−∞

The following example is an important application of the above theorem. It is used to

check by ‘eye; whether a time series is uncorrelated (there are more sensitive tests, but this
one is often used to construct CI in for the sample autocovariances in several statistical
packages). This is an important problem, for many reasons:

• Given a data set, we need to check whether there is dependence, if there is we need to
analyse it in a different way.

• Suppose we fit a linear regression, we may need to check whether the residuals are actu-
ally uncorrelated, else the standard errors based on the assumption of uncorrelatedness
would be unreliable.

67
• We need to check whether a time series model is the appropriate model. To do this
we fit the model, estimate the residuals, if the residuals appear to be uncorrelated it
would seem likely that the model is correct. If they are correlated, then the model is in
appropriate. For example, we may fit an AR(1) to the data, estimate the residuals εt ,
if there is still correlation in the residuals, then the AR(1) was not the correct model,
since Xt − ρXt−1 still has information about other residuals.

Suppose {Xt } are iid random variables, and we use (4.1) as an estimator of the autocovari-
ances. Recalling if {Xt } are iid then ρ(k) = 0 for k +=, using this and (4.3) we see that the
asymptotic distribution of ρ̂n (h) in this case is
√ D
n(ρ̂n (h) − ρ(h)) → N (0, Wh )

where
#
1 i=j
(Wh )ij =
+ j
0 i=
√ D
In other words n(ρ̂n (h) − ρ(h)) → N (0, I). Hence the sample autocovariances at different
lags are uncorrelated. This allows us to easily construct confidence intervals for the auto-
covariances under the assumption of the observations. If the vast majority of the sample
autocovariance lie inside the confidence there is not enough evidence to suggest that the data
is a realisation of a iid random variables (often called a white noise process). An example of
the empirical ACF and the CI constructed under the assumption of independence is given
in Figure 4.1. We see that the empirical autocorrelations of the realisation from iid random
variables all lie within the CI. The same cannot be said for the emprical correlations of a
dependent time series. Of course, doing the check by eye means that we may encounter
multiple testing problems, since even under independence, some sample correlations may
lie above the line. To counter this problem, we should construct a test statistic for testing
√ D
uncorrelatedness. Since under the null n(ρ̂n (h) − ρ(h)) → N (0, I), one method of testing
is to use the square correlations
h
"
Sh = n |ρn (r)|2 ,
r=1

under the null it will have a χ2 -distribution with h degrees of freedom, under the alternative
it will be a non-central (generalised) chi-squared. The non-centrality is what makes us reject
the null if the alternative of correlatedness is true. This is known as a Portmanteau test.

Remark 4.1.2 (Long range dependence versus changes in the mean) We first note
!
that a process is said to have long range dependence if the covariances k |c(k)| are not abso-
lutely summable. From a practical point of view data is said to exhibit long range dependence

68
Series ACF1

1.0
0.8
0.6
ACF

0.4
0.2
0.0
−0.2

0 5 10 15 20

Lag

Series ACF2
1.0
0.5
ACF

0.0
−0.5

0 5 10 15 20

Lag

Figure 4.1: The top plot is the empirical ACF taken from a iid data and the lower lot is the
empirical ACF of a realisation from the AR(2) model defined in (2.19).

if the autocovariances do not decay very fast to zero as the lag increases. We now demonstrate
that one must be careful in the diagnoses of long range dependence, because a slow decay of
the autocovariance could also imply a change in mean if this has not been corrected for. This
was shown in Bhattacharya et al. (1983), and applied to econometric data in Mikosch and
Stărică (2000) and Mikosch and Stărică (2003). A test for distinguishing between long range
dependence and change points is proposed in Berkes et al. (2006).
Suppose that Yt satisfies

Yt = µt + εt ,

where {εt } are iid random variables and the mean µt depends on t. We observe {Yt } but
do not know the mean is changing. We want to evaluate the autocovariance function, hence
estimate the autocovariance at lag k using
n−|k|
1 "
ĉn (k) = (Yt − Ȳn )(Yt+|k| − Ȳn ).
n t=1

Observe that Ȳn is not really estimating the mean but the average mean! If we plotted the
empirical ACF {ĉn (k)} we would see that the covariances do not decay with time. However
the true ACF would be zero and at all lags but zero. The reason the empirical ACF does not
decay to zero is because we have corrected for the correct mean. Indeed it can be shown that
!
for large lags ĉn (k) ≈ s<t (µs − µt )2 . Hence because we are not correcting for the mean in
the autocovariance, it remains.

69
It should be noted if you study a realisation of a time series with a large amount of depen-
dence, it is unclear whether what you see is actually a stochastic time series or an underlying
trend. This makes disentangling a trend from data with a large amount of correlation ex-
tremely difficult.

4.1.4 Proof of Bartlett’s formula

The variance of the sample covariance in the case of strict stationarity

To prove the result we first evaluate the variance ĉn (r). We start by obtaining an expression
under strict stationarity of the time series, and then simplify it under linearity. A simply
expansion shows that

var(ĉn (r))
n−|r|
1 "
= cov(Xt Xt+r , Xτ Xτ +r )
n2 t,τ =1
n−|r|
1 "* +
= cov(X t , X τ )cov(X t+r , X τ +r ) + cov(X t , X τ +r )cov(X t+r , X τ ) + cum(X t , X t+r , X τ , X τ +r )
n2 t,τ =1
n−|r|
1 "* 2
+
= c(t − τ ) + c(t − τ − r)c(t + r − τ ) + k4 (r, τ − t, τ + r − t)
n2 t,τ =1
:= I + II + III,

where the above is due to strict stationarity of the time series. We analyse the above term
!
by term. By changing variables and the limits, under the assumption that r |rc(r)| < ∞
it can be shown that
∞
1 " 1
I= c(k)2 + o( ),
n k=−∞ n

similarly
∞
1 " 1
II = c(k)c(k − r) + o( ).
n k=−∞ n
!
To deal with the fourth order cumulant term we use the condition that t1 ,t2 ,t3 |ti κ4 (t1 , t2 , t3 )| <
∞ (which is not as strong as you may think, if the time series is linear, mixing etc, then
under certain conditions on these rates we get this bound). Using this gives
1" 1
III = κ4 (r, k, k + r) + o( ).
n t,k n

70
Therefore altogether we have
∞ ∞
1 " 2 1 " 1" 1
var(ĉn (r)) = c(k) + c(k)c(k − r) + κ4 (r, k, k + r) + o( ),
n k=−∞ n k=−∞ n t,k n

and using similar arguments

∞ ∞
1 " 2 1 " 1"
cov(ĉn (r1 ), ĉn (r2 )) = c(k + r1 − r2 ) + c(k − r1 )c(k + r2 ) + κ4 (r1 , k, k + r2 ).
n k=−∞ n k=−∞ n k

We observe that the covariance of the covariance estimator contains both covariance and
cumulants terms. Thus if we need to estimate them, for example to construct confidence
intervals, this can be extremely difficult. However, under linearity the above fourth order
cumulant term has a simpler form.

The covariance of the sample covariance under linearity

Let us suppose that the time series is linear

∞
"
Xt = ψj εt−j
j=−∞

where {εt } are iid, E(εt ) = 0, var(εt ) = 1 and κ4 = cum4 (εt ). Then the third (4th order
cumulant) term III reduces to
∞
1 " * " " " " + 1
III = cum ψj1 ε−j1 , ψj2 εr1 −j2 , ψj3 εk−j3 , ψj4 εk+r2 −j1 + o( ).
n k=−∞ j =−∞ j =−∞ j =−∞ j =−∞
n
1 2 3 4

Noting that if {εt } are iid, then the fourth order cumulant will be zero unless they are all
them reduces the above to
∞ ∞
κ4 " " 1
III = ψj ψj−r1 ψj−k ψj−r2 −k + o( )
n k=−∞ j=−∞ n
∞
κ4 * " +* " + 1 1 1
= ψj ψj−r1 ψi ψi−r2 + o( ) = κ4 c(r1 )c(r2 ) + o( ),
n j=−∞ i
n n n

where the above is due to a change of variables. Altogether this gives

∞ ∞
1 " 1 " 1 1
cov(ĉn (r1 ), ĉn (r2 )) = c(k)c(k + r1 − r2 ) + c(k − r1 )c(k + r2 ) + κ4 c(r1 )c(r2 ) + o( ).(4.4)
n k=−∞ n k=−∞ n n

Thus in the case of linearity our expression for the variance is nicer, and the only difficult
parameter to estimate of κ4 , which can be done by various means (see later in this course).

71
The variance of the correlation under linearity

There exists a surprising trick to completely omit the cumulant term, that is to consider the
correlation rather than just the covariance. The sample correlation is

ĉn (r)
ρ̂n (r) = ,
ĉn (0)

which is an estimator of c(r)/c(0).

Lemma 4.1.1 (Bartlett’s formula) Suppose {Xt } is a linear time series, then the vari-
ance of the distribution of ρ̂n (r) is
∞
"
{ρ(k + r) + ρ(k − r) − 2ρ(r)ρ(k)}{ρ(k + r) + ρ(k − r) − 2ρ(r)ρ(k)}
k=−∞

PROOF. We first note that

, -
ĉn (r)
var
ĉn (0)
, ! !∞ !∞ -
∞ 2 2
1 k=−∞ c(k) + k=−∞ c(k)c(k − r) + κ 4 c(r) 2 k=−∞ c(k)c(k − r) + κ 4 c(r)c(0)
= !∞ !∞ 2
!∞ 2
n 2 k=−∞ c(k)c(k − r) + κ4 c(r)c(0) k=−∞ c(k) + k=−∞ c(k)c(k − r) + κ4 c(0)

We observe that by using a Taylor expansion we have

1 * + c(r) * + 1
ρ̂n (r) = ĉn (r) − c(r) − 2
ĉ n (0) − c(0) + Op ( ).
c(0) c(0) n

Thus the limiting variance of the distribution will be

1 c(r)2 c(r)2 1
var(ĉn (r)) 2
− 2cov(ĉ n (r), ĉ n (0)) 3
+ var(ĉ n (0)) 4
+ O( 2 ).
c(0) c(0) c(0) n

Now substituting (4.4) in the above gives the result, and importantly you will see that the
fourth order cumulants cancel. !

The proof of Theorem 4.1.1 is identical to the above (though we still need to show
asymptotic normality).

Remark 4.1.3 The above would appear to be a nice trick, but there are two major factors
that lead to the cancellation of the fourth order cumulant term

• Linearity.

• The ratio.

72
Indeed this is not a chance result, in fact there is a logical reason why this result is true (and
is true for many statistics, which have a similar form - commonly called ratio statistics). It
is easiest explained in the Fourier domain. If the estimator can be written as
!
1 nk=1 φ(ωk )In (ωk )
! ,
n n1 nk=1 In (ωk )

where In (ω) is the periodogram, and {Xt } is a linear time series, then we will show later
that the asymptotic distribution of the above has a variance which is only in terms of the
covariances not higher order cumulants.

4.2 Estimation for AR models

Let us suppose that {Xt } is a zero mean stationary time series which satisfies the AR(p)
representation
p
"
Xt = φj Xt−j + εt ,
j=1

!
where E(εt ) = 0 and var(εt ) = σ 2 and the roots of the characteristic polynomial 1− pj=1 φj z j
lie outside the unit circle. Our aim in this section is to construct estimator of the AR
parameters {φj }. We will show that in the case that {Xt } has an AR(p) representation
the estimation is relatively straightforward, and the estimation methods all have properties
which are asymptotically equivalent to the Gaussian maximum estimator.
The following estimation scheme stems from the following observation. Suppose the
AR(p) time series {Xt } is causal (that is the roots of the characteristic polynomial lie outside
the unit circle, hence it satisfies an MA(∞) presentation). Then we can multiple Xt by Xt−i
for 1 ≤ i ≤ p, since the process is causal εt and Xt−i . Therefore taking expectations we have
for all i > 0
p p
" "
E(Xt Xt−i ) = φj E(Xt−j Xt−i ), ⇒ c(i) = φj c(i − j). (4.5)
j=1 j=1

Recall these are the Yule-Walker equations we considered in Section 2.5.3. Putting the cases
1 ≤ i ≤ p together we can write the above as

γ p = Γp φ̂p , (4.6)

where (Γp )i,j = c(i − j), (γ p )i = c(i) and φ"p = (φ1 , . . . , φp ).

73
4.2.1 The Yule-Walker estimator
The Yule-Walker equations inspire the method of moments estimator often called the Yule-
Walker estimator. We use (4.6) as the basis of the estimator. It is clear that γ̂ p and Γ̂p are
estimators of γ p and Γp where (Γ̂p )i,j = ĉn (i − j) and (γ̂ p )i = ĉn (i). Therefore we can use

φ̂p = Γ̂−1
p γ̂ p , (4.7)

as an estimator of the AR parameters φ"p = (φ1 , . . . , φp ). We observe that if p is large this

involves inverting a large matrix. However, we can use the Durbin-Levinson algorithm to
estimate φ̂p by fitting lower order AR processes to the observations and increasing the order.
This way an explicit inversion can be avoided. We detail how the Durbin-Levinson algorithm
can be used to estimate the AR parameters below.
Using Remark 4.1.1 there exists a process Zt which has the autocovariance function
{ĉn (k)}. This means the best linear predictor of Ym+1 given Ym , . . . , Y1 is
m
"
Ym+1|m = φm,j Ym+1−j ,
j=1

where φ̂m = (φm,1 , . . . , φm,m )" = Γ̂−1

m γ̂ m with (Γ̂m )i,j = ĉn (i − j) and (γ̂ m )i = ĉn (i). Hence
φ̂m are in fact the estimators of the AR(m) parameters. Recall from Section 3.2 that for
m ≥ 2, that φ̂m can be obtained from φ̂m−1 and the empirical covariances ĉn (k). Hence we
can use the Durbin-Levinson algorithm to estimate the parameters φ̂p , by first fitting an
AR(1) model to the time series, then iterating the Durbin-Levinson algorithm to fit higher
order AR models, until we finally fit the AR(p) model to the time series.
!
In the pevious sections we estimate the covariance E(X0 Xk ) using n1 n−|k| t=1 Xt Xt+|k| ,
which lead to the Yule-Walker estimators. In the following section we estimate the covari-
ance in a slightly different way. This will lead to the least squares estimator (or maximum
likelihood estimator). Both estimators are different but are asymptotically equivalent.

4.2.2 The Gaussian maximum likelihood (least squares estimator)

Our object here is to obtain the maximum likelihood estimator of the AR(p) parameters. It
turns out that this is the same as the least squares estimator. We recall that the maximum
likelihood estimator is the parameter which maximises the joint density of the observations.
Since the log-likelihood often has a simpler form, we often maximise the log density rather
than the density (since both the maximum likelihood estimator and maximum log likelihood
estimator yield the same estimator). We note that the Gaussian MLE is constructed as if
the observations {Xt } were Gaussian, though it is not necessary that {Xt } is Gaussian when

74
doing the estimation. They can have a different distribution, the only difference is that
estimate may be less efficient (will not obtain the Cramer-Rao lower bound).
Suppose we observe {Xt ; t = 1, . . . , n} where Xt are observations from an AR(1) process.
To construct the the MLE, we use that the joint distribution of {Xt } is the product of the
conditional distributions. Hence we need an expression for the conditional distribution (in
terms of the densities). Let F" f" be the distribution function and the density function of +
respectively. We first note that the AR(p) process is p-Markovian, that is

P(Xt ≤ x|Xt−1 , Xt−2 , . . .) = P(Xt ≤ x|Xt−1 , . . . , Xt−p ) ⇒ fa (Xt |Xt−1 , . . .) = fa (Xt−1 |Xt−1 , . . . Xt−p ),(4.8)

where fa is the conditional density of Xt given the past, where the distribution function is
derived as if a is the true AR(p) parameters.

Remark 4.2.1 To understand why (4.8) is true consider the simple case that p = 1 (AR(1)).
Studying the conditional probability gives

P(Xt ≤ xt |Xt−1 = xt−1 , . . .) = P(aXt−1 + +t ≤ xt |Xt−1 = xt−1 , . . .)

= P" (+t ≤ xt − axt−1 ) = P(Xt ≤ xt |Xt−1 = xt−1 ).
!p
By using the (4.8) we have P(Xt ≤ x|Xt−1 , . . .) = P" (+ ≤ x − j=1 aj Xt−j ), hence
p p
" "
P(Xt ≤ x|Xt−1 , . . .) = F" (x − aj Xt−j ), fa (Xt |Xt−1 , . . .) = f" (Xt − aj Xt−j ). (4.9)
j=1 j=1

Therefore the joint density of {Xt }nt=1 is

n
.
fa (X1 , X2 , . . . , Xn ) = fa (X1 , . . . , Xp ) fa (Xt |Xt−1 , . . . , X1 ) (by Bayes theorem)
t=p+1
.n
= fa (X1 , . . . , Xp ) fa (Xt |Xt−1 , . . . , Xt−p ) (by the Markov property)
t=p+1
n p
. "
= fa (X1 , . . . , Xp ) f" (Xt − aj Xt−j ) (by (4.9)).
t=p+1 j=1

Therefore the log likelihood is

n p
" "
log fa (X1 , X2 , . . . , Xn ) = log fa (X1 , . . . , Xp ) + log f" (Xt − aj Xt−j ) .
/ 01 2
t=p+1 j=1
often ignored / 01 2
conditional likelihood

Usually we ignore the initial distribution log fa (X1 , . . . , Xp ) and maximise the conditional
likelihood to obtain the estimator. In the case that the sample sizes are large n >> p, the

75
contribution of log fa (X1 , . . . , Xp ) is minimal and the conditional likelihood and likelihood
are asymptotically equivalent.
We note in the case that fε is Gaussian, the conditional log-likelihood is −nLn (a), where
n p
1 "
2
"
Ln (a) = log σ + 2 (Xt − aj Xt−j )2 .
nσ t=p+1 j=1

Therefore the estimates of the AR(p) parameters is φ̃p = arg min Ln (a). It is clear that φ̃p
is the least squares estimator and can be explicitly obtained using

φ̃p = Γ̃−1
p γ̃ p ,

1
!n 1
!n
where (Γ̃p )i,j = n−p t=p+1 Xt−i Xt−j and (γ̃ n )i = n−p t=p+1 Xt Xt−i .

Remark 4.2.2 (A comparison of the Yule-Walker and least squares estimators) If

we compare the least squares (Gaussian conditional likelihood) estimator φ̃p with the Yule-
Walker estimator φ̂p , then we see that they are very similar. The difference lies in the way
!
the covariances are estimated. We see that for the Yule-Walker estimator n1 n−k t=1 Xt Xt+k
is used exclusively to estimate the covariance c(k). Whereas for the least squares estimator
!n−p
{ n1 t=p Xt−i Xt−j : (i − j) = k} are all used as estimators of c(k). Comparing the two,
we see that there little difference between these two covariances estimators, indeed the Yule-
Walker estimates and the least squares estimates have asymptotically the same properties.
There are however subtle differences, in the actual estimators. Because the Yule-Walker
is constructed from a positive definite sequence, using the parameter estimates φ̂p one can
construct a stationary AR(p) process. The same is not necessarily true of the least squares es-
timator, which does not necessarily construct a stationary AR(p) process. Moreover, because
φ̂p can be used to construct a stationary AR(p) process, it can be shown that 'φ̂p '2 ≤ 2p , the
same does not necessarily hold for the least squares estimate φ̃p .

4.3 Estimation for ARMA models

Let us suppose that {Xt } satisfies the ARMA representation
p q
" (0)
" (0)
Xt − φi Xt−i = εt + θj εt−j ,
i=1 j=1

(0) (0) (0) (0)

and θ 0 = (θ1 , . . . , θq ), φ0 = (φ1 , . . . , φp ) and σ02 = var(εt ). We will suppose for now
that p and q are known.

76
4.3.1 The Hannan and Rissanen AR(∞) expansion method
We first describe an easy method to estimate the parameters of an ARMA process. These
estimators may not be necessarily ‘efficient’ (we define this term later) but they have an
explicit form and can be easily obtained. Therefore they are a good starting point, and can
be used as the initial value when using the Gaussian maximum likelihood to estimate the
parameters (as described below). The method was first propose in Hannan and Rissanen
(1982) and An et al. (1982) and we describe it below. It is worth bearing in mind that
currently the ‘large p small n problem’ is a hot topic. These are generally regression problems
where the sample size n is quite small but the number of regressors p is quite large (usually
model selection is of importance in this context). The methods proposed by Hannan involves
expanding the ARMA process (assuming invertibility) as an AR(∞) process and estimating
the parameters of the AR(∞) process. In some sense this can be considered as a regression
problem with an infinite number of regressors. Hence there are some parallels between the
estimation described below and the ‘large p, small n problem’.
As we mentioned in Lemma 2.4.1, if an ARMA process is invertible it is can be represented
as
∞
"
Xt = bj Xt−j + εt . (4.10)
j=1

The idea behind Hannan’s method is to estimate the parameters {bj }, then estimate the
innovations εt , and use the estimated innovations to construct a multiple linear regression
estimator of the ARMA paramters {θi } and {φj }. Of course in practice we cannot estimate
all parameters {bj } as there are an infinite number of them. So instead we do a type of sieve
estimation where we only estimate a finite number and let the number of parameters to be
estimated grow as the sample size increases. We describe the estimation steps below:

(i) Suppose we observe {Xt }nt=1 . Recalling (4.10), will estimate {bj }pj=1
n
parameters. We
will suppose that pn → ∞ as n → ∞ and pn << n (we will state the rate below).
We use least squares to estimate {bj }pj=1
n
and define

b̂n = R̂n−1 r̂n ,

where
n
" T
"
R̂n = Xt−1 X"t−1 r̂n = Xt Xt−1
t=pn +1 t=pn +1

and X"t−1 = (Xt−1 , . . . , Xt−pn ).

77
(ii) Having estimated the first {bj }pj=1
n
coefficients we estimate the residuals with
pn
"
ε̃t = Xt − b̂j,n Xt−j .
j=1

(iii) Now use as estimates of φ0 and θ 0 φ̃n , θ̃ n where

n p q
" " "
φ̃n , θ̃ n = arg min (Xt − φj Xt−j − θi ε̃t−i )2 .
t=pn +1 j=1 i=1

We note that the above can easily be minimised. In fact

(φ̃n , θ̃ n ) = R̃−1
n s̃n

where
n n
1 " 1 "
R̃n = Ỹt Ỹt and s̃n = Ỹt Xt ,
n n
t=max(p,q) t=max(p,q)

Ỹt" = (Xt−1 , . . . , Xt−p , ε̃t−1 , . . . , ε̃t−q ).

4.3.2 The Gaussian maximum likelihood estimator

We now consider the Gaussian maximum likelihood estimator (GMLE) to estimate the pa-
rameters θ 0 and φ0 . Let X"T = (X1 , . . . , XT ). We note that despite calling the estimate the
GMLE, it does not assume that the time series {Xt } is Gaussian. The criterion (the GMLE)
is constructed as if {Xt } were Gaussian, but this need not be the case.
It is clear that the negative Gaussian likelihood of {Xt }nt=1 , assuming that it is a realisa-
tion from an ARMA process is
1 1 1
Ln (φ, θ, σ) = log |Γ(φ, θ)| + X"n Γ(φ, θ)−1 Xn , (4.11)
n n n
where Γ(φ, θ, σ) the variance covariance matrix of Xn constructed as if Xn came from an
ARMA process with parameters θ and φ. To directly evaluate the above for each (φ, θ),
never mind about minimising over all (φ, θ) can be a daunting task, its computationally
extremely difficult for even relatively large sample sizes. However, there exists a simple
solution, which uses the one-step predictions considered in the prediction section.
Let
(φ,θ)
Xt+1|t = BestLin(φ,θ) (Xt+1 |Xt , . . . , X1 ), (4.12)

78
be the best linear predictor of Xt+1 given Xt , . . . , X1 and the ARMA parameters φ and θ
which are used to calculate the covariances in the prediction. Let rt+1 (σ, φ, θ) be the one-
(φ,θ)
step ahead mean squared error E(Xt − Xt+1|t )2 . By using Cholskey’s decomposition it can
be shown that
n−1 n−1 (φ,θ)
2
1 1" 1 " (Xt+1 − Xt+1|t )
Ln (φ, θ, σ) = log rt+1 (σ, φ, θ) + .
n n t=1 n t=1 rt+1 (σ, φ, θ)

We see that we have avoided inverting the matrix Γ(φ, θ, σ). The GMLE is the parameter
(φ,θ)
θ̂ n , φ̂n which minimises Ln (φ, θ, σ). We note that the one-step ahead predictor Xt+1|t can
be obtained using Durbin-Levinson Algorithm.
It is possible to obtain an approximation of Ln (φ, θ, σ) which is simple to evaluate.
However this approximation only really make sense when the sample size n is large. It is,
however, useful when obtaining the asymptotic sampling properties of the GMLE.
To motivate the approximation consider the one-step ahead prediction error considered
in Section 3.3. We have shown in Proposition 3.3.1 that for large t, X̃t+1|t,...,0 ≈ Xt+1|t (=
PXt ,Xt−1 ,... (Xt+1 )) and σ 2 ≈ E(Xt+1 − Xt+1|t )2 . Now define
t
"
(φ,θ)
X̃t+1|t,...,0 = bj (φ, θ)Xt+1−j . (4.13)
j=1

(φ,θ) (φ,θ)
We now replace in Ln (φ, θ, σ), Xt+1|t with X̃t+1|t,...,0 and rt+1 (σ, φ, θ) with σ 2 to obtain

T −1
1 2 1 " (φ,θ)
L̃n (φ, θ, σ) = log σ + 2 (Xt+1 − X̃t+1|t,...,0 )2 .
n nσ t=1

We show in Section 8 that n1 Ln (φ, θ, σ) and n1 L̃n (φ, θ, σ) are asymptotically equivalent.

Time Series Analysis and Its Applications (Instructor's Manual) (Robert H. Shumway, David S. Stoffer)
100% (1)
Time Series Analysis and Its Applications (Instructor's Manual) (Robert H. Shumway, David S. Stoffer)
81 pages
Time Series
No ratings yet
Time Series
32 pages
List of NBFC-Micro Finance Institutions (NBFC-MFIs)
0% (1)
List of NBFC-Micro Finance Institutions (NBFC-MFIs)
7 pages
Lectures 2-3 Notes Final20180308013455
No ratings yet
Lectures 2-3 Notes Final20180308013455
21 pages
Chapter 3 - Lecture Notes
No ratings yet
Chapter 3 - Lecture Notes
20 pages
L4 Modeling Cycles
No ratings yet
L4 Modeling Cycles
80 pages
Ch6 Slides Ed3 Feb2024
No ratings yet
Ch6 Slides Ed3 Feb2024
31 pages
Auto Correlation
100% (2)
Auto Correlation
33 pages
Thermodynamics For Engineers 1st Edition Kroos Solutions Manual 1
100% (56)
Thermodynamics For Engineers 1st Edition Kroos Solutions Manual 1
36 pages
Lecture 11 - Autocorrelation
No ratings yet
Lecture 11 - Autocorrelation
46 pages
Lecture 2
No ratings yet
Lecture 2
42 pages
Chapter 06 Merged
No ratings yet
Chapter 06 Merged
333 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
Time Series Analysis
100% (1)
Time Series Analysis
66 pages
Time Series and Sequential Data
No ratings yet
Time Series and Sequential Data
143 pages
MIT14 384F13 Lec20
No ratings yet
MIT14 384F13 Lec20
6 pages
Time Series Analysis Lecture Three
No ratings yet
Time Series Analysis Lecture Three
34 pages
Financial Volatility Analysis
No ratings yet
Financial Volatility Analysis
57 pages
Univariate Time Series Forecasting
100% (2)
Univariate Time Series Forecasting
72 pages
TS PartII
100% (1)
TS PartII
50 pages
Integration, Interconnection, and Interoperability of Iot Systems
No ratings yet
Integration, Interconnection, and Interoperability of Iot Systems
237 pages
Stationary Process
No ratings yet
Stationary Process
178 pages
TIme-series Analysis
No ratings yet
TIme-series Analysis
17 pages
Nonparametric Time Series Analysis For Periodically Correlated Processes
No ratings yet
Nonparametric Time Series Analysis For Periodically Correlated Processes
10 pages
LN LinearTSModels
No ratings yet
LN LinearTSModels
31 pages
Chapter 4. ARIMA - SV
No ratings yet
Chapter 4. ARIMA - SV
49 pages
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
6 pages
Sta 445 1 Stationarity and Non-Stationarity
No ratings yet
Sta 445 1 Stationarity and Non-Stationarity
15 pages
Time Series Lecture Notes
No ratings yet
Time Series Lecture Notes
97 pages
Assumption C.5 States That The Values of The Disturbance Term in The Observations in The Sample Are Generated Independently of Each Other
No ratings yet
Assumption C.5 States That The Values of The Disturbance Term in The Observations in The Sample Are Generated Independently of Each Other
129 pages
EC7 Pile Design Seminar Overview
100% (2)
EC7 Pile Design Seminar Overview
66 pages
LN LinearTSModels 3
No ratings yet
LN LinearTSModels 3
15 pages
Introduction To Time Series Analysis. Lecture 4
No ratings yet
Introduction To Time Series Analysis. Lecture 4
34 pages
Y .C, YA,: Yt Yy y Ys
No ratings yet
Y .C, YA,: Yt Yy y Ys
24 pages
Time Series
No ratings yet
Time Series
22 pages
STA222
No ratings yet
STA222
6 pages
PSTAT 174/274: Lecture Notes 2
No ratings yet
PSTAT 174/274: Lecture Notes 2
27 pages
TSNotes 2
No ratings yet
TSNotes 2
28 pages
Chapter 4X
No ratings yet
Chapter 4X
96 pages
Univariate Time Series Modelling and Forecasting
No ratings yet
Univariate Time Series Modelling and Forecasting
62 pages
Modeling Continuous Systems Using Modified Petri Nets Model
No ratings yet
Modeling Continuous Systems Using Modified Petri Nets Model
5 pages
Practice Test On Advanced Vocabular1
No ratings yet
Practice Test On Advanced Vocabular1
4 pages
Time Series Analysis & ARMA Modeling
No ratings yet
Time Series Analysis & ARMA Modeling
56 pages
Univariate Time Series Modelling and Forecasting: Introductory Econometrics For Finance' © Chris Brooks 2002 1
No ratings yet
Univariate Time Series Modelling and Forecasting: Introductory Econometrics For Finance' © Chris Brooks 2002 1
62 pages
Time Series Analysis Course Guide
No ratings yet
Time Series Analysis Course Guide
139 pages
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
No ratings yet
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
20 pages
Hindu Medieval Salvation Islamic Sufism: Bhakti Movement
No ratings yet
Hindu Medieval Salvation Islamic Sufism: Bhakti Movement
1 page
Stationary Time Series Analysis
No ratings yet
Stationary Time Series Analysis
7 pages
Certified SOC Analyst (CSA) Flyer
No ratings yet
Certified SOC Analyst (CSA) Flyer
1 page
Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004
No ratings yet
Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004
52 pages
Time Series Exam, 2010: Solutions
No ratings yet
Time Series Exam, 2010: Solutions
4 pages
In The Mountains (Form 4)
50% (2)
In The Mountains (Form 4)
6 pages
Ch6 Slides Ed3 Feb2021
No ratings yet
Ch6 Slides Ed3 Feb2021
63 pages
Spring 2012 Statistics 153 Lecture Five
No ratings yet
Spring 2012 Statistics 153 Lecture Five
6 pages
Spurious Regressions in Economics PDF
No ratings yet
Spurious Regressions in Economics PDF
10 pages
Limit Theorems, OLS, and HAC
No ratings yet
Limit Theorems, OLS, and HAC
7 pages
Adjective Practice Worksheet
No ratings yet
Adjective Practice Worksheet
2 pages
Linear Stationary Models
No ratings yet
Linear Stationary Models
16 pages
Fault Detection and Diagnosis
No ratings yet
Fault Detection and Diagnosis
25 pages
Common Printer Problems Guide
No ratings yet
Common Printer Problems Guide
17 pages
Random Process LN
No ratings yet
Random Process LN
27 pages
Autocorrelation by Christopher Dougherty PDF
No ratings yet
Autocorrelation by Christopher Dougherty PDF
30 pages
Lecture On Time Series Diagnostic Tests: Chung-Ming Kuan Institute of Economics Academia Sinica
No ratings yet
Lecture On Time Series Diagnostic Tests: Chung-Ming Kuan Institute of Economics Academia Sinica
23 pages
Time Series Analysis for Students
No ratings yet
Time Series Analysis for Students
24 pages
G8-Pilot Humility Class Insights
No ratings yet
G8-Pilot Humility Class Insights
4 pages
Graduate Macro Theory II: Notes On Time Series
No ratings yet
Graduate Macro Theory II: Notes On Time Series
20 pages
Right On 7 Gi A Kì
No ratings yet
Right On 7 Gi A Kì
3 pages
Music Theory and Technique
100% (1)
Music Theory and Technique
8 pages
The Case of The Vanishing
No ratings yet
The Case of The Vanishing
7 pages
Introduction To Mechanisms: 2 Mechanisms and Simple Machines
No ratings yet
Introduction To Mechanisms: 2 Mechanisms and Simple Machines
6 pages
Camfrog Server Commands List
No ratings yet
Camfrog Server Commands List
3 pages
OPC Unified Architecture Specification Part 1 - Concepts Version 1.00 PDF
No ratings yet
OPC Unified Architecture Specification Part 1 - Concepts Version 1.00 PDF
26 pages
IDIOMS tiếng Anh thường dùng cho chủ đề BUSINESS
No ratings yet
IDIOMS tiếng Anh thường dùng cho chủ đề BUSINESS
2 pages
Unit 58 Preposition + - Ing
No ratings yet
Unit 58 Preposition + - Ing
5 pages
Iot Smart Home Concept: David Vasicek, Jakub Jalowiczor, Lukas Sevcik and Miroslav Voznak, Senior Member, Ieee
No ratings yet
Iot Smart Home Concept: David Vasicek, Jakub Jalowiczor, Lukas Sevcik and Miroslav Voznak, Senior Member, Ieee
4 pages
SSC MTS Syllabus PDF in Hindi Download 2021 Exam Pattern
No ratings yet
SSC MTS Syllabus PDF in Hindi Download 2021 Exam Pattern
4 pages
Iso 45009
No ratings yet
Iso 45009
30 pages
Fault Diagnosis of Discrete Event Systems Using Hybrid Petri Nets
No ratings yet
Fault Diagnosis of Discrete Event Systems Using Hybrid Petri Nets
6 pages
E. Nursing Diagnosis
No ratings yet
E. Nursing Diagnosis
2 pages
JUDICIAL AFFIDAVIT OF BELLE VERAS ANNEX 12 Answer For Forcible Entry Civil Case No. 890
No ratings yet
JUDICIAL AFFIDAVIT OF BELLE VERAS ANNEX 12 Answer For Forcible Entry Civil Case No. 890
4 pages
X SST Summer Holidays Homework
No ratings yet
X SST Summer Holidays Homework
4 pages
PT.2 23-24 Setb
No ratings yet
PT.2 23-24 Setb
2 pages
Introduction To IEEE 802.11 Standards: (Z. M - Ups)
No ratings yet
Introduction To IEEE 802.11 Standards: (Z. M - Ups)
11 pages
2024 Specimen Paper Markscheme 2
No ratings yet
2024 Specimen Paper Markscheme 2
10 pages
Examen de Ingles III 2024B .3p 2 2 2
No ratings yet
Examen de Ingles III 2024B .3p 2 2 2
2 pages
Spring Semester 2021-22 4 Sem, B Tech (Iit Jammu Curriculum) List of Institute Core (IC) and Credit Earning (CE) Courses and Their Slots
No ratings yet
Spring Semester 2021-22 4 Sem, B Tech (Iit Jammu Curriculum) List of Institute Core (IC) and Credit Earning (CE) Courses and Their Slots
7 pages
IECO - Volume 3 - Issue 1 - Pages 81-90
No ratings yet
IECO - Volume 3 - Issue 1 - Pages 81-90
10 pages
Differential Petri Nets Representing Continuous Sy
No ratings yet
Differential Petri Nets Representing Continuous Sy
8 pages
Feedback Techniques Using PID and PI Int
No ratings yet
Feedback Techniques Using PID and PI Int
14 pages
Fuzzy Timed Petri Net Definitions, Properties, and Applications
No ratings yet
Fuzzy Timed Petri Net Definitions, Properties, and Applications
16 pages
5 Design and Implementation of Test of Audiometric Based On Microcontroller1
No ratings yet
5 Design and Implementation of Test of Audiometric Based On Microcontroller1
13 pages
Guide To Getting Started With FemDom-Lilly
No ratings yet
Guide To Getting Started With FemDom-Lilly
6 pages
Qyresearch: Published by
No ratings yet
Qyresearch: Published by
6 pages
Financial Performance: Hindalco (Aditya Birla Group) An Overview of Financial Performance
No ratings yet
Financial Performance: Hindalco (Aditya Birla Group) An Overview of Financial Performance
10 pages
LPV Fault Detection of Glucose-Insulin System: Cite This Paper
No ratings yet
LPV Fault Detection of Glucose-Insulin System: Cite This Paper
6 pages
EsameSED.2013.11.14 Con Soluzioni
No ratings yet
EsameSED.2013.11.14 Con Soluzioni
6 pages
Dynamic Systems Exam Guide
No ratings yet
Dynamic Systems Exam Guide
6 pages
Predictors
No ratings yet
Predictors
6 pages
Springer Controls Encycl Article Vs 2
No ratings yet
Springer Controls Encycl Article Vs 2
7 pages
H∞ Control for Type-1 Diabetes
No ratings yet
H∞ Control for Type-1 Diabetes
7 pages
Robust Estimation For ARMA Models: The Annals of Statistics May 2009
No ratings yet
Robust Estimation For ARMA Models: The Annals of Statistics May 2009
27 pages
Methodologies For Discrete Event Dynamic
No ratings yet
Methodologies For Discrete Event Dynamic
25 pages
100 Uc Modbus 001
No ratings yet
100 Uc Modbus 001
108 pages
OPC Complex Data Working Group OPC Complex Data Specification Version 1.00
No ratings yet
OPC Complex Data Working Group OPC Complex Data Specification Version 1.00
47 pages
CRC
No ratings yet
CRC
56 pages
2012 13 Undergraduate Catalog PDF
No ratings yet
2012 13 Undergraduate Catalog PDF
268 pages

Estimation

Uploaded by

Estimation

Uploaded by

Chapter 4

Estimation for Linear models

• The Gaussian likelihood.

• Some idea of what a cumulant is.

4.1 Estimation of the mean and autocovariance func-

4.1.1 Estimating the mean

4.1.2 Estimating the covariance

Remark 4.1.1 Suppose we define the empirical covariances

Using the above we have a" X"n Xn a = 'Xa'2 ≥ 0.

4.1.3 Some asymptotic results on the covariance estimator

Theorem 4.1.1 Suppose {Xt } is a linear stationary time series where

The following example is an important application of the above theorem. It is used to

4.1.4 Proof of Bartlett’s formula

and using similar arguments

The covariance of the sample covariance under linearity

Let us suppose that the time series is linear

where the above is due to a change of variables. Altogether this gives

which is an estimator of c(r)/c(0).

PROOF. We first note that

We observe that by using a Taylor expansion we have

Thus the limiting variance of the distribution will be

4.2 Estimation for AR models

where (Γp )i,j = c(i − j), (γ p )i = c(i) and φ"p = (φ1 , . . . , φp ).

as an estimator of the AR parameters φ"p = (φ1 , . . . , φp ). We observe that if p is large this

where φ̂m = (φm,1 , . . . , φm,m )" = Γ̂−1

4.2.2 The Gaussian maximum likelihood (least squares estimator)

P(Xt ≤ xt |Xt−1 = xt−1 , . . .) = P(aXt−1 + +t ≤ xt |Xt−1 = xt−1 , . . .)

Therefore the joint density of {Xt }nt=1 is

Therefore the log likelihood is

Remark 4.2.2 (A comparison of the Yule-Walker and least squares estimators) If

4.3 Estimation for ARMA models

(0) (0) (0) (0)

b̂n = R̂n−1 r̂n ,

and X"t−1 = (Xt−1 , . . . , Xt−pn ).

(iii) Now use as estimates of φ0 and θ 0 φ̃n , θ̃ n where

We note that the above can easily be minimised. In fact

Ỹt" = (Xt−1 , . . . , Xt−p , ε̃t−1 , . . . , ε̃t−q ).

4.3.2 The Gaussian maximum likelihood estimator

You might also like