Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views45 pages

4 2023TS Lancaster Session3new

The document discusses the estimation, specification, and testing of ARMA models, focusing on maximum likelihood estimation and least squares methods. It highlights the importance of estimating ARMA coefficients and the challenges associated with maximizing the likelihood function in multivariate distributions. The document also provides insights into the prediction error decomposition approach to simplify the likelihood calculation for ARMA models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views45 pages

4 2023TS Lancaster Session3new

The document discusses the estimation, specification, and testing of ARMA models, focusing on maximum likelihood estimation and least squares methods. It highlights the importance of estimating ARMA coefficients and the challenges associated with maximizing the likelihood function in multivariate distributions. The document also provides insights into the prediction error decomposition approach to simplify the likelihood calculation for ARMA models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

MSc Applied Econometrics — Time Series

Estimation, Specification and Testing

F Blasques, C Francq, SJ Koopman, JM Zakoian

Lancaster University
Timberlake Consultants

January/February 2023
ARMA models: estimation, specification and testing

Program today:
• Estimation via Regression
• Maximum Likelihood
• Specification
• General to specific
• Residuals
• Testing and Diagnostic Checking
• Model Validation

2 / 45
The ARMA Coefficients

Let {Xt } be generated by an ARMA(p, q),

Xt = φ1 Xt−1 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q

with {εt } ∼ W N (0, σε2 ). Then,


• The stationarity properties of {Xt } depend on the values of
the parameters φ1 , ..., φp .
• Unconditional moments and dynamic structure (the ACF of
{Xt }) depend on the parameters φ1 , ..., φp , θ1 , ..., θq and σε2 .
• Forecasted paths and forecasting accuracy depend on
φ1 , ..., φp , θ1 , ..., θq and σε2 .
Conclusion: We have to learn about these parameters! We
have to estimate them!

3 / 45
Estimation of ARMA Coefficients

The usual estimators can help us! We just need to be careful


with the fact that the data is not IID.
Estimators that you know:
• Maximum likelihood (maximize likelihood)
• Method of moments (minimize distance between moments)
• Least squares (minimize squared residuals)

Here we focus on:


• Maximum likelihood
• based on joint distribution
• based on prediction error decomposition
• Least squares

4 / 45
The Maximum Likelihood Estimator

Definition: Given a sample XT := (X1 , ..., XT ), with joint pdf


f (XT ; ψ) depending on some vector of parameters ψ, the
Maximum Likelihood (ML) estimator ψ̂T is defined as,

ψ̂T = arg max f (XT ; ψ)


ψ

⇒ Given a realized sample xT := x1 , ..., xT , the ML estimate


maximizes the realized likelihood function f (xT ; ψ).
⇒ The ML estimator ψ̂T selects the value for ψ that is the most
likely given the observed data xT .
⇒ Location of ψ for maximum f (XT ; ψ) is the same location of
ψ for maximum log f (XT ; ψ).

5 / 45
Maximum Likelihood Estimation
Important: Independence is needed to factorize the likelihood,
T
Y
f (xT ; ψ) = ft (xt ; ψ),
t=1

and identical distributions are needed to obtain ft = f ∀ t,


T
Y T
Y
ft (xt ; ψ) = f (xt ; ψ).
t=1 t=1

Important: When X1 , ..., XT are not independent, THEN, we


CANNOT factorize the joint likelihood,
T
Y
f (xT ; ψ) 6= ft (xt ; ψ).
t=1

6 / 45
Joint Likelihood of T -Period Sequence from ARMA(p, q)

Let the sample XT := (X1 , ..., XT ) be a subset of a time-series


{XT } generated by a stationary ARMA(p, q) model,

Xt = φ1 Xt−1 + φ2 Xt−2 + . . . + φp Xt−p + εt + θ1 εt−1 + . . . + θq εt−q ,

where εt ∼ NID(0, σε2 ) with non-zero coefficients φ1 , ..., φp ,


θ1 , ..., θp and σε2 .
Then, the elements X1 , ..., XT are not independent and the
likelihood cannot be factorized!
However, XT is jointly Gaussian and hence we can still
compute the ML estimator!

7 / 45
Multivariate Normal Distribution

Assume XT := (X1 , ..., XT ) is jointly Gaussian (that is jointly


normally distributed), with distribution XT ∼ N(µ, Γ), the joint
probability log-density function (pdf) is given by

log f (XT ; µ, Γ) =

T 1 1
− log 2π − log |Γ| − (XT − µ)0 Γ−1 (XT − µ).
2 2 2

In case XT is a T -period sequence from a stationary


ARMA(p, q) process, we have µ = 0 and Γ is a Toeplitz (band)
matrix of autocovariances.

8 / 45
Multivariate Normal Likelihood of ARMA Model
If XT := (X1 , ..., XT ) is jointly Gaussian, we can define the ML
estimator of the vector of parameters
ψ := (φ1 , ..., φp , θ1 , ..., θq , σε2 ) as,

T 1 1
ψ̂T = arg max − log 2π − log |Γ(ψ)| − (X0T Γ−1 (ψ)XT )
ψ 2 2 2

where Γ(ψ) is the variance-covariance matrix of the Gaussian


vector (X1 , ..., XT ) which depends on the ARMA parameters,
 
γ(0) γ(1) γ(2) ... γ(T )

 γ(1) γ(0) γ(1) . . . γ(T − 1) 

Γ(ψ) = 
 γ(2) γ(1) γ(0) . . . γ(T − 2) .

 .. .. .. .. .. 
 . . . . . 
γ(T ) γ(T − 1) γ(T − 2) . . . γ(0)

9 / 45
Variance-Covariance Matrix of AR(1)

Let X1 , ..., XT be a subset of a time-series generated by a


stationary Gaussian AR(1) model,

Xt = φXt−1 + εt , {εt } ∼ NID(0, σε2 ).

Then, using the results from last week:

φ2 . . . φT
 
1 φ
2
 φ 1 φ . . . φT −1 
2 σε

 φ2 φ 1 . . . φT −2

Γ(ψ) = Γ(φ, σε ) = .

1 − φ2  ..

.. .. .. .. 
 . . . . . 
φT φT −1 φT −2 ... 1

10 / 45
Variance-Covariance Matrix of MA(1)

Let X1 , ..., XT be a subset of a time-series generated by a


stationary Gaussian MA(1) model,

Xt = εt + θεt−1 , {εt } ∼ NID(0, σε2 ).

Then, using the results from last week:

(1 + θ2 )
 
θ 0 ... 0
 θ 2
(1 + θ ) θ ... 0 
 
2 2
Γ(θ, σε ) = σε  0 θ (1 + θ2 ) ... 0 .

 .. .. .. .. .. 
 . . . . . 
0 0 0 . . . (1 + θ2 )

11 / 45
Problems with Multivariate Distribution Likelihood

The idea is straightforward... but the practice is difficult!


Practical problems:
• Maximizing the likelihood function is involved!
• Inverting the matrix Γ(ψ) is challenging for large sample
sizes T .
• Computing the determinant |Γ(ψ)| is also challenging for
large sample sizes T .
• We leave it to clever algorithms for the computer!

Important: We can simplify things by using prediction error


decomposition to re-write the joint likelihood as a product of
densities!

12 / 45
Prediction Error Decomposition of ARMA Likelihood

Recall: When observations are dependent, then,

f (x1 , ..., xT ; ψ) 6= f (x1 ; ψ) × . . . × f (xT ; ψ).

However: We can always factorize the joint distribution


function as a product of conditional times marginal:

f (x1 , x2 ; ψ) = f (x1 ; ψ) × f (x2 |x1 ; ψ),


f (x1 , x2 , x3 ; ψ) = f (x2 , x1 ; ψ) × f (x3 |x2 , x1 ; ψ)
= f (x1 ; ψ) × f (x2 |x1 ; ψ) × f (x3 |x2 , x1 ; ψ),
T
Y
f (x1 , ..., xT ; ψ) = f (x1 ; ψ) × f (xt |xt−1 , ...; ψ).
t=2

13 / 45
Prediction Error Decomposition of ARMA Likelihood

Important: We can always write the loglikelihood function as


a sum of conditional densities!
T
X
log L(xT ; ψ) = log f (xt |Dt−1 ; ψ),
t=1
where Dt denotes the set x1 , . . . , xt and f (x1 |D0 ; ψ) = f (x1 ; ψ).

If we know the conditional distributions f (xt |Dt−1 ; ψ), then we


can write the likelihood and maximize it!

What is the distribution of Xt conditional on past values?

14 / 45
Likelihood Function of AR(1)

Let X1 , ..., XT be a subset of a time-series generated by a


stationary Gaussian AR(1) model,

Xt = φXt−1 + εt , {εt } ∼ NID(0, σε2 ).


Then, we have

X2 |X1 ∼ N(φX1 , σε2 )


X3 |X2 , X1 = X3 |X2 ∼ N(φX2 , σε2 )

and in general Xt |Xt−1 ∼ N(φXt−1 , σε2 ).


⇒ In the stationary Gaussian AR(1), every f (Xt |Dt−1 ) is a
normal density with different mean but same variance!

15 / 45
Likelihood Function of AR(1)
If X1 , ..., XT is a subset of a stationary Gaussian AR(1) process,
then, ψ = (φ, σε2 ) and,
T
Y
ψ̂T = arg max f (xT ; ψ) = arg max f (xt |Dt−1 ; ψ)
ψ ψ
t=2
T
Y 1 h (xt − φxt−1 )2 i
= arg max √ exp −
ψ 2πσ 2 2σ 2
t=2
T √
X (xt − φxt−1 )2
= arg max − log 2πσ 2 − .
ψ 2σ 2
t=2

Note: We start at t = 2 and simply treat x1 as a fixed value x0


is unknown. We call this conditional ML by error
decomposition. If we assume a distribution for x1 then we call it
ML by error decomposition with exact initialization.
16 / 45
ML Estimator of an AR(1) with NID(0,1) Innovations
Let X1 , ..., XT be a subset of a time-series generated by the
following stationary AR(1) model,
Xt = φXt−1 + εt , {εt } ∼ NID(0, 1).

⇒ Stability requires |φ| < 1 and hence, the parameter space is


the interval (−1, 1).
⇒ ML estimator is given by,
T
X √ (xt − φxt−1 )2
φ̂T = arg max − log 2π − .
φ 2
t=2

Since the likelihood is differentiable w.r.t. φ we can find the


maximum by setting the derivative equal to zero!
We ignore further details such as checking second derivatives
and asymptotes of the likelihood function.
17 / 45
ML Estimator of an AR(1) with NID(0,1) Innovations

T
X √ (xt − φxt−1 )2
Since, log L(φ) = − log 2π −
2
t=2
T
∂ log L(φ) X
we have = (xt − φxt−1 )xt−1
∂φ
t=2

and by construction, φ̂T satisfies,


T
∂ log L(φ̂T ) X
=0 ⇔ (xt − φ̂T xt−1 )xt−1 = 0
∂φ
t=2

T T PT
X X xt xt−1
⇔ xt xt−1 = φ̂T x2t−1 ⇔ φ̂T = Pt=2
T
.
2
t=2 t=2 t=2 xt−1

18 / 45
Least Squares Estimator

Definition: Given a subset XT := (X1 , ..., XT ) of a process


{Xt } generated by an AR(1) model,

Xt = φXt−1 + εt

the least-squares (LS) estimator of φ is defined as


T
X
φ̂T = arg min (Xt − φXt−1 )2
t=2

⇒ Least squares estimator φ̂T for AR(1) selects the value for φ
that gives the best squared error fit given the observed data xT .

⇒ LS estimator φ̂T is equivalent to (conditional) ML estimator.

19 / 45
Maximum Likelihood Estimation in ACTION

To illustrate:
• Simulate a long sequence from an AR(1) process with
coefficients
φ = 0.8, σε2 = 1.
• Select a T -period sequence from long sequence, T = 200.
• Numerically maximize log-likelihood w.r.t. φ and σε2 .
• There are efficient search methods for finding the
maximum.
• The ML estimates are φbT = 0.8009 and σ b2 = 0.992406
εT

20 / 45
Simulated AR(1) process with φ = 0.8
6

-1

-2

-3

-4

0 50 100 150 200 250 300 350 400 450 500

21 / 45
Maximum Likelihood Estimation of AR(1) with φ = 0.8

-1.42 Loglikelihood value

-1.44

-1.46

-1.48

-1.50

-1.52
φ coefficient

0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

22 / 45
ML and LS Estimator Properties

Important: We now know how to obtain point estimates for


the autoregressive parameter φ.

Question: How reliable are those estimates? Can we calculate


confidence intervals? Can we test hypothesis about φ?

Yes! Both small sample and large sample properties!


• Monte Carlo simulations reveal small sample properties
like bias, variance and RMSE.
• Asymptotic theory establishes large sample properties
like consistency and asymptotic normality.

23 / 45
Simulated Small-Sample Properties
Figure: Distribution obtained from 9000 Monte Carlo draws of LS and
ML estimator of φ in AR(1) with known variance.
phi=0.8 T=100 phi=0.97 T=100
1000 3000

2000
500
1000

0 0
0.5 0.6 0.7 0.8 0.9 0.8 0.85 0.9 0.95 1
phi=0.8 T=250 phi=0.97 T=250
1000 2000

500 1000

0 0
0.5 0.6 0.7 0.8 0.9 0.8 0.85 0.9 0.95 1
phi=0.8 T=1000 phi=0.97 T=1000
1000 1500

1000
500
500

0 0
0.5 0.6 0.7 0.8 0.9 0.8 0.85 0.9 0.95 1

24 / 45
Model Specification: Selecting p and q

⇒ The properties of ML and OLS estimators are typically derived


under assumption of correct ARMA(p,q) model!
⇒ The forecasts and impulse response functions depend on the order
p and q of ARMA and ADL.
Question: How can we select p and q? How can we test for correct
specification?

Answer: Using the general-to-specific approach (based on


autocorrelation tests and parameter significance or some information
criteria).

25 / 45
Determine p and q in ARMA model

Selecting p and q as large as possible is not a good strategy:


• Estimation uncertainty increases a lot!
• Forecasting uncertainty increases due to parameter
estimation uncertainty!
Definition: The ‘General-to-Specific’ estimation strategy for
ARMA models begins with the estimation of an ARMA(p, q)
with large p and q, and reduces the size of the model by
eliminating lags that are insignificant.
Alternative: G2S can be done by minimizing an information
criteria like Akaike’s Information Criterion (AIC) which
introduces a penalty to the log likelihood that discourages too
many parameters.

26 / 45
Akaike’s Information Criterion (AIC)

AIC = 2k−2 log L(ψ̂) where k = number of parameters in model

⇒ Model with largest likelihood L has lower AIC


⇒ Likelihood can always be improved by adding parameters to
the model!
⇒ AIC penalizes models with more parameters!
Strategy: model with smallest AIC is best!

27 / 45
Correct Specification in ARMA model

Let XT := (X1 , ..., XT ) be a subset of a process {Xt } generated


by an ARMA(p, q) model,

Xt = φ1 Xt−1 + ...φp Xt−p + εt + θ1 εt−1 + ... + θq εt−q .

Then: Estimation of an ARMA(p∗ , q ∗ ) model with p∗ < p or


q ∗ < q introduces autocorrelation in the residuals!
Example:
• DGP is AR(2) Xt = φ1 Xt−1 + φ2 Xt−2 + εt .
• Estimated model is AR(1) Xt = φ1 Xt−1 + ut .
⇒ ut = εt + φ2 Xt−2 has autocorrelation because {Xt } has
autocorrelation!

28 / 45
Correct Specification in ARMA model

Conclusion: Autocorrelation in the residuals is a sign of


misspecification of the ARMA model! (p or q might be too
low!)
Conclusion: Testing for autocorrelation in the residuals is
important!

29 / 45
Residual Sample Autocorrelation

Note: A host of other autocorrelation tests can be performed!


You already know these! Just recap from previous courses!
30 / 45
Autoregressive Modelling Tips

Let XT := (X1 , ..., XT ) be a subset of a process {Xt } generated


by an AR(p) model,

Xt = φ1 Xt−1 + ... + φp Xt−p + εt .

General-to-Specific:
(1) Estimate AR(p) for large p.
(2) Check for autocorrelation (or other signs of incorrect
specification!)
(3) Eliminate insignificant lag (large p-values! what is large?).
(4) Repeat steps 1 and 2 until all lags appear significant or
significant autocorrelation appears in the residuals.

31 / 45
ADL(p, q) Model Specification

Important: Let the DGP of {(Yt , Xt )} be given by an


ADL(p, q) model. Then, in general, estimation of an
ADL(p∗ , q ∗ ) with p∗ < p or q ∗ < q will result in a regression
with autocorrelation in the residuals.

Conclusion: Autocorrelation in the ADL regression residuals


is a sign of model misspecification!

32 / 45
Regression Residuals and ADL(p, q)

Let the dynamics of {Yt } be given by an ADL(1, 0) model


Yt = α + φYt−1 + βXt + εt

Suppose that we regress Yt on Xt ,


Yt = α + βXt + ut

The static regression model is misspecified and


ut = φYt−1 + εt is correlated with its lag
ut−1 = φYt−2 + εt−1 .

33 / 45
Innovation Properties: Normality tests
⇒ The ML estimator for the AR(1) was derived under the
assumption that the innovations are Gaussian!
⇒ The forecast confidence bounds are derived assuming that
the innovations are Gaussian!
Question: Can we test if the innovations are Gaussian?
Answer: Yes! Using the Jarque-Bera statistic :
 
T −k+1 2 1 2
JB = µ̂3 + (µ̂4 − 3)
6 4
k = number of regressors,
µ̂3 is sample skewness and µ̂4 is sample kurtosis
H0 : data is normally distributed , H1 : data not
normal
JB∼ χ22 under H0
34 / 45
Eviews Output: Residuals of ADL

35 / 45
Eviews Output: Jarque-Bera Test

36 / 45
Eviews Output: Jarque-Bera Test

37 / 45
Innovation Properties: White Noise

⇒ The point forecasts and IRFs of ARMA and ADL models


were derived under the assumption that the innovations are
white noise!
Question: Can we test if the innovations are white noise? Can
we test if the innovations are uncorrelated and have fixed
variance (mean zero is ensured by the intercept!)?

Answer: Yes! Using autocorrelation tests and


heteroeskedasticity tests!!

38 / 45
AutocorrelationPand Heteroeskedasticity Tests
(ut − ut−1 )2 / u2t
P
Durbin-Watson: d =
DW ≈ 2 : no autocorrelation
DW < 2 : positive autocorrelation

ρ̂2k /(T − k)
P
Ljung-Box Q-test Statistic: T (T + 2)
H0 : no autocorrelation H1 : autocorrelation

Breusch-Godfrey test: ut = α0 + α1 Xt + ρ1 ut−1 + ... + ρp ut−p + vt


H0 : no autocorrelation H1 : autocorrelation
Test statistic T × R2 ∼ χ2p under H0

Breusch-Pagan test: u2t = α0 + α1 Xt + ... + αp Xt−p + vt


H0 : homoeskedasticity H1 : heteroeskedasticity
Test statistic: F test for α1 = ... = αp = 0
39 / 45
Eviews Output: Autocorrelation Test (Q-Stat)

40 / 45
Eviews Output: Autocorrelation Test (BG LM)

41 / 45
Eviews Output: Heteroeskedasticity Test (BPG)

42 / 45
Model Validation

Important: time-series models can be evaluated and compared


by their forecasting accuracy

Problem: in-sample fit is not a good measure due to the


problem of over-fitting

Important idea: split the sample in two parts. Use the first
for estimation and the second for evaluating the forecasting
performance of the model... this is called sub-sample validation!
Example: Suppose you have a sample of T observations
DT = X1 , ..., XT
1. use Dn = X1 , ..., Xn for estimation (n < T )
2. use Xn+1 , Xn+2 ..., XT for validation!

43 / 45
Sub-sample validation with repeated static forecasts

Validation in practice: We have a sample Dt = X1 , ..., XT

1. use Dn = X1 , ..., Xn for estimation (n < T )

2. produce a 1-step-ahead (i.e. static) forecast X̂n+1 using Dn .


Compare against true value Xn+1

3. produce a 1-step-ahead (i.e. static) forecast X̂n+2 using


Dn+1 . Compare against true value Xn+2

4. continue until you use the entire sample!

Note: the forecast is called dynamic when we use past forecasts as


regressors for obtaining the next forecast.

44 / 45
Sub-sample validation with repeated static forecasts
Validation: the forecast mean squared error (FRMSE) and the
forecast mean absolute error (FMAE) can be used to evaluate the
quality of the forecast!
 1 XT  12
FRMSE = (Xt − X̂t )2 (strong penalty for outliers)
T − n t=n+1

T
1 X
FMAE = |Xt − X̂t | (smaller penalty for outliers)
T − n t=n+1

Model A Model B Model C


Total Sample Size 276 276 n276
Estimation Sample 230 230 230
Validation Sample 46 46 46
FRMSE 27.1 28.9 18.4
FMAE 22.2 21.8 16.1

45 / 45

You might also like