Notes LinearTimeSeries
Notes LinearTimeSeries
1 Introduction
A time series arises when observations of a quantity are collected at successive time points. Examples
include tracking economic indicators, monitoring industrial operations, or recording meteorological
conditions. Mathematically, a time series can be represented by a stochastic process {Yt , t ∈ T },
where the index set T may be discrete (e.g., Z = {0, 1, 2, . . .}) or continuous (e.g., [0, ∞)). In this
course, we will restrict our attention to the discrete case.
Typically, one observes a single realization of a time series over a finite time horizon, such as y1 , y2 , . . . , yn .
The objective of time series analysis is to find an appropriate stochastic model that could
plausibly generate the observed data. Such a model can serve several purposes: 1) to provide
insight into the mechanism underlying the observed phenomenon, 2) to forecast future outcomes, 3)
to monitor and control processes (e.g., in manufacturing, where adjustable systems yield sequences of
measurements), and 4) to explain the dynamics of one time series using information from another, in
a way that is closely related to regression analysis.
Time series data are encountered across a wide range of scientific domains, and the study of such data
is of fundamental importance for both theoretical research and practical decision-making:
• In economics, common examples include daily stock market prices or monthly unemployment figures.
• In the social sciences, population-related series such as birth rates or school enrollment are of
interest.
• In epidemiology, one may analyze the number of recorded influenza cases during a given period.
Asset returns, such as the log return of a stock, can be represented as a time series {rt }, where values
evolve over time. The most general formulation for the returns {rit ; i = 1, . . . , N ; t = 1, . . . , T } is
given by their joint distribution function:
where θ is a parameter vector uniquely determining the distribution Fr (·). The probability distribution
Fr (·) governs the stochastic behavior of both the returns rit . In practice, the empirical analysis of
asset returns typically aims at estimating the unknown parameter vector θ and making statistical
inferences about the behavior of {rit } based on previously observed returns.
Focusing on a single stock, our interest is in the joint distribution of {rit }Tt=1 . It is useful to factorize
this distribution as
T
Y
F (ri1 , . . . , riT ; θ) = F (ri1 ) F (ri2 | ri1 ) · · · F (riT | ri,T −1 , . . . , ri1 ) = F (ri1 ) F (rit | ri,t−1 , . . . , ri1 ).
t=2
This decomposition highlights the temporal dependence of log-returns rit . The essential modeling
problem is the specification of the conditional distribution F (rit | ri,t−1 , . . .), and particu-
larly how this distribution evolves over time.
1
2 Linear Time Series Models
Linear time series analysis provides a framework to study their dynamic behavior. For asset returns,
these models focus on the connection between current returns and information available from the
past. Historical return data as well as external economic factors can influence this relationship.
Correlation plays a crucial role, capturing the dependence between present and lagged values. In time
series analysis, these correlations, known as serial correlations or autocorrelations, form the basis for
analyzing stationary processes.
• Time series are stochastic processes. A process that describes a “statistical phenomenon that
develops over time according to the laws of probability”.
R0 , R1 , R2 , . . .
r1 , r2 , . . . , rT .
• The purpose of time series analysis is to find the probability distribution of a stochastic process
through the observed time series.
• Instead of probability distributions, we often focus on examining the moments of the process. The
mean and autocovariance functions are defined as
Stationary Process
Strictly Stationary: A time series {rt } is strictly stationary if the joint distribution of (rt1 , . . . , rtk )
is unchanged under time shifts for any choice of (t1 , . . . , tk ).
This is a very strong assumption and is rarely testable empirically. A weaker and more practical
notion is weak stationarity.
Weak Stationarity: A time-series is weakly stationary if both the mean of rt and the covariance
between rt and rt−ℓ are time-invariant, where ℓ is an integer. More specifically, {rt } is weakly stationary
if
2
(i) E(rt ) = µ, a constant mean, and
In practice, weak stationarity implies that the time series fluctuates around a fixed level with constant
variance, which allows reliable inference and prediction.
Weak stationarity assumes that the first two moments of rt are finite. If rt is strictly stationary with
finite first and second moments, then it is also weakly stationary. The converse, however, does not
necessarily hold, except when rt is normally distributed, in which case weak and strict stationarity
coincide.
It is often more practical to work with the autocorrelation function (ACF). The ACF at lag τ is
defined as
γℓ Cov(rt , rt−ℓ )
ρℓ = = ,
γ0 E (rt − r)2
where r is the mean of the process. This normalization ensures that ρℓ is scale-free, making it easier
to compare dependence structures across different time series.
PT
For a given sample of returns {rt }Tt=1 , let r̄ be the sample mean (i.e., r̄ = t=1 rt /T ). Then the lag-1
sample autocorrelation of rt is PT
t=2 (rt − r̄)(rt−1 − r̄)
ρ̂1 = PT .
2
t=1 (rt − r̄)
ˆ 1 , rho
The statistics rho ˆ 2 , . . . is called the sample autocorrelation function (ACF) of rt . The autocor-
relation function (ACF) plays a central role in the analysis of linear time series. In fact, a linear time
series model can be fully described by its ACF, and the process of linear time series modeling relies
on the sample ACF to represent the linear dynamics present in the data.
Remark 1. In time series modeling, the assumption of stationarity is fundamental for several reasons:
(i) Constant mean and variance: A stationary process has statistical properties, such as mean
and variance, that do not change over time. This constancy simplifies estimation, improves
interpretability, and ensures the stability of the model.
(ii) Autocorrelation structure: For stationary series, the autocorrelation function depends only
on the lag between observations, not on the specific time at which the correlation is computed.
This property is crucial since the autocorrelation function is central to most time series methods.
(iii) Model reliability: Stationarity ensures that the model’s behavior remains stable over time. In
contrast, non-stationary series often exhibit trends or seasonality, which can complicate estima-
tion and forecasting.
3
(iv) Theoretical basis: Many commonly used models, such as ARMA and ARIMA, rely on the
stationarity assumption. In practice, non-stationary series are frequently transformed into sta-
tionary ones before modeling.
It is important to note that not all time series are stationary. When non-stationarity is present,
specialized methods or transformations are typically required.
Remark 2. A structured procedure for analyzing time series data usually involves the following steps:
(i) Inspect the series visually to detect patterns such as trends, seasonality, or unusual observations.
(iii) Fit candidate models to the transformed data to capture its underlying dynamics.
(iv) Choose the most suitable model using statistical criteria and diagnostic checks of model fit.
(v) Generate forecasts for the transformed series, and then apply inverse transformations to obtain
predictions for the original time series.
{rt } ∼ IID(0, σ 2 ).
A discrete process {rt } is white noise, if it consists of a sequence of uncorrelated random variables
with an expected value of 0 and a variance of σ 2 , that is,
σ 2 , s = t,
E(rt ) = 0 and Cov(rs , rt ) =
0, s ̸= t.
We denote
{rt } ∼ WN(0, σ 2 ).
where µ is the mean of rt , ψ0 = 1, and {at } is a white noise. Here, at captures the new information
or innovation at time t, often referred to as a shock.
The dynamic structure of {rt } is determined by the sequence {ψi }, known as the ψ-weights. If the
4
series is weakly stationary, its mean and variance are
∞
X
E(rt ) = µ, and Var(rt ) = σa2 ψi2 ,
i=0
where σa2 is the variance of at . For stationarity, {ψi2 } must form a convergent sequence, implying that
ψi → 0 as i → ∞. Thus, the influence of past shocks at−i on rt diminishes as i grows larger.
∞
X ∞
X ∞
X
=E ψi ψj at−i at−ℓ−j = ψj+ℓ ψj E(a2t−ℓ−j ) = σa2 ψj ψj+ℓ .
i,j=0 j=0 j=0
Therefore, the ψ-weights are closely related to the autocorrelations of the series. For a weakly station-
ary process, ψi → 0 as i → ∞, which leads to ρℓ → 0 as ℓ → ∞. In the context of asset returns, this
indicates that the linear dependence of current returns on very distant past returns becomes negligible
for large lags.
Remark 3. Backward shift operator:
With backward shift operator, the linear process above can expressed as
∞
X
rt = ψ(B)at , where ψ(B) = ψj B j ,
j=0
P∞ j.
where ψ(B) = j=0 ψj B
The backward shift operator B in ψ(B) is a functional that “shifts” the time series backwards. When
B acts on a term at , it replaces it with at−1 , B 2 replaces at with at−2 , and so on.
5
4 Simple Autoregressive Models
An autoregressive process of order 1, AR(1), is defined as
rt = ϕ0 + ϕ1 rt−1 + at ,
where {at } is assumed to be a white noise process with mean zero and variance σa2 . This form is similar
to a simple linear regression model, with rt as the dependent variable and rt−1 as the explanatory
variable.
The AR(1) model has properties analogous to regression but also key differences. In particular,
conditional on rt−1 , we have
Thus, given the past return, the current return is centered around ϕ0 + ϕ1 rt−1 with variance σa2 .
Importantly, the AR(1) model satisfies the Markov property, meaning that once rt−1 is known, past
returns rt−i for i > 1 provide no additional information about rt .
A natural extension of this framework is the autoregressive model of order p, AR(p), defined as
rt = ϕ0 + ϕ1 rt−1 + · · · + ϕp rt−p + at ,
where p is a nonnegative integer and {at } is again white noise. In this case, the past p returns jointly
determine the conditional expectation of rt . The AR(p) model resembles a multiple linear regression,
with lagged returns serving as explanatory variables.
There are several questions that need to be discussed. First, it is important to study the properties of
AR models. Second, how do we identify the order p of an AR time series. Third, how do we estimate
the coefficients. Fourth, how do we check the model adequacy? and finally how to make forecasts,
which is an important application of time-series analysis.
E(rt ) = ϕ0 + ϕ1 E(rt−1 ).
ϕ0
µ = ϕ0 + ϕ1 µ ⇒ E(rt ) = µ = .
1 − ϕ1
6
(b) The mean is zero if and only if ϕ0 = 0.
Hence, for a stationary AR(1) process, the constant ϕ0 determines the long-run mean of rt . If
ϕ0 = 0, then E(rt ) = 0.
Remark 4. Using the identity ϕ0 = (1 − ϕ1 )µ, the AR(1) model can be written as
rt − µ = ϕ1 (rt−1 − µ) + at .
which is of the form of linear time series with ψi = ϕi . Thus, the AR(1) process can be expressed
as an infinite weighted sum of past shocks {at−i }.
where σa2 is the variance of the innovation process {at }. Note that rt − µ can be expressed as a
linear combination of past shocks at−i . Since {at } is a white noise sequence, it follows that
E[(rt − µ)at+1 ] = 0.
Thus, we have
Var(rt ) = ϕ21 Var(rt−1 ) + σa2 .
σa2
Var(rt ) = ,
1 − ϕ21
The condition ϕ21 < 1 ensures that the variance is finite and positive. This requirement leads
to the weak stationarity condition of the AR(1) model, which implies
−1 < ϕ1 < 1.
7
where γℓ = Cov(rt , rt−ℓ ).
σa2
Var(rt ) = γ0 = , and γℓ = ϕ1 γℓ−1 , ℓ > 0.
1 − ϕ21
ρℓ = ϕ1 ρℓ−1 , ℓ > 0.
This result shows that the ACF of a weakly stationary AR(1) process decays exponentially with
lag ℓ, at a rate determined by ϕ1 , starting from ρ0 = 1 (refer Figure 1.
AR(2) Model
8
Taking expectation and using E[(rt−ℓ − µ)at ] = 0 for ℓ > 0, we obtain
This result is referred to as the moment equation of a stationary AR(2) model. Dividing the
above equation by γ0 , we have the property
ρ1 = ϕ1 ρ0 + ϕ2 ρ−1 = ϕ1 + ϕ2 ρ1 .
ϕ1
ρ1 = , ρℓ = ϕ1 ρℓ−1 + ϕ2 ρℓ−2 , ℓ ≥ 2.
1 − ϕ2
Thus, the ACF of a stationary AR(2) series satisfies the second-order difference equation
(1 − ϕ1 B − ϕ2 B 2 )ρℓ = 0.
This difference equation determines the properties of the ACF of a stationary AR(2) time series. It
also determines the behavior of the forecasts of rt .
Under such a condition, the recursive equation of ACF ensures that the ACF of the model converges
to 0 as the lag ℓ increases. This convergence property is a necessary condition for a stationary time
series.
AR(p) Model
Extending the results obtained for the AR(1) and AR(2) processes, the mean for a stationary AR(p)
process is
ϕ0
E(rt ) = ,
1 − ϕ1 − · · · − ϕp
as long as the denominator is nonzero. The stationarity of the AR(p) process depends on the charac-
teristic equation of the model:
1 − ϕ1 x − ϕ2 x2 − · · · − ϕp xp = 0.
If all solutions (roots) of this equation lie outside the unit circle (i.e., their moduli are greater than
one), then the process {rt } is stationary.
For a stationary AR(p) process, the autocorrelation function (ACF) satisfies the difference equation
(1 − ϕ1 B − ϕ2 B 2 − · · · − ϕp B p )ρℓ = 0, ℓ ≥ 0.
As a result, the ACF of an AR(p) process can exhibit a variety of shapes, including mixtures of
exponentially decaying patterns and damped sine or cosine waves, depending on the location of the
characteristic roots.
9
4.2 Identifying the order of AR model
Partial Autocorrelation Function (PACF)
The partial autocorrelation function (PACF) of a stationary time series is closely related to the ACF
and serves as a useful tool for identifying the appropriate order p of an autoregressive (AR) model.
One way to introduce PACF is by considering AR models of increasing order:
Here, ϕ0,j is the constant, ϕi,j are the coefficients for the lagged terms, and ejt represents the error in
the AR(j) model.
These models resemble multiple regression equations and their parameters can be estimated using
least squares. Note that
• and so forth.
The lag-2 PACF ϕ̂2,2 measures the incremental effect of rt−2 on rt , after controlling for rt−1 (the AR(1)
component). Similarly, the lag-3 PACF reflects the additional contribution of rt−3 beyond the AR(2)
model, and so on. For an AR(p) process, the lag-p PACF is nonzero, but for lags greater than p, the
PACF values should be approximately zero. This truncation property of the PACF is commonly used
to determine the appropriate order p.
Parameter Estimation
For a specified AR(p) model, the conditional least-squares method, which starts with the (p + 1)th
observation, is often used to estimate the parameters. Specifically, conditioning on the first p obser-
vations, we have
rt = ϕ0 + ϕ1 rt−1 + · · · + ϕp rt−p + at , t = p + 1, . . . , T,
which is in the form of a multiple linear regression and can be estimated by the least-squares method.
Denote the estimate of ϕi by ϕ̂i . The fitted model is
10
and the associated residual is
ât = rt − r̂t .
The series {ât } is called the residual series, from which we obtain
PT 2
t=p+1 ât
σ̂a2 = .
T − 2p − 1
After fitting the time series, it is essential to verify with diagnostic checks that the model’s residuals
{ât } have the same properties as those of a white noise process.
The residual process is an estimate for a white noise process, which generates the series {rt }. If the
model is good, the empirical residuals should have the same properties as those of the white noise
process. These properties include uncorrelatedness and homoscedasticity. Additionally, if normality
is assumed for the error process, the residuals should be approximately normal.
Goodness of fit
A commonly used statistic to measure goodness of fit of a stationary model is the R square (R2 )
defined as
residual sum of squares
R2 = 1 − .
total sum of squares
For a stationary AR(p) time series model with T observations {rt | t = 1, . . . , T }, the measure becomes
PT 2
2 t=p+1 ât
R = 1 − PT ,
t=p+1 (rt − r̄)2
1 PT
where r̄ = T −p t=p+1 rt . It is easy to show that 0 ≤ R2 ≤ 1. Typically, a larger R2 indicates that
the model provides a closer fit to the data.
For a given data set, it is well known that R2 is a nondecreasing function of the number of parameters
used. To overcome this weakness, an adjusted R2 is proposed, which is defined as
variance of residuals σ̂ 2
Adj-R2 = 1 − = 1 − a2 ,
variance of rt σ̂r
where σ̂r2 is the sample variance of rt . This new measure takes into account the number of parameters
used in the fitted model.1 However, it is no longer between 0 and 1.
4.3 Forecasting
For the AR(p) model, suppose that we are at the time index h and are interested in forecasting rh+ℓ ,
where ℓ ≥ 1. The time index h is called the forecast origin and the positive integer ℓ is the forecast
horizon.
1
Expanding this with penalty terms, we can write
1
PT 2
σ̂a2 T −p−1 t=p+1 ât
Adj-R2 = 1 − =1− T
,
σ̂r2 1
P
t=1 (rt − r̄)
2
T −1
Thus, compared to the unadjusted R2 , the adjusted version increases the denominator’s degrees of freedom (using T − 1)
but penalizes the numerator by dividing the residual sum of squares by (T − p − 1) instead of T , to account for the
number of estimated parameters.
11
1-Step-Ahead Forecast From the AR(p) model, we have
the optimal forecast of rh+1 based on the information set Fh is the conditional expectation, given by
p
X
r̂h (1) = E(rh+1 | Fh ) = ϕ0 + ϕi rh+1−i ,
i=1
In econometric terminology, at+1 is commonly called the shock series at time t + 1. The variance of
this one-step-ahead forecast error is, therefore,
2-Step-Ahead Forecast
Taking the conditional expectation with respect to Fh , the two-step-ahead forecast becomes
which indicates that as the forecast horizon grows, the uncertainty in the forecast also increases.
This observation is consistent with intuition. You can use this recursive process for multistep-ahead
forecasts.
Remark 5. For a stationary AR(p) process, the forecast r̂h (ℓ) approaches the expected value E(rt ) as
the forecast horizon ℓ becomes very large. In other words, over the long run, predictions for the series
converge to its unconditional mean. In finance, this behavior is known as mean reversion.
12