Chapter 3 Methodology
ARIMA Model Auto Regressive Integrated Moving Average (ARIMA)
The ARIMA technique does not assume any particular pattern in the historical data of the
series to be forecast. The applications of an ARIMA model are well documented in Barras
(1983), Box and Jenkins (1976), Chow and Choy (1993), Cleary and Levenbach (1982),
Hanke and Reitsch (1986), Herbst (1992), Nazem (1988), Wasim Shahid Malik (2007),
Shakir Khan and Hela Alghulaiakh (2020), Tahira Bano Qasim at el (2021) for example. An
ARIMA model uses an iterative approach of identifying a possible useful model from a
general class of models. The simpler autoregressive and moving average models are actually
special cases of the ARIMA classes. Moving averages are popular for determining turning
points, which is when a market trend changes direction. The simple models can contain either
autoregressive or moving average components but not both. A mixed autoregressive and
moving average model with both components is known as an ARIMA model. The model
helps to track the direction of changes in the prices. ARIMA is a model describes time series
given based on observed value which can be used to forecast future values. Prediction the
future of stocks values using ARIMA model will be by testing the auto ARIMA values as
well as build customize ARIMA models to get better forecasting model. The ARIMA model
applied on ________________data which is available for ________-.
The model introduced by Box and Jenkins in 1970. To generate short-term forecasts,
ARIMA models showed efficient capability outperformed complex structural models [3].
The future value of a variable in ARIMA model is a combination of linear to the past values
and errors, expressed as follows:
Yt=ϕ0+ϕ1Yt−1+ϕ2Yt−2+…+ϕpYt−p+εt−θ1εt−2 − … −θq εt−q (1)
Where, Yt is the actual value
and Et is the random error at t,
ϕi and θj are the coefficients,
p and q are integers
that are often referred to as autoregressive and moving average, respectively
Determine the model accuracy are three stages of ARIMA Model.
1) Model identification by using Autocorrelation Functions (ACFs), and Partial
Autocorrelation Function (PACF)
2) Model Estimation and Validation by Mean Absolute Percentage Error (MAPE).
3) Model Application
The general form of the ARIMA model can be written as:
(2)
where B represents the backshift operator
such that BYt = Yt–1,
Yt is the value of the time series observation at time t,
εt is a series of random shocks which are assumed to be independently,
normally distributed with zero mean and variance and d represents the order of difference.
If a series is stationary, then d = 0.
In equation (2), φ(B) is a polynomial of order p in the backshift operator B, which is
defined as: (3) Similarly,
φ(B) is defined to be a polynomial of order q in B,
such that: (4) The conditions for the existence of the model require
(i) stationarity, that is the statistical equilibrium about a fixed mean and
(ii) invertibility, which guarantees uniqueness of representation.
While one may not be aware of the appropriate order of the autoregressive process to fit to a
time series, we examine the time series of the property markets by taking differences of order
1 of the data. Once a tentative model has been selected, the parameters for that model must be
estimated.
The first differences of the two series are given by: (5) (6) Now, we shall proceed to
estimating such a model.
Capital Asset Pricing Model (CAPM)
CAPM The capital asset pricing model (CAPM) links non-diversifiable risk to expected
returns. It is a finance model that measures linear relationship between required return on an
investment and risk ( X, Y ) and is used for
a) Pricing risky securities.
b) Generating expected returns for assets Risk of assets
c) Cost of capital
The goal of the CAPM formula is to evaluate
1) Whether a stock is fairly valued when its risk and the time value of money are
compared with its expected return.
2) Whether the current price of stock is consistent with its likely return.
Model= Asset’s beta, & the risk-free rate
risk-free rate = (equity risk premium(expected return on the market minus the risk free
rate ))
We will discuss the model in five sections.
1. The first section deals with the beta coefficient, which is a measure of non-
diversifiable risk. The beta coefficient, b, is a relative measure of non-diversifiable
risk. It is an index of the degree of movement of an asset’s return in response to a
change in the market return. An asset’s historical returns are used in finding the
asset’s beta coefficient. The market return is the return on the market portfolio of all
traded securities. Deriving Beta from Return Data an asset’s historical returns are
used in finding the asset’s beta coefficient.
2. The second section presents an equation of the model itself, and
3. The third section graphically describes the relationship between risk and return.
4. The fourth section discusses the effects of changes in inflationary expectations and
risk aversion on the relationship between risk and return.
5. The fifth section offers some comments on the CAPM
The capital asset pricing model (CAPM) is given in Equation
Eri = Rf + βi * ( E (Rm) - Rf )
where ERi = Expected return of investment
Rf = risk-free rate of return.
βi = beta of the investment
E(Rm) - Rf = market risk premium (market return)
The CAPM can be divided into two parts:
(1) The risk-free rate of return, Rf, which is the required return on a risk-free asset, typically a
short-term and
(2) The risk premium. The portion { E(R m) - Rf } of the risk premium is called the market risk
premium because it represents the premium the investor must receive for taking the average
amount of risk associated with holding the market portfolio of assets.
CAPM & β
Beta of a potential investment is a measure of how much risk the investment will add to a
portfolio that look like the market
a) If a stock is riskier than the market, it will have a β > 1.
b) If a stock has a β < 1 , the formula assumes it will reduce the risk of a portfolio.
The stock’s β is than multiplied by the market risk premium, which is the return expected
from the market above the risk-free-rate. The risk – free- rate is then added to the product of
the stock’s β and the market risk premium. The result should give an investor the required
return or discount rate that they can use to find the value of an asset.
The Graph:
In the graph, risk as measured by beta, b, is plotted on the x axis, and required returns,
r, are plotted on the y axis. The risk–return trade-off is clearly represented by the SML.
Augmented Dickey-Fuller test
The first and for most issues in the testing procedure is to determine whether the data
contain unit roots indicating that data is non-stationary or not. Most commonly used type
test employed in this study was the Augmented Dickey–Fuller (ADF) test which has
developed by Dickey and Fuller. The test is used for checking whether variables such that
GDP growth rate, Gross Saving growth rate have a unit root or not. If parameter α is
equal to Zero, it means the variable contains unit root which means the data is
nonstationary.
The Augmented Duckey-Fuller test is in two forms: one with only intercept and
another with intercept and trend. The one that is chosen depends on the nature of the
curvature of the variable being tested for a unit root. If the curvature of a time series
variable exhibit trend, then the Augmented Dickey-Fuller test is conducted with intercept
and trend. On the other hand, if the curvature of a variable exhibits no trend, then the
ADF Test is performed with the only intercept. (Samuel Elias & Abebe worku (2015)
The ADF test equation is stated as:
n
∆ x t = ∅ ο + β 1 x + δ t + ∑ θ i ∆ x t −1+ ε
t−1
i=1
The ARIMA model is essentially an approach to economic forecasting based on time-
series data. However, the ARIMA model requires the use of stationary time-series data
(Dickey and Fuller, 1981; Granger and Newbold, 1974; Tse, 1996). Under current practice,
developing such data requires that the observed data series should be tested for unit roots.
The tests for unit roots are also known as Dickey-Fuller (DF) and augmented Dickey-Fuller
(ADF) tests. Typically, the ADF test is based on the following formulation:
∆yI = (1)
where ∆Yt = Yt – Yt–1,
µ is a drift term
T is the time trend
null hypothesis H0: α = 0
and its alternative hypothesis H1: α ≠ 0,
N is the number of lags necessary to obtain white noise
and ut is the error term.
The simpler Dickey-Fuller (DF) test removes the summation term. However, the
implied t-statistic is not the student t distribution, instead it is generated from Monte Carlo
simulations (Engle and Granger, 1987, 1991).
Failing to reject H0 implies that the time series is non-stationary. Generally, many
kinds of non-stationary are present in time series data. A non-stationary time series is one for
which the parameters are functions of time, and thus one for which its mean, variance, and so
on, change over time. A time series is referred to as stationary when it contains no growth or
decline and as non-stationary when a trend is present. When a time series is non-stationary,
autocorrelations will dominate the pattern. To model the non-trend patterns in the series,
trends must be removed before further analysis can take place. The most popular approach is
to carry out consecutive differencing on the series concerned to achieve stationary and then
fit the ARIMA model to them. In fact, many time series can be made stationary by replacing
the original data points with their first differences; that is, the differences between successive
observations.
A popular and widely used statistical method for time series forecasting is the ARIMA
model.
ARIMA stands for AutoRegressive Integrated Moving Average and represents a
cornerstone in time series forecasting. It is a statistical method that has gained immense
popularity due to its efficacy in handling various standard temporal structures present in
time series data.
Autoregressive Integrated Moving Average Model
The ARIMA (AutoRegressive Integrated Moving Average) model stands as a statistical
powerhouse for analyzing and forecasting time series data.
It explicitly caters to a suite of standard structures in time series data, and as such
provides a simple yet powerful method for making skillful time series forecasts.
ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It is a
generalization of the simpler AutoRegressive Moving Average and adds the notion of
integration.
Let’s decode the essence of ARIMA:
AR (Autoregression): This emphasizes the dependent relationship between an
observation and its preceding or ‘lagged’ observations.
I (Integrated): To achieve a stationary time series, one that doesn’t exhibit trend or
seasonality, differencing is applied. It typically involves subtracting an observation from
its preceding observation.
MA (Moving Average): This component zeroes in on the relationship between an
observation and the residual error from a moving average model based on lagged
observations.
Each of these components is explicitly specified in the model as a parameter. A standard
notation is used for ARIMA(p,d,q) where the parameters are substituted with integer
values to quickly indicate the specific ARIMA model being used.
The parameters of the ARIMA model are defined as follows:
p: The lag order, representing the number of lag observations incorporated in the model.
d: Degree of differencing, denoting the number of times raw observations undergo
differencing.
q: Order of moving average, indicating the size of the moving average window.
A linear regression model is constructed including the specified number and type of
terms, and the data is prepared by a degree of differencing to make it stationary, i.e. to
remove trend and seasonal structures that negatively affect the regression model.
Interestingly, any of these parameters can be set to 0. Such configurations enable the
ARIMA model to mimic the functions of simpler models like ARMA, AR, I, or MA.
Adopting an ARIMA model for a time series assumes that the underlying process that
generated the observations is an ARIMA process. This may seem obvious but helps to
motivate the need to confirm the assumptions of the model in the raw observations and
the residual errors of forecasts from the model.
Next, let’s take a look at how we can use the ARIMA model in Python. We will start with
loading a simple univariate time series.
ARIMA with Python
The statsmodels library stands as a vital tool for those looking to harness the power of
ARIMA for time series forecasting in Python.
Building an ARIMA Model: A Step-by-Step Guide:
1. Model Definition: Initialize the ARIMA model by invoking ARIMA() and specifying the p,
d, and q parameters.
2. Model Training: Train the model on your dataset using the fit() method.
3. Making Predictions: Generate forecasts by utilizing the predict() function and
designating the desired time index or indices.
Let’s start with something simple. We will fit an ARIMA model to the entire Shampoo
Sales dataset and review the residual errors.
We’ll employ the ARIMA(5,1,0) configuration:
5 lags for autoregression (AR)
1st order differencing (I)
No moving average term (MA)
Rolling Forecast ARIMA Model
The ARIMA model can be used to forecast future time steps.
The ARIMA model is adept at forecasting future time points. In a rolling forecast, the
model is often retrained as new data becomes available, allowing for more accurate and
adaptive predictions.
We can use the predict() function on the ARIMAResults object to make predictions. It
accepts the index of the time steps to make predictions as arguments. These indexes
are relative to the start of the training dataset used to make predictions.
How to Forecast with ARIMA:
1. Use the predict() function on the ARIMAResults object. This function requires the index
of the time steps for which predictions are needed.
2. To revert any differencing and return predictions in the original scale, set the typ
argument to ‘levels’.
3. For a simpler one-step forecast, employ the forecast() function.
We can split the training dataset into train and test sets, use the train set to fit the model
and generate a prediction for each element on the test set.
A rolling forecast is required given the dependence on observations in prior time steps
for differencing and the AR model. A crude way to perform this rolling forecast is to re-
create the ARIMA model after each new observation is received.
Configuring an ARIMA Model
ARIMA is often configured using the classical Box-Jenkins Methodology. This process
employs a meticulous blend of time series analysis and diagnostics to pinpoint the most
fitting parameters for the ARIMA model.
The Box-Jenkins Methodology: A Three-Step Process:
1. Model Identification: Begin with visual tools like plots and leverage summary statistics.
These aids help recognize trends, seasonality, and autoregressive elements. The goal
here is to gauge the extent of differencing required and to determine the optimal lag size.
2. Parameter Estimation: This step involves a fitting procedure tailored to derive the
coefficients integral to the regression model.
3. Model Checking: Armed with plots and statistical tests delve into the residual errors.
This analysis illuminates the temporal structure that the model might have missed.
The process is repeated until either a desirable level of fit is achieved on the in-sample
or out-of-sample observations (e.g. training or test datasets).
The process was described in the classic 1970 textbook on the topic titled Time Series
Analysis: Forecasting and Control by George Box and Gwilym Jenkins. An updated 5th
edition is now available if you are interested in going deeper into this type of model and
methodology.
Given that the model can be fit efficiently on modest-sized time series datasets, grid
searching parameters of the model can be a valuable approach.