Forecasting - Introduction
Dinesh Kumar
Those who have knowledge don’t predict. Those who predict
don’t have knowledge
- Lao Tzu
I think there is a world market for may be 5 computers
- Thomas Watson, Chairman of IBM 1943
Computers in future weigh no more than 1.5 tons
- Popular Mechanics, 1949
640K ought to be enough for everybody
- Bill Gates, 1981???
But, forecasting helps!
Forecasting
• Forecasting is a process of estimation of an unknown
event/parameter.
• Forecasting is commonly used to refer time series data.
• Time series is a sequence of data points measured at
successive time intervals.
Corporate
Strategy
Business Product and Financial
Forecasting Market Planning Planning
Aggregate
Aggregate Resource
Production
Forecasting Planning Planning
Item Master
Capacity
Production
Forecasting Planning
Planning
Spare Materials Capacity
Forecasting Requirement Requirement
Planning Planning
Forecasting methods
• Qualitative Techniques.
– Expert opinion, Astrologers, Vaastu experts.
• Quantitative Techniques.
– Time series techniques
• Casual Models.
– Uses information about relationship between system
elements (e.g. regression).
Time Series Techniques
• Moving Average.
• Exponential Smoothing.
• Extrapolation.
• Trend Estimation.
• Auto-regression.
Time Series Analysis - Application
• Times series analysis helps to explain:
– Any systemic variation in the series of data which
is usually due to seasonality.
– Cyclical patterns that repeat.
– Trends in the data.
– Growth rates of these trends.
Time Series Components
Trend Cyclical
Seasonal Irregular
Trend Component
• Persistent, overall upward or downward
pattern
• Due to population, economy, technology etc.
• Several years duration
Cyclical Component
• Repeating up and down movements
• Due to interaction of factors influencing
economy
• Usually 2-10 years duration
Seasonal Component
• Regular pattern of up and down movements
• Due to weather, customs etc.
• Occurs within one year
Irregular Component
• Erratic, unsystematic fluctuations
• Due to random variation or unforeseen events
– Strike
– Floods
• Short duration and non-repeating
Demand
Demand
Random
movement
Time Time
(a) Trend (b) Cycle
Demand
Demand
Time Time
(c) Seasonal pattern (d) Trend with seasonal pattern
Time series techniques for Forecasting
Why Time Series Analysis ?
• Time series analysis helps to identify and
explain:
– Any systemic variation in the series of data which
is due to seasonality.
– Cyclical pattern that repeat.
– Trends in the data.
– Growth rates in the trends.
Seasonal Vs Cyclical
• When a cyclical pattern in the data has a period of one
year, it is referred as seasonal variation.
• When the cyclical pattern has a period of more than
one year we refer to it as cyclical variation.
Seasonal Model
The level, trend and seasonality can be combined in three basic ways:
Multiplicative:
Systematic component = level x trend x seasonal factor
Additive:
Systematic component = level + trend + seasonal factor
Mixed:
Systematic component = (level + trend) x seasonal factor
Smoothing Techniques
• Smoothing is a technique of removing random
variation in the data but retain the trend and cyclic type
variation.
• Smoothing Techniques:
– Moving average smoothing
– Exponential smoothing
Moving Average (Rolling Average)
• Simple moving average.
– Used mainly to capture trend and smooth short
term fluctuations.
– Most recent data are given equal weights.
• Weighted moving average
– Uses unequal weights for data
Simple moving average
• The forecast for period t+1 (Ft+1) is given by
the average of the ‘n’ most recent data.
1 t
Ft 1 Di
n i t n 1
Ft 1 Forecast for period t 1
Di Data corresponding to time period i
Moving average example – Electra TV Sales
• Electra city is a retail store that sells electronic goods. Each
month the manager of the store must order merchandize from a
distant warehouse. Currently the manager is trying to estimate
how many TVs the store is likely to sell next month. To assist
this process, she has collected the data (TV sales.xls) on the
number of TVs sold in each of the previous 24 months. She
wants to use this data for decision making.
TV sale data for 16 weeks
Week TVs sold Week TVs sold
1 49 9 63
2 77 10 85
3 90 11 98
4 79 12 88
5 57 13 73
6 90 14 102
7 92 15 98
8 80 16 89
Moving average with n = 2 and 4
1 t
Ft 1,2 Di
2 i t 1
1 t
Ft 1,4 Di
4 i t 3
Moving Average with n = 2 and n = 4
120
100
80
TV Sale
60
40
20
0
0 5 10 Time 15 20
Actual MA with n = 2 MA with n = 4
Length of moving average
• What N should be chosen?
– Small N makes level very responsive to the last
observed demand point
– Large N makes level less responsive (smoother)
Forecasting Accuracy
• The forecast error is the difference between the forecast value
and the actual value for the corresponding period.
E t Yt Ft
E t Forecast error at period t
Yt Actual value at time period t
Ft Forecast for time period t
Measures of aggregate error
Mean absolute error MAE 1 n
MAE Et
n t 1
Mean absolute percentage 1 n Et
MAPE
error MAPE n t 1 Yt
Mean squared error MSE 1 n 2
MSE Et
n t 1
Root mean squared error RMSE
1 n 2
Et
RMSE n t 1
Exponential Smoothing
• Form of Weighted Moving Average
– Weights decline exponentially.
– Largest weight is given to the present observation,
less weight to immediately preceding observation
and so on.
• Requires smoothing constant ()
– Ranges from 0 to 1
Exponential Smoothing
Next forecast = (present actual value)
+ (1-) present forecast
Simple Exponential Smoothing Equations
• Smoothing Equations
Lt * Yt 1 (1 ) * Lt 1
L1 Y1
– Lt : Level at time period t (smoothed value)
• Forecast Equation
Ft Lt
Simple Exponential Smoothing Equations
• Smoothing Equations
Lt Yt 1 (1 )Yt 2 (1 ) Yt 3 ....
2
120
100
80
Demand
60
40
20
0
0 5 10 15 20
Actual alpha = 0.8 Time alpha = 0.6 alpha = 0.4
Choice of
The larger the value of the faster the forecast series
responds to change in the original series.
The smaller the value of the less sensitive is the forecast
to changes in the original series.
Choice of
For “smooth” data, try a high value of a , forecast
responsive to most current data
For “noisy” data try low a forecast more stable—less
responsive
Double exponential Smoothing – Holt’s model
– A problem with simple exponential smoothing is
that it will produce consistently biased forecasts
in the presence of a trend.
– Holt's method (double exponential smoothing) is
appropriate when demand has a trend but no
seasonality.
– Systematic component of demand = Level +
Trend
Holt’s method
• Holt’s method can be used to forecast when there is a
linear trend present in the data.
• The method requires separate smoothing constants for
slope and intercept.
Level Model
The demand xt in a specific period t consists of the level a plus
random noise ut (which cannot be estimated by a forecasting
method). Thus
xt = at + ut
Trend Model
The linear trend b is added to the level model’s equation:
xt = at +bt + ut
Holt’s Method
• Holt’s Equations
(i) Lt Yt (1 ) ( Lt 1 Tt 1)
(ii ) Tt ( Lt Lt 1) (1 ) Tt 1
• Forecast Equation
Ft 1 Lt Tt
F t m Lt mTt
Initial values of Lt and Tt
• L1 is in general set to Y1.
• T1 can be set to any one of the following values (or use
regression to get initial values):
T1 (Y 2Y1 )
T1 (Y2 Y1 ) (Y3 Y2 ) (Y4 Y3 ) / 3
T1 (Yn Y1 ) /( n 1)
Double exponential Smoothing
120
100
80
60
40
20
0
0 2 4 6 8 10 12 14 16 18
Actual Forecast
Forecasting Power of a Model
Theil’s Coefficient
2
n
Ft 1 Yt 1
n -1
t t
Y F 2
Y
U1 t 1
, U2 t 1 t
2
n n n -1
Yt 1 Yt
Yt 2 Ft2 Y
t 1 t 1 t 1 t
U1 is bounded between 0 and 1, with values closure to zero
indicating greater accuracy.
If U2 = 1, there is no difference between naïve forecast and
the forecasting technique
If U2 < 1, the technique is better than naïve forecast
If U2 > 1, the technique is no better than the naïve
forecast.
Theil’s coefficient for TV Sales Problem
Method U1 U2
Moving Average with 2 0.1488 0.8966
periods
Double exponential 0.1304 0.7864
Smoothing
Auto Regressive, Integrated Moving
Average
ARIMA
• ARIMA has the following three components:
• Auto-regressive component: Function of past
values of the time series.
• Integration Component: Differencing the time
series to make it a stationary process.
• Moving Average Component: Function of past
error values.
Auto-Regression
An auto-regression is a regression model in which Yt is
regressed against its own lagged values.
The number of lags used as regressors is called the order of
the autoregression.
In a first order autoregression, Yt is regressed against Yt–1
In a pth order autoregression, Yt is regressed against
Yt–1,Yt–2,…,Yt–p.
Auto-regressive process (AR(p))
• Assume { } is purely random with mean zero
t
and s.d.
• Then the autoregressive process of order p or
AR(p) process is
Yt 0 1Yt 1 2Yt 2 ... pYt p t
AR(p) process models each future observation as a function “p” previous
observations.
Moving Average Process MA(q)
• Start with { } being white noise or purely random,
t
mean zero, s.d.
• {Yt} is a moving average process of order q (written
MA(q)) if for some constants 0, 1, . . . q we have
Yt 0 1 t 2 t 2 ... q t q t
MA(q) models each future observation as a function of “q” previous errors
Stationarity
• A stochastic process is stationary, if:
– Mean is constant over time.
– Variance is constant over time.
– The covariance between two time periods (Yt) and
(Yt+k) depends only on the lag k not on the time t.
ACF Plot of non-stationary and
stationary process
Non-Stationary Process
• The existence of non-stationarity is
indicated by an ACF which is large at long
lags
• Stationarity can be achieved by
differencing. Differencing once is
generally sufficient, twice may be needed
Differencing
• Differencing is a process of making a non-stationary process into
stationary process.
• In differencing, we create a new process Xt, where Xt = Yt – Yt-1.
Integration (d)
• Checks whether the process is stationary or non-stationary.
• Instead of observed values, differences between observed
values are modelled.
• When d=0, the observations are modelled directly. If d = 1,
the differences between consecutive observations are
modelled. If d = 2, the differences of the differences are
modelled.
ARIMA (p, d, q)
• The q and p values are identified using auto-
correlation function (ACF) and Partial auto-
correlation function (PACF) respectively.
• Usually p+q <= 4 and d <= 2.
ARIMA(p,0,q) Model
AR(p) Model
Yt 0 1Yt 1 2Yt 2 ... pYt p
0 1 t 2 t 2 ... q t q t
MA(q) Model
ARIMA(p,1,q) Process
X t 0 1 X t 1 2 X t 2 ... p X t p
0 1 t 2 t 2 ... q t q t
Where Xt = Yt – Yt-1
Auto-Correlation
• Auto-correlation is the correlation between successive
observations over time.
• The autocorrelation for a k-period lag is given by:
nk
Yt k Y Yt Y
rk t 1 n
(Yi Y ) 2
t 1
Auto-correlation Function
• A k-period plot of autocorrelations is called
autocorrelation function (ACF) or a
correlogram.
ACF Plot for Harmon Foods
PACF Plot for Harmon Foods
Hypothesis test for autocorrelation
• To test whether the autocorrelation at lag k is significantly
different from 0, the following hypothesis test is used:
• H0: rk = 0
• HA: rk ≠ 0
• For any k, reject H0 if |rk| > 1.96/√n. Where n is the number
of observations.
Harmon Foods Example
=1.96/√48
= 0.29
Partial Auto-Correlation
• Partial auto-correlation of lag k is auto-correlation between Yt
and Yt+k after the removal of linear dependence of Yt+1 to Yt+k-1.
• To test whether the autocorrelation at lag k is significantly
different from 0, the following hypothesis test is used:
• H0: k = 0
• H A: k ≠ 0
• For any k, reject H0 if | k| > 1.96/√n. Where n is the number
of observations.
Harmon Foods: Partial Auto-
Correlation
Pure AR AND MA Process
• Non-stationary process has ACF significant for large number of
lags.
• Autoregressive processes have an exponentially declining ACF and
spikes in the first one or more lags of the PACF. The number of
spikes indicates the order of the auto-regression.
• Moving average processes have spikes in the first one or more
lags of the ACF and an exponentially declining PACF. The number
of spikes indicates the order of the moving average.
• Mixed (ARMA) processes typically show exponential declines in
both the ACF and the PACF.
Pure AR & MA Model Identification
Model ACF PACF
AR(1) Exponential Decay: Positive side if 1 > Spike at lag 1, then cuts off to zero.
0 and alternating in sign starting on Spike positive if 1 > 0 and negative
negative side if 1 < 0. side if 1 < 0.
AR(p) Exponential decay: pattern depends on Spikes at lags 1 to p, then cuts of to
signs of 1, 2, etc zero.
MA(1) Spike at lag 1 then cuts of to zero. Spike Exponential decay. On negative side if
positive if 1 > 0 and negative side if 1> 0 on positive side if 1< 0.
1 < 0.
MA(q) Spikes at lags 1 to q, then cuts off to Exponential decay or sine wave.
zero. Exact pattern depends on signs of 1,
2 etc.
ARMA(p,q) Model Identification
• ARMA(p,q) models are not easy to identify. We usually start
with pure AR and MA process. The following thump rule may
be used.
Process ACF PACF
ARMA(p,q) Tails off after (q – p) lags Tails off after (p-q) lags
• The final ARMA model may be selected based on parameters
such as RMSE, MAPE, AIC and BIC.
Forecasting Model Evaluation
Akaike’s information criteria
AIC = -2LL + 2m
Where m is the number of variables estimated in the model
Bayesian Information Criteria
BIC = -2LL + m ln(n)
Where m is the number of variables estimated in the model and
n is the number of observations
Recommended Readings
• F Diebold, “Forecasting Applications and
Methods”, Cengage Learning, 2009.
• J Holten Wilson and Barry Keating, “Business
Forecasting”, TaTa McGraw Hill, 2010.