Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
Assignment #3
Submit your printed solution at the start of class on Tuesday, October 15.
Short answers, please. Explain your answer concisely. If you refer to a plot in your
answer, include that plot as part of your answer. Do not include extraneous plots that
you do not refer to in your narrative. “Significance” implies statistical significance.
Presume necessary conditions for inference hold unless the question addresses these.
The data for the first five questions is in the oddly-named file “MRTSSM4453USN.csv”
(in the Canvas data folder). This filename is given by FRED to total monthly retail sales
of beer, wine, and liquor stores (in millions of dollars). (Google that if you’re curious.)
Read the data from this file and define time series as follows. We will truncate the series
at the end of 2023 for convenience working with STL. [You need to have installed the
lubridate R package for this assignment.]
library(lubridate)
Data <- read.csv("[ your path ]MRTSSM4453USN.csv")
sales <- ts(Data$MRTSSM4453USN, start=1992,end=2023+11/12,frequency=12)
dates <- lubridate::ymd(Data$DATE[1:length(sales)])
1. Explain why one should use a multiplicative decomposition rather than an additive
decomposition for the sales time series. Include the diagnostic plot of a median polish
of the sales time series in your answer.
A multiplicative decomposition is
preferred because the residuals show
non-constant variance, with
fluctuations increasing as sales grow.
This suggests that both the seasonal and
trend components scale with the level
of sales, which is typical in retail data.
The log transformation stabilizes this
variance, indicating that a
multiplicative model better captures the
proportional changes in the data
compared to an additive model.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
2. Use the STL function in R to perform a decomposition of log(sales).
sales.stl <- stl(log(sales), s.window=9)
Using this decomposition, define a seasonally adjusted version of the sales time series
(actual sales, not on a log scale). Show a sequence plot of your seasonally adjusted
time series in a plot with the original raw time series (on the same axes in different
colors, with a plot legend identifying which is which).
Blue = Original, Red = Seasonally Adjusted
3. Based on the estimated trend from STL, how did Covid affect the long-term growth
of sales of these products? [ This analysis is most straightforward on the log scale. ]
COVID-19 caused a sharp increase
in sales growth, as seen in the
steeper trend after 2020. The
pandemic accelerated long-term
sales, leading to a sustained higher
level, though the growth rate
stabilized afterward.
4. Does the seasonal pattern change over time? In particular, is the seasonal pattern
identified by STL different pre-Covid (2018-2019) and post-Covid (say, 2022-2023).
Yes, the seasonal pattern has changed over time.
Post-COVID, the seasonal variations appear to
be more amplified, suggesting that consumer
behavior during peak times (like holidays)
intensified after the pandemic. However, the
overall shape of the pattern remains consistent,
indicating that the general timing of seasonal
demand hasn't shifted dramatically, but the
magnitude has increased.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
5. Does the remainder from the STL decomposition contain further seasonal
information? Use the following commands to create time series that count the number
of days and the number of weekend days in a month.
count_weekdays <- function(from, to) {
sum(!wday(seq(from, to, "days")) %in% c(1,7)) }
next.date <- c(dates[-1], ymd("2024-01-01"))-1
wdays <- mapply(count_weekdays, dates, next.date)
ndays <- lubridate::days_in_month(dates)
Use these variables in a regression model (a choice for the model follows) to
determine whether STL has left seasonal information in the remainder. Covid
produces an artifact in the remainder time series, so use data only through 2018.
remain <- window(sales.stl$time.series[,'remainder'], end=2018.99)
n <- length(remain)
wdays <- wdays[1:n]
ndays <- ndays[1:n]
month <- factor(rep(month.abb, n/12), levels=month.abb)
regr <- dynlm(remain ~ month + ndays + wdays)
Yes, the remainder from the STL decomposition
contains some further seasonal information.
Significant coefficients for months like February,
April, June, September, and November, along
with the number of days in a month, indicate that
some residual seasonality remains. However, the
low R-squared value of 0.04527 suggests that
STL captured most of the seasonality, leaving
only minor seasonal effects in the remainder.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
The file “a3_series.csv” (in the Canvas assignments folder) defines three time series that
are used in questions 6, 7, and 8. All have length n = 150. Define these time series as
follows.
Data <- read.csv("[your path]a3_series.csv")
ts.6 <- ts(Data[,2])
ts.7 <- ts(Data[,3])
ts.8 <- ts(Data[,4])
6. Identify an ARMA model for the time series ts.6.
(a) Identify a model from a visual inspection of the estimated ACF and PACF.
ACF: The plot shows a sharp cutoff around lag 2, which suggests the presence of a
Moving Average (MA) component of order q = 2. After lag 2, the correlations drop and
oscillate around zero, indicating that the MA part might be appropriate.
PACF: The PACF cuts off sharply at lag 1, suggesting the presence of an Auto-
Regressive (AR) component of order p = 1. The sharp decline indicates a possible AR(1)
process.
Model would be ARMA(1, 2)
(b) Following the procedure described in class that uses the function `fit_models`
(Lecture 12), use AIC and BIC to identify models for this time series.
Best model based on AIC: ARMA(3, 2): This model has three AR terms and two MA
terms. AIC = 451.17.
Best model based on BIC. ARMA(2, 1): This model has two AR terms and one MA
term. BIC = 466.77.
(c) Do the chosen models in parts (a) and (b) agree with each other? If the results
disagree, what model would you recommend? Why?
No, the models suggested by the visual inspection (ARMA(1, 2)) and the AIC/BIC
criteria (ARMA(3, 2) for AIC and ARMA(2, 1) for BIC) do not fully agree.
I recommend the ARMA(2, 1) model (suggested by BIC). We usually prefer the simpler,
more parsimonious model over the marginal performance added and added complexity.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
7. Repeat the analysis of Q6, but with the time series ts.7.
ACF: The ACF shows a significant spike at lag 1, followed by a gradual decay, which
suggests an AR process. The decay indicates that the data might have autoregressive
behavior.
PACF: The PACF plot shows a sharp cutoff after lag 1, suggesting a potential AR(1)
process. The sharp drop-off in the PACF suggests that we might have a simple
autoregressive model.
Visually, model is likely an AR(1) model.
AIC: ARMA(1, 3). AIC = 434.77. BIC: MA(2). BIC = 449.99.
The models suggested by visual inspection, AIC, and BIC do not fully agree. I
recommend the MA(2) model (suggested by BIC) because it the simpler one (BIC favors
simpler models) while having good explainability.
8. Repeat the analysis of Q6, but with the time series ts.8.
ACF: Spike at lag 1, followed by a
quick decay, indicating a possible
MA(1) or MA(2) process.
PACF: Cuts off sharply after lag 1,
which is indicative of an AR(1) process.
Visually, model is likely an ARMA(1,
1) model.
AIC: AR(2) model. AIC = 406.49.
BIC: AR(2) model. BIC = 418.54.
No, the models from visual inspection (ARMA(1, 1)) and AIC/BIC selection (AR(2)) do
not fully agree.
I recommend the The AR(2) model. It was the consistent answer between AIC and BIC,
and is relatively simple, preventing overfitting.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
9. Use the R program arima.sim to generate a Gaussian realization of length n=800 of
the AR(2) process defined by
Xt = 1.6 Xt-1 – 0.80 Xt-2 + wt
Set the random seed to 54 (i.e., set.seed(54)) and the white noise variance to 1.
(a) Confirm that the ACF and PACF for your simulated time series match those of an
AR(2) process and that the estimated coefficients when fitting this model are
close to the parameters of the process (fit the model using sarima):
sarima(xt, p=2, d=0, q=0)
ACF: Shows a slow decay, typical of an AR process.
PACF: Sharp cutoff after lag 2, confirming an AR(2) structure.
Estimated Coefficients:
o AR1: 1.6238 (true value: 1.6)
o AR2: -0.8127 (true value: -0.8)
The estimates are very close to the true values, confirming the accuracy of the fit.
Residuals: Appear random with no significant autocorrelations, and the Ljung-
Box test shows no remaining autocorrelation.
Conclusion:
The ACF, PACF, and estimated coefficients confirm that the simulated time series
behaves as an AR(2) process, and the model fit is accurate.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
(b) Reset the seed to 62 (set.seed(62)), then generate an independent sequence of
standard Gaussian white noise of the same length as your simulated time series.
Add this white noise to your realization of Xt.
yt <- xt + rnorm(length(xt))
What ARMA process describes the resulting time series?
ACF: The ACF shows a slow decay in the early lags, typical of an autoregressive
process, but now it also exhibits a faster drop-off, which is indicative of the addition of a
moving average component.
PACF: The PACF shows a sharp cutoff after lag 2, confirming that the underlying
process is still autoregressive in nature with two AR terms. The added white noise
introduces some new short-term dependencies, seen in the ACF.
The fitted model is an ARMA(2, 1), which includes:
AR1: 1.6837 (close to the true AR1 value of 1.6 from the original AR process).
AR2: -0.8315 (close to the true AR2 value of -0.8).
MA1: -0.6415 (the new moving average term introduced by adding white noise).
Adding white noise to the AR(2) process transforms it into an ARMA(2, 1) process,
where the original AR(2) structure is preserved but a moving average component is
introduced. The estimated parameters closely match the original values, and the addition
of MA1 confirms the impact of the added noise.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
10. Textbook exercise 4.3, parts (a) and (b).
(a) Redundancy Check:
Model (i):
o AR roots: 2, 3.33
o MA root: -3.33
No common roots between the AR and MA polynomials, so Model (i) does not have
parameter redundancy.
Model (ii):
o AR roots: 1 ± 1i (complex conjugates)
o MA root: -1
Since there are no exact common roots between the AR and MA polynomials, Model (ii)
also does not show parameter redundancy. However, one MA root is exactly -1, which is
on the edge of invertibility (see below).
(b) Causality and Invertibility:
Model (i):
o Causality: Both AR roots (2, 3.33) are greater than 1 in magnitude, so Model (i)
is causal.
o Invertibility: The MA root (3.33) is greater than 1, so Model (i) is invertible.
Model (ii):
o Causality: The AR roots (1.41) are greater than 1 in magnitude, so Model (ii) is
causal.
o Invertibility: The MA root (-1) is exactly on the boundary (1 in magnitude),
which means Model (ii) is on the edge of invertibility, and might need further
investigation or adjustments for practical purposes.
Model (i) is both causal and invertible without any parameter redundancy.
Harvani Sumawijaya Stat 5350/7110 Forecasting Fall 2024
Model (ii) is causal, but its invertibility is borderline due to the MA root being exactly -1. There
is no redundancy in either model.