Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views16 pages

Arima Model For Aapl

The document outlines the process of developing an ARIMA model for AAPL stock prices, emphasizing the importance of stationarity in time series analysis. It details the steps for transforming the data to achieve stationarity, including differencing and log transformations, and provides R code snippets for implementation. Additionally, it discusses the use of various statistical tests to confirm stationarity and the selection of the appropriate ARIMA model for forecasting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

Arima Model For Aapl

The document outlines the process of developing an ARIMA model for AAPL stock prices, emphasizing the importance of stationarity in time series analysis. It details the steps for transforming the data to achieve stationarity, including differencing and log transformations, and provides R code snippets for implementation. Additionally, it discusses the use of various statistical tests to confirm stationarity and the selection of the appropriate ARIMA model for forecasting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

############################################

# DEVELOPING AN ARIMA MODEL FOR AAPL


# Take note that usually what is modelled is the logreturns
##############################################

References:
1) For requirements of parameters of AR and MA (Autoregressive and Moving Average)
https://online.stat.psu.edu/stat510/lesson/2/2.1

2) How to write AR equation with seasonality differencing

https://stackoverflow.com/questions/56879940/writing-mathematical-equation-for-an-arima1-1-00-1-0-
12

3) Excellent resource
https://people.duke.edu/~rnau/411home.htm
https://people.duke.edu/~rnau/411arim.htm

# Require packages
```{r}
library(forecast) #for forecast function
library(fpp3)
library(quantmod)
library(tseries)
library(timeSeries)
library(xts)
library(ggplot2)
library(urca)
library(plotly)
library(ggfortify)
library(tsm)
library(FinTS)
```

# If you encounter difficulty loading tsm:

```{r}
install.packages("remotes")
remotes::install_github("KevinKotze/tsm")
```

# TIME SERIES ANALYSIS


Time series analysis is the art of extracting meaningful insights from time series data by exploring the
series' structure and characteristics and identifying patterns that can then be utilized to forecast future
events of the series.

Box-Jenkins approach
# Using ARIMA (Auto Regressive Integrated Moving Average). The general steps are as follows:
AR - use past values of itself to model the time series
MA - use past error terms to model the time series
I - integrated - you have to DIFFERENCE the data first to convert to stationary t.s.

## The First-order Autoregression Model AR(1)

$$
y_t = \delta + \phi_1y_{t-1} + \epsilon_t\\
y_t = \beta_0 + \phi_1y_{t-1} + \epsilon_t
$$

Assumptions:

1) error terms
$$
e_t \overset{iid}{\sim} N(0, \sigma^2_w)
$$
This means the errors are independently distributed with a normal distribution that has mean 0 and
constant variance.

2) Properties of the errors $\epsilon_t$ are independent of $x_t$

3) The series x1, x2, is (weakly) stationary. A requirement for a stationary AR(1) is that
$$
|\phi_1| < 1
$$
## AR(2)
$$
x_t = \delta + \phi_1 x_{t-1} + \phi_2 x_{t-2}+ e_t
$$

$$\phi_1 + \phi_2 < 1$$


$$
|\phi_2| <1 \\
\phi_2 + \phi_1 < 1\\
\phi_2 - \phi_1 < 1
$$

## AR(p)
$$
x_t = \delta + \phi_1 x_{t-1} + \phi_2 x_{t-2} + \phi_3 x_{t-3} +\dots \\ + \phi_{p-1} x_{t-p-1} + \phi_p
x_{t-p} + e_t
$$

MA(q)

## Moving Average with lag 1: MA(1)


$$
x_t = \mu + e_t +\theta_1e_{t-1}
$$

## MA(2)
$$
x_t = \mu + e_t +\theta_1e_{t-1} +\theta_2e_{t-2}
$$

## The qth order moving average model, denoted by MA(q) is:


$$
x_t = \mu + e_t +\theta_1e_{t-1}+\theta_2e_{t-2}+\dots + \theta_{q-1}e_{t-q-1}+ \theta_qe_{t-q}
$$

## ARMA(p,q): ARMA(1,1)
$$
x_t = c + \phi_1 x_{t-1} + e_t +\theta_1e_{t-1}
$$
## ARMA(2,1)
$$
x_t = c + \phi_1 x_{t-1} + \phi_2 x_{t-2} + e_t + \theta_1e_{t-1}
$$

## ARMA(2,2)
$$
x_t = c + \phi_1 x_{t-1} + \phi_2 x_{t-2} + e_t + \theta_1e_{t-1} + \theta_2e_{t-2}
$$

$$
x_t = c + \phi_2 x_{t-2} + e_t + \theta_1e_{t-1}
$$

## ARMA(p,q)
$$
x_t = c + \phi_1 x_{t-1} + \phi_2 x_{t-2} +\dots + \phi_{p-1} x_{t-p-1} + \phi_p x_{t-p} + e_t +\\
\theta_1e_{t-1}+\theta_2e_{t-2}+\dots + \theta_{q-1}e_{t-q-1}+ \theta_qe_{t-q}

$$

ARIMA(p,d,q): d = differenced
ARIMA(p,d,q) ARIMA(1,1,1)

## Steps in setting up an ARIMA(p,d,q) model


Step 1: Plot data as time series, and check if *stationary*
Step 2: Difference data to make data stationary on mean (remove trend)
Step 3: log transform data to make data stationary on variance (if necessary)
Step 4: Difference log transform data to make data stationary on both mean and variance (if necessary)
Step 5: Check statistically if the time series is already stationary (adf test, kpss test)
Step 6: Plot ACF (Autocorrelation function) and PACF (Partial autocorrelation function) to identify
potential AR and MA model
Step 7: Identification of best fit ARIMA model
Step 8: Forecast the time series using the best fit ARIMA model
Step 9: Plot ACF and PACF for residuals of ARIMA model to ensure no more information is left for
extraction

# STATIONARY TIME SERIES


A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc.
are all constant over time, i.e., it has the property that the mean, variance and autocorrelation structure
do not change over time.

A stationarity looks like a flat looking series, without trend, constant variance over time, a constant
autocorrelation structure over time and no periodic fluctuations.

A stationary time series is one whose properties do not depend on the time at which the series is
observed. Thus, time series with trends, or with seasonality, are not stationary - the trend and
seasonality will affect the value of the time series at different times.

Sample plot of Stationary vs Non-stationary TS


https://www.oreilly.com/library/view/hands-on-machine-learning/9781788992282/15c9cc40-bea2-
4b75-902f-2e9739fec4ae.xhtml

https://towardsdatascience.com/stationarity-in-time-series-analysis-90c94f27322

On the other hand, a **white noise series** is stationary - it does not matter when you observe it, it
should look much the same at any point in time. In general, a stationary time series will have no
predictable patterns in the long-term. Time plots will show the series to be roughly horizontal (although
some cyclic behaviour is possible), with constant variance.

Most statistical forecasting methods are based on the assumption that the time series can be rendered
approximately **stationary** (i.e., "stationarized") through the use of mathematical transformations.

A stationarized series is relatively easy to predict: you simply predict that its statistical properties will be
the same in the future as they have been in the past. The predictions for the stationarized series can
then be "untransformed," by reversing whatever mathematical transformations were previously used, to
obtain predictions for the original series. Stationarizing a time series through differencing (where
needed) is an important part of the process of fitting an Autoregressive Integrated Moving Average
(ARIMA) model.

```{r}
data("AirPassengers")
ap <- AirPassengers
ap
plot(ap)
```

```{r}
tseries::adf.test(ap) #Augmented Dickey-Fuller test
# H0: the time series has a UNIT ROOT (loosely, a.k.a NOT Stationary)
```
# Alternative test - KPSS test
```{r}
# H0: time series is STATIONARY - (opposite of adf test)
tseries::kpss.test(ap)
```

```{r}
forecast::auto.arima(ap, trace = T)
```

```{r}
# To Treat non-constant mean
print(ap)
dap <- diff(ap)# get diff

head(ap); head(dap)
mean(ap); mean(dap)
plot(dap)
```

```{r}
# Get the log of the difference ap series and see the results
ldap <- log(dap)# ldap <- log(diff(ap)) log(diff(ap))

# This results in NaNs because you there is no log of a negative number


```

```{r}

lap <- log(ap)


lap
plot(lap)

dlap <- diff(log(ap)) # diff(lap)


plot(dlap)

tseries::adf.test(dlap)
tseries::kpss.test(dlap)
```
```{r}
mod_dlap <- auto.arima(dlap, trace = T); summary(mod_dlap)
```

# Alternative test - KPSS test


```{r}
# H0: time series is STATIONARY - (opposite of adf test)
tseries::kpss.test(dap)
```

C) Using the dlap, and checking statistically for stationarity


```{r}
tseries::adf.test(dlap) #Augmented Dickey-Fuller test
# H0: the time series has a UNIT ROOT (loosely, a.k.a NOT Stationary)

```

# Alternative test - KPSS test


```{r}
# H0: time series is STATIONARY - (opposite of adf test)
tseries::kpss.test(dlap)
```
We also get the same re
NOTE: K33 July 18, 2025 - We stopped testing statistically dlap

# Import AAPL stock prices from Yahoo finance using "quantmod" package
```{r}
start_date <- "2007-01-01"
end_date <- "2024-12-31"

quantmod::getSymbols("AAPL",
src = "yahoo",
from = start_date,
to = end_date)
```

```{r}
getSymbols("^GSPC")
getSymbols("META")
```
```{r}
start_date <- "2000-01-01"
end_date <- "2024-12-31"

gs <- quantmod::getSymbols("GS",
src = "yahoo",
from = start_date,
to = end_date,
auto.assign = F)
```

```{r}
tickers <- c("MSFT", "TSLA", "JNJ", "SBUX", "META")
getSymbols(tickers,
from = "2015-01-01")
```

```{r}
aapl <- getSymbols("AAPL",
src = "yahoo",
from = start_date,
to = end_date,
auto.assign = F)
```

```{r}
getSymbols("USTB10Y",
src = "FRED")
```

```{r}
getSymbols("MSFT")
```

```{r}
getSymbols("AAPL")
aapl <- AAPL
```

```{r}
class(aapl) #tells us this is an extensible time series (xts/zoo)
View(aapl)
head(aapl,4); tail(aapl,4)
str(aapl)
summary(aapl)

plot(aapl)
plot(aapl$AAPL.Close)
```

# Extract only the Adjusted prices (6th column from aapl)


```{r}
adjaapl <- aapl[, 6]#Model the adjusted closing price
colnames(adjaapl) <- "adj" #Replace the column name with adj
```

Preliminary analysis:
```{r}
plot(adjaapl)# Plot the prices

p1 <- autoplot(adjaapl) +
ggtitle("Adjusted AAPL prices from 2007-01-01 to 2024-12-31") +
ylab("Daily prices")
p1

p2 <- ggplotly(p1); p2

```

# Another way to plot


```{r}
quantmod::chartSeries(aapl)
aapl_dec2024 <- aapl["2024-12"]

chartSeries(aapl_dec2024)
```

```{r}
auto.arima(adjaapl, trace = T)
```

Is the aapl adjusted prices STATIONARY? NO, because it has trend (increasing over time). A stationary
series does not have trend, nor seasonality. Our data, though does not exhibit seasonality.

I. Model the AAPL stock price using ARIMA

# USING MANUAL METHOD


A) Convert our aapl time series to STATIONARY time series by getting the first difference to resolve the
trend. for our aapl series, we subtract each of the prices from the previous one. If we have n
observations, we will be reduced to n-1 observations, i.e., the differenced data will contain one less point
than the original data. For our appl data set, we have 1509 observations, hence the differenced aapl data
will only have 3271 observations.

```{r}
aaplj <- adjaapl
plot(aaplj)

daaplj <- diff(aaplj) # aapl is the adjusted price of aapl

View(daaplj)
plot(daaplj)
daaplj <- daaplj[-1]
mean(daaplj)
# there are still 3272 observations but the first one is empty, so we need to remove this later

daaplj <- daaplj[-1]


mean(daaplj)
# Note that daapl has a empty row, the first row, so we need to remove this

# Or alternative way
daapl <- daapl[2:nrow(daapl)]

```

B) Describe the plot of daapl - The series has been detrended. The differenced series appears stationary,
although there are volatility shocks in some areas. Usually, a simple differencing will make the series
stationary. Other transformation is needed if the series is not yet stationary.

```{r}
plot(daapl)
```

```{r}
ldaaplj <- log(daaplj)
plot(ldaaplj)
```
NOTE: What would have happened if we got the log of the series instead of differencing it?
```{r}
laaplj <- log(aaplj)
View(laaplj)
plot(laaplj)
```
Getting the log did not make the series stationary. It made it more linear, but did not address the trend.
Hence, we still have to get the difference of the log t.s. to make it stationary

```{r}
# stationary series

dlaaplj <- diff(log(aaplj))


plot(dlaaplj)

dlaapl <- diff(laapl); View(dlaapl)


dlaapl <- na.omit(dlaapl)
plot(dlaapl)
dlaapl <- dlaapl[-1]
```

```{r}
dlaapl <- diff(log(aaplj))# COntinuous returns
plot(dlaaplj)
dlaaplj <- dlaaplj[-1]

```

```{r}
dlaapl1 <- dailyReturn(aaplj, type = 'log')

dlaaplj1 <- dlaapl1[-1]


View(dlaaplj1)
```

```{r}
head(dlaapl, 6); head(dlaapl1, 6)
```

# Which to choose? differenced or log differenced model?


It depends on the requirement of the study. The basic rule is since choose the more simple approach.
# What if we get the log of the differenced time series daapl?

```{r}

# differencing (dappl) will have negative values, so getting the log of it will result to errors since we can't
take log of negative numbers
```

As mentioned above, it is better to get the log and then difference it, and not the reverse.

Reference for linearizing variables


https://faculty.fuqua.duke.edu/~rnau/Decision411_2007/411log.htm

C) Using the dlaapl, and checking statistically for stationarity


```{r}
tseries::adf.test(dlaapl) #Augmented Dickey-Fuller test
# H0: the time series has a UNIT ROOT (loosely, a.k.a NOT Stationary)

```

# Alternative test - KPSS test


```{r}
# H0: time series is STATIONARY - (opposite of adf test)
tseries::kpss.test(dlaapl)
```
We also get the same result, the series is STATIONARY

# A more informative test comes from the urca package ur.df(daapl, type = c("none", "drift", "trend"),
lags = 1, selectlags = c("Fixed", "AIC", "BIC"))

```{r}
# H0: time series is NOT stationary
test <- urca::ur.df(dlaapl)
summary(test)
```

D) DETERMINING THE ARIMA MODEL


Inspect the ACF (autocorrelation function) and PACF(partial autocorrelation function) to determine what
ARIMA model to use.

```{r}
par(mfrow=c(1,2))
Acf(dlaapl, main = "ACF for Diff log aapl Series")
Pacf(dlaapl, main = "PACF for Diff log aapl Series")
```

OR
```{r}
tsm::ac(dlaapl, max_lag = 24)

tsm::ac(dlaapl, max_lag = 24, main_title = "Autocorrelation of DLAAPL Data", output = "plot")


```

```{r}
dlaapl_ts <- ts(dlaapl)

tsm::ac(dlaapl_ts, max.lag = 24)

```

```{r}
par(mfrow=c(1,2))
acf(dlaapl, lag.max = 24)
pacf(dlaapl, lag.max = 24)
```

# Based on the ACF and PACF plot, decide on which is the model and test it using ARIMA from forecast
package:

# Make sure required package is loaded


```{r}
library(forecast)
```

# If data is not a time series yet, convert it (e.g., dlaapl <- ts(dlaapl))

# Assume we chose ARMA(1,1) based on ACF and PACF


```{r}
fit_arma11 <- Arima(dlaapl, order = c(1, 0, 1))
```

# Check model summary


```{r}
summary(fit_arma11)

fit_arma11$coef
```

```{r}
fit_arma10 <- Arima(dlaapl, order = c(1, 0, 0))
fit_arma10
```

```{r}
fit_arma51 <- Arima(dlaapl, order = c(5, 0, 1))
fit_arma51
```

# Diagnostic plots (residuals, ACF, PACF)


```{r}
checkresiduals(fit_arma11)
```
```{r}
checkresiduals(fit_arma51)
```

```{r}
names(fit_arma11)
```

```{r}
fit_arma11$coef
```

# Before we look at another approach, let's illustrate how the ARMA processes look like

## Illustrating ACF and PACF for AR(), MA() and ARMA() process
```{r}
# AR process
set.seed(123)
a1 <- arima.sim(model = list(ar = 0.8), n = 1000)
head(a1, 10)
plot.ts(a1)

par(mfrow=c(1,2))
Acf(a1, main = "ACF for a1 AR(1) process")
Pacf(a1, main = "PACF for a1 AR(1) process")
```

```{r}
# MA process
set.seed(123)
m1 <- arima.sim(model = list(ma = 0.6), n = 1000)

plot.ts(m1)
ac(m1, max.lag = 18)
par(mfrow=c(1,2))
Acf(m1, main = "ACF for m1 MA(1) process")
Pacf(m1, main = "PACF for m1 MA(1) process")

```

```{r}
# Simulate 1000 observations from an ARMA(2,2) Process
set.seed(123)
arma22 <-arima.sim(model=list(ar=c(.7,-.4),ma=c(-.6,.5)),n=1000)

plot.ts(arma22)
par(mfrow=c(1,2))
Acf(arma22, main = "ACF for arma22 ARMA(2,2) process")
Pacf(arma22, main = "PACF for arma22 ARMA(2,2) process")
```

```{r}
forecast::Arima(arma22, order = c(2, 0, 2), include.constant = TRUE)
```

```{r}
mod22 <- auto.arima(arma22)
mod22 <- auto.arima(arma22, trace = T)
mod22$coef
summary(mod22)
```

After plotting the ACF and PACF, try various models and compare their AICs. This is a very tedious and
time-consuming process.

# Using the forecast::auto.arima function

A) Use original data. R will suggest if differencing is needed.

What does using daapl mean? It means we are asking R to determine what model is best, including the
AR, MA and the number of lags to be differenced.

```{r}
mod011 <- auto.arima(dlaapl) # log returns
mod011
```

## NOTE: Compare this with modelling AAPL price dataset


```{r}
plot(aapl)
mod011 <- auto.arima(aapl) # log returns
mod011
```

## Compare above with modelling difference aapl


```{r}
plot(daapl)
mod012 <- auto.arima(daapl) # log returns
mod012
```
## Are the outputs for aapl and daapl above similar?

## Back to the logreturns of aapl data


```{r}
mod111 <- auto.arima(dlaapl)
summary(mod111)
mod111
mod111a <- auto.arima(dlaapl, trace = T)
mod111a
```

```{r}
mod111a$coef
```

```{r}
(modf <- arima(dlaapl, order=c(5,0,1), include.mean=T))

# Or
modf$coef
```

```{r}
(modf1 <- forecast::Arima(dlaapl, order=c(5,0,1)))
modf1$coef
```

## DIAGNOSTICS FOR modf


Plot the residuals of the model
```{r}
checkresiduals(modf)
tsm::ac(modf$residuals, max.lag = 24)

par(mfrow = c(1,2))
Acf(modf$residuals, main = "ACF of residuals")
Pacf(modf$residuals, main = "PACF of residuals"
```

## Check if residuals are stationary


```{r}
tseries::adf.test(modf$residuals)

tseries::kpss.test(modf$residuals)
```

## Predicting the next 22 days (1 trading month)


```{r}
pred <- predict(modf, n.ahead = 22)
pred
```

## Check ARCH effects


```{r}
FinTS::ArchTest(dlaapl, lags = 18)
```

You might also like