Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views7 pages

Time Series Analysis Homework

The document discusses time series concepts including autocorrelation, stationarity, and differencing. It provides examples of autocorrelation plots with different sample sizes and analyzes their properties. It also examines residuals from a forecasting model and discusses how to make a non-stationary series stationary through differencing.

Uploaded by

Siddharth Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views7 pages

Time Series Analysis Homework

The document discusses time series concepts including autocorrelation, stationarity, and differencing. It provides examples of autocorrelation plots with different sample sizes and analyzes their properties. It also examines residuals from a forecasting model and discusses how to make a non-stationary series stationary through differencing.

Uploaded by

Siddharth Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Siddharth Singireddy

Time Series

Homework 3

Question 1:

a) The series at Lag-7, Lag-9, and Lag-11 is given by [1,2,3,4,5,6,7,8,9,10,11] at Lag-7, the
value is 3, which is the sum of the numbers from 1 to 7. [1+2+3+4+5+6+7+8+9+10+11=3] at
Lag-9, the value is 5, which is the sum of the numbers from 1 to 9.
[1+2+3+4+5+6+7+8+9+10+11=5] at Lag-11, the value is 10, which is the sum of the
numbers from 1 to 11. [1+2+3+4+5+6+7+8+9+10+11=10]

The series at Lag-7, Lag-9, and Lag-11 is given by [1,2,3,4,5,6,7,8,9,10,11] at Lag-7, the
value is 3, which is the sum of the numbers from 1 to 7. [1+2+3+4+5+6+7+8+9+10+11=3] at
Lag-9, the value is 5, which is the sum of the numbers from 1 to 9.
[1+2+3+4+5+6+7+8+9+10+11=5] at Lag-11, the value is 10, which is the sum of the
numbers from 1 to 11. [1+2+3+4+5+6+7+8+9+10+11=10]

b) The 8th order autocorrelation is the correlation between the 8th and 1st order terms. In this
case, the 8th order term is 3, and the 1st order term is 1. The correlation between these two
terms is [31=3] Therefore, the 8th order autocorrelation is 3. The 10th order autocorrelation
is the correlation between the 10th and 3rd order terms. In this case, the 10th order term is
5, and the 3rd order term is 3. The correlation between these two terms is [53=15] Therefore,
the 10th order autocorrelation is 15.

The 8th order autocorrelation is the correlation between the 8th and 1st order terms. In this
case, the 8th order term is 3, and the 1st order term is 1. The correlation between these two
terms is [31=3] Therefore, the 8th order autocorrelation is 3. The 10th order autocorrelation
is the correlation between the 10th and 3rd order terms. In this case, the 10th order term is
5, and the 3rd order term is 3. The correlation between these two terms is [53=15] Therefore,
the 10th order autocorrelation is 15.

Question 2:

a) The graph shows the residuals (errors) from a forecasting model. These residuals exhibit a
noticeable pattern, suggesting that they are not white noise. White noise residuals would
appear random without any discernible structure.

The histogram and Q-Q plot indicates that the residuals are not normally distributed. In a
normal distribution, the points on the Q-Q plot would align closely with the diagonal line.
But in this case, we see deviations from that line.
The model might not be a good fit for the data due to the non-random residuals and departure
from normality.

b) Normally distributed residuals are desirable because they indicate that the model’s errors
are random and unbiased. When residuals follow normal distribution, the model captures the
underlying patterns effectively.

Non-normal residuals can lead to biased forecasts. For instance, if the residuals are skewed,
the model may consistently overestimate or underestimate future values. Normality ensures
that the model’s assumptions align with reality, improving its reliability.

c) Small residuals suggest that the model fits the data well during training. However, this
doesn’t guarantee or assure good forecasts.

Overfitting is a concern: A model can have small residuals by memorizing the training data
but fail to generalize to new data (poor forecasts). While small residuals are desirable, we
must also consider the model’s ability to generalize beyond the training set.

d) Increasing model complexity isn’t always the solution. Adding complexity can lead to
overfitting, where the model fits noise rather than true patterns. A more complex model may
perform well on training data but poorly on unseen data (overfitting). Instead of blindly
increasing complexity, we should focus on improving model selection, feature engineering,
and addressing biases in the data.

Question 3:

No, the residuals in the graphs do not appear to be uncorrelated and normally distributed.

 Residuals are considered uncorrelated if they show no pattern over time. In the graph,
the residuals plot shows a cyclical pattern, with positive residuals followed by
negative residuals. This suggests that the errors are correlated.
 Normally distributed residuals would follow a bell-shaped curve. The histogram of
the residuals doesn't appear to follow a bell-shaped curve, particularly at the tails.
This suggests the residuals are not normally distributed.

Few reasons why the residuals in the graphs are uncorrelated and violate the assumptions of
normality
a) No Autocorrelation:

 Imagine the residuals plotted like points on a graph, with time on the -axis and the
residual value on the -axis. Ideally, these points should be scattered randomly, with no
trend.
 In the graph, the residuals form a wave-like pattern, other non-random pattern, it
suggests that the errors are correlated.
 This means the error made on one measurement might influence the error on the next
measurement, which is not ideal.

b) Normal Distribution:

 Imagine a bell-shaped curve. This is the shape a histogram of normally distributed


data would take.
 The residuals, when plotted as a histogram, should roughly follow this bell-shaped
curve.
 In the graph, the histogram has a different shape, with fat tails compared to a bell
curve, it indicates the residuals are not normally distributed.

Residuals that are uncorrelated and normally distributed are important assumptions for many
statistical tests. If these assumptions are not met, the results of the tests may be unreliable.
These assumptions are crucial because many statistical tests rely on them. If the residuals
aren't random and normally distributed, the test results might be misleading.

Question 4:

a) Figure 1 36 Random Numbers:

 The autocorrelation function ACF shows a mix of positive and negative correlations.
 Some lags have significant correlations e.g., lag 5, while others are close to zero.
 This graph does not indicate white noise because white noise has no correlation
between consecutive observations.

Figure 2 360 Random Numbers

 The ACF is smoother and more evenly distributed around zero.


 Most lags have correlation close to zero.
 This graph suggests closer resemblance to white noise, but some patterns may still
exist.

Figure 3 1000 Random Numbers

 The ACF is even smoother and more centred around zero.


 Almost all lags have correlations close to zero.
 This graph closely resembles white noise.
As the sample size increases, the ACF becomes smoother and the correlations approach zero.
However, only the last figure random numbers can be considered close to white noise.

The graphs differ due to sample size. Figure 1 (36 random numbers) shows mixed
correlations, Figure 2 (360 random numbers) has smoother ACF with correlations closer to
zero, and Figure 3 (1000 random numbers) closely resembles white noise with predominantly
zero correlations. As sample size increases, ACF becomes smoother and correlations
approach zero, indicating white noise characteristics.

b) Critical Values:

 Critical values confidence bounds are determined by the sample size and the desired
confidence level e.g. 95%,
 As the sample size increases, the critical values become narrower closer to zero
 This is because larger samples provide more accurate estimates, leading to higher
confidence intervals.

Autocorrelation:

 Even though the graphs refer to white noise, the sample size affects the precision of
the estimates.
 With more data points, the estimated autocorrelations become more stable and closer
to the true population values.
 Smaller sample sizes may exhibit more variability in the ACF due to random
fluctuations.

While all graphs refer to white noise, the larger sample size provides more reliable estimates,
resulting in smoother ACFs and narrower confidence bounds.

As sample size increases, critical values narrow, reflecting more accurate estimates and
tighter confidence intervals. Similarly, larger sample sizes stabilize autocorrelation estimates,
yielding smoother ACFs closer to true population values. Smaller sample sizes may exhibit
greater variability due to random fluctuations. Overall, larger samples offer more reliable
estimates, resulting in smoother ACFs and narrower confidence bounds compared to smaller
samples.

Larger sample sizes lead to smoother autocorrelation functions and narrower confidence
bounds, reflecting more accurate estimates. Despite variations, all figures indicate
characteristics of white noise, with larger samples providing more reliable results.
Question 5:

The provided plot illustrates characteristics such as trends, seasonality, or varying variance,
which suggest non-stationarity. If the series demonstrates a consistent upward or downward
trend, it indicates that the mean is not constant over time. Seasonality involves predictable
and recurring patterns over a specific period, which violates the stationarity assumption.
Varying variance, where the spread of series data points increases or decreases over time,
also suggests non-stationarity. Differencing the data can help stabilize the mean by removing
changes at a lagged level and potentially reduce trend and seasonality, making the series
stationary.

To make a series stationary, one common method is differencing, where you subtract the
current value from the previous value. This method can help to eliminate or reduce trend and
seasonality, potentially stabilizing the mean and variance of the series over time. The
question suggests that such an analysis was expected by examining each plot for these
characteristics and then applying differencing to check if it achieves stationarity.

Question 6:

1: Read the Data:

import pandas as pd

# Read data from excel


data = pd.read_excel("retail.xlsx", skiprows=1, index_col=0, parse_dates=True)

Imported the retail data from the "retail.xlsx" file, skipping the first row, and set given date
column in the index.

2: Plotting the Time Series Data:

import matplotlib.pyplot as plt

# Plot time series data


plt.figure(figsize=(10, 6))
plt.plot(data)
plt.title("Retail Data Time Series Plot")
plt.xlabel("Year")
plt.ylabel("Sales")
plt.show()

This visualized the retail sales data over time which let us observe any trends, seasonality, or
patterns.

3: Plotting the Autocorrelation Function (ACF):

from statsmodels.graphics.tsaplots import plot_acf

# Plot ACF
plt.figure(figsize=(10, 6))
plot_acf(data, lags=50, alpha=0.05)
plt.title("Autocorrelation Function (ACF) Plot")
plt.xlabel("Lag")
plt.ylabel("ACF")
plt.show()

Examined the ACF plot to see how quickly autocorrelation decreases as lag increases,
identifying any long-term dependencies.

4: Plotting the Partial Autocorrelation Function (PACF):

from statsmodels.graphics.tsaplots import plot_pacf

# Plot PACF
plt.figure(figsize=(10, 6))
plot_pacf(data, lags=50, alpha=0.05)
plt.title("Partial Autocorrelation Function (PACF) Plot")
plt.xlabel("Lag")
plt.ylabel("PACF")
plt.show()

Analysed the PACF plot to identify direct effects of each lag on the current observation,
looking for significant spikes beyond the first few lags indicating long-term dependencies.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Read data from excel


data = pd.read_excel("retail.xlsx", skiprows=1, index_col=0, parse_dates=True)

# Plot time series data


plt.figure(figsize=(10, 6))
plt.plot(data)
plt.title("Retail Data Time Series Plot")
plt.xlabel("Year")
plt.ylabel("Sales")
plt.show()

# Plot ACF
plt.figure(figsize=(10, 6))
plot_acf(data, lags=50, alpha=0.05)
plt.title("Autocorrelation Function (ACF) Plot")
plt.xlabel("Lag")
plt.ylabel("ACF")
plt.show()

# Plot PACF
plt.figure(figsize=(10, 6))
plot_pacf(data, lags=50, alpha=0.05)
plt.title("Partial Autocorrelation Function (PACF) Plot")
plt.xlabel("Lag")
plt.ylabel("PACF")
plt.show()

Presence of a trend in the time series plot. Slow decay or significant correlations in ACF and
PACF plots. Loaded the retail sales data from the Excel file, setting the date column as the
index. Visualized the sales data to detect trends or patterns over time. Checked how quickly
correlations decay as lag increases, indicating the series' dependence on past values.
Identified direct influences of each lag on the current observation, helping determine the
appropriate differencing order.

You might also like