Assignment 4: Time Series Analysis and
Forecasting of Online Retail Revenue
Objective
The objective of this assignment is to conduct time series analysis and forecasting on the Online
Retail dataset using the Day and Revenue variables. We aim to:
• Visualize temporal patterns
• Identify seasonality and trends
• Apply statistical and machine learning models for forecasting
• Evaluate model performance
Accurate forecasting supports inventory management, marketing strategies, and financial
planning.
Import library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
• numpy → works with numbers, arrays, and math.
• pandas → works with tables and data (like Excel).
• matplotlib.pyplot → makes basic graphs and plots.
• seaborn → makes nice-looking graphs on top of matplotlib.
Read dataset
df = pd.read_csv('teleco_time_series.csv')
df.head()
Day Revenue
0 1 0.000000
1 2 0.000793
2 3 0.825542
3 4 0.320332
4 5 1.082554
df.shape
(731, 2)
df.isnull().sum()
Day 0
Revenue 0
dtype: int64
EDA
start_date = pd.Timestamp("2020-01-01")
df = df.sort_values("Day").reset_index(drop=True)
df["Date"] = start_date + pd.to_timedelta(df["Day"] - df["Day"].min(),
unit="D")
df = df.set_index("Date")
start_date = pd.Timestamp("2020-01-01")
start_date
Timestamp('2020-01-01 00:00:00')
• Convert the “Day” numbers into actual calendar dates starting from 2020-01-01 and set
them as the DataFrame index
df.head(5)
Day Revenue
Date
2020-01-01 1 0.000000
2020-01-02 2 0.000793
2020-01-03 3 0.825542
2020-01-04 4 0.320332
2020-01-05 5 1.082554
ts = df["Revenue"].asfreq("D")
• Take the 'Revenue' column and convert it into a time series with daily frequency
print("Shape:", df.shape)
print(df.head(), "\n")
plt.figure(figsize=(12,5))
plt.plot(df['Revenue'], label="Revenue")
plt.title("Telecom Revenue (Daily)")
plt.xlabel("Date")
plt.ylabel("Revenue")
plt.legend()
plt.tight_layout()
plt.show()
Shape: (731, 2)
Day Revenue
Date
2020-01-01 1 0.000000
2020-01-02 2 0.000793
2020-01-03 3 0.825542
2020-01-04 4 0.320332
2020-01-05 5 1.082554
• Plot daily revenue as a labeled line chart.
Data preprocessing
# create seasonal subseries from the day column
df['Year'] = df['Day'].apply(lambda d: 1 if d <= 365 else 2 )
df['Quater'] = (((df['Day'] - 1)//91.3125 % 4)+1).astype(int)
df['Month'] = (((df['Day'] - 1)//30.437 % 12)+1).astype(int)
df['Week'] = (((df['Day'] - 1)//7 % 52.1429)+1).astype(int)
• Year: Assign 1 or 2 depending on whether the day is in the first or second year.
• Quarter: Calculate which quarter (1–4) the day falls into.
• Month: Calculate which month (1–12) the day falls into.
• Week: Calculate which week (1–52) the day falls into.
Feature Selection
n = int(len(df)*0.8)
n
584
ts_train = df.iloc[:n,:]
ts_test = df.iloc[n:,:]
print(ts_train.shape,ts_test.shape)
(584, 1) (147, 1)
• In time series analysis, we avoid using random sampling methods like train_test_split
(commonly used in non-time-series machine learning) for splitting data into training and
testing sets. Instead, we use a sequential split (as seen in the provided code with ts_train
= df.iloc[:n,:] and ts_test = df.iloc[n:,:]) for the following reasons:
5.decompose
• Decompose Dataset to analyse Trends, Seasonality and residual components.
from statsmodels.tsa.seasonal import seasonal_decompose
df = ts.reset_index() # convert to DataFrame
df.columns = ["Date", "Revenue"] # rename columns
print(df.head())
Date Revenue
0 2020-01-01 0.000000
1 2020-01-02 0.000793
2 2020-01-03 0.825542
3 2020-01-04 0.320332
4 2020-01-05 1.082554
# Apply rcParams for styling
results = seasonal_decompose(df['Revenue'], model='additive',
period=90)
plt.rcParams.update({
"figure.figsize": (12, 8), # Bigger figure
"axes.titlesize": 14, # Title size
"axes.labelsize": 12, # Label size
"xtick.labelsize": 10, # X-axis tick size
"ytick.labelsize": 10, # Y-axis tick size
"legend.fontsize": 12, # Legend font size
"lines.linewidth": 1.5 # Thicker lines
})
# Plot with updated style
results.plot()
plt.suptitle("Seasonal Decomposition of Revenue (Additive Model)",
fontsize=16)
plt.show()
6. Test for Stationarity
• We observe that moving (rolling) mean and moving std deviation are not constant with
respect to time.
• both are showing upword trends.
• Ideal case moving avg and moving std must be constant.
• time series in not stationary
Agumented Dickey fuller test
• The intution behind the test is that if the series is integrated , then the lagged level series
y(t-1) will provide no relevent infomation in predicting the change in y(t)
• null hypothisis (H0) : time series is not stationary
• alternative hypothisis (Ha) : Time series is stationary
rollmean = ts.rolling(window = 15).mean()
rollstd = ts.rolling(window = 15).std()
• There are multiple ways to make time series stationary:
a. Transformation
• Log transformation
• square root transformation
• Box cox Transformation
• Exponential Transformation
b. Differencing: yt = yt - y(t-1)
c. Seasonal Differencing: yt' = yt - y(t-n)
d. smoothing
e. Decomposition
orig = plt.plot(ts,label = 'original')
mean = plt.plot(rollmean, label = 'Rolling mean ')
std = plt.plot(rollstd , label = 'Rolling STD')
plt.legend(loc = 'best')
plt.show()
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
rollmean = timeseries.rolling(window = 15).mean()
rollstd = timeseries.rolling(window = 15).std()
orig = plt.plot(timeseries,label = 'original')
mean = plt.plot(rollmean, label = 'Rolling mean',color = 'r')
std = plt.plot(rollstd , label = 'Rolling STD',color = 'k')
plt.legend(loc = 'best')
plt.show()
dftest = adfuller(timeseries, autolag = 'AIC')
df_output = pd.Series(dftest[0:4],index = ['Test
Statistic','p_value','#Lags','Number of Observation Used'])
for key,value in dftest[4].items():
df_output[f'Critical Values(%s) {key}'] = value
print(df_output)
Explanation:
1. Rolling Mean & Std
– We use a 15-period rolling window.
– If both rolling mean and std stay roughly constant → series might be stationary.
2. ADF Test
– Null Hypothesis (H0): Series has a unit root → Non-Stationary.
– Alternative Hypothesis (H1): Series is stationary.
– Decision rule:
• If p-value < 0.05 → Reject H0 → Series is Stationary.
• Else → Fail to Reject H0 → Series is Non-Stationary.
test_stationarity(ts)
Test Statistic -1.924612
p_value 0.320573
#Lags 1.000000
Number of Observation Used 729.000000
Critical Values(%s) 1% -3.439352
Critical Values(%s) 5% -2.865513
Critical Values(%s) 10% -2.568886
dtype: float64
Defferencing
# yt = yt - y(t-1)
ts_diff = ts - ts.shift(2)
Explanation:
• ts.shift(1) → shifts the series by 1 time step.
Subtracting gives the 1st-order difference.
• ts.shift(2) → shifts the series by 2 time steps.
Subtracting gives the 2nd-order difference.
Differencing stabilizes the mean of a series by removing trend/seasonality,
which is required before applying models like ARIMA.
ts_diff.dropna(inplace = True)
test_stationarity(ts_diff)
Test Statistic -6.826541e+00
p_value 1.941386e-09
#Lags 1.700000e+01
Number of Observation Used 7.110000e+02
Critical Values(%s) 1% -3.439581e+00
Critical Values(%s) 5% -2.865614e+00
Critical Values(%s) 10% -2.568939e+00
dtype: float64
6. ACF and PACF plot
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plt.figure(figsize = (12,8))
plt.subplot(2,1,1)
plot_acf(ts_diff, ax = plt.gca(),lags = 30)
plt.subplot(2,1,2)
plot_pacf(ts_diff,ax = plt.gca(),lags = 30)
plt.show()
Interpretation Guide:
• From ACF plot → Decide q (MA part)
Look for the lag where the autocorrelation cuts off (drops to near zero).
• From PACF plot → Decide p (AR part)
Look for the lag where the partial autocorrelation cuts off.
These values (p, d, q) form the ARIMA model order.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.arima.model import ARIMA
# p = 0,1,2, q = 0,1, d = 0,1,2
model = ARIMA(ts_diff, order = (1,0,1))
results = model.fit()
results.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
SARIMAX Results
======================================================================
========
Dep. Variable: Revenue No. Observations:
729
Model: ARIMA(1, 0, 1) Log Likelihood
-491.037
Date: Fri, 29 Aug 2025 AIC
990.073
Time: 10:08:57 BIC
1008.440
Sample: 01-03-2020 HQIC
997.160
- 12-31-2021
Covariance Type: opg
======================================================================
========
coef std err z P>|z| [0.025
0.975]
----------------------------------------------------------------------
--------
const 0.0452 0.024 1.898 0.058 -0.001
0.092
ar.L1 -0.4702 0.034 -14.001 0.000 -0.536
-0.404
ma.L1 0.9976 0.007 140.691 0.000 0.984
1.011
sigma2 0.2239 0.013 17.769 0.000 0.199
0.249
======================================================================
=============
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB):
2.05
Prob(Q): 0.93 Prob(JB):
0.36
Heteroskedasticity (H): 1.02 Skew:
-0.01
Prob(H) (two-sided): 0.85 Kurtosis:
2.74
======================================================================
=============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
"""
Conclusion:
• The ARIMA(1,0,1) model fits the data well.
• Both AR and MA terms are statistically significant.
• Diagnostics confirm residuals are independent, normally distributed, and
homoscedastic.
• Information criteria (AIC/BIC/HQIC) can be compared with other ARIMA models to
confirm if this is the best fit.
ts_diff.plot(label="Actual", figsize=(12,5))
results.fittedvalues.plot(label="Fitted", color='red')
plt.legend()
plt.show()
# Forecast next 30 days
forecast_steps = 30
forecast = results.get_forecast(steps=forecast_steps)
# Get predicted mean and confidence intervals
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int()
# Plot forecast
plt.figure(figsize=(12,5))
plt.plot(ts_diff, label="Actual")
plt.plot(forecast_mean.index, forecast_mean, label="Forecast",
color='green')
plt.fill_between(forecast_ci.index,
forecast_ci.iloc[:,0],
forecast_ci.iloc[:,1], color='lightgreen', alpha=0.5)
plt.legend()
plt.show()
Interpretation:
• Green line → Model’s forecasted revenue for the next 30 days.
• Light green shaded area → Confidence interval (uncertainty range).
• Blue line → Historical differenced series used for fitting.
This helps visualize both future revenue trends and the reliability of predictions.
results.fittedvalues
Date
2020-01-03 0.045205
2020-01-04 0.251925
2020-01-05 -0.030325
2020-01-06 0.183198
2020-01-07 -0.170984
...
2021-12-27 -0.345738
2021-12-28 0.133179
2021-12-29 0.318206
2021-12-30 -0.318382
2021-12-31 -0.299240
Freq: D, Length: 729, dtype: float64
from sklearn.metrics import r2_score, mean_absolute_error,
mean_squared_error, mean_absolute_percentage_error
import numpy as np
# Actual vs Fitted
y_true = ts_diff
y_pred = results.fittedvalues
# Metrics
r2 = r2_score(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mape = mean_absolute_percentage_error(y_true, y_pred)
# Print all at once
print(f"R² Score: {r2:.4f}")
print(f"MAE: {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAPE: {mape:.4f}")
R² Score: 0.2598
MAE: 0.3821
RMSE: 0.4744
MAPE: 2.7629
SARIMA model
Unlike simple ARIMA, SARIMA (Seasonal ARIMA) accounts for both trend and seasonality in the
time series.
We specify two sets of parameters:
1. Non-seasonal part (p,d,q):
– p=2 → Includes 2 autoregressive (AR) terms (depends on last 2 lags).
– d=0 → No differencing applied.
– q=0 → No moving average (MA) terms.
2. Seasonal part (P,D,Q,s):
– P=1 → Includes 1 seasonal autoregressive term.
– D=0 → No seasonal differencing.
– Q=0 → No seasonal moving average term.
– s=30 → Seasonal cycle length = 30 (likely monthly seasonality if data is daily).
Other options:
• enforce_stationarity=False → Allows fitting even if strict stationarity is not
satisfied.
• enforce_invertibility=False → Allows fitting even if invertibility conditions are
not met.
fig, ax = plt.subplots(1,2, figsize=(14,4))
plot_acf(df['Revenue'], ax=ax[0])
plot_pacf(ts, ax=ax[1])
plt.show()
import itertools
import warnings
from statsmodels.tsa.statespace.sarimax import SARIMAX
warnings.filterwarnings("ignore")
p = [0,1,2]
d = [0] # you fixed differencing
q = [0]
seasonal_p = [0,1]
seasonal_d = [0,1]
seasonal_q = [0,1]
s = 30 # weekly seasonality
non_seasonal_orders = list(itertools.product(p, d, q))
seasonal_orders = list(itertools.product(seasonal_p, seasonal_d,
seasonal_q, [s]))
best_aic = float("inf")
best_params = None
best_model = None
for order in non_seasonal_orders:
for seasonal_order in seasonal_orders:
try:
model = SARIMAX(ts,
order=order,
seasonal_order=seasonal_order,
enforce_stationarity=False,
enforce_invertibility=False)
model_fit = model.fit(disp=False)
current_aic = model_fit.aic
if current_aic < best_aic:
best_aic = current_aic
best_params = (order, seasonal_order)
best_model = model_fit
except Exception as e:
continue
print("Best AIC:", best_aic)
print("Best Params:", best_params)
Best AIC: 954.0711133375758
Best Params: ((2, 0, 0), (1, 0, 0, 30))
import statsmodels.api as sm
# example orders (replace with what you found)
p, d, q = 2, 0, 0
P, D, Q, s = 1, 0, 0, 30 # s is seasonal periodicity
best_sarima_model = SARIMAX(ts,order = (2,0,0),seasonal_order =
(1,0,0,30),enforce_stationarity=False,enforce_invertibility=False)
best_sarima_fit= best_sarima_model.fit()
best_sarima_model.fit().summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
SARIMAX Results
======================================================================
====================
Dep. Variable: Revenue No. Observations:
731
Model: SARIMAX(2, 0, 0)x(1, 0, 0, 30) Log Likelihood
-473.036
Date: Fri, 29 Aug 2025 AIC
954.071
Time: 10:13:17 BIC
972.270
Sample: 01-01-2020 HQIC
961.106
- 12-31-2021
Covariance Type: opg
======================================================================
========
coef std err z P>|z| [0.025
0.975]
----------------------------------------------------------------------
--------
ar.L1 0.5386 0.034 15.991 0.000 0.473
0.605
ar.L2 0.4631 0.034 13.737 0.000 0.397
0.529
ar.S.L30 -0.0355 0.038 -0.935 0.350 -0.110
0.039
sigma2 0.2266 0.013 17.132 0.000 0.201
0.253
======================================================================
=============
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB):
2.72
Prob(Q): 0.91 Prob(JB):
0.26
Heteroskedasticity (H): 0.96 Skew:
-0.01
Prob(H) (two-sided): 0.74 Kurtosis:
2.70
======================================================================
=============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
"""
forecast = best_sarima_fit.get_forecast(steps = 60)
ci = forecast.conf_int()
plt.plot(ts,label = 'Observed Demand')
plt.plot(best_sarima_fit.fittedvalues, label = 'Fitted values')
plt.plot(forecast.predicted_mean,label = 'Forecast', color = 'orange')
plt.fill_between(ci.index,ci.iloc[:,0],ci.iloc[:,1],color =
'orange',alpha = 0.3)
plt.show()
Interpretation:
• Observed (blue) → Actual revenue/demand values.
• Fitted Values (green) → Model’s in-sample fit → shows how well SARIMA explains
historical data.
• Forecast (orange line) → Predicted future values.
• Shaded band (orange area) → Confidence interval → uncertainty of predictions.
This visualization makes it easy to compare actual vs fitted vs forecast and judge how reliable
the model is in capturing both trend and seasonality.
from sklearn.metrics import r2_score, mean_absolute_error,
mean_squared_error, mean_absolute_percentage_error
import numpy as np
# Actual vs Fitted
y_true = ts
y_pred = best_sarima_fit.fittedvalues
# Metrics
r2 = r2_score(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mape = mean_absolute_percentage_error(y_true, y_pred)
# Print all at once
print(f"R² Score: {r2:.4f}")
print(f"MAE: {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAPE: {mape:.4f}")
R² Score: 0.9849
MAE: 0.3808
RMSE: 0.4733
MAPE: 0.0624
Prophet model
df = df.rename(columns={"Date": "ds", "Revenue": "y"})
from prophet import Prophet
model = Prophet()
model.fit(df)
10:12:33 - cmdstanpy - INFO - Chain [1] start processing
10:12:33 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x1d006064980>
Time Series Forecasting with Prophet
Prophet is a time series forecasting library by Facebook (Meta) that handles trends, seasonality,
and holidays automatically.
It is especially useful for daily, weekly, or yearly data with strong seasonal effects.
1. Prophet() → Creates a Prophet model object.
– seasonality_mode → 'additive' (default) or 'multiplicative'
– yearly_seasonality → Model captures yearly seasonal patterns
– weekly_seasonality → Model captures weekly patterns
– holidays → Optional DataFrame to include holiday effects
2. df → Must be a pandas DataFrame with:
– ds → datetime column
– y → numeric column to forecast
3. model.fit(df) → Trains the model on your data.
– Learns trends, seasonality, and holiday effects.
– After fitting, the model is ready to predict future values using
model.predict(future).
days_forecast = model.make_future_dataframe(periods = 365 )
days_forecast
ds
0 2020-01-01
1 2020-01-02
2 2020-01-03
3 2020-01-04
4 2020-01-05
... ...
1091 2022-12-27
1092 2022-12-28
1093 2022-12-29
1094 2022-12-30
1095 2022-12-31
[1096 rows x 1 columns]
y_prediction = model.predict(days_forecast)
model.plot(y_prediction)
plt.title('Forecast Signal Value')
plt.xlabel('Date')
plt.ylabel('Signal Strength (dBm)')
plt.show()
forecoast_df = y_prediction[['ds','yhat','yhat_lower','yhat_upper']]
result = pd.merge(df, forecoast_df, on = 'ds',how = 'inner')
result
ds y yhat yhat_lower yhat_upper
0 2020-01-01 0.000000 0.886250 -0.239651 1.979115
1 2020-01-02 0.000793 0.757337 -0.348770 1.907965
2 2020-01-03 0.825542 0.752101 -0.304042 1.959633
3 2020-01-04 0.320332 0.718706 -0.373206 2.013013
4 2020-01-05 1.082554 0.729916 -0.444631 1.859251
.. ... ... ... ... ...
726 2021-12-27 16.931559 17.099233 15.963015 18.280605
727 2021-12-28 17.490666 17.049021 15.825265 18.214090
728 2021-12-29 16.803638 17.094324 15.908044 18.254234
729 2021-12-30 16.194813 16.957491 15.836851 18.198155
730 2021-12-31 16.620798 16.941487 15.792119 18.142373
[731 rows x 5 columns]
n = int(len(df)*0.8)
n
584
signal_data3 = df[['ds','y']]
train = df.iloc[:n,:]
test = df.iloc[n:,:]
from prophet import Prophet
# initialize model with multiple regressor
model = Prophet()
model.fit(train)
10:12:34 - cmdstanpy - INFO - Chain [1] start processing
10:12:34 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x1d00cadf890>
y_pred = model.predict(test)
model.plot(y_pred)
plt.title('Forecast signal value with external regressor')
plt.xlabel('Date')
plt.ylabel('Signal Strength (dbm)')
plt.show()
from sklearn.metrics import r2_score, mean_absolute_error,
mean_squared_error, mean_absolute_percentage_error
import numpy as np
# Actual vs Fitted
y_true = test['y']
y_pred = y_pred['yhat']
# Metrics
r2 = r2_score(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
mape = mean_absolute_percentage_error(y_true, y_pred)
# Print all at once
print(f"R² Score: {r2:.4f}")
print(f"MAE: {mae:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAPE: {mape:.4f}")
R² Score: 0.1453
MAE: 1.6607
RMSE: 1.9963
MAPE: 0.1296
LSTM Model
LSTM Model for Time Series Forecasting
Introduction
Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) that
are particularly effective for sequential data like time series. LSTMs can capture long-term
dependencies and patterns in data, making them suitable for forecasting daily revenue.
Steps:
1. Normalize the revenue data to improve model performance.
2. Prepare the data in sequences (sliding windows) for LSTM input.
3. Build the LSTM model using Keras.
4. Train the model on the training set.
5. Forecast revenue and evaluate performance.
# Assuming df has columns: 'ds' (date) and 'y' (revenue)
df['ds'] = pd.to_datetime(df['ds'])
df.set_index('ds', inplace=True)
data = df['y'].values
# Normalize data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
data_scaled = scaler.fit_transform(data.reshape(-1,1))
• We normalize data for LSTM so that all input values are on a similar scale, which helps
the model learn faster and prevents issues like exploding or vanishing gradients.
This improves training stability and leads to more accurate forecasts.
def create_sequences(data, time_steps=30):
X, y = [], []
for i in range(len(data)-time_steps):
X.append(data[i:(i+time_steps), 0])
y.append(data[i+time_steps, 0])
return np.array(X), np.array(y)
time_steps = 30 # past 30 days used for prediction
X, y = create_sequences(data_scaled, time_steps)
# Reshape for LSTM [samples, time steps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
model = Sequential()
model.add(LSTM(64, return_sequences=True,
input_shape=(X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(128, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
history = model.fit(
X_train, y_train,
validation_data=(X_test, y_test),
epochs=100,
batch_size=32,
verbose=1
)
Epoch 1/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 76ms/step - loss: 0.0025 - val_loss:
0.0018
Epoch 2/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - loss: 0.0022 - val_loss:
0.0019
Epoch 3/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 74ms/step - loss: 0.0024 - val_loss:
0.0018
Epoch 4/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 70ms/step - loss: 0.0023 - val_loss:
0.0025
Epoch 5/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 78ms/step - loss: 0.0024 - val_loss:
0.0022
Epoch 6/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 76ms/step - loss: 0.0021 - val_loss:
0.0018
Epoch 7/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 78ms/step - loss: 0.0021 - val_loss:
0.0022
Epoch 8/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - loss: 0.0022 - val_loss:
0.0026
Epoch 9/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 70ms/step - loss: 0.0023 - val_loss:
0.0015
Epoch 10/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 70ms/step - loss: 0.0022 - val_loss:
0.0018
Epoch 11/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 71ms/step - loss: 0.0022 - val_loss:
0.0015
Epoch 12/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 71ms/step - loss: 0.0022 - val_loss:
0.0021
Epoch 13/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 81ms/step - loss: 0.0022 - val_loss:
0.0016
Epoch 14/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 80ms/step - loss: 0.0023 - val_loss:
0.0026
Epoch 15/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 77ms/step - loss: 0.0021 - val_loss:
0.0015
Epoch 16/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 83ms/step - loss: 0.0022 - val_loss:
0.0019
Epoch 17/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 80ms/step - loss: 0.0022 - val_loss:
0.0020
Epoch 18/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 71ms/step - loss: 0.0021 - val_loss:
0.0013
Epoch 19/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 93ms/step - loss: 0.0022 - val_loss:
0.0013
Epoch 20/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 69ms/step - loss: 0.0019 - val_loss:
0.0020
Epoch 21/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 77ms/step - loss: 0.0021 - val_loss:
0.0020
Epoch 22/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 69ms/step - loss: 0.0018 - val_loss:
0.0014
Epoch 23/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 75ms/step - loss: 0.0023 - val_loss:
0.0013
Epoch 24/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 70ms/step - loss: 0.0019 - val_loss:
0.0013
Epoch 25/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 72ms/step - loss: 0.0021 - val_loss:
0.0020
Epoch 26/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 70ms/step - loss: 0.0019 - val_loss:
0.0015
Epoch 27/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 78ms/step - loss: 0.0021 - val_loss:
0.0018
Epoch 28/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 72ms/step - loss: 0.0023 - val_loss:
0.0015
Epoch 29/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 68ms/step - loss: 0.0025 - val_loss:
0.0013
Epoch 30/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 70ms/step - loss: 0.0020 - val_loss:
0.0018
Epoch 31/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 71ms/step - loss: 0.0020 - val_loss:
0.0018
Epoch 32/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 80ms/step - loss: 0.0017 - val_loss:
0.0013
Epoch 33/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 77ms/step - loss: 0.0021 - val_loss:
0.0016
Epoch 34/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 86ms/step - loss: 0.0019 - val_loss:
0.0015
Epoch 35/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 111ms/step - loss: 0.0019 - val_loss:
0.0014
Epoch 36/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 3s 83ms/step - loss: 0.0019 - val_loss:
0.0011
Epoch 37/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 77ms/step - loss: 0.0020 - val_loss:
0.0016
Epoch 38/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 81ms/step - loss: 0.0019 - val_loss:
0.0012
Epoch 39/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 82ms/step - loss: 0.0019 - val_loss:
0.0012
Epoch 40/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 73ms/step - loss: 0.0018 - val_loss:
0.0011
Epoch 41/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 79ms/step - loss: 0.0018 - val_loss:
0.0017
Epoch 42/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 79ms/step - loss: 0.0018 - val_loss:
0.0014
Epoch 43/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 74ms/step - loss: 0.0017 - val_loss:
0.0012
Epoch 44/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 78ms/step - loss: 0.0019 - val_loss:
0.0026
Epoch 45/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 86ms/step - loss: 0.0022 - val_loss:
0.0056
Epoch 46/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 73ms/step - loss: 0.0022 - val_loss:
0.0011
Epoch 47/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 73ms/step - loss: 0.0017 - val_loss:
0.0011
Epoch 48/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 82ms/step - loss: 0.0017 - val_loss:
0.0019
Epoch 49/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 77ms/step - loss: 0.0015 - val_loss:
0.0012
Epoch 50/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 80ms/step - loss: 0.0019 - val_loss:
0.0011
Epoch 51/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 87ms/step - loss: 0.0019 - val_loss:
0.0011
Epoch 52/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 89ms/step - loss: 0.0016 - val_loss:
0.0013
Epoch 53/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 97ms/step - loss: 0.0017 - val_loss:
0.0018
Epoch 54/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 3s 73ms/step - loss: 0.0016 - val_loss:
0.0022
Epoch 55/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 85ms/step - loss: 0.0019 - val_loss:
0.0035
Epoch 56/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 103ms/step - loss: 0.0019 - val_loss:
0.0012
Epoch 57/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 3s 74ms/step - loss: 0.0018 - val_loss:
0.0017
Epoch 58/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 80ms/step - loss: 0.0019 - val_loss:
0.0012
Epoch 59/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 76ms/step - loss: 0.0016 - val_loss:
0.0018
Epoch 60/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 78ms/step - loss: 0.0016 - val_loss:
0.0012
Epoch 61/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 85ms/step - loss: 0.0015 - val_loss:
0.0011
Epoch 62/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 76ms/step - loss: 0.0015 - val_loss:
0.0013
Epoch 63/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 82ms/step - loss: 0.0016 - val_loss:
0.0010
Epoch 64/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 77ms/step - loss: 0.0016 - val_loss:
0.0015
Epoch 65/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 80ms/step - loss: 0.0016 - val_loss:
0.0015
Epoch 66/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 75ms/step - loss: 0.0015 - val_loss:
0.0010
Epoch 67/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 83ms/step - loss: 0.0015 - val_loss:
0.0011
Epoch 68/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 76ms/step - loss: 0.0015 - val_loss:
0.0010
Epoch 69/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 75ms/step - loss: 0.0016 - val_loss:
0.0010
Epoch 70/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 74ms/step - loss: 0.0017 - val_loss:
0.0013
Epoch 71/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 79ms/step - loss: 0.0015 - val_loss:
0.0010
Epoch 72/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 83ms/step - loss: 0.0016 - val_loss:
0.0010
Epoch 73/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 80ms/step - loss: 0.0015 - val_loss:
0.0010
Epoch 74/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 92ms/step - loss: 0.0016 - val_loss:
0.0011
Epoch 75/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 122ms/step - loss: 0.0014 - val_loss:
0.0010
Epoch 76/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 3s 70ms/step - loss: 0.0016 - val_loss:
9.7479e-04
Epoch 77/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 76ms/step - loss: 0.0014 - val_loss:
0.0012
Epoch 78/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 63ms/step - loss: 0.0013 - val_loss:
0.0020
Epoch 79/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 63ms/step - loss: 0.0015 - val_loss:
0.0014
Epoch 80/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 78ms/step - loss: 0.0015 - val_loss:
0.0020
Epoch 81/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 80ms/step - loss: 0.0016 - val_loss:
0.0020
Epoch 82/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 88ms/step - loss: 0.0014 - val_loss:
0.0015
Epoch 83/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 76ms/step - loss: 0.0013 - val_loss:
9.7875e-04
Epoch 84/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 74ms/step - loss: 0.0013 - val_loss:
0.0011
Epoch 85/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 75ms/step - loss: 0.0014 - val_loss:
0.0016
Epoch 86/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 78ms/step - loss: 0.0016 - val_loss:
0.0031
Epoch 87/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 76ms/step - loss: 0.0016 - val_loss:
0.0012
Epoch 88/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 87ms/step - loss: 0.0014 - val_loss:
9.6009e-04
Epoch 89/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 69ms/step - loss: 0.0014 - val_loss:
9.5869e-04
Epoch 90/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 73ms/step - loss: 0.0015 - val_loss:
0.0014
Epoch 91/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 65ms/step - loss: 0.0013 - val_loss:
0.0019
Epoch 92/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 72ms/step - loss: 0.0013 - val_loss:
0.0013
Epoch 93/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 71ms/step - loss: 0.0016 - val_loss:
0.0010
Epoch 94/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 69ms/step - loss: 0.0017 - val_loss:
9.3513e-04
Epoch 95/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 70ms/step - loss: 0.0015 - val_loss:
9.6089e-04
Epoch 96/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 95ms/step - loss: 0.0016 - val_loss:
0.0010
Epoch 97/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 114ms/step - loss: 0.0015 - val_loss:
0.0015
Epoch 98/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 2s 83ms/step - loss: 0.0013 - val_loss:
0.0010
Epoch 99/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 75ms/step - loss: 0.0014 - val_loss:
9.7181e-04
Epoch 100/100
18/18 ━━━━━━━━━━━━━━━━━━━━ 1s 81ms/step - loss: 0.0014 - val_loss:
0.0012
y_pred = model.predict(X_test)
# Inverse scale
y_pred_inv = scaler.inverse_transform(y_pred)
y_test_inv = scaler.inverse_transform(y_test.reshape(-1,1))
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step
import matplotlib.pyplot as plt
plt.figure(figsize=(12,5))
plt.plot(y_test_inv, label="Actual")
plt.plot(y_pred_inv, label="Predicted")
plt.title("LSTM Forecast vs Actual")
plt.legend()
plt.show()
future_steps = 30 # forecast horizon
last_sequence = data_scaled[-time_steps:] # last known 30 days
forecast = []
current_seq = last_sequence.reshape(1, time_steps, 1)
for _ in range(future_steps):
pred = model.predict(current_seq, verbose=0)[0][0]
forecast.append(pred)
pred_reshaped = np.array(pred).reshape(1,1,1) # 🔹 fix here
current_seq = np.append(current_seq[:,1:,:], pred_reshaped,
axis=1)
# Inverse scale back
forecast_inv = scaler.inverse_transform(np.array(forecast).reshape(-
1,1))
Explanation of Forecasting Steps
1. Set forecast horizon: We want to predict the next 30 days (future_steps = 30).
2. Start from last known sequence: Take the most recent 30 days of normalized data as
input (last_sequence).
3. Iteratively predict next day:
– Use the LSTM model to predict the next value.
– Append this predicted value to the current sequence to predict the following day.
– Repeat this process for 30 steps.
4. Inverse transform: Convert the normalized predictions back to actual revenue using the
scaler.
This way, we generate future revenue predictions step by step, using previous predictions as
input for the next day.
plt.figure(figsize=(12,6))
plt.plot(data, label="Historical Revenue")
plt.plot(range(len(data), len(data)+future_steps), forecast_inv,
label="Forecast", color="red")
plt.title("LSTM Forecasting into the Future")
plt.xlabel("Time")
plt.ylabel("Revenue")
plt.legend()
plt.show()
from sklearn.metrics import mean_squared_error, mean_absolute_error,
r2_score
# Calculate metrics
mse = mean_squared_error(y_test_inv, y_pred_inv)
mae = mean_absolute_error(y_test_inv, y_pred_inv)
r2 = r2_score(y_test_inv, y_pred_inv)
# Print results
print("MSE:", mse)
print("MAE:", mae)
print("R² Score:", r2)
MSE: 0.3954797498014632
MAE: 0.4984128810276936
R² Score: 0.9184559761317201
Business Recommendations
Based on the time series analysis of the "teleco_time_series.csv" dataset, which shows an
upward trend and seasonal patterns: