Short-term Electrical Load Forecasting
Dr. Devender Singh, Ayush Kumar Goyal (15084005), Boragapu Sunil Kumar (15084006),
Srimukha Paturi (15084013) and Rishabh Agrahari (15084015)
Department of Electrical Engineering, IIT(BHU), Varanasi
Abstract— Electricity demand forecasting is a central and and support vector machine. Recently, sequential models
integral process for planning periodical operations and facility such as RNN, LSTM and GRUs have come into picture for
expansion in the electricity sector. Demand pattern is almost forecasting Electric Load.
very complex due to the deregulation of energy markets.
Therefore, finding an appropriate forecasting model for a The ARIMA models and their versions have achieved
specific electricity network is not an easy task. Although many a considerable success for Electric Load Forecasting. In
forecasting methods were developed, none can be generalized general, ARIMA models can be used when the time series
for all demand patterns. This review article aims to explain is stationary without missing data. They can be further
the complexity of available solutions, their strengths and weak- hybridized with artificial intelligence techniques. However,
nesses, and the opportunities and threats that the forecasting
tools offer or that may be encountered. the complexity of demand pattern depends on its base period;
it changes from fairly smooth curve (annually based) to most
I. INTRODUCTION noisy and cyclic complex curve (hourly based) since the
Electricity as a product has very different characteristics effect of environmental factors increases.
compared to a material product. For instance, electricity In this era the electric power consumption is growing
energy cannot be stored as it should be generated as soon fast and may be more randomly because of the increasing
as it is demanded. Any commercial electric power company effect of environmental and human behavior. Therefore,
has several strategic objectives. One of these objectives the electricity demand pattern becomes more complex and
is to provide end users (market demands) with safe and unrecognized. For instance, the people all over the world are
stable electricity. Therefore, Electric Load Forecasting is a using increased number and variety of electric appliances
vital process in the planning of electricity industry and the most of them are environmentally related, that increases the
operation of electric power systems. Accurate forecasts lead cyclic variation and noise on the demand pattern. Though
to substantial savings in operating and maintenance costs, in- there are many forecasting methods, no single one can be
creased reliability of power supply and delivery system, and generalized to perform enough for all cases, especially when
correct decisions for future development. Electricity demand many factors are considered. Thus, to get a proper forecast,
is assessed by accumulating the consumption periodically; it it is not just adopting a famous method. In other words,
is almost considered for hourly, daily, weekly, monthly, and an ideal method for a case may perform poorly for another
yearly periods. one. Therefore, the research must be directed to specially
The Electric Load Forecasting is classified in terms of assigned methods. In other words, each electric power plant
the planning horizons duration: up to 1 day/week ahead for in any country needs to follow its own forecasting method.
short-term, 1 day/week to 1 year ahead for medium-term, and For that purpose, the general methods can be also adopted
more than 1 year ahead for long-term. Short-term forecasts but with efficient and effective modifications that suit the
are used to schedule the generation and transmission of case; otherwise the results will be misleading.
electricity. Medium-term forecasts are used to schedule the The aim of this paper is to demonstrate a pragmatic fore-
fuel purchases. Long-term forecasts are used to develop casting methodology for analyzing the electric load pattern
the power supply and delivery system (generation units, and predicting the future load demand for short, medium,
transmission system, and distribution system). and/or long terms. This methodology can integrate different
The electricity demand pattern is necessarily affected forecasting models. The rest of the paper is organized as
by several factors including time, social, economical, and follows. Section 2 is about the dataset used for the proposed
environmental factors by which the pattern will form various methodologies which are presented in Section 3. Section 4
complex variations. Social (such as behavior) and environ- applies the methodology to a typical power load pattern.
mental factors are big sources of randomness (noise) found Concluding remarks are contained in Section 5. Note: All
on the load pattern. Diversity and complexity in demand the source code developed during the course of the project
pattern have been leading to developing complicated Electric is open source and is available on Github1
Load Forecasting methods. The literature is enriched with II. DATASET
Electric Load Forecasting methods having many attempts
We are using load data from State Load Dispatch Center
to find the best estimation of load forecasting. The major
(SLDC), Delhi. Load data is available at a time step of 5
methods include time series such as exponential smoothing,
ARMA, BoxJenkins ARIMA, neural networks; Fuzzy logic; 1 https://github.com/pyaf/load_forecasting
may be desirable in many cases). But if there is a trend in
the data-either increasing or decreasing-the moving average
has the adverse characteristic of lagging the trend. Therefore,
while a shorter time span produces more oscillation, there is a
closer following of the trend. Conversely, a longer time span
gives a smoother response but lags the trend. The formula
for a simple moving average is
Pn
At−i
Ft = i=1
n
Where Ft = Forecast for the coming period, n = Forecast
Fig. 1: Day wise plot for the coming period, and At−1 , At−2 , At−3 and so on are
the actual occurrences in the in the past period, two periods
ago, three periods ago and so on respectively.
B. Weighted Moving Average (WMA)
Whereas the simple moving average gives equal weight to
each component of the moving average database, a weighted
moving average allows any weights to be placed on each
element, providing, of course, that the sum of all weights
equals 1. The formula for the weighted moving average is
n
X
Ft = wi At−i
Fig. 2: Plot for a month i=1
n
X
wi = 1
minutes, totally 288 values for each day. A general trend of
i=1
the load data can be seen in the day wise plot and month
plot in Fig(1) and Fig(2) respectively. Where Ft = Forecast for the coming period, n = the total
This data is being extracted from the SLDC website of number of periods in the forecast, wi = the weight to be
Delhi 2 by an automated script. The mentioned website given to the actual occurrence for the period t-i, Ai = the
updates the data for every 5 minutes. Thus we have data in actual occurrence for the period t-i.
the time steps of 5 minutes. We acquired the data of the past Although many periods may be ignored (that is, their
one year till date, from the SLDC delhi website which made weights are zero) and the weighting scheme may be in
the information public. However, there are certain days when any order (for example, more distant data may have greater
the SLDC website was not functional due to maintenance or weights than more recent data), the sum of all the weights
any other factors, which implies the data collected might must equal 1. Experience and trial and error are the simplest
have some missing values. ways to choose weights. As a general rule, the most recent
past is the most important indicator of what to expect in the
III. METHODS future, and, therefore, it should get higher weighting.
In this paper several methods, from moving averages to
non-traditional algorithms such as RNNs have been imple- C. Simple Exponential Smoothing (SES)
mented on the data set described in the above section. The In the previous methods of forecasting (simple and
working of all these methods are discussed thoroughly below weighted moving average), the major drawback is the need
A. Simple Moving Average (SMA) to continually carry a large amount of historical data. (This
is also true for regression analysis techniques, which we
When demand for a product is neither growing nor declin- soon will cover) As each new piece of data is added in
ing rapidly, and if it does not have seasonal characteristics, these methods, the oldest observation is dropped, and the
a moving average can be useful can be useful in removing new forecast is calculated. In many applications (perhaps in
the random fluctuations for forecasting. Although moving most), the most recent occurrences are more indicative of
averages are frequently centered, it is more convenient to the future than those in the more distant past. If this premise
use past data to predict the following period directly. is valid that the importance of data diminishes as the past
Although it is important to select the best period for becomes more distant - then exponential smoothing may be
the moving average, there are several conflicting effects the most logical and easiest method to use.
of different period lengths. The longer the moving average
period, the more the random elements are smoothed (which Ft = αAt−1 + (1 − α)Ft−i
2 https://www.delhisldc.org/ where Ft = exponentially smoothed forecast for period t,
D. AutoRegressive Integrated Moving Average (ARIMA)
ARIMA models provide another approach to time series
forecasting. Exponential smoothing and ARIMA models
are the two most widely used approaches to time series
forecasting, and provide complementary approaches to the Fig. 3: RNN architecture
problem. While exponential smoothing models are based on
a description of the trend and seasonality in the data, ARIMA
models aim to describe the autocorrelations in the data. Data Preprocessing: For RNN, LSTM and GRU models
An ARIMA model can be understood by outlining each Data was made stationary by detrending using one lag
of its components as follows: differencing and was re-scaled to [-1, 1] scale. This helps
• Autoregression (AR) refers to a model that shows a in faster convergence and prevents features with relatively
changing variable that regresses on its own lagged, or large magnitudes like previous year load demand to carry a
prior, values. larger weight during training. The model was trained on last
• Integrated (I) represents the differencing of raw obser- 60 days of data, with each training data vector containing
vations to allow for the time series to become stationary, load of a specific time for last 10 days and the label was the
i.e., data values are replaced by the difference between load of the 11th day at the same corresponding time.
the data values and the previous values. Model Architecture: The model consisted of two layers
• Moving average (MA) incorporates the dependency one with two simple RNN cells with hyperbolic tangent
between an observation and a residual error from a function (tanh) as activation function and followed by a
moving average model applied to lagged observations. dense layer without any activation. Metric used for loss was
Each component functions as a parameter with a standard mean square error optimized with Adam optimizer with a
notation. For ARIMA models, a standard notation would be learning rate of 0.001. The model was trained for 15 epochs.
ARIMA with p, d, and q, where integer values substitute for The code was written in Python using Keras library.
the parameters to indicate the type of ARIMA model used.
The parameters can be defined as: F. Long Short-term Memory (LSTM)
• p: the number of lag observations in the model; also A usual RNN has a short-term memory. There are two
known as the lag order. major obstacles RNNs have or had to deal with, exploding
• d: the number of times that the raw observations are gradients and vanishing gradients. Long Short-Term Memory
differenced; also known as the degree of differencing. (LSTM) networks are an extension for recurrent neural
• q: the size of the moving average window; also known networks, which basically extends their memory. Therefore
as the order of the moving average. it is well suited to learn from important experiences that have
Data Preprocessing: We used last one months data at a very long time lags in between.
frequency of 30 minutes instead of 5 minutes due to compu- LSTMs enable RNNs to remember their inputs over a
tational overload. Best hyper-parameters were searched using long period of time. This is because LSTMs contain their
grid search method and used for model training. As the data information in a memory, that is much like the memory of
is seasonal with seasonality and their of varying nature of a computer because the LSTM can read, write and delete
trends in data given the season of year the optimized values information from its memory. This memory can be seen as a
of p, d, q were obtained accordingly. gated cell, where gated means that the cell decides whether
Model Architecture: Once the optimized hyper param- or not to store or delete information (e.g if it opens the gates
eters were available after the grid search finished, standard or not), based on the importance it assigns to the information.
ARIMA model was used for training and prediction using The assigning of importance happens through weights, which
Python’s statsmodels library. are also learned by the algorithm. This simply means that it
learns over time which information is important and which
E. Recurrent Neural Networks (RNN) not.
Recurrent Neural Networks are the state of the art algo- In an LSTM you have three gates: input, forget and output
rithm for sequential data and among others used by Apple’s gate. These gates determine whether or not to let new input in
Siri and Google’s Voice Search. This is because it is the (input gate), delete the information because it isnt important
first algorithm that remembers its input, due to an internal (forget gate) or to let it impact the output at the current time
memory, which makes it perfectly suited for Machine Learn- step (output gate).
ing problems that involve sequential data. It is one of the Data Preprocessing: Data preprocessing used in LSTM
algorithms behind the scenes of the amazing achievements is similar to the one used in RNN model.
of Deep Learning in the past few years. Model Architecture: The model consisted of two layers
In a RNN, the information cycles through a loop. When one with one LSTM cell with hyperbolic tangent function
it makes a decision, it takes into consideration the current (tanh) as activation function followed by a dense layer
input and also what it has learned from the inputs it received without any activation. Metric used for loss was mean square
previously. error optimized with Adam optimizer with a learning rate of
0.001. The model was trained for 15 epochs. The code was
written in Python using Keras library.
Fig. 7: WMA
Fig. 4: A LSTM cell
G. Gated Recurrent Unit (GRU)
A slightly more dramatic variation on the LSTM is the
Gated Recurrent Unit, or GRU. It combines the forget and
input gates into a single update gate. It also merges the cell
state and hidden state, and makes some other changes. The
resulting model is simpler than standard LSTM models, and Fig. 8: SES
has been growing increasingly popular.
Data Preprocessing: Data pre-processing used in LSTM
is similar to the one used in RNN model.
Model Architecture: The model consisted of two layers
one with one GRU cell with hyperbolic tangent function
(tanh) as activation function followed by a dense layer
without any activation. Metric used for loss was mean square
error optimized with Adam optimizer with a learning rate of
0.001. The model was trained for 15 epochs. The code was
written in Python using Keras library. Fig. 9: ARIMA
Fig. 5: A GRU cell
Fig. 10: RNN
IV. RESULTS
This section shows the results of the above mentioned
methodologies on the load data for date: 27/11/2018.
All the plot below show the real load data obtained
from the SLDC Delhi and the load curve forecasted
by the above mentioned methods. The real load
curve clearly shows typical day wise variations.
Fig. 11: LSTM
Forecasted results by all 7 algorithms:
For the sake of comparing these seven methods perfor-
mance wise, the error metrics RMSE3 and MAPE4 are taken
3 Root Mean Square Error
Fig. 6: SMA 4 Mean Absolute Percentage Error
Fig. 12: GRU
into account. The below table shows the error metrics for the
date 27/11/2018.
Serial No. Method RMSE MAPE
1 SMA 126.17 3.16
2 WMA 62.86 1.85
3 SES 100.01 2.73
4 ARIMA 178.66 5.82
5 RNN 96.86 2.8
6 LSTM 73.56 2.52
7 GRU 88.35 2.48
V. CONCLUSION
Six algorithms were implemented in the project namely
SMA, WMA, SES, ARIMA, RNN, LSTM and GRU. While
the performances vary every day because all the models are
trained everyday incorporating the last day’s data in training
set, WMA, LSTM or GRU are among the best performing
models. The performance of the LSTM and GRU models
can be further improved by incorporating weather parameters
and other features like day, weekend, month into the training
dataset.
ACKNOWLEDGMENT
We would like to extend our sincere thanks to our mentor
and guide Prof. D. Singh for his constant guidance and
supervision which has helped us in realizing this project to
its completion.
R EFERENCES
[1] Long Term Load Forecasting with Hourly Predictions based on Long-
Short-Term-Memory Networks, 2018 IEEE Texas Power and Energy
Conference (TPEC) Feb 2018
[2] http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[3] https://machinelearningmastery.com/
[4] https://www.delhisldc.org/