Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views8 pages

Work

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views8 pages

Work

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

A Comparative Study of Optimization Algorithms for LSTM-Based Time

Series Forecasting: Adadelta, Nadam, and Genetic Algorithms.


Anum Joseph1, Dape Alexander Naanret2, Agabaidu Abraham Sunday3, Abdulbasit Adeyemi Abdulazeez4,
Ramotalahi Adenike Situ5, Godfrey Kigbu6, Christos Asulada Yaro7, Lemuel Isuwa8, Joseph Michael Tanko9, Oklo
Elvis Samuel10, Olaitan Tawakat Olopade11, Paschal O. Igata12, Olonitola Benjamin Damilola13
Department of Computer Engineering, Federal University of Technology Minna 920211, Niger State, Nigeria

Abstract
This research explores the application of the three optimization algorithms; Adadelta, Nadam,
and the Genetic Algorithm Tuned (GA-Tuned) in the time series forecasting of stock price data
obtained from Yahoo Finance. The purpose was to evaluate and compare these optimizers for
training Long Short Term Memory (LSTM) network. The models showed good results after data
preprocessing and feature engineering and after applying the models, performance indicators
such as RMSE, MAE and R² score were used. The best algorithm was agreed to be Genetic
Algorithm given that it provided a lower RMSE (3.466) and MAE (2.71) with highest value of
R² equal to 0.98 than Nadam and Adadelta. Based on these discoveries, hyperparameters tuning
using genetic algorithms has revealed valuable as an improvement for forecasting the stock
prices’ performance.

Keywords; Stock Price Prediction, LSTM Model, Optimization Algorithms, Genetic Algorithm
(GA-Tuned), Nadam Optimizer, Adadelta Optimizer, Hyperparameter Tuning, Machine
Learning, Predictive Modeling
1. Introduction identify sequential patterns and long-term
In many sectors, such as finance, healthcare, dependencies in data, which makes them
energy, and meteorology, time series ideal for time series forecasting tasks
forecasting is essential because precise (Budiman et al., 2024). In contrast to
projections of future trends enable well- conventional models, LSTMs are capable of
informed decision-making. For time series addressing problems such as the vanishing
analysis, conventional statistical techniques gradient problem, which frequently impairs
like Autoregressive Integrated Moving the functionality of conventional recurrent
Average (ARIMA) have been widely used. neural networks (RNNs).
But when it comes to handling the complex An essential component of LSTM models'
patterns and non-linear dynamics found in efficacy is optimization. Optimizers are
real-world data, these methods frequently made to modify the network's weights and
fail. The development of machine learning biases in order to reduce prediction error and
has opened the door for more complex improve model accuracy. LSTMs have been
models to overcome these constraints. merged with a variety of optimization
Among these, Long Short-Term Memory algorithms, each of which has different
(LSTM) networks have become a potent advantages and difficulties. Gradient-based
instrument because of their capacity to optimizers like as Adadelta and Nadam, for
example, have been acknowledged for their
ability to dynamically modify learning rates, the overall efficacy of time series
which can result in increased accuracy and forecasting models.
quicker convergence (Sha, 2024). However, 2. Related Works
by employing population-based techniques Sujan Ghimire et al (2019) uses Moderate
modelled after natural evolution, Genetic Resolution Imaging Spectroradiometer
Algorithms (GAs) provide an alternative (MODIS) satellite data (aerosol, cloud, and
strategy that can investigate a wider variety water vapor) and feature selection methods,
of solutions and possibly circumvent the deep learning models were created to
local minima problems that gradient-based estimate long-term global sun radiation
techniques might run into. (GSR) in Australia's solar towns. The
Model performance is greatly impacted by models performed better than conventional
the optimizer selection, according to machine learning models, especially deep
comparative research on optimization belief networks (DBN), which demonstrated
techniques for LSTM-based models. Nadam higher accuracy in predicting GSR across all
combines the advantages of Nesterov- sites. Sensitivity investigation verified the
accelerated gradients with adaptive learning, major influence of atmospheric conditions
offering improved convergence speed and on GSR predictions. The method works well
stability, whereas Adadelta prioritizes and can be used to forecast solar energy in
flexibility and lower memory utilization. comparable areas. . However, the scope of
Through the evolution of multiple solutions the study was restricted to the prediction
over many generations, genetic algorithms horizon, authenticating deep learning for
—despite their computational complexity— monthly averaged, and daily GSR.
have proven to be robust in finding optimal Ali Al Bataineh, et al (2021) designed a
configurations (Sonata & Heryadi, 2024). research which proposes a novel framework
These optimizers are especially useful when using clonal selection algorithm (CSA) for
datasets are prone to noise and fluctuation or automatic long short-term memory (LSTM)
when complex data patterns are not captured topology optimization in text classification.
by conventional techniques. The approach efficiently encodes and
The purpose of this research is to compare evolves topologies, outperforming state-of-
Adadelta, Nadam, and Genetic algorithms in the-art method because designing a LSTM
relation to LSTM-based time series topology is complex due to many
forecasting. This study aims to determine hyperparameters. It reduces overfitting
which optimization strategy offers the without regularization and achieves high
optimum balance between accuracy and accuracy across datasets but future works
computing economy by evaluating its can expand CSA evaluations and apply it to
performance across a variety of datasets. (natural language processing) NLP tasks like
The knowledge gathered from this summarization, translation, and speech
investigation will help researchers and recognition.
practitioners choose the best optimization Lirong Yin, et al (2023) developed a
approach for LSTM applications, increasing research which suggests an innovative
approach that combines U-Net and LSTM in learning by pre-training the network and
order to forecast lake boundaries. While then updating it with new input.
LSTM manages time-series data and Harshal Dhake, et al (2023) proposes two
enhances predictions by preserving and algorithms based on well-established
updating historical information, U-Net optimization techniques to tune LSTM
records geographical features. Compared to network hyperparameters and enhance
more conventional models like Markov or performance utilizing (Fast Fourier
cellular automata, the model's incorporation Transform) FFT-based data decomposition.
of LSTM allows it to more accurately assess Consistent with findings in Shahid et al.
spatiotemporal changes in lake boundaries. (2021), the study shows that optimization
Overfitting is avoided by a variety of approaches significantly improve
optimization strategies, and the method performance when the right number of
shows notable benefits over the U-Net-STN iterations and optimizer settings are used.
model, providing higher accuracy for natural Improving model performance and training
lake management and decision-making. requires data breakdown and
However, the research is limited to remote hyperparameter tweaking. In subsequent
sensing imagery as the sole basis for research, these algorithms can be extended
predicting changes in lake boundaries and to a range of datasets and models for even
the results is uninterpretable. greater improvement.
Johannes De Smedta, et al (2024) introduces Even though there are currently
(Processes As-Movies) PAM, a technique inadequately efficient and user-friendly
that uses feature creation to simulate hyperparameter tweaking options, even with
processes like sequences for training high- earlier research in this field showing that
dimensional recurrent neural networks. changing hyperparameters improves model
PAM predicts future process models and performance.
records dynamic process advancements by Taha Abd El Halim Nakabi, (2020)
predicting constraint sets at various time introduces a novel approach to
intervals. It has proven accurate for real- (Thermostatically Controlled Load) TCL
world event records and supports control that uses electricity price proxies, in
forecasting, monitoring, and predictive which TCL power usage is influenced by a
conformance checking. series of prices. An LSTM network is used
Whereas in the future, larger experiments, to describe TCL behavior based on historical
research into deeper network topologies like observations, framing the problem as a
PredRNN (Recurrent Neural Network), and Markov decision process with non-
analysis of various constraint types will be Markovian state. To anticipate the next
included in order to strike a balance between cluster state, which is utilized in a genetic
depth and performance. The window-based algorithm for profit maximization, the
trace splitting might also be improved to LSTM aggregates individual TCL replies to
support varying trace lengths. Additionally, price-based demand response (DR). TCL
the technique might be adapted for online unit experiments demonstrate that the LSTM
accurately forecasts load profiles, resulting
in increased energy arbitrage gains. The initiatives, and resource allocation.
LSTM efficiently manages partial Performance is much improved by model
observability and unpredictability, while the optimization, which advances the expanding
flexibility of TCLs facilitates grid body of research on deep learning models
integration of renewable energy by offering for pandemic forecasting that may be used
auxiliary services. in future medical emergencies. For well-
Accurate wind speed estimates are essential informed resource management and
for reducing conventional energy use and policymaking, accurate COVID-19
emissions, which are major concerns these forecasting is essential. These models can
days. R. K. Reja, et al (2023) explores assist authorities in the (United Arab
methods for short-term wind speed Emirates) UAE in predicting the course of
predictions to improve the dispatch of the epidemic, putting prompt actions into
renewable energy. It recommends a two- place, and allocating resources effectively.
stage disintegration procedure in Meanwhile, Future research could explore
conjunction with short-term forecasting. The alternative feature selection and
technique uses data processing, preprocessing techniques to improve the
optimization, and deep learning prediction model performance.
algorithms to forecast wind speed every ten Albert Zeyer, (2022) compared training
minutes. The model uses (Root Mean methods, neural models, and sequence-to-
Squared Error) RMSE, (Mean Squared sequence architectures, training methods,
Error) MSE, and R2 metrics to reliably particularly data augmentation,
estimate wind speed. Data from the Turkish regularization, and the number of training
(Supervisory Control and Data Acquisition) epochs, show the most consistent and
SCADA system is used for training and significant improvements. Compared to
testing, along with external components training approaches, neural models have a
including wind direction, LV active power, slightly less influence but still produce good
and theoretical power curves. The data results. Even while end-to-end models
confirm that the model can accurately typically outperform sequence-to-sequence
forecast future wind speed. Though this designs, the comparison is still somewhat
model is currently fitted for multivariate ambiguous. Irie (2020) reports similar
parameters. Later, it may be expanded to results in language modeling, where the
include various parameters whose data is largest increases were achieved with the
available in a provided data-set. introduction of LSTMs and subsequently
Muhammad Usman Tariq, (2024) highlights Transformers, together with extended
model performance, compares findings to training times and appropriate
prior research, acknowledges limitations, regularization. Alternative neural models
and suggests future directions. Predicting and additional training method
new (coronavirus diseases 2019) COVID-19 improvements should be the main topics of
cases with high-performing models like future research. Improvements to its internal
RNN and (multilayer perceptron) MLP can components, including adding attention
improve vaccination plans, public health mechanisms or other brain advances, could
help our extended transducer architecture, The data is then scaled, typically
which is already devoid of independence using normalization or
assumptions. standardization, which is crucial for
The application of a predictive model known neural networks like LSTM to
as (finite impulse response long short-term handle inputs effectively.
memory) FIR-LSTM to forecast daily iii. Model Training: The LSTM model is
COVID-19 instances is covered in this trained using three different
paper. As a preprocessor, the FIR layer optimizers—Adadelta, Nadam, and
removes noise from the input time series the Genetic Algorithm’s optimized
data and extracts important characteristics. parameters. During training, each
Dense layers produce the final result after optimizer adjusts the model's
several LSTM layers have captured internal weights to minimize the loss
dependencies from these characteristics. The function, with the goal of improving
window technique, which is commonly used predictive accuracy over 30 epochs.
for FIR filter design, was utilized to choose iv. Model Evaluation: The models are
the FIR layer weights during training, while evaluated on test data using metrics
the Adam method was employed to optimize such as RMSE, MAE, and R² score.
the LSTM and dense layer weights. Both These metrics help determine the
layers' hyperparameters were adjusted using accuracy and robustness of the
the improved (self-adaptive global harmony models. A comparison of the three
search) SGHS technique. The FIR-LSTM optimizers’ performance reveals
model performed well in the simulation, which method is best suited for stock
especially when long-term data was used. price prediction.
Sicheng Li (2023) v. Visualization of Results: Finally, the
3. Methodology training process and predictions are
visualized through loss curves and
i. Data Collection: The historical stock prediction plots. These visualizations
price data is sourced from Yahoo show the performance of each
Finance, forming the basis for model optimizer and provide insights into
training and testing. This data how well the models captured the
includes essential variables such as trends in stock prices.
open, high, low, and closing prices.
Having a long enough history of data 4. Results
helps capture market trends and
From the graph in Figure 1, we can observe
fluctuations effectively.
that the GA-Tuned model (red line) achieves
ii. Data Preprocessing: Raw data is
the lowest and most stable loss after a few
cleaned to ensure completeness, and
epochs compared to Adadelta and Nadam.
feature engineering is used to
Initially, all three models experience a sharp
enhance predictive performance. For
decrease in loss within the first few epochs,
instance, moving averages and price
but the GA-Tuned model quickly converges
lags are derived from the raw data.
to a lower loss value than the other two quickly to volatility. Nadam, while
optimizers. This suggests that the genetic performing better than Adadelta, still shows
algorithm was effective in optimizing the slight mismatches during high volatility,
hyperparameters of the model, leading to although it maintains relatively good
faster convergence and better performance. alignment with the true price. Overall, the
On the other hand, Adadelta shows the GA-Tuned optimizer provides the most
highest initial loss and demonstrates more accurate and responsive predictions,
fluctuation throughout the training, demonstrating the benefits of
indicating less stability. Nadam (blue) hyperparameter tuning through genetic
performs better than Adadelta, achieving algorithms in capturing complex patterns in
lower loss earlier, but it still converges to a stock price movements.
higher loss than the GA-Tuned model.

Figure 2 Adadelta Test predictions

Figure 1 Loss per Epoch for each Optimizer

In Figures 2, 3 and 4, we observe the test


predictions for Google's stock price using
the Adadelta, Genetic Algorithm (GA-
Tuned), and Nadam optimizers over a
shorter, more volatile period. The GA-Tuned Figure 3 Nadam Test predictions
model performs the best, closely following
the true stock price with minimal deviation,
indicating its strong ability to capture rapid
changes in stock price trends. In contrast,
both Adadelta and Nadam show slight lags
and underestimation during periods of sharp
increases in stock prices. Specifically,
Adadelta tends to lag behind the true price
more noticeably, especially during rapid Figure 4 Genetic Optimizer Test predictions
fluctuations, suggesting it struggles to adapt
The Genetic Algorithm (GA-Tuned) model —in predicting stock prices using time
demonstrates the best overall performance series forecasting models. The results
with the lowest RMSE (3.466) and MAE demonstrated that the Genetic Algorithm
(2.71), as well as the highest R² score (0.98) outperformed the other optimizers, yielding
shown in Table 1. This indicates that the the lowest error rates (RMSE: 3.466, MAE:
GA-Tuned model achieves the most accurate 2.71) and the highest R² score (0.98),
predictions, with minimal error, and explains indicating superior accuracy and predictive
98% of the variance in the stock price data, capability. Nadam also performed well,
confirming its ability to capture complex though slightly behind the GA-Tuned
trends and price movements. model, while Adadelta showed the weakest
performance with higher error rates and
Table 1 Performance Metrics for each
slower adaptation to volatility.
Optimization Technique
In summary, the study highlights the value
Optimizer RMSE MAE R2 SCORE
of hyperparameter tuning through genetic
Adadelta 5.808 4.67 0.944 algorithms in improving stock price
Nadam 3.584 2.771 0.979 prediction models, particularly in capturing
Genetic 3.466 2.71 0.98 complex trends and handling market
fluctuations.

In comparison, Nadam also performs well, References


with an RMSE of 3.584 and an MAE of
Abd El Halim Nakabi, Taha, [2020].
2.771, along with an R² score of 0.979.
Computational intelligence for smart grid’s
While Nadam is close to the performance of
flexibility Prediction, coordination, and
the Genetic Algorithm, it has slightly higher
optimal pricing.
errors, showing that while effective, it is not
as finely tuned as the GA model. Finally, ALI AL BATAINEH, (Member, IEEE),
Adadelta shows the weakest performance, AND DEVINDER KAUR, [2021]. (Life
with significantly higher RMSE (5.808) and Senior Member, IEEE) Immunocomputing-
MAE (4.67), and a lower R² score (0.944). Based Approach for Optimizing the
While it still explains 94.4% of the variance, Topologies of LSTM Networks.
the higher errors indicate that Adadelta
Budiman, N., Alamsyah, R. Y., & Rakhman,
struggles more with making accurate
A., [2024]. Activation Function in LSTM for
predictions, particularly during periods of
Improved Forecasting of Closing Natural
volatility, as suggested by its tendency to lag
Gas Stock Prices. JITK (Jurnal Ilmu
behind in the test predictions.
Pengetahuan Dan Teknologi Komputer),
5. Conclusion 10(1), 100-107.
https://doi.org/10.33480/jitk.v10i1.5258.
This study explored the effectiveness of
three optimization algorithms—Adadelta, Ghimire, Sujan, Ravinesh C Deo, Nawin
Nadam, and Genetic Algorithm (GA-Tuned) Raj, and Jianchun Mi, [2019]. Deep
Learning Neural Networks Trained with (LSTM) Optimization. Advances in
MODIS Satellite-Derived Predictors for Economics, Management, and Political
Long-Term Global Solar Radiation Sciences, 91(1), 142-149.
Prediction. https://doi.org/10.54254/2754-
1169/91/20241031.
Dhake, Harshal, Yashwant Kashyap, and
Panagiotis Kosmopoulos, [2023]. Sonata, I., & Heryadi, Y., [2024].
Algorithms for Hyperparameter Tuning of Comparison of LSTM and Transformer for
LSTMs for Time Series Forecasting. Time Series Data Forecasting. In 2024 7th
International Conference on Informatics
De Smedta, Johannes, and Jochen De
and Computational Sciences (ICICoS) (pp.
Weerdta, [2024]. Predictive Process Model
491-495). IEEE.
Monitoring Using Recurrent Neural
https://doi.org/10.1109/ICICoS.2024.912.
Networks.
Yin, Lirong, Lei Wang, Tingqiao Li, Siyu
Li, Sicheng, [2023]. School of Science,
Lu, Jiawei Tian, Zhengtong Yin, Xiaolu Li,
Computing and Engineering Technologies
and Wenfeng Zheng, [2023]. U-Net-LSTM:
Swinburne University of Technology
Time Series-Enhanced Lake Boundary
Hawthorn, VIC 3122, Australia. Artificial
Prediction Model.
Neural Network–based COVID-19
Diagnosis and Prediction. Tariq, Muhammad Usman, and Shuhaida
Binti Ismail, [2024]. AI-powered COVID-19
Reja, R. K., Ruhul Amin, Zinat Tasneem,
forecasting: a comprehensive comparison of
Sarafat Hussain Abhi, Md. Firoj Ali, Md.
advanced deep learning methods.
Faisal Rahman Badal, [2023]. Rajshahi
University of Engineering & Technology Zeyer, Albert, Prof. Dr.-Ing. Hermann Ney,
Rajshahi. A New ANN Technique for Short- Assoc. Prof. Dr. Shinji Watanabe, Prof. Dr.
Term Wind Speed Forecasting Based on sc. techn. Bastian Leibe, [2022]. Neural
SCADA System Data in Turkey. Network based Modeling and Architectures
for Automatic Speech Recognition and
Sha, X., [2024]. Time Series Stock Price
Machine Translation.
Forecasting Based on Genetic Algorithm
(GA)-Long Short-Term Memory Network

You might also like