Electronics 13 01459
Electronics 13 01459
Review
Reinforcement Learning Techniques in Optimizing Energy Systems
Stefan Stavrev 1, * and Dimitar Ginchev 2
1 Department of Software Technologies, Faculty of Mathematics and Informatics, Plovdiv University “Paisii
Hilendarski”, 4000 Plovdiv, Bulgaria
2 Department of Air Transport, Faculty of Transport, Technical University of Sofia, 1000 Sofia, Bulgaria;
[email protected]
* Correspondence: [email protected]; Tel.: +359-889-716-824
Abstract: Reinforcement learning (RL) techniques have emerged as powerful tools for optimizing
energy systems, offering the potential to enhance efficiency, reliability, and sustainability. This
review paper provides a comprehensive examination of the applications of RL in the field of energy
system optimization, spanning various domains such as energy management, grid control, and
renewable energy integration. Beginning with an overview of RL fundamentals, the paper explores
recent advancements in RL algorithms and their adaptation to address the unique challenges of
energy system optimization. Case studies and real-world applications demonstrate the efficacy of
RL-based approaches in improving energy efficiency, reducing costs, and mitigating environmental
impacts. Furthermore, the paper discusses future directions and challenges, including scalability,
interpretability, and integration with domain knowledge. By synthesizing the latest research findings
and identifying key areas for further investigation, this paper aims to inform and inspire future
research endeavors in the intersection of reinforcement learning and energy system optimization.
1. Introduction
The pursuit of energy efficiency embodies the strategic utilization of technology
to minimize energy consumption while maintaining or enhancing the performance of
Citation: Stavrev, S.; Ginchev, D. systems across various domains, including industrial operations, power grids, civilian
Reinforcement Learning Techniques infrastructure, military applications, and Internet of Things (IoT) ecosystems. Achieving
in Optimizing Energy Systems. higher energy efficiency is paramount in the global endeavor towards sustainability, offer-
Electronics 2024, 13, 1459. https:// ing a multi-faceted spectrum of benefits such as significant cost reductions, diminished
doi.org/10.3390/electronics13081459 environmental footprints, and bolstered energy security. The automation of system op-
erations and performance through cutting-edge artificial intelligence (AI) methodologies
Academic Editor: Cheng He
stands at the forefront of this quest, heralding a new era of efficiency and intelligence in
Received: 14 March 2024 energy management.
Revised: 7 April 2024 Reinforcement learning, a sophisticated branch of machine learning inspired by be-
Accepted: 10 April 2024 havioral psychology, presents a paradigm where intelligent agents learn to make decisions
Published: 12 April 2024 autonomously to maximize cumulative rewards in novel and evolving environments. This
methodology, particularly in its advanced form of deep reinforcement learning (DRL),
which combines RL with the computational power of neural networks, has demonstrated
remarkable proficiency in navigating complex challenges. Its applications span from mas-
Copyright: © 2024 by the authors.
tering strategic games like chess and Go to advancing the fields of robotics and autonomous
Licensee MDPI, Basel, Switzerland.
vehicular navigation, highlighting its versatility and potential.
This article is an open access article
In the realm of energy efficiency, the adaptive nature and online learning capabilities
distributed under the terms and
conditions of the Creative Commons
of DRL are drawing increasing interest for their ability to dynamically respond to the
Attribution (CC BY) license (https://
evolving demands of energy systems. Through varied neural network architectures, from
creativecommons.org/licenses/by/ the foundational multi-layer perceptrons to sophisticated recurrent networks like Long
4.0/).
Short-Term Memory (LSTM) or Gated Recurrent Units (GRUs), DRL offers a rich toolkit for
modeling and optimizing energy systems under diverse and changing conditions.
As the global demand for energy surges amidst the urgency to mitigate environmental
impacts and facilitate sustainable development, the complexity and unpredictability of
modern energy systems have escalated. These systems, marked by variable demand,
intermittent renewable sources, and the challenges of grid integration, call for innovative
optimization strategies capable of navigating their inherent dynamics and uncertainties.
Reinforcement learning, with its ability to learn from direct interaction with the en-
vironment without relying on predefined models or assumptions, emerges as a potent
solution to these challenges. Unlike conventional optimization techniques, RL’s experi-
ential learning approach equips it to adeptly manage the nonlinear dynamics and the
unpredictability of contemporary energy systems.
This review critically assesses the efficiency of reinforcement learning in the optimiza-
tion of energy systems, delving into RL’s foundational principles, its relevance to energy
optimization, and a comprehensive analysis of its application in this domain through
the scholarly literature. This paper looks at different examples and evaluates different
reinforcement learning algorithms in solving problems to highlight what RL does well and
where it struggles when dealing with the complex issues of energy systems. Moreover,
it ventures into the societal ramifications of deploying RL-driven optimization solutions
and outlines prospects for future studies. In this review, we aim to add to the discussion
about moving to cleaner energy sources and explore innovations in the optimization of
energy systems.
Linear programming, for instance, is a widely used method in energy systems for
optimizing a linear objective function, subject to linear equality and inequality constraints.
It is particularly useful for tasks such as cost minimization and load dispatching where
the relationships can be linearized. LP provides solutions that are globally optimal if the
problem is convex. A similar technique is Integer Programming (IP). It extends linear
programming by restricting some or all of the variables as integers. This is useful in energy
system optimization for decisions that require discrete choices, like the number of genera-
tors to run or units of equipment to activate. A more advanced approach is Mixed-Integer
Linear Programming (MILP)—it combines LP and IP to handle problems involving both
continuous and discrete variables. MILP is extensively used in the planning and operation
of power systems, including unit commitment and the scheduling of energy resources,
where decisions about which power plants to run and their operating levels need to be
made simultaneously. Quadratic Programming, on the other hand, is used when the objec-
tive function is quadratic, which is common in cost optimization problems involving power
generation. It can optimize parabolic functions subject to linear constraints, applicable in
the optimization of fuel consumption and emission levels. For problems that can be broken
down into simpler subproblems and then solved recursively, dynamic programming is
often used. For instance, it is used for solving multi-stage decision-making processes
like hydrothermal scheduling, where the output from various power sources needs to be
optimized over time. In nonlinear problems and constraints, there are techniques called
Nonlinear Programming. Finally, when there are uncertainties in input data, such as future
demand, fuel prices, or renewable output, stochastic optimization methods are employed.
Techniques like Stochastic Programming can model these uncertainties as random variables
to make more robust decisions under uncertainty.
These conventional methods have provided the backbone for decision-making in
energy systems for decades, offering robust frameworks for optimizing the complex opera-
tions and planning tasks required in the energy sector.
crucial for effectively integrating intermittent renewable energy sources such as wind and
solar, which experience significant output fluctuations due to changing weather conditions.
Moreover, RL methods excel in making real-time decisions based on current state
observations, providing significant operational benefits for tasks such as demand response
and real-time grid balancing. This is a notable improvement over traditional optimization
methods, which often require model reruns or recalculations that are not feasible on a
minute-to-minute basis. Furthermore, RL can operate effectively with minimal information
about the system’s dynamics, an advantage in scenarios where complete data may not be
available or practical to collect.
Lastly, RL facilitates the simultaneous optimization of multiple objectives, enabling
the balancing of cost, reliability, and sustainability in energy management. This capacity for
multi-objective optimization aligns well with the complex trade-offs required in modern
energy systems, making RL an increasingly preferred approach in the field of energy system
optimization. A visual comparison is structured in Table 1.
3. Reinforcement Learning
RL encompasses foundational concepts crucial to its operation. An agent, representing
the decision-maker, is described as “an abstract entity (usually a program) that can make
observations, takes actions, and receives rewards for the actions taken, and transitioning to
new states based on actions taken. The overarching objective for the agent lies in learning a
policy that dictates optimal actions in various states, with the aim of maximizing cumulative
rewards over time [1]. Given a history of such interactions, the agent must make the next
Electronics 2024, 13, 1459 5 of 17
choice of action to maximize the long-term sum of rewards. To do this well, an agent
may take suboptimal actions which allow it to gather the information necessary to later
take optimal or near-optimal actions with respect to maximizing the long term sum of
rewards” [2]. Software agents have the capability to act either by following hand-coded
rules or by learning how to act through the utilization of machine learning algorithms.
Reinforcement learning constitutes one such subarea of machine learning.
Formally, most reinforcement learning problems can be described through a Markov
Decision Process (MDP). An MDP is delineated as the tuple {S, A, T, R}, where S denotes
a set of states, A stands for a set of actions, T represents the transitional probability, and
R denotes the reward function. Within MDP environments, a learning agent selects and
executes action at ∈ A at a current state s ∈ S at time t. Upon transitioning to the state
st+1 ∈ S at time t + 1, the agent receives a reward rt . The primary objective of the agent is
to maximize the discounted sum of rewards from time t to infinity, referred to as the return
Rt , which is defined as follows:
∞
R t = r t +1 + γ 1 r t +2 + γ 2 r t +3 + . . . = ∑ k =0 γ k r t + k +1 (4)
where E [.] indicates the expected value. The optimal value of state s, V ∗ (s), is defined as
the maximum value over all possible policies:
∞
h i
V * (s) = maxE ∑t=0 γt R(st , at ) s0 = s, at = π (st ) (6)
π
Related to the state-value function is the action-value function Qπ (s), which gives the
expected return when taking action a in state s and following policy π thereafter:
The optimal Q-value of state–action pair (s, a), is the maximum Q-value over all
possible policies:
∞
h i
Q* (s, a) = maxE ∑t=0 γt R(st , at ) s0 = s, at = a, at>0= π (st ) (8)
π
mechanisms, advantages, and the challenges it faces, particularly in complex domains like
energy system optimization.
a result, they may struggle to adapt to dynamic changes in energy supply and demand,
leading to suboptimal solutions.
In the face of these challenges, innovative approaches are needed to enhance energy
system optimization. Reinforcement learning emerges as a promising solution for address-
ing the complexities inherent in sustainable energy and electric systems. As a versatile
class of optimal control methods, experiences, simulations, or searches may use reinforce-
ment learning to derive value estimates in highly dynamic, stochastic environments. Its
interactive nature fosters robust learning capabilities and the ability to adapt without the
need for explicit models of system dynamics. This property makes reinforcement learning
particularly well suited for addressing the complex nonlinearities and uncertainties present
in sustainable energy and electric systems.
models can predict peak-load periods and adjust demand accordingly, either by directly
controlling smart appliances or through pricing incentives to consumers [30]. The best DRL
model, as identified by Gallego et al. [31], achieves a complete listing of optimal actions for
the forthcoming hour 90% of the time. This level of precision underscores the potential of
DRL in enhancing the flexibility of smart grids, providing a robust mechanism for adjusting
grid operations in response to real-time conditions and forecasts. This predictive prowess
is pivotal for maintaining operational efficiency and optimizing the dispatch of energy
resources within the grid.
Furthermore, RL enables the dynamic allocation of energy resources to where they
are most needed, ensuring optimal load distribution across the grid. The management
of distributed energy resources, such as rooftop solar panels, wind turbines, and battery
storage systems, is another critical aspect of smart grid optimization. RL algorithms are
adept at optimizing the operation of DERs, enhancing grid resilience and facilitating the
integration of renewable energy sources. By dynamically adjusting the dispatch of DERs
based on current grid conditions and forecasted demand, RL contributes to a more flexible
and responsive grid system [32]. Gallego et al. [31] illustrate the application of deep
reinforcement learning techniques, specifically Deep Q-Networks (DQNs), to select optimal
actions for managing grid components.
Maintaining optimal voltage and frequency levels is essential for grid stability and the
efficient operation of electrical devices. RL algorithms, through their ability to learn and
adapt from environmental feedback, can be effectively employed for real-time voltage and
frequency regulation. By continuously monitoring grid conditions and adjusting control
mechanisms, such as capacitor banks and voltage regulators, RL ensures that voltage and
frequency remain within desired ranges, even under fluctuating demand and generation
conditions [22].
Despite the promising applications of RL in smart grid optimization, several challenges
remain, including the scalability of RL solutions, integration with existing grid infrastruc-
ture, and the protection of consumer privacy in data-driven applications. Addressing these
challenges requires ongoing research and development, as well as collaboration between
industry, academia, and regulatory bodies.
Future directions in smart grid optimization with RL include the exploration of more
advanced RL techniques, such as deep reinforcement learning and multi-agent systems,
to manage the increasing complexity of smart grids. In conclusion, RL offers a versatile
and powerful toolset for optimizing smart grid operations across a range of dimensions.
Through its adaptive and predictive capabilities, RL not only enhances grid efficiency
and reliability but also plays a crucial role in the transition toward more sustainable and
resilient energy systems.
an improved voltage profile and reduced system losses. Furthermore, RL techniques can be
applied to optimize distribution system operation, particularly in the context of integrating
distributed energy resources such as solar panels, wind turbines, and energy storage sys-
tems. RL-based controllers can optimize DER dispatch strategies, maximizing renewable
energy utilization and grid reliability. For example, a study by Kumar et al. [35] used
RL techniques to optimize the operation of a microgrid with renewable energy sources,
achieving improved grid stability and reduced energy costs. Moreover, RL algorithms can
be applied to address challenges related to grid congestion and load balancing. By opti-
mizing the scheduling of energy flows and grid assets, RL-based controllers can alleviate
congestion and improve grid efficiency. For example, a study by [36] applied RL techniques
to optimize energy scheduling in distribution networks, achieving reduced congestion and
improved grid reliability.
5.5. Summary
The summarized RL techniques and their applications are presented in Table 2.
Applicable Energy
RL Method Benefits Challenges
System Scenarios
Enhance predictive accuracy
All energy systems, including Require extensive training data;
for demand and supply
Deep Learning Techniques microgrids, transmission, and computationally intensive; may
variations; optimize
distribution systems. overfit without proper tuning.
operational efficiency.
Offer robust decision-making
Smart grids, particularly in Prone to overestimation of
under uncertainty; effective in
Deep Q-Networks (DQNs) demand response and Q-values; require large and diverse
policy optimization for
battery management. datasets to train effectively.
load balancing.
Facilitates cooperative control
Distributed systems including Coordination complexity
and decentralized
Multi-Agent RL (MARL) microgrids and decentralized increases with number of agents;
decision-making; improves
smart grids. risk of conflicting objectives.
resilience and flexibility.
Suffer from high variance in
Directly optimize policy;
Voltage and frequency gradient estimates; slow
Policy Gradient Methods capable of handling
regulation in smart grids. convergence in environments with
continuous action spaces.
high-dimensional action spaces.
6. Discussion
In recent years, the application of reinforcement learning techniques in optimizing
energy systems has witnessed several notable trends and phenomena, driven by advance-
ments in technology, shifts in energy policies, and emerging challenges in the energy sector.
One trend is the increasing integration of RL algorithms with advanced data analytics
techniques, such as machine learning and deep learning [3,7,24,37]. This integration
enables RL-based controllers to use large volumes of data to learn complex patterns and
relationships in energy systems, leading to improved decision-making and optimization
performance. Furthermore, the rise of edge computing and Internet of Things (IoT) devices
has enabled RL algorithms to be deployed directly at the device level, allowing for real-time
control and optimization of distributed energy resources and smart grid components.
Another trend is the growing emphasis on decentralized and distributed energy
systems, driven by the proliferation of renewable energy sources, advancements in energy
storage technologies, and evolving consumer preferences. RL techniques play a crucial
role in optimizing the operation of distributed energy resources, microgrids, and virtual
power plants, enabling greater grid flexibility, resilience, and sustainability. Furthermore,
the emergence of peer-to-peer energy trading platforms and community-based energy
initiatives presents new opportunities for RL-based optimization and coordination among
energy prosumers.
Effective RL applications depend significantly on the quality, granularity, and time-
liness of the data collected. Diverse data sources, from real-time sensor outputs in smart
grids to historical energy consumption records, provide the necessary inputs for training
and refining RL models. The integration of IoT devices [38] and smart meters has revo-
lutionized data collection, enabling more precise and continuous streams of information.
These technologies not only facilitate the accurate modeling of energy demand and supply
dynamics [39] but also support the training of RL algorithms that can predict and adapt to
complex energy patterns efficiently. Additionally, advanced data preprocessing techniques
such as normalization, anomaly detection, and feature engineering are essential to prepare
raw data for effective learning and performance optimization [40]. This comprehensive data
infrastructure supports the adaptive and predictive capabilities of RL models, ultimately
driving their success in optimizing energy systems.
Moreover, the increasing complexity and interconnectedness of modern energy sys-
tems pose significant challenges for traditional optimization methods, which often struggle
to handle nonlinear dynamics, uncertainty, and stochastic environments. RL algorithms
offer a promising alternative by providing adaptive, model-free optimization approaches
that can learn and adapt to changing system conditions over time. Additionally, the appli-
cation of RL techniques in multi-agent systems and game-theoretic frameworks opens up
new avenues for addressing strategic interactions and market dynamics in energy systems.
One interesting phenomenon is the convergence of RL with other emerging technolo-
gies, such as blockchain and quantum computing, to address key challenges in energy
system optimization. Blockchain technology offers decentralized and transparent mecha-
nisms for peer-to-peer energy trading, while quantum computing promises exponential
gains in computational power for solving complex optimization problems. By integrating
RL with these technologies, researchers and practitioners can explore novel approaches for
optimizing energy systems at scale, leveraging the strengths of each technology to address
specific challenges and constraints.
Furthermore, the increasing focus on energy efficiency and sustainability has led
to the development of novel RL-based approaches for optimizing energy consumption
and reducing environmental impact. RL algorithms can learn adaptive control strategies
for energy-intensive processes, such as industrial manufacturing and HVAC systems, to
minimize energy waste and improve overall efficiency. Additionally, RL-based controllers
can optimize energy consumption in smart buildings and homes, by using real-time data
and user preferences to achieve significant energy savings without compromising comfort
or functionality.
Electronics 2024, 13, 1459 15 of 17
7. Conclusions
The application of reinforcement learning in optimizing energy systems offers trans-
formative potential, addressing key challenges within the sector such as efficiency, reli-
ability, and sustainability. This research has demonstrated that RL’s capability to adapt
to dynamic environments and handle complex, multi-objective optimization tasks can
significantly enhance how energy systems operate. Particularly, RL’s proficiency in real-
time decision-making and its robustness against uncertainties prove advantageous over
traditional optimization methods. These attributes facilitate the integration of renewable
energy sources, optimize demand response, and improve grid management, thereby sup-
porting the transition to more sustainable energy systems. However, challenges such as
scalability, computational demand, and the need for interpretable models remain and
must be addressed through continued interdisciplinary research. Looking forward, the
implementation of RL could reshape energy management practices, making them more
adaptive, efficient, and aligned with global sustainability goals.
Author Contributions: Conceptualization, S.S.; literature search, S.S.; review and analysis, S.S.
and D.G.; citations and references, D.G.; discussion, D.G. All authors have read and agreed to the
published version of the manuscript.
Electronics 2024, 13, 1459 16 of 17
Funding: This research was funded by the Research and Development Sector at the Technical
University of Sofia.
Data Availability Statement: No new data were created or analyzed in this study. Data sharing is
not applicable to this article.
Acknowledgments: This research is supported by the Aerospace Equipment and Technologies
Laboratory at the Technical University of Sofia and the Human–Computer Interactions and Simulation
Laboratory (HSL) at the University of Plovdiv “P. Hilendarski”.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018.
2. Langford, J. Efficient exploration in reinforcement learning. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.;
Springer: Boston, MA, USA, 2011; pp. 1–5. [CrossRef]
3. Liu, Y.; Swaminathan, A.; Liu, Z. Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. In Proceedings
of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [CrossRef]
4. Janner, M.; Fu, J.; Zhang, M.; Levine, S. When to Trust Your Model: Model-Based Policy Optimization. Adv. Neural Inf. Process.
Syst. 2019, 32, 12519–12530. [CrossRef]
5. Yang, T.; Zhao, L.; Li, W.; Zomaya, A.Y. Reinforcement learning in sustainable energy and electric systems: A survey. Annu. Rev.
Control 2020, 49, 145–163. [CrossRef]
6. Van Hasselt, H. Double Q-learning. Advances in Neural Information Processing Systems 23. In Proceedings of the 24th Annual
Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Volume 2010, pp. 2613–2621.
7. Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-learning. In Proceedings of the Thirtieth AAAI
Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; p. 30. [CrossRef]
8. Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the 32nd International
Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 1889–1897.
9. Van Seijen, H.; Mahmood, A.R.; Pilarski, P.M.; Machado, M.C.; Sutton, R.S. True online temporal-difference learning. J. Mach.
Learn. Res. 2016, 17, 1–40.
10. Iqbal, S.; Sarfraz, M.; Ayyub, M.; Tariq, M.; Chakrabortty, R.K. A comprehensive review on residential demand side management
strategies in smart grid environment. Sustainability 2021, 13, 7170. [CrossRef]
11. Ali, K.H.; Sigalo, M.; Das, S.; Anderlini, E.; Tahir, A.A.; Abusara, M. Reinforcement Learning for Energy-Storage Systems in
Grid-Connected Microgrids: An Investigation of Online vs. Offline Implementation. Energies 2021, 14, 5688. [CrossRef]
12. Paudel, A.; Hussain, S.A.; Sadiq, R.; Zareipour, H.; Hewage, K. Decentralized cooperative approach for electric vehicle charging.
J. Clean. Prod. 2022, 364, 132590. [CrossRef]
13. Puech, A.; Read, J. An improved yaw control algorithm for wind turbines via reinforcement learning. In Machine Learning and
Knowledge Discovery in Databases. ECML PKDD 2022; Amini, M.R., Canu, S., Fischer, A., Guns, T., Novak, P.K., Tsoumakas, G.,
Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 13717. [CrossRef]
14. Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-line building energy
optimization using deep reinforcement learning. IEEE Trans. Smart Grid 2018, 9, 3254–3264. [CrossRef]
15. Zhang, Z.; Zhang, D.; Qiu, R.C. Deep reinforcement learning for power system applications: An overview. Front. Inf. Technol.
Electron. Eng. 2019, 20, 1358–1372. [CrossRef]
16. Glavic, M. (Deep) Reinforcement learning for electric power system control and related problems: A short review and perspectives.
Annu. Rev. Control 2019, 48, 22–35. [CrossRef]
17. Alabi, T.M.; Aghimien, E.I.; Agbajor, F.D.; Yang, Z.; Lu, L.; Adeoye, A.R.; Gopaluni, B. A review on the integrated optimization
techniques and machine learning approaches for modeling, prediction, and decision making on integrated energy systems. Renew.
Energy 2022, 194, 822–849. [CrossRef]
18. DeCarolis, J.; Daly, H.; Dodds, P.; Keppo, I.; Li, F.; McDowall, W.; Pye, S.; Strachan, N.; Trutnevyte, E.; Usher, W.; et al. Formalizing
best practice for energy system optimization modelling. Appl. Energy 2017, 194, 184–198. [CrossRef]
19. Palensky, P.; Dietrich, D. Demand Side Management: Demand Response, Intelligent Energy Systems, and Smart Loads. IEEE
Trans. Ind. Inform. 2011, 7, 381–388. [CrossRef]
20. Cicilio, P.; Glennon, D.; Mate, A.; Barnes, A.; Chalishazar, V.; Cotilla-Sanchez, E.; Vaagensmith, B.; Gentle, J.; Rieger, C.;
Wies, R.; et al. Resilience in an evolving electrical grid. Energies 2021, 14, 694. [CrossRef]
21. Rehman, A.U.; Wadud, Z.; Elavarasan, R.M.; Hafeez, G.; Khan, I.; Shafiq, Z.; Alhelou, H.H. An optimal power usage scheduling
in smart grid integrated with renewable energy sources for energy management. IEEE Access 2021, 9, 9448087. [CrossRef]
22. Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques.
Appl. Energy 2019, 235, 1072–1089. [CrossRef]
23. Ruelens, F.; Claessens, B.J.; Vandael, S.; De Schutter, B.; Babuska, R.; Belmans, R. Residential demand response of thermostatically
controlled loads using batch Reinforcement Learning. IEEE Trans. Smart Grid 2017, 8, 2149–2159. [CrossRef]
Electronics 2024, 13, 1459 17 of 17
24. Zhang, C.; Wang, X.; Li, F.; He, Q.; Huang, M. Deep learning–based network application classification for SDN. Trans. Emerg.
Telecommun. Technol. 2018, 29, e3302. [CrossRef]
25. Mocanu, D.C.; Mocanu, E.; Stone, P.; Nguyen, P.H.; Gibescu, M.; Liotta, A. Scalable training of artificial neural networks with
adaptive sparse connectivity inspired by network science. Nat. Commun. 2018, 9, 2383. [CrossRef]
26. Zhang, Q.; Dehghanpour, K.; Wang, Z.; Qiu, F.; Zhao, D. Multi-Agent Safe Policy Learning for Power Management of Networked
Microgrids. IEEE Trans. Smart Grid 2021, 12, 1048–1062. [CrossRef]
27. Francois-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found.
Trends Mach. Learn. 2018, 11, 219–354. [CrossRef]
28. Zhang, Q.; Dehghanpour, K.; Wang, Z. A Learning-Based Power Management Method for Networked Microgrids under
Incomplete Information. IEEE Trans. Smart Grid 2020, 11, 1193–1204. [CrossRef]
29. Deng, R.; Yang, Z.; Chow, M.-Y.; Chen, J. A Survey on Demand Response in Smart Grids: Mathematical Models and Approaches.
IEEE Trans. Ind. Inform. 2019, 11, 570–582. [CrossRef]
30. Siano, P. Demand response and smart grids—A survey. Renew. Sustain. Energy Rev. 2014, 30, 461–478. [CrossRef]
31. Gallego, F.; Martín, C.; Díaz, M.; Garrido, D. Maintaining flexibility in smart grid consumption through deep learning and deep
reinforcement learning. Energy AI 2023, 13, 100241. [CrossRef]
32. Dall’Anese, E.; Simonetto, A. Optimal Power Flow Pursuit. IEEE Trans. Smart Grid 2018, 9, 942–959. [CrossRef]
33. Meng, X.; Zhang, P.; Xu, Y.; Xie, H. Construction of decision tree based on C4.5 algorithm for online voltage stability assessment.
Int. J. Electr. Power Energy Syst. 2020, 117, 105668. [CrossRef]
34. Wang, N.; Li, J.; Hu, W.; Zhang, B.; Huang, Q.; Chen, Z. Optimal reactive power dispatch of a full-scale converter based wind
farm considering loss minimization. Renew. Energy 2019, 136, 317–328. [CrossRef]
35. Kumar, S.; Krishnasamy, V.; Kaur, R.; Kandasamy, N.K. Virtual energy storage-based energy management algorithm for optimally
sized DC nanogrid. IEEE Syst. J. 2022, 16, 231–239. [CrossRef]
36. Jiang, X.; Wu, L. Residential power scheduling based on cost efficiency for demand response in smart grid. IEEE Access 2020, 8,
197379–197388. [CrossRef]
37. Christoff, N.; Bardarov, N.; Nikolova, D. Automatic Classification of Wood Species Using Deep Learning. In Proceedings of the
2022 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Biarritz, France, 8–12 August
2022; pp. 1–5. [CrossRef]
38. Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of Things: A Survey on Enabling Technologies,
Protocols, and Applications. IEEE Commun. Surv. Tutor. 2015, 17, 2347–2376. [CrossRef]
39. Pourbehzadi, M.; Niknam, T.; Kavousi-Fard, A.; Yilmaz, Y. IoT in Smart Grid: Energy Management Opportunities and Security
Challenges. In Internet of Things. A Confluence of Many Disciplines, IFIPIoT 2019; Casaca, A., Katkoori, S., Ray, S., Strous, L., Eds.;
Springer: Cham, Switzerland, 2020; Volume 574, pp. 236–249. [CrossRef]
40. Alasali, F.; Haben, S.; Foudeh, H.; Holderbaum, W. A Comparative Study of Optimal Energy Management Strategies for Energy
Storage with Stochastic Loads. Energies 2020, 13, 2596. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.