Deep Machine Learning Model Based Cyber Attacks Detection
Deep Machine Learning Model Based Cyber Attacks Detection
Article
Deep Machine Learning Model-Based Cyber-Attacks Detection
in Smart Power Systems
Abdulaziz Almalaq 1, * , Saleh Albadran 1 and Mohamed A. Mohamed 2, *
1 Department of Electrical Engineering, Engineering College, University of Ha’il, Ha’il 55476, Saudi Arabia;
[email protected]
2 Electrical Engineering Department, Faculty of Engineering, Minia University, Minia 61519, Egypt
* Correspondence: [email protected] (A.A.); [email protected] (M.A.M.)
Abstract: Modern intelligent energy grids enable energy supply and consumption to be efficiently
managed while simultaneously avoiding a variety of security risks. System disturbances can be
caused by both naturally occurring and human-made events. Operators should be aware of the
different kinds and causes of disturbances in the energy systems to make informed decisions and
respond accordingly. This study addresses this problem by proposing an attack detection model
on the basis of deep learning for energy systems, which could be trained utilizing data and logs
gathered through phasor measurement units (PMUs). Property or specification making is used to
create features, and data are sent to various machine learning methods, of which random forest has
been selected as the basic classifier of AdaBoost. Open-source simulated energy system data are used
to test the model containing 37 energy system event case studies. In the end, the suggested model has
been compared with other layouts according to various assessment metrics. The simulation outcomes
showed that this model achieves a detection rate of 93.6% and an accuracy rate of 93.91%, which is
greater compared to the existing methods.
Keywords: cyber-attack detection; deep machine learning; smart power grid; data processing
Citation: Almalaq, A.; Albadran, S.;
Mohamed, M.A. Deep Machine MSC: 94-10
Learning Model-Based Cyber-Attacks
Detection in Smart Power Systems.
Mathematics 2022, 10, 2574. https://
doi.org/10.3390/math10152574 1. Introduction
Academic Editors: Gurami 1.1. Necessity of the Research
Tsitsiashvili and Alexander Bochkov Cyber-physical systems (CPS) attempt to couple the physical and cyber-worlds, and
Received: 6 June 2022
they are extensively employed by industrial control systems (ICS) to provide users with all
Accepted: 22 July 2022
the data they need in real-time [1]. Power distribution systems and waste-water treatment
Published: 25 July 2022
plants are among the areas where CPS is being used. Nevertheless, CPS security problems
differ from conventional cyber-security problems in that they include integrity, confidential-
Publisher’s Note: MDPI stays neutral
ity, and availability. In addition to transmitting, distributing, monitoring, and controlling
with regard to jurisdictional claims in
electricity, a smart grid (SG) would greatly enhance energy effectiveness and reliability.
published maps and institutional affil-
Such systems may fail and result in temporary damage to infrastructures [2]. Power grids
iations.
are regarded as essential infrastructure nowadays by many societies, which have developed
security measures and policies related to them [3]. Phasor measurement units (PMUs) are
adopted in modern electrical systems to improve reliability as they become more complex
Copyright: © 2022 by the authors.
in their structure and design. Utilizing the gathered information for quick decision making
Licensee MDPI, Basel, Switzerland. is one of the advantages. There is still the possibility that hacker exploits vulnerabilities to
This article is an open access article result in branch overloaded tripping, which will lead to cascading failures and, therefore,
distributed under the terms and leads to considerable damage to SG systems [4]. As the operators monitor and manage
conditions of the Creative Commons the energy grid, they must consider possible attacks on the grid. To accomplish this, much
Attribution (CC BY) license (https:// energy and grid expertise is required. However, deep machine learning (DML) methods
creativecommons.org/licenses/by/ are used because of their capability to recognize patterns and learn, as well as being quickly
4.0/). able to identify potential security boundaries [5].
1.3. Contributions
A model based on machine learning is presented in this study for detecting system
behaviors by analyzing historical data and related log data. Although unsupervised
learning is beneficial for detecting zero-day attacks since it requires no training in attack
scenarios, it is also vulnerable to false positives [17]. Furthermore, supervised learning
can clearly improve the detection’s confidence. The experiments are then performed using
the supervised machine learning approach. The main contributions in this paper are
summarized as follows:
(1) Feature construction engineering is performed, and 16 novel features are constructed
via an analysis of the features and possible links of the raw data in the electrical
network. It is possible to construct novel features using a combination of attributes
that could help more effectively utilize possible types of data instances, which could
be used in machine learning models for better application.
(2) A new process for handling abnormal data, such as not the number and infinity
amounts in the data sets, is proposed. The suggested approach could significantly en-
hance accuracy in comparison to conventional processes of processing abnormal data.
(3) A classification model based on machine learning is constructed. The average ac-
curacy of 0.9389, precision of 0.938, recall of 0.936, and F1 score of 0.935 on 15 data
sets demonstrate that the suggested model successfully distinguished 37 kinds of
behaviors such as power grid fault and single-line-to-ground (SLG) fault replay, relay
setting varies, and trip command injection attacks.
Mathematics 2022, 10, 2574 3 of 16
Following are the remaining sections of the study. A detailed explanation of the
methodology is provided in Section 2. The results of the classification are discussed in
Section 3. The conclusion appears in Section 4.
2. Model Structure
Scenarios where disturbances and attacks happen in the electric grid, as well as the
meaning of features in the data set, are presented in this part. The suggested model and
data processing are detailed here.
Control
Central Bus Station Bus
State SCADA
Attack
Estimation & PDC Remote
Server
Attack
Figure
Figure 1.
1. The
The power
power system
system framework
framework configuration.
configuration.
TableThis
1. Explanation
experimentof scenarios.
applied a data set that contains 128 features recorded using PMUs 1
Case Study No. 41 to 4 and relay
1–6 snort alarms and logs (Relay
13, 14 7–12and PMU have been combined). A 21–30,
15–20 synchronous
35–40
phasor,
Usual operation load or PMU, measures electric waves on a power network using a common
Remote tripping command Relay setting time
Explanation source.SLG
A faultsof Line
total 29 maintenance
features could Data
be injection by every PMU. The data set also contains
measured
variations injection vary
Kind No events 12 columns of log data
Natural from the control panel and one
events column
Intrusion of an actual tag. There
events
are three main categories of scenarios in the multiclass classification data set: No Events,
Events,
2.2. Intrusion, and Natural Events. Table 1 summarizes the scenarios, and a brief
Methodology
explanation of each category is provided in the data set.
Despite the fact that the machine learning approach is capable of detecting dis-
(a) SLG and
turbances fault:cyber-attacks
A fault occurs
on whenever the current,
electric grids, voltage
it can have thesefrequency
drawbacks.of the system
Currently,
changes abnormally, and many faults in electrical systems occur in line-to-ground
references just discuss how to diagnose attacks in the electrical grids and seldom examine and
the data relationship. In contrast, when working with multi-classification problems,
many algorithms convert them into multi-two-class situations. Nonetheless, the Ada-
Boost algorithm is able to handle multi-classification situations directly. It utilizes weak
classifiers well for cascading and is capable of using various classification algorithms as
weak classifiers. In terms of the error rate of misclassification, the AdaBoost algorithm is
Mathematics 2022, 10, 2574 4 of 16
line-to-line (LL). The simulated SLG faults are represented as short circuits at diverse
points along the TL in the data set.
(b) Line maintenance: This type of attack is caused when one or more relays have been
deactivated on a particular line to maintain.
(c) Data injection: More research is being conducted into false data injection state esti-
mation in electrical networks. False data injection attacks are one of the main forms
of network attacks, which could affect the power system estimation method. Attack-
ers alter phase angles in order to create false sensor signals. The objective of such
attacks is to blind the operators and to avoid raising an alarm, which could lead to
economic or physical damage to the electrical systems. Attackers synchronize the
phasor measurement with the fault’s SLG and next send a relay trip command on
the affected lines. A data set modeled the conditions by varying variables, such as
current, voltage, and sequence components, which caused faults on various levels
([10 to 19]%, [20 to 79]%, [80 to 90]%) of the TLs.
(d) Remote tripping command injection attack: This occurs when a computer on the
communications network uses unexpected relay trip commands to relay at the end of
a TL. For achieving attacks, command injection has been applied versus single relays
(R [1–4]) or double relays (R3 and R4, R1 and R2).
(e) Relay adjusting variation attack: The relay is configured with a distance protection
layout. Attackers change the setting, so the relay responds badly to authentic faults.
In the data sets, faults were caused via deactivating the relay functions at diverse
parts of TLs with R1 or R2 or R3 or R4 deactivated and fault.
2.2. Methodology
Despite the fact that the machine learning approach is capable of detecting distur-
bances and cyber-attacks on electric grids, it can have these drawbacks. Currently, refer-
ences just discuss how to diagnose attacks in the electrical grids and seldom examine the
data relationship. In contrast, when working with multi-classification problems, many
algorithms convert them into multi-two-class situations. Nonetheless, the AdaBoost algo-
rithm is able to handle multi-classification situations directly. It utilizes weak classifiers
well for cascading and is capable of using various classification algorithms as weak clas-
sifiers. In terms of the error rate of misclassification, the AdaBoost algorithm is highly
competitive [22]. With an increase in data amount, the fitting ability is affected both by
generalization problems and by the increasing difficulty of computing. Machine learning
requires a large amount of calculating to find the best solution. Additionally, the accuracy
rates on the model presented in [11,12] are about 90% compared to the multiclass data sets,
which provides considerable space for development. As a consequence of these findings,
this paper constructs a model that can perform superior feature engineering and next can
split the data by the diverse PMUs to minimize computation overhead. It should be noted
that the PMU allocation in the smart grid is performed in the planning stage and might be
implemented according to different purposes. While the high cost might be a limitation,
the high number of PMUs is always preferred to cover all areas of the smart grid. It is
worth noting that PMU allocation is out of the scope of this work but can be found in other
research works widely. In addition, the AdaBoost algorithm for detecting the 37-class fault
and cyber-attack case studies in the electric grids is adopted in this paper.
Mathematics 2022, 10, 2574 5 of 16
About the feature selection process, it should be noted that this experiment applied a
data set that contains 128 features recorded using PMUs 1 to 4 and relay snort alarms and
logs (relay and PMU have been combined). Please also note that each PMU can record 29 dif-
ferent features. In this regard, and in order to obtain enriched and integrated informative
data, feature construction engineering is performed, and 16 novel features are constructed
via an analysis of the features and possible links of the raw data in the electrical network.
Technically, it is possible to construct novel features using a combination of attributes that
could help more effectively utilize possible types of data instances, which could be used
in machine learning models for better application. It is worth noting that we made use
Mathematics 2022, 10, x FOR PEER REVIEW
of
6 of 16
the random forest method to create and classify features. Finally, based on anticipation
weighted voting, 37 various case studies were implemented for simulation purposes.
of various
2.3. weights
Diagnosing has
Attack been thus
Behavior determined.
Model Structure Various tags are generated by the test set
following they have passed through
A model architecture diagram is shown the trained classifier,
in Figure and faults
2 to detect the weights are deter-
and cyber-attack
mined for the last voting session based on the tags of the relevant classifier.
in electrical grids. According to Figure 2, the model architecture usually consists By updating
of four
the weights in real-time, the entire system can become more robust and generalizable.
stages: property making, data dividing, weight voting, and layout training as follows:
Combined
Classifier 1 Tag 1
characteristics
Train
ω1: … :ω5 ζ
PMU 1 acc(clf1): … :
Classifier 2 Tag 2 acc(clf5)
(36 characteristics)
PMU 3
Classifier 4 Tag 4
(36 characteristics)
Tag
PMU 4
Classifier 5 Tag 5
(36 characteristics)
Figure 2.
Figure 2. Explanation
Explanation of
of layout
layout to
to detect
detect disturbance
disturbance and
and cyber-attack
cyber-attack in
in electrical
electrical networks.
networks.
of this step, several of the original features are combined with novel ones in order to reduce
the dimension. The original features are sorted using feature importance, and afterward, a
variety of proportions of the features are selected, explained in more detail in Part 3. In
addition, several classifier models are developed for personalizing the features following
splitting. Various classifiers are set up to make every section of the data display the greatest
impact on the classifier, i.e., the training model. Using five classifiers and later obtaining
five tags following transferring the information to the layout reduces the effect of the alone
classifier generalization error.
Stage.3. Weights for voting. It is the responsibility of the module to assign diverse
weights to the tags derived from diverse classifiers and vote on the last classification tag
of the data. According to the accuracy ratio of every classifier in the training set, the ratio
of various weights has been thus determined. Various tags are generated by the test set
following they have passed through the trained classifier, and the weights are determined
for the last voting session based on the tags of the relevant classifier. By updating the
weights in real-time, the entire system can become more robust and generalizable.
x−µ
Xscale = (2)
σ
Mathematics 2022, 10, 2574 7 of 16
A data set may contain the not a number (NAN) and infinity (INF) amount, but it has
been usually substituted through the mean amount or zero. For the data set applied here,
the novel replacement process is proposed to avoid underflows in the final replacement
value and the data being overly discrete. log_mean value is used for replacing NAN and
INF values present in the data. It can be calculated as follows:
∑ log|k i | ∑ ki
log_mean = · 1 − 2l <0 (3)
Num(k i ) Num(k i )
Here, the number of digits in a column is shown by Num(k i ) and the indicator function
is represented by l( x ), which can be described in the following way:
1 i f x is true
l( x ) = (4)
0 otherwise
the weights are equal. In a classifier, weights represent the probability value of a tag or its
confidence level. The present study sets up various machine learning models for various
data blocks to address multi-tag problems so as to make the model perform effectively for
the data set. Lastly, different weights are assigned to tags to determine the final results.
Algorithm 1 describes these steps.
2TP 2· precision·recall
F1 score = = (8)
2TP + FN + FP precision + recall
Data set Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8
Data number 4966 5069 5415 5202 5161 4967 5236 5315
Data set Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Entire
Data number 5340 5569 5251 5224 5271 5115 5276 78,377
Calculating the limitation problem via the Lagrange function is more efficient, and
an objective function can be derived from the following formula, in which αi shows the
Lagrange multiplier and αi ≥ 0.
m
1
L(ω, b, α) = ||ω ||2 + ∑ αi 1 − yi ω T x + b (10)
2 i =1
∂L(ω, b, α) ∂L(ω, b, α)
= 0, =0 (11)
∂ω ∂b
The dual problem can be as follows:
m
1 m m m
max ∑ αi −
2 i∑ ∑ αi α j yi y j xiT xi subject to ∑ αi yi = 0, αi ≥ 0 (12)
α
i =1 =1 j =1 i =1
(C) The decision tree algorithm starts with a group of instances/cases and then makes a
tree information framework, which is applied to novel cases. A group of amounts/symbolic
amounts describes every case [27]. Entropy is used in C4.5 and C5.0 for the spanning
tree algorithm.
(D) A boosting algorithm has been used to improve the XGBoost [28] classifier algo-
rithm. The model is based on residual lifting. Based on the error function, the objective
function is calculated by taking the prime and second derivatives of every data spot. The
(C) The decision tree algorithm starts with a group of instances/cases and then
makes a tree information framework, which is applied to novel cases. A group of
amounts/symbolic amounts describes every case [27]. Entropy is used in C4.5 and C5.0
for the spanning tree algorithm.
Mathematics 2022, 10, 2574 (D) A boosting algorithm has been used to improve the XGBoost [28] classifier al-16
10 of
gorithm. The model is based on residual lifting. Based on the error function, the objective
function is calculated by taking the prime and second derivatives of every data spot. The
loss function is a square loss. Here is its objective function, in which 𝑙 shows a differen-
loss function is a square loss. Here is its objective function, in which l shows a differential
tial convertible loss function, which shows variation among the prediction 𝑦 and the
convertible loss function, which shows variation among the prediction ŷi and the purpose
purpose 𝑦 . The second part Ω can penalize the pattern complexity, and 𝑇 shows the
yi . The second part Ω can penalize the pattern complexity, and T shows the leaves number
leaves number
in the tree. Theinγ the
andtree.
λ show 𝛾 and
The the 𝜆 show
tree’s the tree’s
complexity, complexity,
the greater their the greater
amount, andtheir
the
amount, and the simpler the framework
simpler the framework of the tree. of the tree.
1
𝐿(𝜙) = 𝑙(𝑦 , 𝑦 ) + Ω(𝑓 ) 𝑤ℎ𝑒𝑟𝑒 Ω(𝑓) = 𝛾𝑇 +1 𝜆|| 𝜔||2 (13)
L(φ) = ∑ l (ŷi , yi ) + ∑ Ω( f k ) where Ω( f ) = γT + 2λ||ω || (13)
i k
2
(E) The random forest exhibits excellent efficiency and has been extensively applied
(E) utilizes
[29]. RF The random forest tree
the decision exhibits
as its excellent efficiency
base classifier and has
and shows been extensively
an extension of Bagging. ap-
plied
RF uses two very significant procedures. The first technique involves introducing ran-of
[29]. RF utilizes the decision tree as its base classifier and shows an extension
Bagging.
dom RF uses
features in two very significant
the procedure procedures.
of decision tree The first technique
making, involvesinvolves
and the second introducingan
random features in the procedure of decision tree making, and
out-of-bag estimation. The RF method can be described below. The first step is to ran- the second involves an
out-of-bag estimation. The RF method can be described below. The
domly select a sample from every data, and afterward, to return the sample to the origi- first step is to randomly
select
nal data.a sample from
As a root every for
sample data, and afterward,
a decision tree, thetochosen
returnsamples
the samplehave tobeen
the original
applieddata.
for
As a root sample for a decision tree, the chosen samples have been
training the decision tree. Second, for splitting the nodes of the decision tree, 𝑚 attrib- applied for training
the decision
utes have been tree. Second,
chosen for splitting
randomly (there theare nodes
a total of
of the decision tree,
M attributes m attributes
and ensuring <<have𝑀).
been chosen randomly (there are a total of M attributes and ensuring
Choose an attribute to be the dividing feature of the node using the strategy, such as in- << M). Choose an
attribute to be the dividing feature of the node using the strategy,
formation gain. Continue to do this until the decision tree can no longer be divided. such as information gain.
Continue to do this
(F) Among theuntil
morethe decision
popular tree learning
deep can no longer
networksbe divided.
is CNN. There are usually
(F) Among the more popular deep learning
input, output, latent, and max-pooling layers in a CNN model. networks is CNN. Theregreat
Several are usually input,
results have
output,
been latent, and
obtained max-pooling
in numerous layers
areas in a CNNvision.
of computer model.Here,Several great results have
one-dimension been
property
obtained in numerous areas of computer vision. Here, one-dimension
vectors are used as input, and a one-dimension convolution kernel in convolution layers property vectors are
used as input, and a one-dimension convolution kernel in convolution
is adopted. The convolution layer extracts properties from the input, and here the kernel layers is adopted.
The
size convolution
is three. The layer
process extracts
of the properties
CNN model from the input,
is shown and here
in Figure 3. the kernel size is three.
The process of the CNN model is shown in Figure 3.
Figure 3.
Figure The procedure
3. The procedure of
of CNN
CNN layout.
layout.
Actually, the
Actually, main purpose
the main purpose ofof this
this research
research is
is to
to show
showthe
thehigh
highand
andsuccessful
successfulrole
roleofof
the deep learning models in reinforcing the smart grid against various cyber-attacks.
the deep learning models in reinforcing the smart grid against various cyber-attacks. In
In
this regard, the proposed model would detect and stop cyber-hacking at the installation
this regard, the proposed model would detect and stop cyber-hacking at the installation
location rather than focusing on the cyber-attack type. Therefore, the localization procedure
location rather than focusing on the cyber-attack type. Therefore, the localization proce-
would be attained through the diverse detection models located in the smart grid, but the
dure would be attained through the diverse detection models located in the smart grid,
cyber-attack type detection requires more data that can be made later based on the recorded
but the cyber-attack type detection requires more data that can be made later based on
abnormal data.
the recorded abnormal data.
3.2.2. Outcomes
This study considers 37 varied scenarios for events. In order to determine the need for
various models (fault analysis), we performed some comparative experiments according
to various PMU kinds. In one group, properties of localization/segmentation are sent to
the related DML model in order to train, and in the other one, whole features are sent
to various machine learning models. Moreover, it is shown in Table 4 that data can be
effectively split according to the PMU resources. Splitting the data can enhance the accuracy
of classification models as well as reduce data dimensions and enhance training speed and
minimize computing sources. The score of the significant features is shown in Figure 4.
sent to the related DML model in order to train, and in the other one, whole features are
sent to various machine learning models. Moreover, it is shown in Table 4 that data can
be effectively split according to the PMU resources. Splitting the data can enhance the
accuracy of classification models as well as reduce data dimensions and enhance training
Mathematics 2022, 10, 2574
speed and minimize computing sources. The score of the significant features is shown in
11 of 16
Figure 4.
1000
900
800
700
600
500
400
300
200
100
0
Figure 4.4.Significance
Figure Significancefeatures score.
features score.
Several corresponding experiments are conducted on various ways of replacing abnor-
Several
mal values corresponding
in data. Table 5 shows experiments
the outcomes. are
Theconducted
replacement onmethod
various ways inofthe
is shown replacing
abnormal
left column,values
and theinsuggested
data. Table 5 shows
approach the outcomes.
is represented The replacement
by log_mean. Zero showsmethod
a processis shown
in the left column, and the suggested approach is represented by 𝑙𝑜𝑔_𝑚𝑒𝑎𝑛.
to replace NAN and INF with zero values, and mean shows a process to replace withZero the shows
amean
process
value.to The
replace
AWV NAN
modeland INF with
is utilized as zero
a trialvalues,
model,and
and mean showsisa adopted
the accuracy process as
to replace
the assessment
with the meanmetrics,
value.that
Theis,AWV
the right column
model in Table 5.
is utilized as a trial model, and the accuracy is
adopted as the assessment metrics, that is, the right column in Table 5.
Table 5. Diverse methods to procedure Inf and Nan.
Applying the 𝑙𝑜𝑔_𝑚𝑒𝑎𝑛 technique for replacing the unusual amount in the data is
Method
intuitively the best approach. Zero Mean
According to the outcome, Log-Mean
the suggested process in order
to process abnormal values has
Accuracy proven successful.
0.9361 0.9342 0.9387
Comparison experiments are also conducted to verify feature selection. First, the
significance ofthethe
Applying log_mean
originaltechnique
featuresforis replacing the unusual
determined, amount inthey
and afterward, the data
are is
arranged
intuitively the best approach. According to the outcome, the suggested process in order
based on significance. A variety of mixtures of features has been selected for training, to and
process abnormal values has
Table 6 shows these outcomes. proven successful.
Comparison experiments are also conducted to verify feature selection. First, the
significance of the original features is determined, and afterward, they are arranged based
Table 5. Diverse methods to procedure Inf and Nan.
on significance. A variety of mixtures of features has been selected for training, and Table 6
shows these outcomes.
Method Zero Mean Log-Mean
TheAccuracy
approach was verified practically
0.9361 through a comparative
0.9342 test. The test0.9387
extracts
the test group and training group from 15 multiclass data sets in a 9:1 ratio at random,
and afterward, these data sets have been combined into 1 training group. The training
group has been transferred to the layout to train and learn. Table 7 presents the outcomes
of 15 test sets transferred to the model for practically simulating the efficiency of the model
applications. It is apparent that the model’s accuracy has decreased. It is because data
interaction would occur by increasing the number of data resulting in changing the model,
and whenever whole data has been combined, there would unavoidably be abnormal
Mathematics 2022, 10, 2574 12 of 16
points and noises. Due to the fact that such noises and anomalies have not been separated
in training, the model’s indexes alter, and the robustness decreases.
Data set Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8
Data number 0.8894 0.8699 0.9097 0.8830 0.9092 0.9096 0.9066 0.9193
Data set Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Entire
Data number 0.9083 0.9229 0.9241 0.9007 0.9016 0.8966 0.9130 0.9043
Firstly, the efficacy of the features created from the feature construction engineering
in the model is determined by sorting the significance of features. Model interpretability
can be determined by determining the significance of features. Weight, gain, cover, and so
on are general indicators of feature significance. In the XGBoost method [30], the number
of times a property appears in a tree has been shown by weight, the mean gain of the slot
using the property has been represented by the gain, and the mean coverage of the slot
using the property is shown by the cover. According to Figure 4, weight calculates feature
significance. The abscissa indicates the names of the beat 45 properties, and the ordinate
indicates the assessment score. The origin features are shown by the gray part. The features
derived from feature construction engineering are represented by the red mark. It is evident
that each of the 16-making properties is in the best 45.
The test trains 15 sets of multiclass classification data sets and tests respectively and
uses accuracy as an assessment metric. The accuracy of the trail data sent to the layout
before and after optimization based on the main 128 properties is shown in Figure 5. The
classification accuracy of the trail group on various layouts with default variables is shown
in Figure 5a, and the accuracy of the trail group on the layout applying optimized variables
is represented in Figure 5b. For a more intuitive visualization of the variation in accuracy
after layouts are optimized, Figure 5a and b are combined, and the mean of the accuracy
values for whole sets are adopted, i.e., Figure 5c. Figure 5 shows that the SVM layout
with default variables has an accuracy of approximately 0.30, but after optimization, it
grows to 0.85, which represents a near 200% advancement. Other models have improved
significantly in accuracy after optimization as well. The best accuracy of the suggested
AWV model is 0.9217.
Table 3 shows that every data set has about 5000 segments of data; therefore, the CNN
layout cannot be used. The semantic relationships among features might also be ignored by
several neural networks, such as CNN and long-short-term memory (LSTM) layouts. Thus,
in several cases, statistical features according to the manual design could positively affect
model accuracy as well. Moreover, the tree-based algorithm outperforms KNN and SVM.
The test set had better performance on the model suggested in this study in comparison
to the conventional DML and CNN, as shown in Figure 5.
0.8894 0.8699 0.9097 0.8830 0.9092 0.9096 0.9066 0.919
number
Data set Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Enti
Data
0.9083 0.9229 0.9241 0.9007 0.9016 0.8966 0.9130 0.904
Mathematics 2022, 10, 2574 number 13 of 16
Accuracy
(a)
Accuracy
(b)
Figure 5. Cont.
hematics 2022, 10, x FOR PEER REVIEW 14 of
Mathematics 2022, 10, 2574 14 of 16
(c)
FigureFigure
5. Proficiency
5. Proficiencycomparisons
comparisons ofof variables
variables through
through applying
applying 128 (a);
128 properties properties (a);over
(b) precision (b) precisi
over 15
15 data setsthrough
data sets through applying
applying optimum
optimum variables;
variables; (c) mean
(c) mean accuracy accuracy comparison.
comparison.
4. Conclusions
4. Conclusions
Various SG information as the experimental foundation is used in the present study,
Various
and propertySG making
information as the experimental
for the original data is applied. foundation
The layout foris used in the
identifying present
faults and stud
cyber-attack in the electrical system is proposed. The present study
and property making for the original data is applied. The layout for identifying fau uses various DML
assessment indexes for evaluating the suggested model and conventional DML methods in
and cyber-attack in the electrical system is proposed. The present study uses vario
the experiment. According to the outcomes, the information analyzing process improves
DMLthe assessment indexes
model’s accuracy, forAWV
and the evaluating the 37
layout detects suggested modelin and
types of behavior conventional
electrical systems DM
methods in the
efficiently. Asexperiment.
a result, machine According
learning cantobethe outcomes,
used in the powerthegridinformation analyzing
to assist operators in p
cess improves the model’s
making decisions. In other accuracy, and grid
words, the smart the operator
AWV layout detects
can always check37 thetypes
healthof behavior
level
of the data gathering by the PMUs all around the grid. In the case that
electrical systems efficiently. As a result, machine learning can be used in the power any abnormality is gr
detected, the possibility of an intentional cyber-attack exists, and thus, some cautious pre-
to assist operators in making decisions. In other words, the smart grid operator can
operation strategies shall be considered to keep the power and demand balance. Moreover,
waysifcheck
the datathe healthfrom
readings levelanyofPMU
the are
data gathering
unusual, by the
the system PMUs
operator canalldecide
around the grid. In t
to estimate
case that any abnormality
the system status withoutisthisdetected,
PMU and therely
possibility
more on the of an
dataintentional
coming from cyber-attack
the other exis
healthy
and thus, PMUs.
some cautious pre-operation strategies shall be considered to keep the pow
and demand balance. Moreover, if the data readings from any PMU are unusual, t
Author Contributions: Conceptualization, A.A., S.A. and M.A.M.; methodology, A.A., S.A. and
system operator
M.A.M.; canA.A.,
software, decide to estimate
S.A. and the system
M.A.M.; validation, status
A.A., S.A. without
and M.A.M.; thisanalysis,
formal PMU and A.A.,rely mo
on the data coming from the other healthy PMUs.
S.A. and M.A.M.; investigation, A.A., S.A. and M.A.M.; data curation, A.A., S.A. and M.A.M.;
writing—original draft preparation, A.A., S.A. and M.A.M.; writing—review and editing, A.A., S.A.
and M.A.M.; visualization, A.A., S.A. and M.A.M.; supervision, A.A., S.A. and M.A.M.; project
Author Contributions: Conceptualization, A.A., S.A. and M.A.M.; methodology, A.A., S.A. a
administration, A.A. and S.A.; funding acquisition, A.A. and S.A. All authors have read and agreed
M.A.M.; software,
to the publishedA.A., S.A.
version and
of the M.A.M.; validation, A.A., S.A. and M.A.M.; formal analysis,A.
manuscript.
S.A. and M.A.M.; investigation, A.A., S.A. and M.A.M.; data curation, A.A., S.A. and M.A.M
Funding: This research has been funded by the Scientific Research Deanship at the University of
writing—original draft preparation, A.A., S.A. and M.A.M.; writing—review and editing,A.A., S
Ha’il—Saudi Arabia through project number RG-21079.
and M.A.M.; visualization, A.A., S.A. and M.A.M.; supervision, A.A., S.A. and M.A.M.; proj
Institutional Review Board Statement: Not applicable.
administration, A.A. and S.A. ; funding acquisition, A.A., and S.A. All authors have read a
agreedInformed versionNot
Consent Statement:
to the published applicable.
of the manuscript.
Data Availability Statement: Not applicable.
Funding: This research has been funded by the Scientific Research Deanship at the University
Ha’il—Saudi Arabia through project number RG-21079.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Mathematics 2022, 10, 2574 15 of 16
References
1. Almalaq, A.; Albadran, S.; Alghadhban, A.; Jin, T.; Mohamed, M.A. An Effective Hybrid-Energy Framework for Grid Vulnerability
Alleviation under Cyber-Stealthy Intrusions. Mathematics 2022, 10, 2510. [CrossRef]
2. Reich, J.; Schneider, D.; Sorokos, I.; Papadopoulos, Y.; Kelly, T.; Wei, R.; Armengaud, E.; Kaypmaz, C. Engineering of Runtime
Safety Monitors for Cyber-Physical Systems with Digital Dependability Identities. In Proceedings of the International Conference
on Computer Safety, Reliability, and Security, Lisbon, Portugal, 15 September 2020; Springer: Cham, Switzerland, 2020; pp. 3–17.
3. Li, Y.; Wang, B.; Wang, H.; Ma, F.; Zhang, J.; Ma, H.; Mohamed, M.A. Importance Assessment of Communication Equipment
in Cyber-Physical Coupled Distribution Network Based on Dynamic Node Failure Mechanism. Front. Energy Res. 2022, 654.
[CrossRef]
4. Zhang, L.; Cheng, L.; Alsokhiry, F.; Mohamed, M.A. A Novel Stochastic Blockchain-Based Energy Management in Smart Cities
Using V2S and V2G. IEEE Trans. Intell. Transp. Syst. 2022, 1–8. [CrossRef]
5. Chen, J.; Alnowibet, K.; Annuk, A.; Mohamed, M.A. An effective distributed approach based machine learning for energy
negotiation in networked microgrids. Energy Strategy Rev. 2021, 38, 100760. [CrossRef]
6. Al-Mhiqani, M.N.; Ahmad, R.; Yassin, W.; Hassan, A.; Abidin, Z.Z.; Ali, N.S.; Abdulkareem, K.H. Cyber-security incidents:
A review cases in cyber-physical systems. Int. J. Adv. Comput. Sci. Appl. 2018, 1, 499–508.
7. Luo, Y.; Cheng, L.; Liang, Y.; Fu, J.; Peng, G. Deepnoise: Learning sensor and process noise to detect data integrity attacks in CPS.
China Commun. 2021, 18, 192–209. [CrossRef]
8. Kaouk, M.; Flaus, J.M.; Potet, M.L.; Groz, R. A review of intrusion detection systems for industrial control systems. In Proceedings
of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23 April 2019;
IEEE: Toulouse, France, 2019; pp. 1699–1704.
9. Dehghani, M.; Kavousi-Fard, A.; Dabbaghjamanesh, M.; Avatefipour, O. Deep learning based method for false data injection
attack detection in AC smart islands. IET Gener. Transm. Distrib. 2020, 14, 5756–5765. [CrossRef]
10. Taormina, R.; Galelli, S.; Tippenhauer, N.O.; Salomons, E.; Ostfeld, A.; Eliades, D.G.; Aghashahi, M.; Sundararajan, R.;
Pourahmadi, M.; Banks, M.K.; et al. Battle of the attack detection algorithms: Disclosing cyber attacks on water distribution
networks. J. Water Resour. Plan. Manag. 2018, 144, 04018048. [CrossRef]
11. Chang, Q.; Ma, X.; Chen, M.; Gao, X.; Dehghani, M. A deep learning based secured energy management framework within a
smart island. Sustain. Cities Soc. 2021, 70, 102938. [CrossRef]
12. Keshk, M.; Sitnikova, E.; Moustafa, N.; Hu, J.; Khalil, I. An integrated framework for privacy-preserving based anomaly detection
for cyber-physical systems. IEEE Trans. Sustain. Comput. 2019, 6, 66–79. [CrossRef]
13. Huang, Y.; He, T.; Chaudhuri, N.R.; la Porta, T. Preventing Outages under Coordinated Cyber-Physical Attack with Secured
PMUs. IEEE Trans. Smart Grid 2022, 13, 3160–3173. [CrossRef]
14. Alexopoulos, T.A.; Korres, G.N.; Manousakis, N.M. Complementarity reformulations for false data injection attacks on pmu-only
state estimation. Electr. Power Syst. Res. 2020, 189, 106796. [CrossRef]
15. Alexopoulos, T.A.; Manousakis, N.M.; Korres, G.N. Fault location observability using phasor measurements units via semidefinite
programming. IEEE Access 2016, 4, 5187–5195. [CrossRef]
16. Mamuya, Y.D.; Lee, Y.-D.; Shen, J.-W.; Shafiullah, M.; Kuo, C.-C. Application of Machine Learning for Fault Classification and
Location in a Radial Distribution Grid. Appl. Sci. 2020, 10, 4965. [CrossRef]
17. Chaithanya, P.S.; Priyanga, S.; Pravinraj, S.; Sriram, V.S. SSO-IF: An Outlier Detection Approach for Intrusion Detection in SCADA
Systems. In Inventive Communication and Computational Technologies; Springer: Singapore, 2020; pp. 921–929.
18. Chen, J.; Mohamed, M.A.; Dampage, U.; Rezaei, M.; Salmen, S.H.; Obaid, S.A.; Annuk, A. A multi-layer security scheme for
mitigating smart grid vulnerability against faults and cyber-attacks. Appl. Sci. 2021, 11, 9972. [CrossRef]
19. Avatefipour, O.; Al-Sumaiti, A.S.; El-Sherbeeny, A.M.; Awwad, E.M.; Elmeligy, M.A.; Mohamed, M.A.; Malik, H. An intelligent
secured framework for cyberattack detection in electric vehicles’ CAN bus using machine learning. IEEE Access 2019, 7,
127580–127592. [CrossRef]
20. Wang, B.; Ma, F.; Ge, L.; Ma, H.; Wang, H.; Mohamed, M.A. Icing-EdgeNet: A pruning lightweight edge intelligent method of
discriminative driving channel for ice thickness of transmission lines. IEEE Trans. Instrum. Meas. 2020, 70, 1–12. [CrossRef]
21. Alnowibet, K.; Annuk, A.; Dampage, U.; Mohamed, M.A. Effective energy management via false data detection scheme for the
interconnected smart energy hub–microgrid system under stochastic framework. Sustainability 2021, 13, 11836. [CrossRef]
22. Chen, L.; Liu, Z.; Tong, L.; Jiang, Z.; Wang, S.; Dong, J.; Zhou, H. Underwater object detection using Invert Multi-Class Adaboost
with deep learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK,
19 July 2020; IEEE: Toulouse, France, 2020; pp. 1–8.
23. Shafizadeh-Moghadam, H. Fully component selection: An efficient combination of feature selection and principal component
analysis to increase model performance. Expert Syst. Appl. 2021, 186, 115678. [CrossRef]
24. Roshan, K.; Zafar, A. Deep Learning Approaches for Anomaly and Intrusion Detection in Computer Network: A Review. Cyber
Secur. Digit. Forensics 2022, 73, 551–563.
Mathematics 2022, 10, 2574 16 of 16
25. Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescape, A. Traffic classification of mobile apps through multi-classification. In Proceedings
of the GLOBECOM 2017-2017 IEEE Global Communications Conference, Singapore, 4 December 2017; IEEE: Toulouse, France,
2017; pp. 1–6.
26. Pham, B.T.; Bui, D.T.; Prakash, I.; Nguyen, L.H.; Dholakia, M.B. A comparative study of sequential minimal optimization-based
support vector machines, vote feature intervals, and logistic regression in landslide susceptibility assessment using GIS. Environ.
Earth Sci. 2017, 76, 371. [CrossRef]
27. Jena, M.; Dehuri, S. Decision tree for classification and regression: A state-of-the art review. Informatica 2020, 44, 405–420.
[CrossRef]
28. Chen, R.C.; Caraka, R.E.; Arnita, N.E.; Pomalingo, S.; Rachman, A.; Toharudin, T.; Tai, S.K.; Pardamean, B. An end to end of
scalable tree boosting system. Sylwan 2020, 165, 1–11.
29. Lulli, A.; Oneto, L.; Anguita, D. Mining big data with random forests. Cogn. Comput. 2019, 11, 294–316. [CrossRef]
30. Franklin, J. The elements of statistical learning: Data mining, inference and prediction. Math. Intell. 2005, 27, 83–85. [CrossRef]