Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views12 pages

Secrypt 2023 Final

Uploaded by

Vaishali Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views12 pages

Secrypt 2023 Final

Uploaded by

Vaishali Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ZT-NIDS: Zero Trust - Network Intrusion Detection System

Abeer Z. Alalmaie1 , Priyadarsi Nanda1


a b and Third Xiangjian He2 c
1School of Electrical and Data Engineering, University of Technology Sydney, Sydney, Australia
2School of Computer Science University of Nottingham Ningbo, China

[email protected], [email protected], [email protected]

Keywords: Zero Trust, Network Intrusion Detection, Network Security, CNN-BiLSTM, Attention, Cybersecurity.

Abstract: Zero Trust security can tackle various cyberthreats. Current trends in security monitoring must shift to a
“never trust, always verify” approach, as data security is threatened when cloud-based third parties access
network traces. Network Intrusion Detection System (NIDS) can be exploited to detect anomalous behaviour.
Convolution Neural Network (CNN), Bi-directional Long Short Term Memory (BiLSTM) based classifiers
and Auto-Encoder (AE) feature extractors have presented promising results in NIDS. AE feature extractor
can compress the important information and train the unsupervised model. CNNs detect local spatial
relationships, while BiLSTMs can exploit temporal interactions. Furthermore, Attention modules can capture
content-based global interactions and can be applied on CNNs to attend to the significant contextual
information. In this paper, we utilized the advantages of all AE, CNN and BiLSTM structures using a multi-
head Self Attention mechanism to integrate CNN features for feeding into BiLSTM classifier. We use the
bottleneck features of a pre-trained AE for an Attention-based CNN-BiLSTM classifier. Our experiments
using 10, 6 and 2 categories NID system on UNSW-NB15 dataset showed that the proposed method
outperforms state-of-the-art methods and achieved accuracy of 91.72%, 89.79% and 93.01%, respectively.
Plus, we introduced a balanced data sampler for training 10 categories of NIDS.

1 INTRODUCTION Thus, organizations have initiated to improve


cybersecurity awareness (Newhouse, et al., 2017).
Today, many private sectors and government Traditionally, organizations have focused on
organizations depend on technology more than ever perimeter defence and give authorized access to
before. Due to the increased threats and actual network traffic on the internal network. Hence,
attacks, the cyber resilience of crucial infrastructures unauthorized lateral movement within the
is a vital requisite in national security. Cybersecurity environment has been one of the biggest challenges
is the establishment and compilation of assets, for organizations networks. Perimeter firewalls are
methods, and systems to defend cyberspace and less useful for detecting and blocking attacks from
cyberspace- enabled structures from incidents. Thus, inside of the network and cannot protect subjects
cybersecurity is a crucial element in private outside of the enterprise perimeter. Network
industries, banking sectors, and government perimeters security alone can no longer be effective
organizations. Technology and cybersecurity are for providing enterprise security and organizations
continuously evolving for better user experience should incessantly focus on improving cybersecurity
across the world. Attackers are also advancing their management systems. There is a persistent increment
techniques to exploit the critical infrastructures for in types of cyber-attacks on the ICT structure with the
unauthorized access. Thus, it is important to prevent, subsequent major propensities.
detect, and respond to cyberattacks quickly so the Maturity models are used as consultants and
damage to the critical infrastructures can be reduced. reference structures for information system
management in various types of companies. A
cybersecurity maturity model is a standard to

a https://orcid.org/0000-0000-0000-0000
b https://orcid.org/0000-0000-0000-0000
c https://orcid.org/0000-0000-0000-0000
determine the maturity of a protection system and 2 RELATED WORKS
guidance on how to attain the higher level. Thus,
some cybersecurity maturity mechanisms have been The Systems Security Engineering Capability
suggested to reduce the impact of cyberattacks, Maturity Model (SSE-CMM) was designed on the
considering country level protection. Moreover, CMM paradigm to evaluate assurance of security
regional, and international organizations have engineering procedures and capability for customers
conducted other studies, but they aim on score and as a standard mechanism. THE SSE-CMM prevents
ranking countries corresponding to their national- organizational security by having various features
level cybersecurity top systems. The cybersecurity (Nahari and Krutz, 2003).
maturity model aims to find processes, methods, and The Community Cyber Security Maturity Model
techniques for developing the company's protection, (CCSMM) was proposed to tackle various problems
and introduce a level-based advancement approach. related to information sharing, metrics, training,
Zero Trust satisfies characteristics by treating all testing, technology, random threats, structured risks,
users, devices, data, and service requests the same. It and well-structured risks. The CCSMM is designed
shifts from the traditional security policy of all assets based on comprehensive working experience with
in an organization being open and accessible to communities/states emerging and implementing
requiring continuous authentication and authorization cybersecurity practices, and it has five maturity levels
for any asset to be accessible. Most existing corporate as security awareness, process development,
networks are flat. That weakness of the traditional information availability, tactics development, and full
hub-and-spoke network model lies in its architecture. security operational capability (White, 2011).
Crossing the chasm from trust to distrust via a firewall In (Eusgeld, et al., 2011), researchers examined
is inherently risky. Instead, Zero Trust no longer the weaknesses associated with the combination of
distinguishes between “inside” and “outside” the industrial control systems and the underlying critical
network perimeter. A Zero Trust Architecture (ZTA) infrastructures. Plus, an integrated model was
addresses this trend by focusing on protecting suggested to address the selection of the suitable
resources, not network perimeters, as the network method, and it investigate interdependencies between
location is no longer viewed as the prime component the critical infrastructures.
to the security posture of the resource. Zero Trust is a In (Mettler and Blondiau, 2012), authors
set of cybersecurity principles used to create a proposed a maturity model to measure the skill levels
strategy that focuses on moving network defences of federal important infrastructure safety attempts,
from wide, static network perimeters to focusing taking the maturity measures based on the obtained
more narrowly on users, systems, and individual or core causes into consideration. The model did analyse
small groups of resources. We are eliminating the of data concerning to nationwide cybersecurity
concept of trust within the network, we say there are developments through base level technique to obtain
no more trusted interfaces, no more trusted users, no the fundamental reasons of the vulnerability of vital
more trusted packets, and no more trusted application. infrastructures to cyber risks. More, it confirms the
Across the industries security professionals are maturity standards by proposing the effects to
shifting the security diameter to zero security trust specialists according to a Delphi survey.
state of mind and quick adopting and implementing In (Karabacak, et al., 2016), authors have focused
the approach of Zero Trust security network model in on the advancement of a wide-ranging maturity
the environment. Zero Trust is more than just mechanism to apply in the Hospital Information
concept, it is robust security model that following 7 System (HIS) environment. Thus, they discussed on
security principles – Data, Devices, Workload, different problems, like most important influencing
Automation and Orchestration, Visibility and factors by information system managers, which are
Analytics, Users, and Network. associated to the maturity phases and maturity-
The rest of the paper is organized as follow. In affecting element assessment in the perspective of the
section 2, some of the main related works are maturity periods. Plus, they suggested HIS Maturity
studied. In section 3, the Zero Trust concept is Model (HISMM). They prepared a questionnaire to
explained. The chosen approach and the proposed understand the critical infrastructure requirements,
method are presented in sections 4 and 5, using the design science research methodology.
respectively. In section 6, the evaluation methods are Further, they conducted a survey, considering
explained. Experimental results are reported in different categories and designed an initial maturity
section 7. Eventually, in section 8, a conclusion and model. Moreover, they implemented a qualitative
a brief point to possible future works are given. assessment technique for interviews to attempt and
analyse the existence or the unavailability of one or In (Bazi, et al., 2017), authors suggested a cloud
many attributes in the text. The final maturity model migration maturity model to ensure dynamism in
stage was designed based on data analysis, strategy, cloud migration procedures. It helps managers for a
people, systems and its infrastructure, electronic complete migration overview to present a strategic
medical record, and information security. plan, achieving an effective management.
In (Nussbaum and Park, 2018), they noticed that As weaknesses, maturity mechanisms only offer
the difficulties in traditional IT agreement are a least compliance framework instead of a required
generally obvious in cybersecurity, as they have cybersecurity standard that work in an emerging
exclusive cybersecurity products and services. Thus, cyber environment. Thus, the mechanism should be
they suggested a prototype, explaining a set of practiced not only by the management, but by security
measures and restrictions that have influence on the specialists to evaluate the complete protection of the
decision-making process of local government group/structure and considering measure to reinforce
towards the cybersecurity agreement. Further, they limitations of any explicit features of the organisation
discussed different conditions for government as recognised by the valuation.
contract and challenges of cybersecurity contracting.
In (Saleem, et al., 2019), authors suggested a
multidimensional holistic model to confirm 3 ZERO TRUST
protection through all aspects of the information
resilience, by innovative technologies, intellectual
The proliferation of cloud computing, mobile device
procedures, and constant evaluations. This layered
use, and the Internet of Things has dissolved
defence model was integrated to assess the security
traditional network boundaries. Hardened network
strengths of critical infrastructures and suggest the perimeters alone are no longer effective for providing
best procedures for cybersecurity analysts towards a enterprise security in a world of increasingly
robust method before combining third-party products.
sophisticated threats. Zero Trust concept compiles
In (Renteria, et al., 2019), authors came up with
strict identity-based verification for every user and
an enabler-founded digital administration maturity
device trying to access resources on a private network
framework, considering three criteria as digital
as well as cloud, regardless of whether they are sitting
government, multidimensional, and availability. within or outside of the network perimeter. No single
Hence, this model is comprised of 7 aspects: specific technology is associated with Zero Trust; it
leadership, regulatory regime, and data to provide
is a holistic approach to network security that
more precise and specific suggestions, strategy,
incorporates different principles and technologies.
governance, technology, and organization.
ZTA is designed to protect digital environments by
In (Gourisetti, et al., 2019), authors designed the
leveraging network segmentation, preventing lateral
cybersecurity weakness improvement framework movement, providing Layer 7 threat prevention, and
through empirical model (CyFEr), using multi-
simplifying granular user-access control.
scenario criteria-based vulnerability analysis and
Zero Trust is a cybersecurity paradigm focused on
multitiered constraint-based optimization. CyFEr was
resource protection and the premise that trust is never
implemented on the NIST cybersecurity structure,
granted implicitly but must be continually evaluated.
checking with a practical cyberattack. It also provides Zero Trust implies not to trust any entity inside or
potential solutions to focus on the user requirements outside of network perimeter at any time. It is focused
to reach an essential cybersecurity maturity, finding
on eliminating trust from within an organization,
the top-ranking solution based on the scalar values.
dictates that no implicit assumptions should be made
In (Niazi, et al., 2020), authors suggested a
about the credibility of users, devices, applications, or
Requirement Engineering Security Maturity Model
data accessing or being accessed on an organization’s
(RESMM) to support software-based industries, network. It provides the visibility and IT controls
suggesting specific requirements in a better way for needed to secure, manage, and monitor every device,
secure software advancement. They studied to know
user, application, and network belonging to or being
the security needs and improved Sommerville’s
used by the organization and its employees to access
requirements engineering procedures. Further, they
data. Zero Trust scrutinizes any incoming or outgoing
also did a questionnaire study, considering identified
traffic. The difference between this and other security
requirements into account. Eventually, they models is that even internal traffic, meaning traffic
considered related works, improved Sommerville’s
that doesn’t cross the perimeter of the organization,
procedures and feedback from the security experts.
must be treated as a potential danger as well.
Zero Trust provides an occasion for scalable Trusted Internet Connections (TIC) and agency
protection structure throughout numerous distinct perimeter firewalls provide strong Internet gateways.
associations. In (Kindervag, 2010) stated that keeping This helps block attackers from the Internet, but the
trust on the cloud and networks is too critical job and TICs and perimeter firewalls are less useful for
then, suggested that it is better to eliminate the idea of detecting and blocking attacks from inside the
trust. Further, author proposed a Zero Trust network. Thus, the Zero Trust separation access
mechanism to improve the protection structures and framework can be thought the next generation
technologies for forthcoming changeability. When mechanism of a firewall, expanding micro
looking at failures inside organizations to stop subdivision of the networks to achieve flexibility,
cyberattacks, especially lateral movements of threats scalability, and virtualization easiness.
inside their networks, they realized that the traditional
security model operated on the outdated assumption
that everything inside an organization’s network 4 THE CHOSEN APPROACH
could be trusted. Instead, Zero Trust inverts model,
directing IT teams according to the guiding principle
Our model is proposed against a semi-honest
of “never trust, always verify” and redefining the adversary who will be conducting the task of IDS but
perimeter to include users and data inside the might have various incentives to disclose the identity
network. Under this broken trust model, it is assumed
or the exact value of some attributes of some records.
that a user’s identity is not compromised and that all
Our assumptions are: Firstly, the adversary is semi-
users act responsibly and can be trusted. The Zero
honest. Secondly, the adversary can observe the
Trust model recognizes that trust is a vulnerability.
anonymized version of the data. The adversary also
So, network users, including threat actors and knows the underlying anonymization algorithm.
malicious insiders, are free to move laterally and Thirdly, the adversary know the original version of a
access/exfiltrate whatever data they are not limited to.
subset of the records. He/she has collected this
In ZTA, none of the pillars, such as clients,
information through running some semantic attacks.
components, applications, and packets are reliable, no
Accordingly, we assume that the adversary knows the
question what kind of unit it is, even though it is part
original prefix values of α% of the prefixes. Finally,
of the network. Zero Trust absolutely redefines the the adversary objective is to find the original version
method to resource separation – a fundamental theory of the most possible number of the anonymized
where resources must be remained safe, are
records. We call these estimations of the adversaries
categorized all together, and separated carefully or
of the original version of the anonymized records
kept unconnected from illegal contact at any type of
“matches”. The larger the number of true matches, is
areas. These mechanisms also present the occasion to
the higher the adversary advantage will become, and
micro-segment systems, allowing groups to change higher privacy leakage incurs.
their demands with not reforming their whole
network. In micro-segmentation, networks are carved
into small granular nodes all the way down to a single
4.1 IDS in Third-Party Setting
machine or application. Security protocols and
Conducting reliable IDS greatly depends on the
service delivery models are designed for each unique
accuracy of the received dataset compared to its
segment. The free flow of data that was once one of
original version. Specifically, most of detection
the cornerstones of the Internet needs to be confined
algorithms rely on learning the benign (or suspicious
to protect networks from penetration, customers from
activities) from the high-dimensional and large-size
privacy violations, and organizations from attacks on
datasets. So, only defence mechanisms with minimal
infrastructure and operations.
modification to the format/entries in the dataset can
ZTA is an end-to-end approach to network/data
lead to trustworthy intrusion detection. Beside the
security that encompasses identity, credentials,
existing homomorphic encryption and computation
access management, operations, endpoints, hosting
over aggregated data techniques, Multi-View (MV)
environments, and the interconnecting infrastructure.
approach (Mohammady, et al., 2018) has been
The focus should be on restricting resource access to
introduced as an effective method to maximally
those with a “need to know.” Traditionally, agencies
benefit the best of both worlds. This approach outputs
have focused on perimeter defence, and authorized
a prefix preserving version of IP address attributes,
users are given broad access to resources. Hence,
and an accurate copy of values in other attributes.
unauthorized lateral movement within a network is
Such a minimally modified version of the network
one of the biggest challenges for organizations. The
traces can be used to accurately conduct analyses.
Moreover, is ensured through hiding that real view their prefixes, are mapped to another two anonymized
among a set of indistinguishable fake views. addresses, e.g., 97.61.5.252, 97.61.5.252, which
share first X bits. As we showed in previous sections,
4.2 Defence: The MV Approach PP is vulnerable to different classes of semantic
attack, and the MV approach presented in the
We first need to formally introduce the prefix following was designed to secure its output. The
preserving anonymization PP (-, K) which is basically schema of the MV approach is presented in Figure 1.
a cryptographic mapping function like CryptoPAn There are 7 steps to this approach, the first five
(Mohammady, et al., 2018) that relies on a secret key are initiated on the data owner side and the final three
K. The most important property of this function is that steps involve the data analyst side. We note that this
it can preserve the prefixes of numeric-value approach assumes that the confidential attribute is the
attributes. Thus, if two real addresses share first X IP address, and if IPs are kept secure, the adversary
bits, e.g., 150.10.10.1, 150.10.20.1 share 14 bits in cannot infer any sensitive information.

Figure 1: MV approach (Mohammady, et al., 2018)

Step 6: Analyst analyses all N views and generates


4.2.1 Implementing effective Zero Trust corresponding reports.
model at the Data Owner Side Step 7: Data owner retrieves report corresponding to
the “real view”, using a private information retrieval
Step 1: Data owner generates two 256 bits (PIR) protocol in a way that analyst cannot identify
Cryptographic keys (K0 & K1), and the original data which view was retrieved.
is anonymized using PP (-, K0). Clearly, the quality of the views generated in the
Step 2: The anonymized trace is partitioned. MV approach is the main factor of the adds-on
Step 3: Each partition is anonymized but repeated for confidentiality the network trace will receive.
a different number of times at different partitions. Specifically, if all other (fake) generated views are
Therefore, seed trace is not prefix-preserved. too far or too close from the original trace (prefix-
Step 4: The seed trace and some supplementary wise and in the presence of some adversary
parameters are outsourced to data analyst. The pseudo knowledge), the MV approach may end up
vector and the seed view generation are designed such compromising a high level of privacy. In particular,
that after “r” number of times view generation, the the adversary can discard many of the (far) fake views
real view (which is identical to the view in step one by looking at the prefix relations in IP addresses and
will be retrieved. compare them to his/her adversary knowledge to
identify any inconsistency. Conversely, a design with
4.2.2 Accuracy Guarantee at the Data fake views generated too close to the real view incurs
Analyst side drastic privacy leakage. In the latter case, the fake
views are not fake. Therefore, in (Mohammady, et al.,
Step 5: Analyst generates N views based on seed 2018), authors proposed a metric called “the
view and supplementary parameters. indistinguishability” to reflect the distance of each
view from the real view. Based on this formalization, the problems associated with the low accuracy of
they suggest two schemes for their partitioning existing intrusion detection models for the multiple
algorithms, i.e., IP-based, and distinct IP based classification of intrusions and low accuracy of class
partitioning: each with their own partition sizes. They imbalance data detection. In their method, a hybrid
conclude that a distinct IP based partitioning with a sampling technique combining adaptive synthetic
customized pseudo vector can significantly (50 times sampling and repeated edited nearest neighbours was
less) reduce the privacy violation. We will detail on applied for sample processing to solve the positive
these methodologies in our next report. and negative sample imbalance issue in the original
dataset. The Feature Selection (FS) was carried out
4.3 Operations Required for IDS by combining RF algorithm and Pearson correlation
analysis to address the feature redundancy problem.
The accuracy of the required analyses must be Afterwards, the spatial features were extracted by
guaranteed alongside its confidentiality because the using a CNN, and then extracted by fusing Average-
first motivation of such outsourcing models is to pooling and Max-pooling, by using attention
benefit from more accurate analysis tasks. For this mechanism to assign different weights to the features,
purpose, we believe the MV approach is an ideal therefore, reducing the overhead and improving the
candidate. In contrast to other solutions, MV allows model performance. At the same time, a Gated
third-party analysts to receive and interact with the Recurrent Unit (GRU) was applied to extract the
raw data. Nevertheless, as MV anonymization entails, long-distance dependent information features to
the original versions of some of the attributes must be achieve comprehensive feature learning. Eventually,
replaced with a cryptographic prefix preserving a Softmax function was used for classification.
version. We now shed light on the impact of such We consider employing such NN-IDS in our
transformation over the outcomes of certain classes ZTA-IDS, and therefore, we will elaborate on the
of IDS algorithms. Since our focus is network traces, accuracy of these special types of IDS.
and we only apply PP on the IP address attributes, all
traffic-level analyses (packet level) are expected to 4.4 Criteria on Security and Accuracy
return trustworthy results, e.g., the traffic of the entire
network, or the traffic on a certain port. With a similar Clearly, the characteristics of the framework, e.g.,
reasoning, the flow level analyses also remain intact. partitioning algorithm, number of views, etc, rely on
The main issue regards the graph-level set of the IDS task. If the IDS task is independent of the
analyses, e.g., subnet-based analyses, throughputs of prefix relation of the IP addresses, e.g., counting the
a subnet, and reachability analyses. number of packets with the size of larger than 300KB,
Recently, solutions with Deep Neural Networks then we can strengthen the confidentiality aspect of
(DNN) and optimisation algorithms have undertaken the solution arbitrarily. On the other hand, if the IDS
traditional machine learning approaches in various task depends on the trustworthiness of the prefix
applications (Abolghasemi, et al., 2022; Ghezelji, et relations, an appropriate degree of MV must be
al., 2022; Tohidi, et al., 2022), due to their strong applied. Interpreting one IDS task’s requirements is
learning power, and IDS is not an exception. one of the main contributions of our ZTA_IDS
In (Kumar, et al., 2020), a misuse-based intrusion framework. For an NN-IDS, depending on the type of
detection system was proposed to detect 5 categories intrusion, the learning module may require
in a network called: Probe, Exploit, DOS, Generic fingerprinting a larger number of attributes to predict
and Normal. This system was based on misuse-based the malicious activities from the benign ones.
model, which permitted it to act as a firewall with Therefore, partitioning algorithm, number of
some extra information added to it. Moreover, unlike partitions, and number of attributes involved in the
most related works, in their paper UNSW-NB15 partitioning must be carefully selected to guarantee a
dataset was considered as the offline dataset to design maximal level of accuracy for an NN-IDS. These are
own integrated classification-based model for the most important parts of our experiments which we
detecting malicious activities in the network. Plus, will elaborate on in the next report.
they generated their own real-time data set at NIT
Patna CSE lab (RTNITP18) that acted as the working 4.4.1 Privacy Preservation
example of their intrusion detection model.
In (Cao, et al., 2022), a network intrusion On the data owner side, ZTA_IDS performs like the
detection model was introduced which fused a CNN MV approach. The only modification is that the
and a gated recurrent unit. Their main was to tackle partitioning algorithm (mainly the parameters) is
defined based on both security and accuracy conducting anomaly-based IDS, to distinct IP based
requirements. An appropriate choice of partitioning partitioning for the prefix groups of with length only
could vary from distinct IP based partitioning for the one or even less when conducting packet/flow level
prefix groups of with length three octets when analyses. The overview is depicted in Figure 2.

Privacy Preservation

Intrusion
Utility Realization

Data
Step6: Intrusion Detection Analyst

Figure 2: Overview of the ZTA-IDS approach

are more correlated. AE consists of 2 components: the


4.4.2 Actions on Data analyst Side encoder that compress input features, and the decoder
which is discarded after pre-training. Thus, a deep AE
On the data analyst side, ZTA-IDS follows the can be used to extract a combined and compressed
MV approach to generate different dataset views. feature from network trace attributes. In the AE, the
Next, the analyst pre-processes all views to a format bottleneck feature z is extracted using the encoder
suitable for the operations. Plus, the analyst may need function from the original data X. The decoder
to reorder the entire trace based on time after function - maps the bottleneck z to the output 𝑋̂ The
generating the views. The prefix-to-prefix decoder is expected to reconstruct the input as Eq. (1).
communication is another set of queries for which the 𝜓=𝑋 ⇒𝑧,
analyst must run some pre-processing to reduce the 𝜙 = 𝑧 ⇒ 𝑋̂ , (1)
impact of anonymization. He/she will then run the ̂
𝜓 , 𝜙 = 𝑎𝑟𝑔𝑚𝑖𝑛‖𝑋 − (𝜙(𝜓(𝑋)))‖
2

IDS algorithms on each of views and returns the


Mean Squared Error (MSE) loss function of the
outcomes to the data owner.
AE is as Eq. (2).
2
𝐿(𝑋, 𝑋 ′ ) = ‖𝑋 − 𝑋̂ ‖ = ‖𝑋 − 𝜎( 𝑊0 (𝜎(𝑊𝑋 + 𝑏)) + 𝑏0 )‖2 (2)
Where 𝑋 − 𝑋̂ is usually averaged over a mini-
5 THE PROPOSED MODEL batch input training set. W, 𝑊0 are weight matrices
and b, 𝑏0 are bias vectors for encoder and decoder,
The architecture consists of 4 modules: AE feature respectively. Bias is not used for encoder part to
extractor, Convolution blocks, Attention mechanism aggregate input feature only.
and LSTM layers. The proposed model is inspired The structure of the AE is shown in Figure 3,
from AE-CNN for binary NID. Thus, we use the where dotted lines are discarded after training the AE.
extracted features as the input to our neural network The bottleneck features of the trained AE, which are
for categorical intrusion detection. We also utilize more spatially related, are used as input to CNN-
Attention module to focus on important features and LSTM.
LSTM layers to handle temporal dynamics.

5.1 AE Feature Extraction


The compressed bottleneck features of network traces
are extracted via a pre-trained deep AE. The AE
bottleneck layer maps the original input into a
compressed representation where the input features Figure 3: AE feature extractor for network traces.
Since AE with a bottleneck layer accepts any intrusion related features. The alternative structure for
numerical value and compresses the information multi-head self-attention on top of CNN, would be a
available in the input numerical values, pre- linear Flatten layer, which maps the CNN multi-
processing and FS is not needed. dimensional features into one large dimension. The
total number of features in this layer, is the same as
5.2 Convolutional Neural Network the total number of CNN features in all dimensions.
(CNN) We also report the results of the proposed method
with a linear Flatten layer instead of Attention
We use a CNN to consider spatially related mechanism. Plus, we use BiLSTM layers after the
features extracted using the AE. The classifier is Attention to handle the temporal dynamics between
applied on a bottleneck feature extracted from a the sequences of network traces.
trained AE for NID because CNNs work well with
data that has a spatial relationship. The CNNs are also 5.4 BiLSTM Classifier
known to be good feature extractors because of local
convolution filters, repetitive filters among whole Since LSTMs can hold or forget information for a
input data, and pooling layers which make it robust. long time, we propose to use LSTMs to handle the
Here, we also use a tuned 1D CNN to handle spatial temporal dynamics. Also, BiLSTM is able to take
dependencies within traces of data. Our proposed forward and backward sequences into consideration
CNN structure is shown in Figure 4, and LeakyReLU4 which can be important in handling temporal
with 0.2 negative slope is considered as activation dynamics. We use 2 BiLSTM layers with 128-
function for hidden layers. In convolution layers, the dimensional representations. A dropout with
first number is for filters and the number in probability of 0.2 is applied between 2 layers of
parentheses is the convolution filter size, e.g., first BiLSTM. Finally, a linear layer with the number of
layer has 128 filters, where 11 is the convolution filter neurons same as target categories is applied (from 2
size. A pooling with size of 2 is only applied on the to 10). In the subsequent paragraphs, we will review
first convolution layer. In the CNN layer output, we the training and test conditions along with the
have 256*5 features, which its knowledge needs to be evaluation results.
aggregated together, since it has a high dimension of
feature vector to feed into any regular layer.
6 EVALUATION METHODS
In this section, we explain the chosen evaluation
metrics and data.

6.1 Training Setup


All experiments are implemented in PyTorch and
conducted on Colab platform with a batch size of 32.
The AE is trained to minimize MSE criterion as loss
Figure 4: The proposed CNN to handle spatial
dependencies of network traces
function, which is also known as reconstruction error.
Both encoder and decoder parameters are considered
and trained independently. The dimension of the
5.3 Attention Module bottleneck features is considered as 64, which is
compact enough to compress input features. We used
We apply a multi-head Self Attention module to
the Adam optimizer with learning rate of 1e-4 and
aggregate the information available in extracted
weight decay of 1e-5 to minimize the reconstruction
features and handle the relation between the CNN
loss. The model is trained until no more improvement
features and LSTM components (subsequent layers).
is possible according to validation results. All data
The attention module dimension is the number of
attributes are normalized to numerical values between
channels from the last CNN layer (256) and uses 8
0 and 1. Thus, non-numerical attributes are converted
heads. The output is mapped into 64 dimension to
into numerical values using one-hot encoding. The
limit the features. The attention learns to focus on
training dataset is set to prevent over-fitting.

4
https://pytorch.org/docs/stable/generated/torch.nn.LeakyReLU.html
The bottleneck features extracted from the trained of removing imbalanced data attributes to have a fair
AE are fed into the CNN for further training and assessment. To show the advantage of the proposed
processing while the AE is frozen. Since the compact structure we report the binary classification results
features of input attributes are available in the using the same data structure, which train, and test
bottleneck layer with 64 neurons, the CNN input data are used in reverse, so, have a few training
spatial dimension is 64 and the sequence number samples. Additionally, we propose a nearly balanced
equals the size of mini batch. sampling procedure to enhance the detection of the
The CNN modules are trained with learning rate categories with fewer samples in the CNN module.
of 1e-3, while the Attention and BiLSTM modules are Due to the sequential nature required to train the
trained with learning rate of 1e-4 for a maximum 50 LSTM, we cannot use any sampling strategy to train
epochs. We used cross-entropy loss as the classifier it. The UNSW-NB15 dataset is highly imbalanced,
loss function and Adam optimizer. the Normal category has 56000 samples for training
while the Worms category only has 130 samples. We
6.2 Data reduce the impact of this imbalance by sampling
based on a smoothing probability function as Eq. 3.
We evaluate the proposed method on the UNSW- #𝑐𝑙𝑖 −(1−
min #𝑐𝑙𝑖
#𝑐𝑙𝑖
+𝜖)𝑚𝑒𝑑𝑖𝑎𝑛
NB15 dataset (Moustafa and Slay, 2015), which is 𝑃(𝑐𝑙𝑖 ) = 10 min #𝑐𝑙 (3)
∑𝑗=1 #𝑐𝑙𝑖 −(1− +𝜖)𝑚𝑒𝑑𝑖𝑎𝑛
comprised of a hybrid of real modern normal (#𝑐𝑙𝑗

activities and synthetic contemporary attack In this equation, 𝑐𝑙𝑖 means ith category (class), so
behaviours. This is an upgraded version of the KDD 𝑃(𝑐𝑙𝑖 ) is the probability of choosing a sample from
cup dataset which is a more balanced dataset. The ith category, calculated using number of samples in
UNSW-NB15 dataset contains ten classes, namely: each category (#𝑐𝑙𝑖 ) and median of the number of
Normal, Fuzzers, Analysis, Back-doors, DoS, samples per category. We use a small 𝜖 (0.1) to
Exploits, Generic, Reconnaissance, Shell Code, and prevent zero addition for the category with minimum
Worms. We use the train and test subsets of the number of samples. The minimum number of samples
UNSW-NB15 dataset with 175343 and 82337 records is 130 associated with category Worms, and the
respectively. In this dataset each record has a 42- median is 11378. The proposed sampling strategy
dimensional feature, which 3 features of them are keeps the ordering of the number of the categories but
non-numerical values and need pre-processing to be make the sampling more balanced by reducing the
fed into Neural Networks since the input of NN distance between number of items in each category.
should be a digital matrix. These 3 features are
protocol, service, and state with 133, 13, and 11
symbol attributes, respectively. 7 EXPERIMENTAL RESULTS
One-hot encoding is used to map non-numerical
attributes of the data set to numerical feature vectors. In this section, we report the experimental results of
In total, the pre-processed input feature size would be the implementing NID methods for ten, six, and two
196. Then, the features are normalized between 0 and categories.
1 which are used to train the AE unsupervised. The
64-dimensional bottleneck features extracted from 7.1 Ten Categories CNN-BiLSTM Data
the trained AE are used for next experiments.
Since classes of attacks are unbalanced, most
Classification
studies reduce this number by merging some
For hyper parameter optimization, we explored
categories together or removing some of them. Binary
the optimal number of layers and neurons for each
classification means all 9 categories are merged into
part of CNN-BiLSTM with Attention module. To
1 class as Intrusion, consequently the classes would
evaluate the effect of each module, the results of
be Intrusion/Non-Intrusion in this scenario. However,
BiLSTM, CNN-BiLSTM (with linear layer in
some works tried to merge the categories that are not
between), and Attention-based CNN-BiLSTM are
far from each other. Some other works reduced the
compared to related methods in Table 1 for ten
amount of imbalance by removing the categories with
categories data classification. Since the results of the
fewer number of existing items including Backdoor,
baseline CNN model using AE features are not
Analysis, Shellcode, Worms and sometimes Fuzzers
available for categorical classification, we
which cause imbalance in the data.
implemented it and reported the results for
Further, we compare the results of 10 categories
comparison. Attention-based CNN-BiLSTM using
classification with related works and report the result
AE bottleneck features out-performed other related
works using deep learning methods for 10 categories We compare Recall, Precision and F-measure of
NID. The confusion matrix of the proposed model is the proposed method (balanced sampler) in Table 2.
shown in Figure 5. Accordingly, most errors of
Analysis, Backdoor and Exploits Attacks are Table 2: Results of NID on test data
misclassified as DoS. More, Fuzzers and Exploits are
misclassified interchangeably. None of Analysis and Method Precision Recall F-measure
Decision Tree C5 (Kumar, et al.,
Backdoor records are predicted correctly. Only 3 2020)
- 75.8% 75.54%

records of Worms class are predicted correctly. Rule-based (Kumar, et al., 2020) - 65.21% 68.13%
FS + ANN (Kasongo and Sun,
Consequently, removing imbalanced data attributes 2020)
79.50% 77.53% 77.28%

including Backdoor, Analysis, Shellcode and Worms Proposed method 60.24% 78.5% 62.62%

should obviously improve the accuracy. Thus,


removing Fuzzers may lead to better accuracy. As can be seen, pre-training CNN using balanced
sampler outperforms standard training and other
works in terms of accuracy. The confusion matrix of
the proposed method with balanced sampler for 10
categories is shown in Figure 6. The number of
misclassifications for each category is low, especially
for Normal category.

Figure 5: Confusion matrix of ten categories for CNN-


BiLSTM with AE feature extractor

7.2 Pre-train CNN Using Balanced


Data Sampler
We use a balanced data sampler to pre-train the CNN Figure 6: Confusion matrix of pre-trained CNN via
for later usage in CNN-BiLSTM with Attention balanced sampler, Attention and BiLSTM with AE feature
module instead of reducing the number of categories. extractor
Our goal is to improve the discrimination of the
model to learn discriminate the data better even when 7.3 Six Categories CNN-BiLSTM Data
the number of training samples are imbalanced. Classification
Hence, we use the trained CNN in CNN-BiLSTM
with Attention module to enhance the detection of Since recent works removed imbalanced data for their
network intrusion. The accuracy results of experimental results and most of them reported with
CNNBiLSTM with Attention module with and six categories, we also experiment the proposed
without balanced sampler are compared in Table 1. method with removing four imbalanced categories.
The results are reported in Table 3.
Table 1.: Accuracy of 10 categories NID on test data
Table 3: Accuracy of 6 categories NID on test data
Method Accuracy
BiLSTM 77.46% Method Accuracy
CNN 78.23% CNN 82.01%
CNN-BiLSTM 78.76% BiLSTM 83.11%
CNN-Attention + BiLSTM 87.76% MLP + IGRF-RFE (Yin, et al., 2022) 84.24%
CNN-Attention 88.13% Rule-based (Kumar, et al., 2020) 84.84%
Proposed method (Standard Sampler) 87.76% CNN-GRU + RFP (Cao, et al., 2022) 86.25%
Proposed method (Balanced Sampler) 91.72% CNN-BiLSTM 86.28%
CNN-Attention 87.54%
CNN-Attention + BiLSTM 89.79%
According to the results, CNN and BiLSTM both
As seen, the proposed method outperforms perform well for NID using AE bottleneck features.
related methods for 6 categories classification However, an Attention module is needed to handle
(confusion matrix in Figure 7). the relation between the components of these two
structures and compose them together.

8 CONCLUSIONS
In this paper, we have introduced a new cybersecurity
maturity model with Zero Trust. The proposed model
effectively protects various systems, crucial
infrastructure, networks, data, services, end-users
from critical security risks, achieving different
security requirements. This will present an analysis
report for researchers to understand the requirement
of a cybersecurity maturity model along with the Zero
Trust policy in the fast-growing world. Moreover, it
helps organizations and governments in the decision-
Figure 7: Confusion matrix of 6 categories for CNN- making process of different security standards.
BiLSTM with AE feature extractor Further, we have proposed an Attention CNN with
Bi-directional Long Short Term Memory (CNN-
7.4 Binary CNN-BiLSTM Data BiLSTM) using Auto-Encoder Bottleneck features
Classification for Network Intrusion Detection System. We utilized
the compressed bottleneck features of the Auto-
The hyper parameters for binary classification model Encoder. We also used a CNN to consider the spatial
have been kept same as the multi-class model. To relation between extracted features. A multi-head Self
evaluate each module, the results of BiLSTM, CNN- Attention module is applied on CNN to aggregate the
BiLSTM, and Attention-based CNN-BiLSTM are features and attend to the most important parts of the
compared to the most related work using CNN with CNN feature maps for BiLSTM in the next layer.
AE bottleneck features for NID in Table 4. Finally, two BiLSTM layers are used for
For a fair comparison, our data should be similar. classification. To reduce the problem of data
We used train and test data interchangeably to have a imbalance, we also propose a balanced sampler for
fair comparison with CNN and AE method. pre-training the CNN. Our experimental results
As can be seen from Table 4, using BiLSTM showed that our proposed approach outperforms
decreases the accuracy of the model especially in state-of-the-art methods for 2, 6 and 10 categories
combination of CNN. It can be due to the high with classification accuracy of 93.01%, 89.79% and
dimension of CNN output, which is fed into the 91.72% on test set of UNSW-NB15 dataset.
BiLSTM layers. However, using an Attention module As future works, we will apply transfer learning
on CNN to aggregate the CNN features for feeding methods and metaheuristic methods to reduce the
into BiLSTM layers outperformed CNN and number of parameters in such a complicated
BiLSTM models. Since other binary classification structures which can be run on edge devices.
methods using original train and test dataset for NID
have reached almost 100% accuracy, more
experiments and improvement are not needed. REFERENCES
Table 4: Accuracy of NID (binary classification) Abolghasemi, Majid; Dadkhah, Chitra; Tohidi, Nasim.
(2022). HTS-DL: Hybrid Text Summarization System
Method Accuracy using Deep Learning. The 27th International Computer
CNN-BiLSTM 78.93% Conference, the Computer Society of Iran. Tehran,
FS + DNN (Kanimozhi and Jacob, 2019) 89% Online. Retrieved from
BiLSTM 90.84% https://ieeexplore.ieee.org/abstract/document/9780395
CNN 92.23%
Bazi, Hamid reza; Hassanzadeh, Alireza; Moeini, Ali.
CNN-Attention + BiLSTM 93.01%
(2017). A comprehensive framework for cloud
computing migration using Meta-synthesis approach. Mohammady, M.; Wang, L.; Hong, Y.; Louafi, H.;
Journal of Systems and Software, 128, 87-105. Pourzandi, M.; Debbabi, M.;. (2018). Preserving both
Cao, Bo; Li, Chenghai; Song, Yafei; Qin, Yueyi; Chen, privacy and utility in network trace anonymization. The
Chen. (2022). Network Intrusion Detection Model 2018 ACM SIGSAC Conference on Computer and
Based on CNN and GRU. Applied Sciences, 12(9). Communications Security.
Retrieved from https://doi.org/10.3390/app12094184 Moustafa, Nour; Slay, Jill. (2015). UNSW-NB15: a
Eusgeld, Irene; Nan, Cen; Dietz, Sven. (2011). “System-of- comprehensive data set for network intrusion detection
systems” approach for interdependent critical systems (UNSW-NB15 network data set). Military
infrastructures. Reliability Engineering & System Communications and Information Systems Conference
Safety, 96(6), 679–686. (MilCIS). Canberra, ACT, Australia: IEEE.
Ghezelji, Mazyar; Dadkhah, Chitra; Tohidi, Nasim; doi:10.1109/MilCIS.2015.7348942
Gelbukh, Alexander. (2022). Personality-Boosted Nahari H.; Krutz R. L.,. (2003). System Security
Matrix Factorization for Recommender Systems. Engineering Capability Maturity Model (SSE-CMM)
International Journal of Information and Model Description Document. Version 3.0. Carnegie
Communication Technology Research (IJICTR), 14(1), Mellon University.
48-55. Newhouse, William; Keith, Stephanie; Scribner, Benjamin;
Gourisetti, S.N.G.; Mylrea, M.; Patangia, H.; . (2019). Witte, Greg. (2017). National Initiative for
Cybersecurity Vulnerability Mitigation Framework Cybersecurity Education (NICE) Cybersecurity
Through Empirical Paradigm (CyFEr): Prioritized Gap Workforce Framework. Gaithersburg: National Institute
Analysis. IEEE Systems Journal, 14(2), 1897-1908. of Standards and Technology.
Jiang, Kaiyuan; Wang, Wenya; Wang, Aili; Wu, Haibin. Niazi, Mahmood; Saeed, Ashraf Mohammed; Alshayeb,
(2020). Network Intrusion Detection Combined Hybrid Mohammad; Mahmood, Sajjad; Zafar, Saad. (2020). A
Sampling With Deep Hierarchical Network. IEEE maturity model for secure requirements engineering.
Access, 8, 32464 - 32476. Computers & Security, 95.
doi:10.1109/ACCESS.2020.2973730 Nussbaum, B.; Park, S., . (2018). A tough decision made
Kanimozhi, V.; Jacob, Prem. (2019). UNSW-NB15 dataset easy? local government decision-making about
feature selection and network intrusion detection using contracting for cybersecurity. . The 19th Annual
deep learning. Journal of Recent Technology and International Conference on Digital Government
Engineering, 7(1), 443–446. Research: Governance in the Data Age.
Karabacak, Bilge; Yildirim, Sevgi Ozkan; Baykal, Nazife. Renteria, C.; Gil-Garcia, J.R.; Pardo, T.A.;. (2019). Toward
(2016). A vulnerability-driven cyber security maturity an Enabler-Based Digital Government Maturity
model for measuring national critical infrastructure Framework: A Preliminary Proposal Based on Theories
protection preparedness. International Journal of of Change. The 12th International Conference on
Critical Infrastructure Protection, 15, 47-59. Theory and Practice of Electronic Governance.
Kasongo, Sydney M.; Sun, Yanxia;. (2020). Performance Saleem, D.; Sundararajan, A.; Sanghvi, A.; Rivera, J.;
Analysis of Intrusion Detection Systems Using a Sarwat, A.I.; Kroposki, B.; . (2019). A
Feature Selection Method on the UNSW-NB15 Dataset. Multidimensional Holistic Framework for the Security
Journal of Big Data, 105(7). of Distributed Energy and Control Systems. IEEE
doi:https://doi.org/10.1186/s40537-020-00379-6 Systems Journal, 14(1), 17-27.
Kindervag, J. (2010). Build Security Into Your Network’s Tohidi, Nasim; Rustamov, Rustam B. (2022). Short
DNA: The Zero Trust Network Architecture. For Overview of Advanced Metaheuristic Methods.
Security & Risk Professionals, Forrester Research. International Journal on Technical and Physical
Kumar, Vikash; Sinha, Ditipriya; Das, Ayan Kumar; Problems of Engineering (IJTPE), 14(51), 84-97.
Pandey, Subhash Chandra; Goswami, Radha Tamal. White, G. (2011). The community cyber security maturity
(2020). An integrated rule based intrusion detection model. IEEE international conference on technologies
system: analysis on UNSW-NB15 data set and the real for homeland security (HST).
time online dataset. Cluster Computing, 23, 1397–1418. Yin, Yuhua; Jang-Jaccard, Julian; Xu, Wen; Singh,
Retrieved from https://doi.org/10.1007/s10586-019- Amardeep; Zhu, Jinting; Sabrina, Fariza; Kwak, Jin.
03008-x (2022). IGRF-RFE: A Hybrid Feature Selection
Mettler, T.; Blondiau, A.;. (2012). HCMM-a maturity Method for MLP-based Network Intrusion Detection on
model for measuring and assessing the quality of UNSW-NB15 Dataset. arXiv:2203.16365. Retrieved
cooperation between and within hospitals. 25th IEEE from https://doi.org/10.48550/arXiv.2203.16365
International Symposium on Computer- Based Medical
Systems (CBMS).

You might also like