Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views12 pages

Real-Time Network Packet Classification Exploiting Computer Vision Architectures

This paper presents a novel real-time network packet classification method that utilizes computer vision techniques to enhance cybersecurity in 6G networks. By converting raw packets into images for analysis, the proposed lightweight classification scheme demonstrates superior performance using a customized Convolutional Neural Network (CNN), achieving high F1-scores across various packet window sizes. The approach emphasizes early threat detection at the network edge, addressing the challenges posed by increasing device connections and sophisticated cyber threats in future network infrastructures.

Uploaded by

fsihaahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

Real-Time Network Packet Classification Exploiting Computer Vision Architectures

This paper presents a novel real-time network packet classification method that utilizes computer vision techniques to enhance cybersecurity in 6G networks. By converting raw packets into images for analysis, the proposed lightweight classification scheme demonstrates superior performance using a customized Convolutional Neural Network (CNN), achieving high F1-scores across various packet window sizes. The approach emphasizes early threat detection at the network edge, addressing the challenges posed by increasing device connections and sophisticated cyber threats in future network infrastructures.

Uploaded by

fsihaahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 28 November 2023; revised 17 January 2024; accepted 4 February 2024.

Date of publication 6 February 2024; date of current version 21 February 2024.


Digital Object Identifier 10.1109/OJCOMS.2024.3363082

Real-Time Network Packet Classification Exploiting


Computer Vision Architectures
EMILIO PAOLINI 1,2,3 (Student Member, IEEE), LUCA VALCARENGHI 1 (Senior Member, IEEE),
LUCA MAGGIANI3 , AND NICOLA ANDRIOLLI 4 (Senior Member, IEEE)
1 Scuola Superiore Sant’Anna, TeCIP Institute, 56124 Pisa, Italy

2 CNR, Istituto di Elettronica e di Ingegneria dell’Informazione e delle Telecomunicazioni, 56122 Pisa, Italy

3 Sma-RTy Italia, 20061 Carugate, Italy

4 Department of Information Engineering, University of Pisa, 56122 Pisa, Italy

CORRESPONDING AUTHOR: E. PAOLINI (e-mail: [email protected]).


This work was supported in part by the Project CLEVER under Project 101097560, which is supported by the Key Digital Technologies Joint Undertaking and
Its Members [including top-up funding by the Italian Ministry of Research and University (MUR)]; in part by the European Union under the Italian National
Recovery and Resilience Plan (NRRP) of NextGenerationEU, partnership on “Telecommunications of the Future” (Program “RESTART”) under
Grant PE00000001; in part by the Project Smart Computing and Communication at the Edge (SMARTCOM) funded by Sma-RTy and CNR;
and in part by the Italian MUR in the framework of the FoReLab Project (Departments of Excellence).

ABSTRACT Forthcoming 6G/NextG networks highlight the need for advanced Artificial Intelligence
(AI)-based security mechanisms to identify malicious activities and adapt to emerging threats. In this
context, the integration of computer vision techniques into the cybersecurity field is promising due to
their potential for sophisticated pattern recognition. In this paper we introduce a computationally efficient
classification scheme acting directly on the raw packets collected at base stations and enforcing real-time
conversion of packets into images. The innovative points of the proposed solution are the lightweight
implementation, aligning well with the demands of future 6G networks, and the operation at network
edge, enabling early threat identification as close as possible to the packet origin. We investigate the
performance of this approach both in terms of F1-score and prediction time using state-of-the-art computer
vision architectures and a customized Convolutional Neural Network (CNN) in an intrusion detection
problem using a 5G dataset. Experimental results show the superiority of the CNN architecture over
complex models. Across multiple packet window sizes N (i.e., 10, 50, 100 packets), the CNN consistently
outperforms the other state-of-the-art computer vision models, achieving very high F1-scores (namely,
0.99593, 0.99860, 0.99895). A scalability analysis highlights a trade-off between CNN scalability and
performance, where larger N values lead to increased prediction time. On the other hand, the other
computer vision models exhibit better scalability, enabling an optimal model selection without trade-offs.

INDEX TERMS DoS, computer vision, artificial intelligence, 6G networks, packet classification,
convolutional neural networks.

I. INTRODUCTION holographic communications, and digital twin will leverage


S 5G network infrastructures are being deployed, with the deployment of 6G network infrastructures to fully achieve
A a more pervasive growth expected in the next few
years [1], both academy and industry are now focusing on
their potentials [3].
Among the many benefits, 6G networks will provide
6G/NextG to fulfill the requirements of applications of the extreme capacity, reliability, and efficiency. To achieve
next decade. Indeed, in many scenarios the limitations of 5G these challenging performance targets, it is expected that
networks are evident in terms of data rate, latency, global 6G networks deploy intelligent operations in both network
coverage, etc. [2]. Applications such as extended reality, orchestration and management [4]. Hence, along with


c 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
VOLUME 5, 2024 1155
PAOLINI et al.: REAL-TIME NETWORK PACKET CLASSIFICATION EXPLOITING COMPUTER VISION ARCHITECTURES

network simplification exploiting Radio Access Network attributes like source and destination addresses, ports are
(RAN)-Core Network (CN) convergence, a key technol- encoded. This transformation allows viewing network traffic
ogy will be Artificial Intelligence (AI), enabling the as visual patterns, enabling the application of Convolutional
transition from connected things to collective network intel- Neural Networks (CNNs) and other image-based algorithms
ligence [5], [6]. The advent of AI-driven functionalities in for analysis. As discussed in [13], [14], [15], leveraging
6G will enable the deployment of proactive networks. These the transformation of network traffic to images not only
networks can perform operations in an autonomous way, facilitates efficient and real-time data processing, but also
such as self-management to maintain the desired network enables the use of pre-existing image analysis tools, opening
performance level, or self-protection to secure the network up new possibilities for enhancing network security.
and deal with threats. Hence, 6G security vision has a tight Hence, in this work we firstly describe how packets,
integration with AI, leading to the paradigm of security exploiting their temporal relationships, can be transformed
automation [4]. Security design exploiting AI systems will to images ready to be used as inputs to computer
become pivotal to autonomously detect and mitigate threats vision algorithms. Then, we study the performance of this
rather than current cryptographic methods [1]. approach exploiting both well-known CNN architectures
Threat mitigation system, i.e., proactively recognizing and and a purpose-built CNN architecture, called afterwards
addressing potential dangers, thereby safeguarding protecting customized-CNN in an intrusion detection problem, exploit-
against unforeseen risks and vulnerabilities, will be the key ing a 5G dataset. Differently from most implementations
element for enabling future networks in critical scenarios, to date, the transformation of network traffic to images
such as military and banking applications. Additionally, is done directly on raw packets, which can be directly
the massive device connections to 6G networks will also collected at the base station, enabling a truly real-time system
pose new challenges to Denial of Service (DoS) attack protection. This features is essential in a system amenable
detection, resulting in traditional DoS mitigation methods to future networks, complying with 6G requirements on
outdated [7], [8]. Subsequently, statistical and AI-based latency and alleviating the DoS attack damage, since a threat
methods can cope with different types of malicious traffic [9], can be identified quickly and as close as possible to where
identifying, mitigating, and preventing these attacks. it is generated. Moreover, immediate detection holds great
Therefore, many works in recent years have focused on importance in this scenario due to the projected expenses
the possibility to build AI-based systems for defending associated with service interruptions [16]. Consequently, an
wireless networks [10]. However, future networks will be on-site solution at the base station level that can effectively
characterized by heterogeneous devices and traffic, demand- detects threats in real-time becomes of essential importance
ing more advanced classifiers. In the rapidly evolving for the future of 6G/NextG wireless networks.
landscape of network security, the integration of computer
vision techniques for cybersecurity applications represents II. RELATED WORKS
an opportunity, with the promise to enable sophisticated Network security has been one of the prime concerns in
pattern recognition strategies. Indeed, similarities between 5G networks to provide increased user privacy, new trust
DoS and computer vision technique lie in their shared and service models and enable the support for Internet of
purpose of complex pattern recognition. In computer vision, Things (IoT) and mission-critical applications [17], [18].
algorithms process visual information to recognize intricate Network protection must be strengthened and enhanced
patterns within images and video. This process involves for the safe deployment of different 6G verticals [19]. To
many layers of abstraction, where lower layers detect overcome some of the additional security challenges imposed
basic features like edges, while higher layers mix these by novel network architectures, researchers have focused on
features to identify complex objects or scenes. Similarly, in novel approaches suitable for 6G networks. Deep Learning
the context DoS attack detection, network traffic analysis (DL) systems have been showing promising results in threat
involves identifying anomalous patterns. This recognition of mitigation [20] thanks to their capability of extracting high-
patterns aims at distinguishing normal network behavior from level features.
malicious activities. As in computer vision, effective DoS For example, in [21] an Intrusion Detection System (IDS)
attack detection often requires the extraction of meaningful is developed based on CNN, capable of performing classi-
features from the network traffic, followed by classification fication on statistics extracted from complete traffic flows
or anomaly detection techniques to discern malicious behav- of the CIC-IDS 2018 dataset [22]. The proposed solution is
ior [11]. Hence, algorithms such as image retrieval and object compared with a Recurrent Neural Network (RNN) model,
shape recognition adapted from computer vision techniques showing the advantages of the feed-forward model over
can offer an effective solution to the threat identification its recurrent counterpart. Although the architecture seems
challenge [12]. promising, an important limitation hampers its deployment in
By converting network traffic data into matrix repre- future network infrastructures: the training of the AI model
sentations, computer vision techniques can be leveraged is performed on statistics extracted from traffic flows; this
to extract meaningful patterns and features. Each network approach is not suited to work on real-time traffic due to the
flow can be mapped into a pixel grid, where various need to wait for complete traffic flows at the base station.

1156 VOLUME 5, 2024


Another work that exploits Deep Neural Networks (DNNs) host, source/destination port number, that may hinder the
for an IDS is proposed in [23]. The authors carried out a generalization capabilities of the ML model. Furthermore,
comparative study of IoT IDS with three DL models: DNN, exploiting payload data, as proposed by the authors, at the
Long Short-Term Memory (LSTM), and CNN. It is shown 5G/6G base stations level is unfeasible due to encrypted
that DL models outperform the other methods applied in packets.
IoT IDS environment. The study only focuses on the CIC- In contrast to all the aforementioned works, the research
IDS 2017 dataset [22], which cannot be considered as a proposed in this paper differs in many aspects. First of
good benchmark for a 5G/6G scenario because the dataset all, we investigate multiple computer vision architectures,
has not been collected in a real 5G network and thus the allowing us to explore a broader spectrum of possibilities in
packet characteristics, e.g., packet inter-arrival time, can be our investigation. Through the exploitation of preprocessing
very different with respect to the ones of a mobile network. techniques, we discuss how a real-time transformation
Furthermore, the authors use the csv format of the dataset, of network packets into images is practically possible.
i.e., statistics extracted from complete traffic flows, again Furthermore, the utilization of a very recent dataset [24]
hampering the possibility to deploy such systems in a real- collected within a 5G environment and never exploited with
time environment. computer vision techniques allows us to set a first benchmark
Tailored to specific 5G datasets, both works in [24], [25], for future studies.
deal with traffic classification. The first works on fea-
tures extracted from complete traffic flows, hampering its III. PROPOSED ARCHITECTURE
exploitation on 5G/6G scenarios. Concerning the latter, a In this section, we first describe how network packets can
PCAP-to-Embeddings techniques is proposed, where Long be transformed into images, with a focus on the used
Short-Term Memory Autoencoders are used for embeddings features and the corresponding preprocessing techniques.
generation followed by a Fully-Connected network for Then, we give an insight on how the proposed method can
classification purposes. be implemented in a next generation eNB (gNB), showing
At the border between computer vision techniques and its amenability to future 6G/NextG wireless infrastructure.
DoS traffic detection, authors in [26] propose to exploit
ResNet architecture to detect malicious packets. Results are A. FROM NETWORK TRAFFIC TO IMAGES
obtained on the CICDDoS2019 dataset [27], which, although The packet represents the basic unit of data transferred
being recent, does not resemble 5G/6G traffic characteristics, over a computer network. Each packet contains a part of
such as packet inter-arrival times. Furthermore, the authors the complete message and embeds information that helps
consider only ResNet as a benchmark, not exploiting other identifying the traffic flow. The latter can be identified by a
computer vision architectures. 5-tuple composed of source and destination IPs, source and
Another interesting work in the context of computer destination ports, and protocol used.
vision techniques applied to network traffic is [11], in In this work, relying on the concept of network traffic
which the authors discuss a multivariate correlation analysis flow, an encoding scheme to translate packet attributes
technique to accurately represent the network traffic records into a structured format, i.e., matrices, is proposed. By
and convert them into corresponding images. The detection structuring the input as packet matrices, we create a spatial
system is developed based on Earth Mover’s Distance data representation. This representation enables the Neural
(EMD), a widely used dissimilarity measure. EMD considers Network (NN) to learn the traits of both DoS attacks and
cross-bin matching, resulting in a more precise evaluation benign traffic by employing convolutional filters that slide
of the dissimilarity between distributions compared to other across the input, identifying crucial patterns. Network traffic
dissimilarity measures like Minkowski-form distance Lp and classification leveraging CNNs allows us to exploit one of
X 2 statistics. The experiments are conducted using two old their main advantages: the ability to identify DoS patterns
datasets [28], [29] that do not contain recent DoS threats; irrespective of their temporal occurrence in the data. This
in addition the proposed methodology works by building intrinsic quality, i.e., producing consistent outputs despite
normal traffic profiles, hence not being able to distinguish the location of patterns in the input, is one of the paramount
among different types of attacks. In the same context, features of CNN architectures [31].
in [14] the authors describe a way to capture network Specifically, the approach consists in (i) identifying F
traffic using pcap files and then convert these into a 2D features, e.g., Time-to-live and packet length, that can be
image using a visual representation tool, i.e., binvis. For extracted from packets belonging to a given flow, and (ii)
efficiency, the packets are divided into multiple chunks defining a maximum number of packets N for each flow
before this conversion process. The proposed approach is within the time window T [32]. Hence, the maximum size
limited by the exploitation of binvis tool, that might slow of the input matrices will be N × F. To have a real-time
down when dealing with substantial volumes of data, as in approach, if N packets are not collected within the time
6G networks [30]. Finally, the work in [15] describes a way window, the matrix is padded with 0s. This allows the method
to transform packets into images considering both header and to adapt to situations with long packet inter-arrival time.
payload. The authors include features like source/destination Finally, each attribute is normalized to the interval [0, 1].

VOLUME 5, 2024 1157


PAOLINI et al.: REAL-TIME NETWORK PACKET CLASSIFICATION EXPLOITING COMPUTER VISION ARCHITECTURES

TABLE 1. Features used to create input images and the corresponding preprocessing techniques.

Hence, in future 6G networks, a new approach will be


exploited providing more flexibility in network deployment,
where the RAN and the CN functions can be converged
in the same platform and optimized together according to
the use-case requirements [6], [33], [34]. With a less strict
separation between RAN and CN, each 6G base station
can be equipped with functionalities coming from both
CN and RAN, ultimately deploying a local CN on top of
each node.
Among the novel Network Functions (NFs), the Network
Data Analytics Function (NWDAF) [35] will assume a more
prominent role within 6G networks, serving as a foundation
for distributed network intelligence. Hence, each future base
station can be equipped to host the NWDAF, offering on-
FIGURE 1. From network packets to images. From each network flow several demand data analytics to other NFs [36].
matrices can be obtained when N packets are collected. In addition, 0-padding is also
used when N packets do not arrive within the time window. The NWDAF can be exploited for intelligent threat
mitigation involving user data. It has the potential to
gather User Plane Function (UPF) data emanating from
A summary of the proposed method, capable of trans- the User Equipments (UEs) and feed this information into
forming packet flows into images, is depicted in Fig. 1. a DL system for the identification of malicious traffic.
Concerning the features, those that are deterministic or For instance, a threat identification and mitigation system
similar, and hence can hamper the generalization of the NN can be implemented at the NWDAF by identifying and
models, have been excluded, such as IP addresses and TCP automatically dropping packets marked as malicious. This
ports. A list of the exploited features with the corresponding architecture allows the direct identification of potential
preprocessing techniques is reported in Table 1. threats at the base station level, alleviating the need to
disseminate them throughout the network. This approach
B. INTEGRATION IN BASE STATIONS complies with the vision of placing security mechanisms
In this section, we detail how the proposed architecture can as close as possible to the potential sources of threats.
be implemented in a future 6G base station. Furthermore, real-time detection is pivotal in this context,
In 5G networks, RAN and CN functions are strictly given the estimated cost of service disruption [16]. Thus,
separated, due to the diverse protocols, interfaces, and a base station-level solution capable of real-time threat
management mechanisms. Consequently, achieving a unified, mitigation is of significant importance for future NextG
simplified network architecture integrating these compo- wireless networks.
nents into a converged network proved challenging for The architecture of the proposed system is illustrated
5G architectures. However, with the advent of evolving in Fig. 2. For conciseness, we report only the principal
technologies and the transition to 6G networks, there is an NFs used for this solution, i.e., UPF responsible for data
unprecedented opportunity to rethink network architectures. forwarding, routing, and quality of service (QoS) enforce-
The shift towards a converged RAN-CN architecture will ment, Session Management Function (SMF) involved in the
enable the creation of a simpler, more efficient network establishment and management of the UPF and the session
infrastructure [6]. of the UE and NWDAF.

1158 VOLUME 5, 2024


FIGURE 2. Architecture of the proposed system, reporting the main NFs used. The
NextG base station is deployed along with a local CN. The proposed technique utilizes
NN implemented inside the NWDAF to process packets acquired from the UPF.
Malicious Packets can be identified and dropped directly at the local NWDAF.

In detail, the NWDAF will perform the following tasks:


(i) Data collection: the NWDAF collects all network traffic
FIGURE 3. Residual block skipping two layers exploiting skip connections.
flows coming from the UEs connected to the base station.
(ii) Data preprocessing: for each packet, the NWDAF
extracts the features and normalizes them, as described in and benign traffic under different attack scenarios. Real
Table 1. If N packets are not collected within T seconds, mobile devices attached to the 5GTN was used to generate
then it pads the matrix with 0s. traffic.
(iii) Classification: Once the matrix is ready, the NWDAF Data is extracted from two base stations, each connected
is responsible for passing the sample to the NN, deployed to an attacker node and several benign 5G UEs. The attack
along the local CN. The NN architecture can be both user- scenarios include DoS attacks and port scans. Under DoS
defined or rely on well-known computer vision models, as attacks, the dataset contains ICMP Flood, UDP Flood, SYN
discussed in the next section. Flood, HTTP Flood, and Slowrate DoS. Under port scans,
the dataset contains SYN Scan, TCP Connect Scan, and
IV. METHODOLOGY UDP Scan. The dataset is publicly available in both pcapng
In this section, we first describe the dataset used for and csv formats. The pcapng format contains full packet
the experiments, highlighting its amenability to 6G/NextG payloads, while the csv files are a collection of statistics
wireless networks. Then, we briefly review the state-of-the- extracted for each traffic flow.
art computer vision architectures that have been exploited. Hence, in this work, we exploit the 5G-NIDD to test the
Finally, the experiments carried out are introduced and results proposed architectures. A list of the attacks included in the
are presented. dataset, with the corresponding description, is reported in
Table 2. In the experiments, the attack type ICMP flood has
A. NETWORK INTRUSION DETECTION not been considered since the number of samples for this
The accuracy and the efficiency of an Machine Learning class, once the dataset has been preprocessed into matrices,
(ML)-based cybersecurity system heavily depends on the was very low. However, 9 classes are still present since
quality of the dataset and how close the behavior of the the HTTP flood was performed using two different tools,
data is to the behavior in a real network scenario. One Slowloris and Torshammer respectively.
of the problem in AI-based security research is the lack
of a comprehensive dataset that resembles complex 5G/6G B. NEURAL NETWORK ARCHITECTURES
network behaviors. In this section, we report the computer vision models that
The majority of the datasets available online are outdated have been tested on matrices of traffic packets. In addition
for modern networks as they have been compiled before to state-of-the-art computer vision models, we have also
some critical technological evolutions, e.g., UNSW-NB designed and tested a customized-CNN, specifically aimed
15 [37], CTU-13 [38]. Other recent dataset available on the at recognizing threats.
Web, such as the CIC-DDoS2019 [27], presents limitations One of the major innovative architecture used in computer
in terms of many redundant records/high class unbalance. vision is the Residual Network (ResNet) [39]. In order to
Additionally, as mentioned in Section II, the behavior of solve the problem of the vanishing/exploding gradient, this
5G/6G networks is far from the testbeds or the simulation architecture introduces the concept of Residual Blocks. As
platforms used to create this dataset. depicted in Fig. 3, instead of simply learning F(x), the
To overcome this problem, authors in [24] recently network fits H(x) = F(x) + x, where x is an input to the
proposed 5G-NIDD, a network intrusion detection dataset residual block and output from the previous layer.
generated from a real 5G test network. The dataset is The key concept of residual blocks relies on skip connec-
collected using the 5G Test Network (5GTN) in Oulu, tions, as shown in Fig. 3, allowing smoother gradient flow
Finland. 5G-NIDD presents a combination of attack traffic and ensure that important features are carried until the final

VOLUME 5, 2024 1159


PAOLINI et al.: REAL-TIME NETWORK PACKET CLASSIFICATION EXPLOITING COMPUTER VISION ARCHITECTURES

TABLE 2. Attack types contained in the 5G-NIDD dataset [24] and corresponding description.

layers without adding computational load to the network. In Starting from residual connections, MobileNetV2 [40]
our experiments, we rely on ResNet50V2, composed of 48 exploits an inverted residual structure where the residual
convolutional layers, one max pooling layer, and one average connections are between the bottleneck layers. This model,
pooling layer. well suited to mobile devices, also exploits lightweight

1160 VOLUME 5, 2024


TABLE 3. Architectures studied in the experiments.

FIGURE 4. Two types of blocks for MobileNetV2: (a) residual block with stride=1
and (b) downsizing block with stride=2.
of the customized-CNN, mainly due to the exploitation of
FC layers at the end on the convolutional section of the
architecture.
depthwise convolutions to filter features as a source of non-
The deployment of these models at the base-station can
linearity in the intermediate layers. MobileNetV2 presents
surely increase the power and computational resource con-
two types of blocks, a residual block with stride 1 and
sumption. However, advancements in hardware acceleration
another block for downsizing with stride 2. These blocks are
(like specialized chips or GPUs) [45] and optimization
depicted in Fig. 4.
techniques [46] have significantly improved their efficiency.
Aiming at increasing computational efficiency, authors
These advancements hence can enable quicker inference
in [41] propose EfficientNet, a more systematic method
times and reduced energy consumption.
for enhancing accuracy and efficiency by scaling
depth/width/resolution of CNN models. This is in contrast
with conventional scaling methods that are based on random C. EXPERIMENTAL SETUP
approaches, demanding manual tuning and significant effort. In this section, the experimental setup is highlighted,
Specifically, the technique is based on a compound scaling describing how classification experiments have been carried
method, that relies on a constant ratio to perform a balanced out.
scaling of width, depth, and resolution. Once the raw packets have been obtained, a script
Opposed to the computational efficiency of MobileNet and to transform pcapng files into suited input matrices is
EfficientNet, DenseNet [42] connects each layer to every developed, as highlighted in Section III-A. The source code
other layer in a feed-forward fashion, resulting in a high- is publicly available at [47]. The script allows to define
demanding architecture in terms of computational resources. the maximum number of packets for each matrix, N, thus
An important milestone in the CNN architectures was the enabling experiments for varying matrix length. In the
Inception Net [43]. The main idea behind this architecture is experiments a maximum number of packets per time window
the Inception layer, a combination of layers with their output N = {10, 50, 100} has been considered. Furthermore, for
filters concatenated into a single output vector forming the DenseNet and Inception models an input resizing has been
input for the next layer. applied. In particular, for DenseNet inputs have been resized
Taking the principles of Inception to extreme, the Xception to 32 × 32, while for Inception the resizing has resulted in
architecture is introduced [44]. In Inception, 1 × 1 convolu- input matrices of 128 × 128. This is due to the fact that
tions compressed the input before applying different filters these models do not support small input matrices, resulting
to various depth spaces. Xception reverses this process, first in negative dimensions of feature maps. Specifically, the
applying filters to depth maps and then compressing the resizing has been obtained with the bilinear interpolation of
input with 1 × 1 convolutions across depth, resembling a the input matrices.
depthwise separable convolution. Concerning the time window T, it has been kept fixed to
In addition to these NN models, a customized-CNN has 10s for all experiments, due to two reasons: (i) experimen-
been specifically developed aiming at an accurate network tally validated results have shown that 10 seconds is a good
packet classification, whose structure is reported in Fig. 5. choice [32]; (ii) the proposed architecture must be amenable
The CNN is made of 3 convolutional layers, with 8, 64 to a 5G/6G implementation, thus it must perform real-time
and 128 filters respectively. 3 Fully-Connected (FC) layers detection, which cannot be achieved with a longer time
have been added, with 512, 128 and 9 units, respectively. window. Additionally, shortening the time window would
As activation function, ReLU has been adopted for all result in most of the packet matrices being 0-padded, thus
layers, except for the last one that employs the SoftMax hampering the classification performance.
to perform classification. A summary of the state-of-the-art Once the input matrices have been created, a normalization
computer vision architectures that have been studied in the and padding phase has been performed. Data normalization
experiments along with the customized-CNN, is reported is performed to scale values to a predefined uniform range.
in Table 3. We can observe a large increase in parameters This prevents larger values from overwhelming smaller ones

VOLUME 5, 2024 1161


PAOLINI et al.: REAL-TIME NETWORK PACKET CLASSIFICATION EXPLOITING COMPUTER VISION ARCHITECTURES

FIGURE 5. Customized CNN developed for the experiments.

TABLE 4. F1-score obtained with different architectures.


during the training process. A Min-Max scaling has been
adopted: the min-max values for each feature have been
searched through the entire dataset, and a rescaling has been
carried out, resulting in all values belonging to the range
[0, 1]. Then, for input matrices with less than N rows a
0-padding strategy is adopted. Furthermore, each flow has
been mapped to a specific label.
Finally, a 80 − 20% training-test split has been performed,
while keeping the original balance of the dataset. To have a
fair comparison, all the models have been trained using the
same training parameters, i.e., batch size, epochs, optimizer
and learning rate.
Since the training set is not balanced, a class weighting
technique has been adopted. Leveraging this technique, it
is possible to assign higher weights to minority classes,
allowing the model to pay more attention to their patterns
and reducing the bias towards majority classes. The weight
for each class is given by:

#samples V. RESULTS AND DISCUSSION


wj = In this section, the obtained results are reported and
#classes × #samples_j
discussed. Results for the different considered N values
where wj is the weight for class j, #samples is the total are reported in Table 4. In the results, we included the
number of samples in the dataset, #classes is the total number performance (in terms of F1-score) of the considered
of unique classes in the dataset, and #samples_j is the total architectures, i.e., both state-of-the-art computer vision
number of samples belonging to class j. architectures and the customized-CNN, and we compare
To evaluate the results of the experiments, we used the the obtained values with the best performing model for
common evaluation metrics, such as the confusion matrix and both binary and multi-class experiments of Multi Layer
the F1-score. Since the test set is kept unbalanced to resemble Perceptron (MLP), obtained by [24] and the technique
as much as possible real world data, accuracy metric can lead introduced in [25].
to skewed results and thus it is not considered. Instead, being Since the proposed system must be compliant with
the F1-score defined as the harmonic mean, or weighted real-time requirements of future wireless networks, in the
average, of precision P and recall R values: experiments we also evaluated the prediction time of the
tested architectures for the entire test set exploiting an 11th
2 Gen Intel Core i7-11700K @ 3.60GHz. Results are depicted
F1-score =
+ R1
1
P
in Fig. 6.
TP TP As reported in Table 4, the best performing model is
P= ; R= the customized-CNN for all the values of N. Notably,
TP + FP TP + FN
this model slightly outperforms also the models proposed
it accounts for instances where precision or recall values are in [24] and [25] by 0,00878 and 0,01229, respectively. When
exceptionally low, resulting in a diminished score even in increasing N, the F1-score of the model improves. When
the case of imbalanced classes. compared to both binary and multi-class MAGNETO [48],

1162 VOLUME 5, 2024


FIGURE 6. Prediction time of the tested architectures.

a work proposing the translation of network traffic into


images while maintaining the retention of semantic data
about the relationships between features, the customized-
CNN achieves a slightly higher F1-score, of 0.00765 and
0.00045 respectively.
Concerning the other tested state-of-the-art computer
vision architectures, only the ResNet and Inception models
reach an F1-score above 90% for N = 10. However,
these architectures do not increase their performance for
increasing N: indeed a drop can be observed for N =
100 for both of them, while for N = 50 ResNet keeps a
similar performance while Inception degrades. The behavior
of Xception architecture is opposite as the CNN architecture:
for increasing N, a decrease in F1-score is noticed. DenseNet
obtains the highest F1-score for N = 50, while both FIGURE 7. Confusion matrix for CNN model with N = 10.

MobileNet and EfficientNet have their best performing


scenario for N = 10. The reported results have noticeably
different behavior among architectures for varying N; for
instance, we have a decrease in F1-score for increasing N
with ResNet, Inception, and Xception, while for DenseNet
the F1-score increases for N = 50 when compared with
N = 10 and then it decreases for N = 100. These different
behaviors are due to the fact that these architectures differ in
many aspects, e.g., in terms of connections among neurons,
number of parameters. Finally, these results highlight one
important outcome: for the tasks of recognizing threats,
complex models do not work better than simpler models.
Indeed, the customized-CNN, composed of few layers,
outperforms all state-of-the-art computer vision architectures.
A look at the confusion matrix of the best performing FIGURE 8. Confusion matrix for CNN model with N = 50.
model, i.e., CNN, can give a better insight on the obtained
results. The confusion matrices for N = 10, N = 50,
and N = 100 are reported in Fig. 7, Fig. 8, and Fig. 9 small number of misclassifications can be observed. In
respectively. particular, the model misclassifies samples belonging to class
We can observe that for N = 10 the customized-CNN 6 (i.e., HTTP flood - Torshammer) as samples of class 1
incorrectly classifies as belonging to class 1 (i.e., HTTP and 2.
flood) almost 10% samples actually belonging to class 2 If N is increased, instead, as depicted in Fig. 8 and 9,
(i.e., Slowrate DoS). While for the other classes, only a the issue with class 2 is solved. Indeed, for N = 50, the

VOLUME 5, 2024 1163


PAOLINI et al.: REAL-TIME NETWORK PACKET CLASSIFICATION EXPLOITING COMPUTER VISION ARCHITECTURES

Results have shown that complex models do not perform


better than the developed CNN architecture. Indeed, for all
the considered values of N, i.e., 10, 50, and 100, the CNN
outperforms all the state-of-the-art computer vision tech-
niques, reaching F1-scores of 0.99593, 0.99860, and 0.99895
respectively. Moreover, the scalability of our approach has
been tested. Results indicate a trade-off between the scala-
bility of the CNN and the obtained performance. Indeed, for
increasing N, a non-negligible increase in prediction time is
observed. On the other hand, state-of-the-art computer vision
architectures, even when starting with very high prediction
times, scale much better. This enables the possibility to
choose the best performing model with respect to N without
FIGURE 9. Confusion matrix for CNN model with N = 100.
any trade-off.
The proposed system is just a first step on the application
of computer vision techniques to network traffic analysis.
CNN reaches an TP rate of 99.4, while for N = 100 the TP Indeed, leveraging the exploitation of convolution-based
rate increases to 100%. This confirms that for this model, models, more complex patterns can be discovered in packet
increasing the number of samples for each matrix, is helpful matrices. For instance, a proactive approach, capable of
and leads to a better generalization. identifying new types of attacks can be implemented.
However, increasing N leads to an increase in prediction Additionally, a distributed learning approach, relying on
time, as sketched in Fig. 6. Especially for the customized- federated/split learning techniques, will be considered in
CNN architecture, the prediction time goes from ≈ 4s for future works to enhance data privacy and model performance.
N = 10 to ≈ 46s for N = 100. Indeed, as reported in Finally, hardware acceleration techniques could be studied
Table 3, the customized-CNN has a substantial increase in to deploy these models on dedicated platforms, i.e., FPGA,
the number of parameters when N increases. This highlights offloading the computational workload from the gNB without
a trade-off between the classification performance and the compromising its performance.
speed at which the classification is performed for this model.
Concerning the state-of-the-art computer vision architectures,
a different behavior can be noticed. Indeed, while Inception REFERENCES
has very high prediction time even for N = 10, this [1] Y. Siriwardhana, P. Porambage, M. Liyanage, and M. Ylianttila,
architecture scales well, resulting in an increase of just 10s “AI and 6G security: Opportunities and challenges,” in Proc. Joint
between the N = 10 and N = 100 scenarios. This can Eur. Conf. Netw. Commun. 6G Summit (EuCNC/6G Summit), 2021,
pp. 616–621.
be traced back to the almost constant size and number of
[2] C. De Alwis et al., “Survey on 6G frontiers: Trends, applica-
parameters for this architecture among different values of N, tions, requirements, technologies and future research,” IEEE Open
as reported in Table 3. Similar considerations can be derived J. Commun. Soc., vol. 2, pp. 836–886, 2021.
for the other models: for instance, DenseNet shows an almost [3] W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten, “The road
towards 6G: A comprehensive survey,” IEEE Open J. Commun. Soc.,
constant prediction time for different values of N, with just vol. 2, pp. 334–366, 2021.
≈ 1s increase. Hence, for these models the prediction time [4] P. Porambage, G. Gür, D. P. M. Osorio, M. Liyanage, A. Gurtov, and
does not influence the choice of N and the N that leads to M. Ylianttila, “The roadmap to 6G security and privacy,” IEEE Open
J. Commun. Soc., vol. 2, pp. 1094–1122, 2021.
the best performing model can be freely chosen. [5] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
Applications, trends, technologies, and open research problems,” IEEE
VI. CONCLUSION Netw., vol. 34, no. 3, pp. 134–142, May/Jun. 2020.
6G/NextG networks will require intelligent threat mitigation [6] V. Ziegler, H. Viswanathan, H. Flinck, M. Hoffmann, V. Räisänen, and
K. Hätönen, “6G architecture to connect the worlds,” IEEE Access,
systems to cope with different types of malicious traffic [9] vol. 8, pp. 173508–173520, 2020.
and able to adapt to newly discovered threats. Hence, in this [7] Y. Ma, X. Chen, W. Feng, and N. Ge, “DDoS detection for 6G Internet
paper, we have explored the innovative approach of trans- of Things: Spatial-temporal trust model and new architecture,” China
Commun., vol. 19, no. 5, pp. 141–149, May 2022.
forming network traffic packets into image representations [8] B. A. Khalaf, S. A. Mostafa, A. Mustapha, M. A. Mohammed, and
and leveraging state-of-the-art computer vision architectures W. M. Abduallah, “Comprehensive review of artificial intelligence
for classification. and statistical approaches in distributed denial of service attack and
defense methods,” IEEE Access, vol. 7, pp. 51691–51713, 2019.
In the experiments, an intrusion detection problem has
[9] S.-N. Nguyen, V.-Q. Nguyen, J. Choi, and K. Kim, “Design and
been investigated with the goal of classifying normal and implementation of intrusion detection system using convolutional
malicious behaviors. We have firstly discussed how raw neural network for DoS detection,” in Proc. 2nd Int. Conf. Mach.
packets can be converted in a real-time manner to input Learn. Soft Comput., 2018, pp. 34–38.
[10] A. Suhag and A. Daniel, “Study of statistical techniques and artificial
matrices ready to be fed into state-of-the-art computer vision intelligence methods in distributed denial of service (DDOS) assault
architectures and the customized CNN. and defense,” J. Cyber Secur. Technol., vol. 7, no. 1, pp. 21–51, 2023.

1164 VOLUME 5, 2024


[11] Z. Tan, A. Jamdagni, X. He, P. Nanda, R. P. Liu, and J. Hu, [32] R. Doriguzzi-Corin, S. Millar, S. Scott-Hayward,
“Detection of denial-of-service attacks based on computer vision J. Martinez-del Rincon, and D. Siracusa, “LUCID: A practical,
techniques,” IEEE Trans. Comput., vol. 64, no. 9, pp. 2519–2533, lightweight deep learning solution for DDoS attack detection,” IEEE
Sep. 2015. Trans. Netw. Service Manag., vol. 17, no. 2, pp. 876–889, Jun. 2020.
[12] F. Hussain, S. G. Abbas, M. Husnain, U. U. Fayyaz, F. Shahzad, and [33] J. Cha et al., “RAN-CN converged user-plane for 6G cellular
G. A. Shah, “IoT DoS and DDoS attack detection using ResNet,” in networks,” in Proc. IEEE Glob. Commun. Conf. (GLOBECOM), 2022,
Proc. IEEE 23rd Int. Multitopic Conf. (INMIC), 2020, pp. 1–6. pp. 2843–2848.
[13] S. S. Kim and A. L. N. Reddy, “A study of analyzing network traffic [34] J. Choi et al., “RAN-CN converged control-plane for 6G cellular
as images in real-time,” in Proc. IEEE 24th Annu. Joint Conf. IEEE networks,” in Proc. IEEE Glob. Commun. Conf. (GLOBECOM), 2022,
Comput. Commun. Soc., 2005, pp. 2056–2067. pp. 1253–1258.
[14] G. Bendiab, S. Shiaeles, A. Alruban, and N. Kolokotronis, “IoT [35] A. Chouman, D. M. Manias, and A. Shami, “Towards supporting
malware network traffic classification using visual representation and intelligence in 5G/6G core networks: NWDAF implementation and
deep learning,” in Proc. 6th IEEE Conf. Netw. Softw. (NetSoft), 2020, initial analysis,” in Proc. Int. Wireless Commun. Mobile Comput.
pp. 444–449. (IWCMC), 2022, pp. 324–329.
[15] R. Moreira, L. F. Rodrigues, P. F. Rosa, R. L. Aguiar, and [36] S. Sevgican, M. Turan, K. Gökarslan, H. B. Yilmaz, and T. Tugcu,
F. de Oliveira Silva, “Packet vision: A convolutional neural network “Intelligent network data analytics function in 5G cellular networks
approach for network traffic classification,” in Proc. 33rd Conf. using machine learning,” J. Commun. Netw., vol. 22, no. 3,
Graph., Patterns Images (SIBGRAPI), 2020, pp. 256–263. pp. 269–280, Jun. 2020.
[16] A. B. de Neira, B. Kantarci, and M. Nogueira, “Distributed denial [37] N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set
of service attack prediction: Challenges, open issues and opportuni- for network intrusion detection systems (UNSW-NB15 network data
ties,” Comput. Netw., vol. 222, Feb. 2023, Art. no. 109553. set),” in Proc. Mil. Commun. Inf. Syst. Conf. (MilCIS), 2015, pp. 1–6.
[17] H. Moudoud, L. Khoukhi, and S. Cherkaoui, “Prediction and detection [38] S. Garcia, M. Grill, J. Stiborek, and A. Zunino, “An empirical
of FDIA and DDoS attacks in 5G enabled IoT,” IEEE Netw., vol. 35, comparison of botnet detection methods,” Comput. Secur., vol. 45,
no. 2, pp. 194–201, Mar./Apr. 2021. pp. 100–123, Sep. 2014.
[39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
[18] M. Liyanage, I. Ahmad, A. B. Abro, A. Gurtov, and M. Ylianttila,
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
A Comprehensive Guide to 5G Security. Hoboken, NJ, USA: Wiley,
2016, pp. 770–778.
2018.
[40] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
[19] S. A. A. Hakeem, H. H. Hussein, and H. Kim, “Security requirements
“MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc.
and challenges of 6G technologies and applications,” Sensors, vol. 22,
IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520.
no. 5, p. 1969, 2022.
[41] M. Tan and Q. Le, “EfficientNetv2: Smaller models and faster train-
[20] X. Yuan, C. Li, and X. Li, “DeepDefense: Identifying DDoS ing,” in Proc. 38th Int. Conf. Mach. Learn., 2021, pp. 10096–10106.
attack via deep learning,” in Proc. IEEE Int. Conf. Smart Comput. [42] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
(SMARTCOMP), 2017, pp. 1–8. connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
[21] J. Kim, Y. Shin, and E. Choi, “An intrusion detection model based Pattern Recognit., 2017, pp. 4700–4708.
on a convolutional neural network,” J. Multimedia Inf. Syst., vol. 6, [43] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
no. 4, pp. 165–172, 2019. “Rethinking the inception architecture for computer vision,” in Proc.
[22] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826.
generating a new intrusion detection dataset and intrusion traffic [44] F. Chollet, “Xception: Deep learning with depthwise separable
characterization,” ICISSp, vol. 1, pp. 108–116, Jan. 2018. convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
[23] J. Jose and D. V. Jose, “Deep learning algorithms for intrusion detec- 2017, pp. 1251–1258.
tion systems in Internet of Things using CIC-IDS 2017 dataset,” Int. [45] C. Latotzke and T. Gemmeke, “Efficiency versus accuracy: A review
J. Elect. Comput. Eng. (IJECE), vol. 13, no. 1, pp. 1134–1141, of design techniques for DNN hardware accelerators,” IEEE Access,
2023. vol. 9, pp. 9785–9799, 2021.
[24] S. Samarakoon et al., “5G-NIDD: A comprehensive network intru- [46] L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model compression
sion detection dataset generated over 5G wireless network,” 2022, and hardware acceleration for neural networks: A comprehensive
arXiv:2212.01298. survey,” Proc. IEEE, vol. 108, no. 4, pp. 485–532, Apr. 2020.
[25] G. Agrafiotis, E. Makri, A. Lalas, K. Votis, D. Tzovaras, and [47] E. Paolini. “Source code for the experiments.” 2023. [Online].
N. Tsampieris, “A deep learning-based malware traffic classifier for Available: https://github.com/emiliopaolini/5g_ddos
5G networks employing protocol-agnostic and PCAP-to-embeddings [48] A. Dunmore, A. Dunning, J. Jang-Jaccard, F. Sabrina, and J. Kwak,
techniques,” in Proc. Eur. Interdiscipl. Cybersecurity. Conf., 2023, “MAGNETO and DeepInsight: Extended image translation with
pp. 193–194. semantic relationships for classifying attack data with machine
[26] Z. Liu, “Detecting DDoS issues under 5G with ResNet,” in learning models,” Electronics, vol. 12, no. 16, p. 3463, 2023.
Proc. Int. Conf. Statist., Data Sci., Comput. Intell. (CSDSCI), 2023,
pp. 257–264.
[27] I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani,
“Developing realistic distributed denial of service (DDoS) attack
dataset and taxonomy,” in Proc. Int. Carnahan Conf. Security Technol.
(ICCST), 2019, pp. 1–8.
[28] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward
developing a systematic approach to generate benchmark datasets for
intrusion detection,” Comput. Secur., vol. 31, no. 3, pp. 357–374,
2012. EMILIO PAOLINI (Student Member, IEEE) received
[29] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. K. Chan, “Cost- the B.S. degree in computer engineering and the
based modeling for fraud and intrusion detection: Results from the M.S. degree in artificial intelligence and data
JAM project,” in Proc. DARPA Inf. Survivabil. Conf. Expo. DISCEX, engineering from the University of Pisa, Italy, in
2000, pp. 130–144. 2019 and 2021, respectively. He is currently pur-
[30] S. K. Gupta, B. Pattnaik, V. Agrawal, R. S. K. Boddu, A. Srivastava, suing the Ph.D. degree with the Scuola Superiore
and B. Hazela, “Malware detection using genetic cascaded support Sant’Anna, with a scholarship co-funded by the
vector machine classifier in Internet of Things,” in Proc. 2nd Int. Conf. National Research Council and Sma-RTy Italia
Comput. Sci., Eng. Appl. (ICCSEA), 2022, pp. 1–6. SRL.
[31] A. Mazari and H. Sahbi, “Deep temporal pyramid design for action His research focuses on the integration and
recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. acceleration of artificial intelligence technologies
(ICASSP), 2019, pp. 2077–2081. in NextG wireless networks.

VOLUME 5, 2024 1165


PAOLINI et al.: REAL-TIME NETWORK PACKET CLASSIFICATION EXPLOITING COMPUTER VISION ARCHITECTURES

LUCA VALCARENGHI (Senior Member, IEEE) has NICOLA ANDRIOLLI (Senior Member, IEEE)
been an Associate Professor with Scuola Superiore received the Laurea degree in telecommunica-
Sant’Anna, Pisa, Italy, since 2014. He has pub- tions engineering from the University of Pisa
lished more than 100 papers in international in 2002, and the Diploma and Ph.D. degrees
journals and conference proceedings and actively from Scuola Superiore Sant’Anna, Pisa, in 2003
participated in the TPC of several IEEE confer- and 2006, respectively. He was a visiting student
ences, such as GLOBECOM and ICC. He received with DTU, Copenhagen, Denmark, and a Guest
a Fulbright Research Scholar Fellowship, in 2009, Researcher with NICT, Tokyo, Japan. From 2007
and a JSPS “Invitation Fellowship Program for to 2019, he was an Assistant Professor with Scuola
Research in Japan (Long Term),” in 2013. His Superiore Sant’Anna. From 2019 to 2023, he was
research interests include optical networks design, a Researcher and then a Senior Researcher with
analysis, optimization, artificial intelligence optimization techniques, com- CNR-IEIIT. Since 2024, he has been an Associate Professor with the
munication networks reliability, fixed and mobile network integration, University of Pisa. He has a background in the design and the performance
fixed network backhauling for mobile networks, and energy efficiency in analysis of optical circuit-switched and packet-switched networks and nodes.
communications networks. He authored more than 200 publications in international journals and
conferences, contributed to one IETF RFC, and filed 11 patents. His research
interests include photonic integration technologies for telecom, datacom and
computing applications, working in the field of optical communications,
processing, and computing.
LUCA MAGGIANI received the Ph.D. degree
from Scuola Superiore Sant’Anna, Pisa, and the
Université Clermont Auvergne, Clermont Ferrand,
in 2017. He is the Co-Founder and the CEO of
SmaRTy SAS, and CEO of Sma-RTy Italia SRL.
He manages a Research and Development Team
for advanced artificial intelligence applications and
at the same time, develops computer vision appli-
cation for automotive and surveillance. During
his research activity, he has coauthored over 15
international reviews on embedded smart video
sensors, bio-inspired systems, and their processing architectures.

Open Access funding provided by ‘Scuola Superiore “S.Anna” di Studi Universitari e di Perfezionamento’ within
the CRUI CARE Agreement

1166 VOLUME 5, 2024

You might also like