Detection of Distributed Denial of Service Attacks
Detection of Distributed Denial of Service Attacks
Information Security
ISSN: 2073-607X, 2076-0930
Volume 15 Issue 03 Year 2023
Musmuharam*
Master, Computer Science Department ,BINUS Graduate Program, Computer Science,
Bina Nusantara University, Jakarta 10480, Indonesia
[email protected]
Suharjito
Doctor, Industrial Engineering Department, BINUS Graduate Program, Bina Nusantara
University, Jakarta 11480, Indonesia
[email protected]
1. Introduction
Software-defined networking (SDN) is the arrangement of functions between the control plan
and data plan; this is what distinguishes it from traditional networks. In software-defined network
(SDN) devices such as switches, routers are only used to carry out packet forwarding, where
decision makers and control logic capabilities are in the SDN controller software. SDN technology
has great advantages over traditional networks, and the development of Software-Defined
Networking (SDN) has made significant progress in meeting organizational needs for operational
efficiency. However, SDN also faces potential problems and significant challenges. Security
challenges, such as distributed denial of service attacks, are the main difficulty facing SDN [1] . A
13
Available online at: https://ijcnis.org
International Journal of Communication Networks and Information Security
new network architecture that divides the control and data layers is present in software-defined
networks. SDN provides centralized and programmable network control facilities. However, the
centralized control feature in SDN also has drawbacks, such as a single point of failure. Additionally,
during DDoS attacks, the processing and traffic capacity of SDN controllers can become overloaded
[2]. In the first quarter of 2020, Amazon Web Services (AWS) reported that there were 2.3 Tbps of
DDoS attacks. Types of DDoS attacks detected by AWS Shield on the network and Web application
layer, such as UDP reflection vectors, DNS reflection, TCP attacks, and SYN floods. The attack was
the most massive DDoS attack of 2020 against the Amazon website [3].
There are four categories of security problems in software-defined networks: forwarding device
attacks, where network traffic is disrupted due to DDoS attacks, which can result in process failure.
Threat in the control plane: the use of centralized control can result in failure of the control process.
Furthermore, there is a vulnerability in communication channels and Fake Traffic flows where
attackers can launch DDoS attacks to eliminate resources on forwarding devices or controls. DDoS
attacks have become the preferred tool for hackers due to their consistent threat to users,
organizations, and internet infrastructure [4]. Consequently, safeguarding controllers from disruptive
attacks like DDoS, capable of service disruptions, is both crucial and time-sensitive. Despite the
implementation of various methods to detect DDoS attacks, such as traditional firewall defenses,
network analysis, and protocol scrutiny aimed at recognizing unusual traffic patterns, these
approaches encounter several challenges and limitations. One major drawback is the constantly
evolving nature of DDoS attacks, as attackers continually employ new techniques and methods.
Examples of these attacks include botnets, which are used to execute DDoS attacks [5].
Machine learning (ML) refers to a subfield within the domain of Artificial Intelligence (AI) that
is devoted to enabling computers to operate without explicit programming for every conceivable
scenario. The fundamental essence of ML lies in the creation of algorithms capable of autonomous
learning through exposure to extensive datasets or inputs [6] . Processing the dataset is crucial as it
involves selecting relevant features. The goal is to improve the efficiency and accuracy of the
classification model by using only the best features. Feature selection becomes very important in the
development of classification models because it can minimize overfitting, improve model
interpretation, and save time and resources in data processing. After the best feature subset is
achieved, the training dataset will be reduced using only the relevant features so as to increase the
efficiency and accuracy of the classification model by using only the important features [7].
In this research, we concentrate on the utilization of the "DDoS attack SDN Dataset," which
consists of 104,345 records categorized into benign and attack traffic, featuring 23 distinct attributes
[8]. Our methodology revolves around the implementation of classification algorithms, specifically
RF, SVM, and KNN. We have chosen these algorithms due to their demonstrated effectiveness in
prior studies, as referenced in [2],[9], particularly in classification scenarios. The primary goal of our
study is to determine the most efficient model for detecting DDoS attacks within SDN environments.
To accomplish this objective, we conduct a series of experiments that incorporate various feature
selection techniques, namely RFE, PCA, and t-SNE. These techniques are deployed to enhance our
model's performance by identifying the most crucial features within the dataset. Additionally, we
use the SDN dataset as a benchmark to compare our proposed model against existing methodologies.
Our results reveal a notable improvement in the reliability of DDoS attack classification detection.
2. Related Works
A number of studies have already been done to assess how well machine learning works for
detecting DDoS attacks. For example, In [2] , they conducted experiments to compare the use of
feature selection methods with and without feature selection in detecting DDoS attacks. They
focused on three variants of DDoS attacks: TCP Flood Attack, UDP Flood Attack, and ICMP Flood
Attack. From their research findings, it was discovered that using the Sequential Forward Floating
Selection (SFFS) feature selection technique resulted in the highest accuracy of 98.30% using the K-
Nearest Neighbors algorithm model.
In the study conducted by [10] , a model that is suggested that uses two machine learning
techniques, Polynomial SVM and linear SVM, DDoS assaults in Software-Defined Networks (SDN)
were categorized using this. The proposed system generated using the Python code and Scapy
packet creation tool in an SDN simulation, UDP Flooding attack traffic and regular traffic are
produced. The results of the proposed system were 95% accurate in classifying flood DDoS assaults
using the Polynomial SVM algorithm.
In the study cited as reference [11], scholars investigated the application of machine learning
methods, namely Decision Trees (DT) and Support Vector Machines (SVM), for the detection of
DDoS attacks within SDN. The research involved creating datasets tailored for the SDN
environment using Mininet, supplemented by the KDD99 dataset for training and testing purposes.
These datasets were bifurcated into two categories: 'attack' (labeled as 1) and 'non-attack' (labeled as
0). Based on the experimental outcomes, it was observed that the SVM algorithm outperformed with
an accuracy rate of 85%, whereas the DT algorithm recorded an accuracy of 78%.
In the study referenced as [12], this research using utilization of dimensionality reduction
methods, PCA and t-SNE, in the context of Software-Defined Network (SDN) environments.The
primary objective was to streamline the data dimensionality to enhance the detection of DDoS
attacks. For this purpose, the "DDoS attack SDN dataset" was employed. The study aimed to
identify an optimal combination of feature representation methods with machine learning algorithms
to boost the efficiency of DDoS attack detection in SDN frameworks. The findings according to the
results of the studies, the highest accuracy was attained with the GB algorithm at 99.56% and with
XDBoots at 98.25%.
In the study referenced as [8], the dataset used in the experiments is the result of a simulation
using the SDN DDoS dataset, with the controller implemented through a Python application created
with the help of the RYU API. The total number of records in the dataset is 104,345, and each
record consists of 23 features. The dataset has two classes, where 0 represents the label for normal
or benign traffic and 1 represents the label for abnormal or malicious traffic. During the SDN
network simulation process, test data includes both legitimate TCP, UDP, and ICMP communication
as well as malicious TCP sync, UDP Flood, and ICMP traffic. The study's results highlighted the
effectiveness of a hybrid model combining Support SVM and RF, which exhibited superior
performance with an impressive accuracy rate of 98.8%.
In the research outlined in [13], an innovative architecture was introduced, integrating Intrusion
Prevention System (IPS) and Intrusion Detection System (IDS) modules directly into the SDN
controller. This setup was further enhanced with the application of various machine learning
algorithms, including J48, Random Tree, REP Tree, Random Forest, SVM, and Multilayer
Perceptron (MLP). The effectiveness of this approach was evaluated using the CIC DoS 2019
dataset, where the MLP algorithm notably achieved an accuracy rate of up to 95%.
In the study documented as [14], this researchers focused on enhancing the security of Software-
Defined Networks (SDN) against DDoS attacks through the development of machine learning
methodologies, specifically leveraging a decision tree classification model. The primary aim was to
bolster the resilience of SDN systems against intrusions by employing this model to discern attack
traffic and categorize SDN traffic into 'attack' or 'normal' classes. Additionally, the research
incorporated a genetic algorithm to refine the classification accuracy further.This process included
dataset preparation and preprocessing, followed by the optimization of hyperparameters for the
decision tree model using the genetic algorithm (GA). The proposed model, an evolutionary
decision tree (EDT), was deployed for the delineation of network traffic into normative and
assaultive categories. The outcomes of the experimental evaluations indicated that this model
exhibited a notable classification precision rate of 99.46%.
The investigation referenced in [15] unfolds in a bifurcated approach, leveraging machine
learning algorithms. Initially, the k-means algorithm is applied to distill the most salient features
during the data preprocessing phase. Subsequently, the k-Nearest Neighbors (kNN) algorithm is
employed to discern attack patterns utilizing the curated feature set in the detection phase.The
model's remarkable accuracy rate of 98.85% and recall rate of 98.47% demonstrate its ability in
precisely and consistently identifying attack flows.
In the research study referred to as [16], Experiments were carried out to compare the detection
of DDoS assaults using six machine learning models: Logistic Regression (LR), Naive Bayes (NB),
SVM, K-NN, Decision Tree (DT), and Random Forest (RF). The public dataset NSL KDD was
utilized. Model evaluation criteria included Accuracy, Precision, Recall, F1 Score, and computation
time. The final results revealed that the K-NN, DT, and RF models outperformed the competition in
the most important performance indicator (F1 Score), with 0.98.
In the research detailed in [17], the focus was on identify unusual data traffic activity in the
SDN controller. They suggest an ensemble approach that makes use of KNN, NB, SVM, and Self-
Organizing Map (SOM) among other machine learning methods. The suggested method seeks to
increase the precision and efficacy of anomaly detection in SDN systems by integrating the
advantages of these methods. The experimental results showed that the suggested model used SVM-
SOM to get a high classification accuracy of 98.12.
In the study presented as [18], researchers introduced a a machine learning-based and proxy-
based TCP Flooding Attack Detection (ML-TFAD) method is suggested. The TFAD approach uses
SYN and ACK are two proxies. While ACK defends against TCP ACK flood attacks, SYN defends
against TCP SYN flood attacks. Before they reach the intended server, SYN flood attacks are
detected by the ML-TFAD module using the C4.5 decision tree algorithm. The training of the
suggested model uses the CAIDA 2007 DDoS dataset. The ML-TFAD aids in early attack detection
before it reaches the server. The accuracy of the KNN model's outputs was 97.15%, whereas the
accuracy of the C4.5 decision tree model was 97.43%.
In the research documented as [19], a methodology combining feature extraction methods and
machine learning classification on SDN is used to detect DDoS attacks of SDN. The best features
are extracted in this procedure and used for both training and testing classifications. Several
common classification techniques are used, including Support Vector Machine (SVM), Random
Forest (RF), K-Nearest Neighbor, eXtreme Gradient Boosting (XGBoost), and Naive Bayes (NB).
The evaluate of performance is the confusion matrix. According to the experiment's findings, the
SVM classification had the highest accuracy, coming in at 99.38%.
In the research study referred to as [20], this research proposed filter-based, Fisher score-based,
wrapper-based, and f-test analysis of variance (ANOVA) feature selection techniques to identify
DDoS attacks on SDN controllers and to carry out optimization utilizing the entropy algorithm with
the Renyi joint entropy algorithm. The dataset used in this study comprises 104,345 traffic flows and
23 attributes. The normal and attack traffic class labels are used to display the TCP, UDP, and ICMP
traffic datasets. The machine learning classifiers employed in this analysis included ANN, XGBoost
(XGB), SVM, and KNN. The experimental findings indicated that the ANN model outperformed the
other classifiers, achieving an accuracy rate of 99.35%.
3. Methodology
This section describes the research procedures used by researchers to classify DDoS attacks on
SDN using machine learning RF, SVM, and KNN approaches. In Figure 1, the steps of the method
followed in developing a machine learning model. We conducted experiments without using any
feature selection techniques, and then we repeated the experiments with feature selection techniques
such as RFE, PCA, and t-SNE to choose the pertinent characteristics or minimize the data's
dimensionality. The performance of the trained models was compared and examined using
evaluation metrics like accuracy, precision, recall, and F1 score.
The dataset used to train machine learning models can include many features. Some features
have a large influence on the classification results, while others have little or no effect. Low-impact
features in the classification process can impact overall model performance. The model becomes
less accurate and less efficient at classifying data. Our goal is to select the most relevant of dataset
and influential subset of features in the classification. Because of this, we can optimize model
performance by focusing on relevant features.
3.2 Data Preprocessing
Prior to beginning the training of a machine learning model, the data must be preprocessed. The
dataset was prepared using a variety of methods, such as preprocessing procedures to clean, handle
missing values, transform data, normalize data, handle outliers to ensure it is ready for modeling,
and other pre-processing techniques. The purpose of this data pre-processing is to optimally prepare
the data to align with the analysis requirements and to quickly running program, maximize the
performance of the model that will be developed.
3.3 Recursive Feature Elimination (RFE)
RFE is a method employed in feature selection to identify the most vital subset of features,
thereby reducing data dimensionality. This technique relies on the outcomes of a specific algorithm
to determine the optimal number of features to be selected. The process of RFE involves iteratively
removing less significant features to focus on the most impactful ones. Using RFE can improve
model performance by focusing on the most informative features. The detailed procedure of RFE for
feature selection is illustrated in Figure 2.
This study's results significantly contribute to the creation of an effective classification model
focused on maximizing accuracy. The analysis of the performance, illustrated through confusion
matrices in Figure 3 and 4, visually demonstrates the efficacy of the classification models used,
particularly emphasizing the K-Nearest Neighbors (KNN) model. In this model, the number of
neighbors was fixed at 5. The application of Recursive Feature Elimination (RFE) enabled us to
pinpoint the top 5 most influential features: flows, bytecount, dur_nsec, dur, and pktcount, as
depicted in Figure 5. These features were chosen for their pivotal role in boosting the accuracy and
efficiency of our model, specifically in the detection and classification of DDoS attacks.
[3] B. Nugraha and R. N. Murthy, “Deep Learning-based Slow DDoS Attack Detection in SDN-
based Networks,” 2020 IEEE Conference on Network Function Virtualization and Software
Defined Networks (NFV-SDN), Nov. 2020.
[4] S. Ejaz, Z. Iqbal, P. Azmat Shah, B. H. Bukhari, A. Ali, and F. Aadil, “ Traffic Load
Balancing Using Software Defined Networking (SDN) Controller as Virtualized Network
Function,” IEEE Access, vol. 7, pp. 46646-46658, 2019.
[5] T. A. Tuan, H. V. Long, L. H. Son, R. Kumar, I. Priyadarshini, and N. T. K. Son,
“ Performance evaluation of Botnet DDoS attack detection using machine learning, ” Evol
Intell, vol. 13, no. 2, pp. 283-294, Jun. 2020.
[6] A. Al-Nusirat, F. Hanandeh, M. Kharabsheh, M. Al-Ayyoub, and N. Al-Dhufairi, “Dynamic
Detection of Software Defects Using Supervised Learning Techniques,” nternational journal
of communication networks and information security, vol. 11, no. 1, Apr. 2022.
[7] M. T. Kurniawan, S. Yazid, and Y. G. Sucahyo, “Comparison of Feature Selection Methods
for DDoS Attacks on Software Defined Networks using Filter-Based, Wrapper-Based
and Embedded-Based.” [Online]. Available: www.joiv.org/index.php/joiv
[8] N. Ahuja, G. Singal, D. Mukhopadhyay, and N. Kumar, “Automated DDOS attack detection
in software defined networking,” Journal of Network and Computer Applications, vol. 187,
Aug. 2021, doi: 10.1016/j.jnca.2021.103108.
[9] H. Polat, O. Polat, and A. Cetin, “ Detecting DDoS attacks in software-defined networks
through feature selection methods and machine learning models,” Sustainability, vol. 12, no.
3, Feb. 2020, doi: 10.3390/su12031035.
[10]Z. O. and,Khin. C. of C. Kyaw, C. Electrical Engineering/Electronics, IEEE Thailand Section,
and Institute of Electrical and Electronics Engineers, The 17th International Conference on
Electrical Engineering/Electronics, Computer, Telecommunications and Information
Technology . 2020, doi : 10.1109/ECTI-CON49241.2020.9158230.
[11]K. M. Sudar, M. Beulah, P. Deepalakshmi, P. Nagaraj, and P. Chinnasamy, “ Detection of
Distributed Denial of Service Attacks in SDN using Machine learning techniques,” in 2021
International Conference on Computer Communication and Informatics, ICCCI 2021,
Institute of Electrical and Electronics Engineers Inc., Jan. 2021. doi:
10.1109/ICCCI50826.2021.9402517.
[12]M. A. Setitra, I. Benkhaddra, Z. El Abidine Bensalem, and M. Fan, “Feature Modeling and
Dimensionality Reduction to Improve ML-Based DDOS Detection Systems in SDN
Environment, ” 2022 19th International Computer Conference on Wavelet Active Media
Technology and Information Processing, ICCWAMTIP. 2022. doi:
10.1109/ICCWAMTIP56608.2022.10016507.
[13]J. A. Perez-Diaz, I. A. Valdovinos, K. K. R. Choo, and D. Zhu, “ A Flexible SDN-Based
Architecture for Identifying and Mitigating Low-Rate DDoS Attacks Using Machine
Learning, ” IEEE Access, vol. 8, pp. 155859-155872, 2020, doi:
10.1109/ACCESS.2020.3019330.
[14]H. Kamel and M. Z. Abdullah, “Distributed denial of service attacks detection for software
defined networks based on evolutionary decision tree model, ” Bulletin of Electrical
Engineering and Informatics, vol. 11, no. 4, pp. 2322-2330, Aug. 2022, doi:
10.11591/eei.v11i4.3835.
[15]L. Tan, Y. Pan, J. Wu, J. Zhou, H. Jiang, and Y. Deng, “A New Framework for DDoS Attack
Detection and Defense in SDN Environment,” IEEE Access, vol. 8, pp. 161908-161919, 2020,
doi: 10.1109/ACCESS.2020.3021435.
[16]B. Mondal, C. Koner, M. Chakraborty, and S. Gupta, “Detection and Investigation of DDoS
Attacks in Network Traffic using Machine Learning Algorithms, ” International Journal of
Innovative Technology and Exploring Engineering, vol. 11, no. 6, pp. 1-6, May 2022, doi:
10.35940/ijitee.F9862.0511622.
[17]V. Deepa, K. M. Sudar, and P. Deepalakshmi, “ Design of Ensemble Learning Methods for
DDoS Detection in SDN Environment,” 2019 International Conference on Vision Towards