Topic: A Comparative Examination of Boosting Algorithms in Network Security Utilizing
XGBoost and Adaboost.
OLUGBEBI Muyiwa2
Ladoke Akintola University of Technology, Ogbomoso, Nigeria,
Department of Mechanical Engineering, College of Engineering,
E-Mail:
[email protected] 2
Ladoke Akintola University of Technology, Ogbomoso, Nigeria, Department of Mechanical
Engineering, College of Engineering, E-Mail:
[email protected] Abstract
The increasing complexity and frequency of cyber-attacks need robust Intrusion Detection
Systems (IDS) capable of identifying both known and unforeseen threats. This study examined
two prominent ensemble learning algorithms, AdaBoost and XGBoost, using the NSL-KDD
dataset. The research used the CRISpen-DM framework, including data preparation, Chi-Square
approach for feature selection, model training, and performance evaluation based on accuracy,
sensitivity, specificity, precision, and error rate. The findings indicate that XGBoost enhances
IDS performance more significantly than AdaBoost across the majority of evaluation metrics.
Keywords: Intrusion Detection System, Ensemble Learning, AdaBoost, XGBoost, NSL-KDD,
Cybersecurity, Machine Learning, Feature Selection, Chi-Square Test, CRISP-DM,
Classification, Network Security
1. Introduction
The rapid progress of Internet traffic, especially in wide area networks, presents network
operators with difficulties in the comprehensive verification of every packet or flow that passes
through their networks. The complexity of computer attack methods has grown significantly,
resulting in a rapid increase in their capacity to subvert current anomaly detection systems [3].
Due to the internet's fast growth, network traffic security is becoming a major problem for
computer networks. Network assaults increase daily. The most-publicized network traffic assault
is an invasion. An intrusion detection system detects intrusions and protects information security.
Traditional intrusion detection methods fail to solve the challenge [15]. They look for suspicious
network activity and can find security problems. False positives occur often and notify when the
network is OK or does not identify malicious activity.
Intrusion detection systems (IDS) monitor and identify computer breaches, making them
a frequent information system security measure. Many conventional systems and applications are
designed without considering the security needs of the environment, which is why intrusion
detection is so crucial to the defensive system. A secure isolated system linked to the Internet
may lose security. IDS monitor exploits of program architecture security weaknesses. Another
issue is that information security and software engineering approaches allow cyber attackers to
exploit early system design errors to compromise system security. IDSs may be characterized by
their detection methods as anomaly or abuse [15]. Anomaly detection seeks deviations from a
profile's regular behavior to uncover dangerous actions. Though better at recognizing new attack
types, these IDSs failed to minimize false positives (FP) [10]. However, abuse detection may
distinguish hazardous from legitimate events using established patterns [11]. While these IDSs
can recognize known attacks, they cannot discriminate between unknown attacks and familiar
attacks.
Machine learning has demonstrated tremendous potential in several domains, including intrusion
detection systems. Learning-based approaches have the potential to enhance security applications
by enabling the training of models to effectively manage large volumes of dynamic and more
intricate data. The utilization of learning models in conjunction with firewalls can enhance
operational efficiency. An accurately trained model that encompasses several attack types would
enhance the identification of anomalies with a fair cost and complexity [4]. Recent years have
seen massive study on Intrusion Detection Systems (IDS) models. Such efforts must classify
attacks and be applicable to multi-cloud environments. Traditionally, the focus of most studies in
this field has been on anomaly detection rather than the classification of attacks [9]. As assailants
advance, novel risks and vulnerabilities arise, heightening the danger of damage to key
infrastructure. In response, intrusion detection systems (IDS) have been enhanced to detect and
address emerging assaults. Machine learning technology, utilized for abuse and anomaly
detection models, has been advanced to enhance the efficacy and detection rate of Intrusion
Detection Systems [8]. Intrusion Detection Systems (IDS) examine network data at essential
nodes to distinguish between malicious and benign traffic types. The anomaly-based Network
Intrusion Detection System, utilizing machine learning, is currently widespread. Numerous
categorization techniques discern between normal and anomalous traffic. The approach is
deemed unviable due to its inadequate detection performance. Multiple factors exist. Examples
include traffic diversity and the inefficiency of feature selection algorithms [13]. The research
looked into enhancing intrusion detection efficacy and detection duration using a thorough
testing environment. It offers a robust system that identifies cyberattacks with exceptional
accuracy and efficiency by employing KBest for feature selection and boosting approaches for
intrusion detection and prediction in internet network datasets. These principles are essential for
efficient intrusion detection systems.
2. Literature Review
The area of IDS (Intrusion Detection Systems) is not new subject of study. A multitude of
studies have been undertaken. An investigation conducted by [14] focused on enhancing
anomaly detection by applying the grey wolf optimization (GWO) algorithm. The empirical
results indicate that the employed methodology exhibits a significant level of accuracy in
detecting Denial of Service (DoS) attacks (93.64%) and investigate assaults (91.01%), Reverse
R2L attacks (57.72%), and User-to-Route (U2R) attacks (53.7%). Nevertheless, the study did not
investigate other types of invasions.
In their study, [7] employed the Gated Recurrent Unit (GRU) framework to examine the 99Cup
KDD dataset. To implement a function-based filter, the author utilized the Random Forest
Classifier. Minimizing the loss function yields the ideal result. The bidirectional long-term
periodic memory neural network (BLSTMRNN) was introduced by [1]. In order to convert the
category functions into numerical values, the authors utilized function normalization. An
inherent limitation of the proposed model is the absence of an analysis of the energy efficiency
of the IDS.
[12] created a decision tree-based IDS feature selection and grouping method. This research
revealed 95.03% detection accuracy, unlike [2]. Vector machine algorithm enhances precision,
recall, specificity, and accuracy, say [2].
The study undertaken by [1] offers a thorough examination of the characteristics of Intrusion
Detection Systems in Wireless Sensor Networks, encompassing elements such as hardware
design and detection techniques. Furthermore, the authors pinpoint noteworthy research domains
that have not been adequately explored. These limitations encompass the restricted accessibility
of data for authentication (e.g., via simulation or deployment), inadequate optimization of power
consumption, and the lack of universal attack detection.
Despite the extensive use of ensemble classifiers in Intrusion Detection Systems (IDS),
limited research provides a direct and thorough comparison of AdaBoost and XGBoost using a
unified pipeline (feature selection, normalization, and evaluation) on the NSL-KDD dataset
inside the CRISP-DM framework. Furthermore, the majority of research focus only on random
feature sets or classifier accuracy, hence neglecting a reproducible and statistically robust
preprocessing and evaluation methodology.
3. Methodology
The design phase sought to address an issue delineated by the proposed system's specifications.
The experiment assessed the Intrusion Detection system utilizing KBest and a comparative
boosting algorithm to distinguish attack and non-attack categories. Two boosting algorithms,
Adaboost and XGBoost, were employed for a comparative assessment. The system was designed
utilizing the CRISP-DM paradigm, a versatile and adaptable approach for data mining activities.
The procedure encompassed data acquisition from the NSL KDD CUP dataset, data cleansing
and filtration, data normalization, application of the KBest filter algorithm, and training the
diminished dataset utilizing Adaboost and XGboost. The training set utilized 75% of the data to
discern the model's patterns, whereas the testing set assessed the efficacy of each classifier. The
results were subsequently subjected to statistical evaluation and comparison.
3.1 Data set acquisition
A total of 25,192 instances will be filtered and collected from the NSL KDD Cup Dataset,
encompassing four primary classes of attacks and the non-attack class, which is normal, for the
experimental setup of this project, along with a total of 41 attributes.
Attacks on Datasets (Class Label)
The dataset is divided into non-attack and sub-attack categories as follows:
Table 3.1Attack Labelling
Dataset Attribute’s
The dataset is grouped under the attributes shown in the tables below:
Table 3.2: Basic features of individual TCP connections.
Table 3.3: Content features within a connection suggested by domain knowledge.
Table 3.4: Traffic features computed using a two-second time window.
3.2 Data-set Filtering
Data preprocessing for modelling incorporates the elimination of errors and outliers. At this
stage, any inconsistent data was filtered to enhance dataset operation and facilitate result
optimization. Furthermore, categorical variables were transformed into numerical variables.
3.2.1 Data Normalization
This research uses the Min-Max Method, a normalization technique used in machine learning
data preparation, to linearly rescale each feature to the [0,1] range, similar to z-score
normalization. The method adjusts numeric column values without altering value ranges or
compromising information, ensuring uniformity in data preparation.
m = (x -xmin) / (xmax -xmin) (3.1)
Where:
m is our new value
x is the original cell value
xmin is the minimum value of the column
xmax is the maximum value of the column
3.3 Feature Selection (Kbest Method (Chi-Square))
The chi-square test is a statistical method used to assess the independence of two variables. It
bears resemblance to the coefficient of determination, R². The chi-square test is exclusively
applicable to category or nominal data. The chi-square statistic was computed for each feature
variable in relation to the target variable, revealing a link between the variables and the target. If
the target variable is independent of the feature variable, the feature variable was eliminated. If
they are dependent, the feature variable has been picked.
Using the formula of Chi Square test:
1
x=
2
d
∑ ¿
(3.2)
( o k − Ek )
k=¿ ¿
1Ek
3.5 Feature Classification
The classification framework for anomaly detection in computer networks is constructed using
an optimal feature selection method and a comparative deep learning strategy.
3.5.1 Adaboost Algorithm
AdaBoost is a technique that combines weak classifiers to create a robust classifier, focusing on
misclassified instances by adjusting their weights in the training dataset.
Given:
1. A training dataset {(x1,y1),(x2,y2),…,(xN,yN)} where:
o xi represents the i-th sample's feature vector.
o yi is the i-th sample's class label, usually yi ∈ {−1, +1}.
2. A weak learning algorithm that generates a classifier ht (x) for each round t with accuracy
better than random guessing. (3.3)
Steps:
1. Initialize Weights:
o Set the weight for each training sample initially to be equal:
(1) 1
Wi = , ∀ i=1 ,2 … … … N (3.4)
N
where N is the sum of all sample sizes.
2. For each boosting round t = 1, 2,…,T: (3.5)
o Train a Weak Classifier:
Train the weak classifier ht (x) using the current weights w(ti ) on the
training data. (3.6)
o Calculate the Weighted Error ϵt of ht (x)
N
ϵ t =∑ w(ti ) ⋅1 ( ht ( x 1 ) ≠ y i )
i=1
(3.7)
where 1 ( ht ( x1 ) ≠ y i )is an indicator function that equals 1 if ht ( x1 ) ≠ y i (i.e., the
classifier misclassifies xi), and 0 otherwise.
o Calculate the Classifier Weight αt:
1
α t = ⅈn
2 ϵt ( )
1−ϵ t
(3.8)
This weight reflects the importance of classifier ht; lower errors result in higher
values of αt.
o Update Weights for Each Sample:
Update the weights to increase the importance of misclassified samples:
(t +1) (t)
wi =wi ⋅exp ¿ (3.9)
Normalize the weights to ensure their total equals 1:
(t +1)
( t +1) wi
W i = N
(3.10)
∑ w(tj +1)
j=1
This normalization ensures that the weights remain a valid probability
distribution.
3. Final Strong Classifier:
o The ultimate classifier H(x) is the vote of the T weak classifiers weighted by their
number of votes:
( )
T
H ( x )=sin ∑ α t ⋅ ht ( x )
t =1
(3.11)
o Here, H ( x ) takes the sign of the weighted sum of the weak classifiers' outputs.
3.5.2 XGBoost Algorithm
Extreme Gradient Boosting (XGBoost) is a fast and accurate gradient boosting approach.
XGBoost enhances a tailored objective function and employs sophisticated regularization
techniques to prevent overfitting. The mathematical formulation of the XGBoost algorithm is
presented as follows:
Objective Function
XGBoost minimizes the following objective function for T trees:
n T
L=∑ l ( y i , ^y(T
i )+∑ Ω ( f t )
)
i=1 t =1
(3.12)
Where:
n represents the quantity of samples
y i represent true label for the i-th sample
^y (T
i
)
is the prediction for i-th sample after T trees.
l ( y i , ^y (T)
i ) is the loss function (e.g, Mean Squared Error for regression or logarithmic loss
for classification)
Ω ( f t )sets the level of difficulty for each tree ft by its regularization term.
Regularization Term
let’s its regularization term decide how hard each tree ft should be:
J
1
Ω ( F t ) =γ T + λ ∑ ω 2j
2 J=1
(3.13)
where:
T The quantity of leaves on the tree
J represents the quantity of leaf nodes in the tree.
wj is the weight of the j-th leaf.
γ It is a regularization parameter that governs the complexity penalty.
λ is a regularization parameter that mitigates overfitting by imposing penalties on
substantial leaf weights
Additive Model
XGBoost constructs trees in a sequential manner, with each new tree ft included into the old
model to enhance predictions:
^y (it ) = ^y (ti −1) + ft (xi)
(3.14)
where:
^y (ti −1) is the prediction for sample i after t−1 trees.
ft (xi) represents the t-th tree's forecast for the i-th sample.
Taylor Expansion for Approximate Loss
To optimize the objective, XGBoost employs a second-order Taylor expansion of the loss
function in the vicinity of ^y (ti −1):
[ ]
n
1
L(t )=∑ l h ( y i , ^y (ti −1) ) + gi f t ( xi ) + hi F t ( x i ) +Ω ( f t )
2
i=1 2
(3.15)
where:
∂ l ( y i , ^y (t−1)
i )
gi = ( t−1 )
❑ It is the initial derivative (gradient) of the loss about the prediction.
∂ ^y i
∂ 2 l ( y i , ^y (t−1)
i )
hi = ( )
2 is the second derivative (Hessian) of the loss.
∂ ^y it −1
Tree Structure Score
To build each tree, XGBoost calculates the score (quality) of a tree structure based on splitting it
into regions Rj:
( ∑ gi )
2
J
−1 i ∈R j
(t )
L = ∑ +γ J
2 j=1 ∑ h i+ λ
i ∈R j
(3.16)
where:
Rj is the set of samples in leaf j.
∑ gi is the addition of gradients in leaf j.
i ∈R j
∑ hi is the addition of Hessians in leaf j.
i ∈R j
The algorithm chooses splits that maximize this score to grow the tree.
Final Prediction
After all T trees are built, the final prediction for a sample x is:
T
^y ( x )=∑ f t ( x ) (3.17)
t=1
Each f t ( x ) represents the prediction of the t-th tree.
3. 6 Performance Evaluation
This study conducts a comprehensive analysis of the foremost and most efficacious boosting
algorithms, specifically Adaboost and XGBoost. The evaluation metric encapsulates the
outcomes of training the classifier. Testing was conducted using TP, FP, TN, FN, Classification
Accuracy, Sensitivity, Err. Rate, and Specificity.
3.6.1 Classification Accuracy
Accuracy = Number of accurate predictions /Total number of predictions
TP+TN
Classification Accuracy (%) ¿ ×100 (3.18)
TP+ TN + FP+ FN
3.6.2 Error Rate (EER)
The error rate is usually the ratio of erroneous test set predictions to overall predictions.
Incorrect Predictions
Error Rate = (3.19)
Total Predictions
3.6.3 Sensitivity
Sensitivity is the proportion of real positive entries out of all the actual positive entries in the
dataset.
TP
Sensitivity =
TP+ FN
(3.20)
3.6.4 Specificity
In data, specificity is defined as the ratio of genuine negatives to the total number of negatives.
TN
Specificity =
TN + FP
(3.21)
3.6.5 Precision
Precision is a statistical metric that measures the proportion of true positives forecasts to the total
number of positive predictions, including both true positives and false positives.
TP
Precision = (3.22)
TP+ FP
4. Results Analysis
Presented here are the results received from each respective area.
4.1 Explorative Data Analysis
Exploratory data analysis helps understand a dataset better and simplifies normalization and
filtering methods for optimal classification performance. Each element was analysed to generate
summary statistics including mean, standard deviation, count, minimum, and maximum values,
as well as interquartile ranges and bounds. Descriptive statistics are presented in figure 4.1.
.
Figure 4.1 Summary Statistic
4.2 Preparation of data for the multiclass dataset
The dataset is categorized into 0-4 categories, with non-attack classes under 0, and attack classes
ranging from 1 to 4. A pie chart in Figure 4.8 illustrates the distribution of these groups. The data
is organized into four categories.
Figure 4.2 Normal and atypical class pie charts.
4.3 Feature Engineering Selection.
Feature engineering is a systematic method for selecting key features with high prediction
accuracy for response variables. Both datasets, including numerous classes, underwent feature
engineering, with the Selected Features Index displaying feature names, scores, and index
positions.
Table 4.1 Multi Class Selected Features by Chi-Square
Figure 4.3 Chi-Square Selected Fetaures
4.4 Data Scaling
The StandardScaler command modifies the statistical distribution of values in a dataset to
achieve a mean of 0 and a standard deviation of 1. This is an example of a scaled incursion
dataset.
Figure 4.4 Noemalized dataset.
4.5 Model Creation
To ensure a balanced distribution of 75% training data and 25% testing data, the data was
partitioned into a training set and a testing set during the model's construction. The test size of
0.25 denotes the interval in which partitioning occurs.
4.6 Experimental Results Evaluation.
The performance of two boosting algorithms, XGBoost and AdaBoost, on the multi-class dataset
is systematically evaluated in Table 4.18. The model's accuracy, sensitivity, specificity and error
rate were calculated to ascertain the classification rate evaluation parameters. Statistical
Inference and Performance Evaluation.
Table 4.1 Parameters for Comparative Evaluation of the XGBoost and Adaboost Algorithms
(Multi-Class)
Figure 4.5 Comparative Classification Accuracy
4.6.2 Sensitivity and Specificity
The XGBoost Algorithm, when used in a multi-class dataset, achieved a maximum positive rate
of 0.9986 and a maximum negative rate of 0.9995, surpassing Adaboost's rates of 0.9448 and
0.6745. The optimal sensitivity and specificity are determined by dividing the total number of
true positive predictions by the total number of accurate positive forecasts.
Figure 4.6 Comparative Sensitivity and Specificity Chart
4.6.3 Error rate
The Adaboost algorithm, with recorded values of 0.0007 and 0.1098 in the multi-class dataset,
exhibits the lowest error rate for a classifier, indicating its effectiveness in selecting random
outcomes during classification.
Figure 4.7 Comparative Error Rate
4.6.4 Precision and F1 Score
The F1 Score, determined by the harmonic mean of precision and recall, balances performance
across false positives and false negatives. Precision evaluates a model's accuracy in identifying
positive occurrences while minimizing misclassification of negative examples. Xgboost showed
superior performance in accurately recognizing positive cases and reducing incorrect
classification of negative examples.
Figure 4.8 Comparative Error Rate
System Flow Chart
The system framework shows the sequence of the processes of the developed model.
Fig 3.2 System Flow Chart
5. Conclusion
This study seeks to improve the effectiveness of intrusion detection systems (IDS) in
safeguarding network security by utilizing machine learning techniques. The increasing
frequency of network attacks requires fortified defenses and the use of advanced strategies. The
CRISP-DM framework is employed, utilizing a multi-class dataset derived from the NSL KDD
Cup Dataset. The KBest method is used to select features and Adaboost and XGBoost are used
for boosting. Empirical evidence demonstrates that Adaboost surpasses XGBoost in both
accuracy and sensitivity, highlighting the importance of algorithmic choice in achieving high
detection rates. The study suggests further exploration and refinement of intrusion detection
techniques, incorporating diverse machine learning models and datasets. Real-world
implementations and considerations of scalability and adaptability are crucial for translating
findings into practical solutions. The integration of sophisticated machine learning algorithms,
meticulous feature selection, and resilient techniques presents a potential approach for
strengthening network defenses and safeguarding the integrity and security of essential systems.
References
1. Abduvaliyev A., Pathan A.S.K., Zhou J., Roman R., Wong W.C. (2021). On the vital
areas of intrusion detection systems in wireless sensor networks. IEEE Commun. Surv.
Tutor.;15:1223–1237. doi: 10.1109/SURV.2012.121912.00006
2. E. M. Roopa Devi and R. C. Suganthe, (2019)“Improved Relevance Vector Machine
(IRVM) classifier for Intrusion Detection System,” Soft Comput., vol. 23, no. 19, pp.
9111–9119. doi: 10.1007/s00500-018-3621-z.
3. Gill, P. and Corner, E. (2015). “Lone-Actor Terrorist Use of the Internet and Behavioural
Correlates”, in Terrorism Online: Politics, Law, Technology and Unconventional
Violence, L. Jarvis, S. Macdonald and T. Chen (eds.). London: Routledge.
4. Jin Kim, Nara Shin, Jo, S. Y., & Sang Hyun Kim. (2017). Method of intrusion detection
using deep neural network. 2017 IEEE International Conference on Big Data and Smart
Computing (BigComp). doi:10.1109/bigcomp.2017.7881684
5. Joldzic, O., Djuric, Z., Vuletic, P., (2016). A transparent and scalable anomaly-based dos
detection method. Computer Networks 104, 27– 42. doi:10.1016/j.comnet.2016.05.004.
6. Li, L., Das, S., Hansman, R. J., Palacios, R., & Srivastava, A. N. (2015). Analysis of
flight data using clustering techniques for detecting abnormal operations. Journal of
Aerospace Information Systems, 12(9), 587-598. https://doi.org/10.2514/1.I010329
7. M. K. Putchala, (2017). “Deep learning approach for intrusion detection system (ids) in
the internet of things (iot) network using gated recurrent neural networks (gru),” .
8. Mishra, P., Varadharajan, V., Tupakula, U., Pilli, E.S., (2018). A detailed investigation
and analysis of using machine learning techniques for intrusion detection. IEEE
Communications Surveys & Tutorials
9. Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning
classifers for mobile malware detection. Soft Comput 20:343–357.
https://doi.org/10.1007/s00500-014-1511-6
10. Papamartzivanos, D., Mármol, F.G., Kambourakis, G., (2018). Dendron: Genetic trees
driven rule induction for network intrusion detection systems. Future Generation
Computer Systems 79, 558–574. doi:10.1016/j.future.2017.09.056.
11. Peng, J., Choo, K.-K. R., & Ashman, H. (2016). User profiling in intrusion detection: A
review. Journal of Network and Computer Applications, 72, 14–
27. doi:10.1016/j.jnca.2016.06.012
12. S. Mohammadi, H. Mirvaziri, M. Ghazizadeh-Ahsaee, and H. Karimipour, “Cyber
intrusion detection by combined feature selection algorithm,” J. Inf. Secur. Appl., vol. 44,
pp. 80–88, 2019, doi: 10.1016/j.jisa.2018.11.007.
13. Saini O, Sharma S. (2020). A review on dimension reduction techniques in data mining.
Comput Eng Intell Syst;9(1):7–14.
14. T. A. Alamiedy, M. Anbar, Z. N. M. Alqattan, and Q. M. Alzubi, “Anomaly-based
intrusion detection system using multi-objective grey wolf optimisation algorithm,” J.
Ambient Intell. Humaniz. Comput., vol. 11, 2019, doi: 10.1007/s12652-019-01569-8
15. Youssef. Gu, K. Li, Z. Guo and Emam. (2015). "Semi-Supervised K-Means DDoS
Detection Method Using Hybrid Feature Selection Algorithm," in IEEE Access, vol. 7,
pp. 64351-64365, 2019, doi: 10.1109/ACCESS.2019.2917532.