Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
35 views6 pages

Obfuscated Memory Malware Detection

p

Uploaded by

varerik95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

Obfuscated Memory Malware Detection

p

Uploaded by

varerik95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Obfuscated Memory Malware Detection

Sharmila S P [1,2], Aruna Tiwari [1], Narendra S Chaudhari[1]


[1]
Computer Science and Engineering, Indian Institute of Technology Indore, Madhya Pradesh, India
{phd2201101012, artiwari, nsc}@iiti.ac.in
[2]
Information Science and Engineering, Siddaganga Institute of Technology, Tumakuru, Karnataka, India
{[email protected]}

Abstract—Providing security for information is highly critical and evasive malware, predicting malware before the attack,
in the current era with devices enabled with smart technology, identifying the path of malware propagation, mitigating the
where assuming a day without the internet is highly impossible. flow of malware and recovery from malware infection and
Fast internet at a cheaper price, not only made communication many.
easy for legitimate users but also for cybercriminals to induce
attacks in various dimensions to breach privacy and security.
Cybercriminals gain illegal access and breach the privacy of users Our contribution to this paper:
to harm them in multiple ways. Malware is one such tool used by 1. Implementing a multi-class classification model for
hackers to execute their malicious intent. Development in AI identifying multiple obfuscated malwares to choose the
technology is utilized by malware developers to cause social harm. proper course of action for its mitigation.
In this work, we intend to show how Artificial Intelligence and 2. A Random Forest Classifier has been proven to
Machine learning can be used to detect and mitigate these cyber- demonstrate impressive accuracy for selecting important
attacks induced by malware in specific obfuscated malware. We features and for both binary and multi-class classification.
conducted experiments with memory feature engineering on 3. Compare the proposed model with the existing dilated
memory analysis of malware samples. Binary classification can
identify whether a given sample is malware or not, but identifying
CNN model intended for detecting obfuscated malware.
the type of malware will only guide what next step to be taken for
that malware, to stop it from proceeding with its further action. Further, the rest of the paper is organized as mentioned here.
Hence, we propose a multi-class classification model to detect the In section II brief introduction to obfuscated malware is
three types of obfuscated malware with an accuracy of 89.07% provided, in section III background and related work is
using the Classic Random Forest algorithm. To the best of our discussed, and in section IV and V proposed methodology
knowledge, there is very little amount of work done in classifying and implementation details are presented. Discussing the
multiple obfuscated malware by a single model. We also compared results in section VI we provide concluding remarks in
our model with a few state-of-the-art models and found it section VII followed by references.
comparatively better.
II. OBFUSCATED MALWARE
Keywords—Memory feature engineering, Random Forest,
Cyber-Attack, Memory Analysis, Multi-class Classification Obfuscation is a software engineering strategy to conceal
software from its internal structure and functionality.
Malware developers are using these techniques to alter the
I. INTRODUCTION (HEADING 1) malware features and behavior such that it can be hidden from
malware detection systems.
Rapid development in digital technology and the replacement
of paper documents with e-documents have caused a hike in
the number of cyber attacks[1] every day. Hackers are A. Obfuscation methods:
perturbing the daily activities of users with multiple types of Ilsun You [4] classifies malware into encrypted,
attacks which begin from observing the user activities to polymorphic, oligomorphic, and metamorphic malware.
intervening in the network and disrupting the entire working Encrypted malware is associated with an encryptor and
of the system. decryptor. It evades detection by encrypting with different
keys during infection, thereby generating different signatures
Malware is a significant cyber-attack[2]. The word malware and confusing the antivirus scanner or any ML-based
is a hyponym derived from ‘mal’ for ‘malicious’ intent and detector. Meanwhile, the decryptor recovers the main body of
‘ware’ is for ‘software’. So, it is a software or program malware when the infected file is run. As there is a feasibility
written with malicious intent. Although antiviruses[3] are of mutation of decryptor from one generation to another
built with the ability to detect and remove malware with the generation, oligomorphic malware with multiple decryptors
existing signatures of malware. However, they lack the were devised. Further to complicate the detection
accuracy in detecting new and unknown malware whose polymorphic malware with an infinite number of decryptors
signatures are not found in the antivirus database. These were coded. Dead code insertion and other techniques like the
unknown malwares are intelligently coded to change their usage of mutation engines were employed to generate such
form and behavior thereby they are undetected by the malware. Advanced malware is metamorphic with auto
antiviruses and sandboxes to deceive the detection. mutation techniques to evolve themselves as and when it is
Obfuscated malware is metamorphic malware that can hide propagating in a network.
itself from detection. Since 1980, enough research has been According to S. Schrittwieser et al., there are three software
done in the field of malware detection, however, still there obfuscation techniques data obfuscation, static code
are major challenges in this field detecting unknown rewriting, dynamic code rewriting[5] in the context of
malware, optimizing the detection rate, detecting obfuscated protecting the software. In data obfuscation the program data

Presented in 12th IEEE International Conference on Cloud Computing for Emerging Markets-2023
is split or merged into several blocks, thus preventing the first and second hidden layers which subsequently doubled in
attacker from evading the software. Dynamic code rewriting the third layer to 128. With a batch size of 1024 and 100
makes use of packers and encryptors to alter the program epochs, this model identified malware with 99.72% accuracy.
behavior during runtime. SubVirt [6] is one such tool used for As the number of epochs and training time increased the loss
code virtualization. Unlike dynamic code rewriting, static was decreased and almost tending to zero. There was no
code rewriting transforms code during compilation with attempt made to classify multiple families of malware in this
semantic replacement and substitutions. Injecting dead codes work.
and rearranging the basic blocks of control flow would
mislead the reverse engineering of software. With a similar dataset Random forest algorithm is used to
detect obfuscated malware in the cloud environment[12]
Lichen Jia [7] identifies three types of obfuscation methods which is preceded by the application of nature-inspired
used by malware viz. binary, source code level, and packed optimization techniques for feature selection, Viz. Cuckoo
obfuscation methods. Adversarial examples were developed Search Algorithm(CSA), wrapper-based Binary Bat
using these obfuscation methods to evaluate learning-based Algorithm(BBA), Particle Swarm Optimization(PSO), and
malware detection systems (LB-MDS). With the frequency Mayfly algorithm(MA). Although these algorithms decrease
of each obfuscation method used in its corresponding the selection of feature set, however, improve the
obfuscation space, there is a decrease in the accuracy of LB- classification accuracy. The model achieved an accuracy of
MDS. 99.99% with {MA, RF}, 99.91% with {PSO, KNN} and
99.10% with {PSO, SVM} for binary classification, though
B. Dataset Description
PSO is excellent for feature selection, multiple malware
Detection of malicious processes and programs is revitalized detection is not addressed.
with the application of memory engineering and forensic
analysis to capture vital characteristics and behaviors hidden III. BACKGROUND AND RELATED WORK
in obfuscated malware. Canadian Institute of Cybersecurity
A. Role of AI in generating Obfuscated Malware
from the University of New Brunswick has assimilated the
CIC MalMem 2022 dataset using Memory feature AI techniques are employed in [13] for preparing obfuscated
engineering[8]. This malware dataset is composed of features malware by inserting NOP instructions via deep
extracted through memory analysis of memory dump reinforcement learning. It is apparent that, machine learning
processing done in debug mode. Being an updated and models used in malware detection systems can be fooled by
balanced dataset, it consists of 2916 samples of benign, 986 adversarial examples. Convolutional Neural Network is
Ransomware samples, 982 Spyware and 948 Trojan Horse implemented to insert dead codes at optimal positions,
samples. Each family of malware has 5 subfamilies of data thereby the resulting executable gets a mislabel from the
samples. Being a balanced dataset, it is very useful for our machine learning classifier.
research.
Obfuscated malware generated by [14] using adversarial deep
C. Detection of Obfuscated Malware reinforcement learning, employing an efficient action control
Extreme Learning machines [9] are employed to detect strategy for generating new malware to defend against LB-
obfuscated malware using the CIC MalMem 2022 dataset. MDS. It has been experimentally proved that 67% of the
Accuracy and geometric mean of sensitivity and specificity malware generated by this model is efficient in escaping from
are the metrics used for evaluation. Authors have worked on detection. The new metamorphic malware generated by this
standard, regularized, and unbalanced ELM methods for model possesses uplifted misclassification and enhanced
binary and multiclass classification of obfuscated malware. evasion probability.
Extracting the training time and testing time based on the
number of neurons and other metrics, it is shown that Prominent API features of 11 families of malware are
accuracy increases with the number of neurons with a extracted from the Cuckoo sandbox by [15]. To represent the
maximum accuracy superior to 90% for binary classification extracted features, A feature extraction algorithm, and
but not for multiclass classification. procedures for feature reduction and representation are
proposed. KNN, RF, and DT multiclass classifiers are used
Dilated CNN model is employed in the classification of to classify 11 families of malware with a high training
obfuscated malware [10]. Its architecture consists of 4 blocks accuracy of 95.7% but testing accuracy is not highlighted.
with two convolutional layers, a dropout layer, and a batch Although it is found to be time-consuming to extract the
normalization layer. For binary classification, sigmoid dynamic features from the API call traces, overcoming
activation function and binary cross entropy loss function are which, API call sequence analysis serves as a major feature
used. For classifying multiple malware, one hot encoder and of analysis for Obfuscated malware detection.
Softmax activation function are used. Focal loss function is
B. Random Forest
applied to deal with imbalanced dataset issues. They achieved
99.92% accuracy with Adam Optimizer and for 100 epochs. Random Forest[16] is a versatile supervised machine learning
But, 81.83% accuracy in classifying multiple malware even algorithm for both classification and regression tasks. It is
with 500 epochs. powerful by growing multiple decision trees and aggregating
Another similar experiment was conducted for the detection the results of multiple decision trees for better decision-
of Obfuscated malware using an Artificial Neural making. It is successful in giving accurate and stable results
Network[11]. With three hidden layers of the neural network, for various complex problems beginning from image
activation function ReLU for the hidden layer and Softmax classification, and image segmentation, to cancer cell
for the outer layer, the number of nodes used was 64 for the detection. Also, it has the capability of adaptability to extend

Presented in 12th IEEE International Conference on Cloud Computing for Emerging Markets-2023
its application to multidimensional problems. Various In the first experiment of Binary classification, the SoTA
versions of Random Forest like Multinomial random forest, classifiers considered are Logistic Regression, Naïve Bayes
Oblique Random Forest, Random Credal Random Forests Classification, Linear SVM classification, Decision Tree, and
etc., are successfully implemented and built in the python Random Forest classifiers. Similarly, The SoTA models
libraries. considered for Multi-class Classification are Naïve Bayes
Classification, Decision Tree, Random Forest, Gradient
C. Target Multiple malware
Boosting, and K-Nearest Neighbor. With this intersection,
We intend to detect three types of obfuscated malware from SoTA models can also be studied for their application in such
our trained model viz. ransomware, trojan, and spyware. problems.
Ransomware is a type of malware that encrypts the files on
the disk to demand a ransom from the victim but without a Binary classifier Implemented Multi-class classifier Implemented
guarantee that paying the ransom will fetch the access back. Logistic Regression Naïve Bayes Classification
Problems pertaining to ransomware are growing rapidly Linear SVM classification Decision Tree
Naïve Bayes Classification Random Forest
because of the obfuscation techniques adopted by the
Decision Tree Gradient Boosting
malware developers. Spyware is a type of malware that Random Forest K-Nearest Neighbor
performs passive attacks by recording user behavior and Table 1: List of SoTA Classifier models considered.
activities to transfer third-party networks, more dangerous
than active attacks. Trojan is another malware executing Consuming updated datasets for cyber security models plays
malicious activities in the background with the disguise as a major role in enhancing the performance, hence we use CIC
harmless programs. Trojan Horse by the name ‘Animal’ MalMem2022 dataset which has 58,596 samples. Out of
appeared in 1974, executed without authorization to copy or which 80% is taken for training and the remaining 20% is
replicate to every directory in a user system, it can execute reserved for testing. A baseline of SoTA algorithms is
endless activities in the background. implemented in Python by taking libraries from the scikit
learn toolkit. Hyperparameters are optimally chosen by
IV. PROPOSED METHODOLOGY performing rigorous random searches. After performing
In this work, we propose a machine learning model which is several hundred iterations optimal hyperparameters were
a result of experimental analysis to detect obfuscated finalized. Before the commencement of the actual
malware and to identify the class of the obfuscated malware. experiment, feature engineering is performed to select the
Here we use the CIC MalMem 2022 Dataset as mentioned in relevant features.
Section II B, to identify the class of a new sample of malware
as spyware, trojan or ransomware. We have evaluated State Cleansing the data is the initial step in feature engineering,
of the Art (SoTA) binary classifiers and multi-class classifiers wherein, specified labels of rows and columns are dropped.
with CIC Malmem 2022 dataset to derive the metrics for Especially, when dealing with multi-index labels on different
comparison and further analysis. levels this can be achieved by specifying the corresponding
axis, index, column names, or levels, thus specific labels from
rows and columns can be removed from the data frame
without affecting the original data frame, unless required.
Further categorization is done to convert categorical variables
to indicator variables for powerful representation in statistical
modeling for machine learning. To handle categorical
Fig. 1. Actual Workflow of the Model
variables, one hot encoding is employed, which provides
accurate options for controlling prefixes, and suffixes by
We conducted two experiments with State of the Art (SoTA) handling missing values. Whenever data distribution is not
models in Machine Learning for Binary classification and Gaussian, ensuring the values within the range will
Multi-class Classification. In Binary Classification, a given equivalently contribute to the data analysis. MinMax scaler is
sample can be identified as malware or non-malware. As per used to transform data by scaling the features within the range
our observation from the literature survey, an enormous without affecting the shape of the original data distribution.
amount of work has been implemented in such classification. Finally splitting the dataset into 80:20 completes the feature
But, identifying the new sample as malware or benign, does engineering step.
not provide a proper insight on the specific type of malware
V. IMPLEMENTATION
attack and the course of action to be taken to mitigate the
propagation of such malware. Because of the variants of Dividing the dataset into training and testing in 80:20 ratio,
malware like ransomware, spyware, trojan, backdoors, the binary and multiclass classifier models are implemented
rootkits, viruses, etc., identifying the type of the malware will using Python and Scikit learn libraries. After the training,
be helpful for suitable action to be taken to stop and/or predicting a label of a new sample is executed which returns
recover the adverse effect caused by the malware, which will the learned label from the object in the array. This is followed
further aid in mitigating the progress of the malware as well by deriving the metrics from the prediction. For multi-class
as recovering from the loss caused by the malware in a system classification, we employ the Adam optimizer with a Sigmoid
or a network. This would also aid in identifying the source of activation function and sparse categorical cross-entropy loss
the attack. function.

Presented in 12th IEEE International Conference on Cloud Computing for Emerging Markets-2023
A. Binary Classification of Malware
With reference to, Fig. 2, 𝑀1 , 𝑀2 , … . . 𝑀𝑛 represents the
machine learning classifiers implemented from scikit library.
The dataset has a sample 𝑋 belongs to {X1, X2…Xn} with
features F{F1,F2….Fn } defining a mapping X → F.
Identifying the class Y of X is the major objective of this
experiment. Y can be 0 or 1 for benign and malware.
𝐴1 , 𝐴2 , … . . 𝐴𝑛 are the accuracies derived from the models
𝑀1 , 𝑀2 , … . . 𝑀𝑛 . Comparing these accuracies, we evaluate
and analyze the outstanding performer among all the binary
classifiers. With a similar ground rule, the multi-class
classifiers are also analyzed as in Fig. 3 for which Y can be
0,1,2 and 3 for benign, spyware, ransomware, and trojan.

With this major objective, we carried out the experiment to


create a baseline of five machine-learning algorithms. As
mentioned earlier, implementation is undertaken with the
scikit learn library. Basic and non-parameterized functions
were used for Logistic Regression and Naïve Bayes classifier.
For Decision Tree minimum samples of leaf used are 3 with
a maximum depth of 10, entropy as the criteria, and log2 max
features are used. Repeating the same parameters for Random
Forest with a number of estimators as 30. C = 1 was the right
choice for Linear SVM.

B. Multi-class Classification of Malware


The major objective of this experiment is to create a baseline Fig. 2. Binary Classification Model
of five machine learning algorithms. Like the binary
classification experiment described in section V A, the
implementation is made with the scikit learn library. Basic
and non-parameterized functions were used for Naïve Baye’s
classifier. For the Decision Tree, the minimum samples of
leaf used are 16 with a maximum depth of 12, entropy as the
criteria, and log2 max features are used. Repeating the same
parameters for Random Forest with number of estimators as
30, min_samples split =4 and max depth as 40. With the batch
size of 2000 and just 10 epochs we achieved better accuracy
with Random Forest. Learning rate of 0.2 was the right choice
for Gradient Boosting. The ML models are tested and
evaluated using the following metrics.

i) Accuracy: metric used to measure the correctness in the


classification. Ratio of samples identified correctly to the
total samples.
𝑁𝑜. 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠

ii) Precision: metric used to measure the preciseness of the


model in predicting positive samples.
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑟𝑖𝑔ℎ𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑

iii) Recall: metric used to measure how many of predicted


positive samples are correct.
𝑅𝑒𝑐𝑎𝑙𝑙
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑟𝑖𝑔ℎ𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
=
𝑇𝑜𝑡𝑎𝑙 𝑟𝑖𝑔ℎ𝑡𝑙𝑦 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 + 𝑇𝑜𝑡𝑎𝑙 𝑟𝑖𝑔ℎ𝑡𝑙𝑦 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Fig. 3. Multi-class Classification Model
iv) F1-score: metric that gives balance factor between
VI. RESULTS AND DISCUSSION
precision and recall, its value is directly proportional to
the performance. A. Results of Binary Classification
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 𝑥 The following are the results deduced from the experiment
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
details discussed in the previous section. The values of the
metrics accuracy, precision, recall, and F1 score of binary and

Presented in 12th IEEE International Conference on Cloud Computing for Emerging Markets-2023
multi-class classification are tabulated in Tables 3 and 4
respectively.
Model Accuracy Precision Recall F1 Score
Logistic Regression 99.56% 99.42 99.71 99.56
Linear SVM classification 99.88% 99.88 99.88 99.88
Naïve Bayes Classification 99.21% 98.78 99.65 99.21
Decision Tree 99.99% 99.98 99.982 99.99
Random Forest 99.982% 99.982 99.982 99.982
ANN [11] 99.72% ~100.0 99.9 ~100
MLP Classifier[10] 99.70% 99.70 99.70 99.70
kNN classifier [10] 99.96% 99.96 99.96 99.96
Dilated CNN [10] 99.88% 99.88 99.88 99.88
a. Decision Tree b. Naïve Bayes
Table 2: Binary Classification-Results.

It is evident that for the hyperparameters chosen by our


model, all models are performing equivalent but Random
Forest performance is consistent in all the tests. Results of
ANN, MLP Classifier, KNN Classifier, and dilated CNN
which are existing models are also shown here for
comparison.
B. Results of Multi-class Classification
c. kNNClassifier d. Gradient Boosting
Model Accuracy Precision Recall F1 Score
Naïve Bayes Classifier 68.86% 68.86 73.26 64.51
Decision Tree 84.67% 84.89 84.92 84.90
Random Forest 89.07% 87.63 87.62 87.62
Gradient Boosting 83.84% 83.84 83.84 83.83
K-Nearest Neighbor 79.80% 79.80 79.85 79.81
Dilated CNN [10] 81.83% 72.71 72.72 72.71

Table 3: Multi-class Classification-Results.

e. Random Forest Classifier


Fig. 5. Confusion matrices with Multi-class Classifiers

VII. CONCLUSION
Obfuscated malware detection is one of the hot topics of
Fig. 4. Multi-class Classifiers - Results research in the field of AI-infused cyber security. Although a
good amount of work is found in classifying a sample as
It is evident from the results of multiclass classification that malware or non-malware, to the best of our knowledge
Decision Tree and Gradient Boosting are closer in significant research is lacking in detecting multiple malware
performance, but Random Forest is performing outstanding in a single model. In this paper, we have implemented a
among other models, although 89% is not superior, but it is Machine Learning-based cybersecurity model for multi-class
comparatively better result obtained so far. A combo chart in classification of obfuscated malware to detect three types of
Fig.4 shows the distribution of metrics. Comparing these malware viz. spyware, ransomware, and trojan. We have
results with Anzhelika’s Dilated CNN model, our proposed compared the results of our work with existing works and
model is better with +8%. A similar experiment was carried presented that our proposed model performance is 8% better
out by Lamia Pervan using ANN, although binary than the existing models with the better hyperparameters we
classification results were best, the model performance chose. With the Random Forest algorithm and considerable
decreased for multiclass classification. The confusion hyperparameter tuning, we achieved an accuracy of 89.07%
matrices derived from our experiment are as shown in Fig.5. in classifying multiple obfuscated malware. Although there
is further scope for improvement in achieving still higher
accuracy, extensive experiments are being conducted for
further improvement in accuracy.

Presented in 12th IEEE International Conference on Cloud Computing for Emerging Markets-2023
REFERENCES pp.1–6.
[1] H. S. Berry, “The Evolution of Cryptocurrency and doi:10.1109/ColCACI59285.2023.10226058.
Cyber Attacks,” in 2022 International Conference on [10] A. Mezina and R. Burget, “Obfuscated malware
Computer and Applications (ICCA), Dec. 2022, pp. detection using dilated convolutional network,” in
1–7. doi: 10.1109/ICCA56443.2022.10039632. 2022 14th International Congress on Ultra Modern
[2] B. Marais, T. Quertier, and S. Morucci, “Malware Telecommunications and Control Systems and
and Ransomware Detection Models,” pp. 1–8, 2022, Workshops (ICUMT), Oct. 2022, pp. 110–115. doi:
doi: https://doi.org/10.48550/arXiv.2207.02108. 10.1109/ICUMT57764.2022.9943443.
[3] M. Botacin et al., “AntiViruses under the [11] L. P. Khan, “Obfuscated Malware Detection Using
microscope: A hands-on perspective,” Comput. Artificial Neural Network (ANN),” in 2023 Fifth
Secur., vol. 112, p. 102500, Jan. 2022, doi: International Conference on Electrical, Computer
10.1016/j.cose.2021.102500. and Communication Technologies (ICECCT), Feb.
[4] I. You and K. Yim, “Malware Obfuscation 2023, pp. 1–5. doi:
Techniques: A Brief Survey,” in 2010 International 10.1109/ICECCT56650.2023.10179639.
Conference on Broadband, Wireless Computing, [12] M. R. Ghazi and N. S. Raghava, “Machine Learning
Communication and Applications, Nov. 2010, pp. Based Obfuscated Malware Detection in the Cloud
297–300. doi: 10.1109/BWCCA.2010.85. Environment with Nature-Inspired Feature
[5] S. Schrittwieser, S. Katzenbeisser, J. Kinder, G. Selection,” in 2022 5th International Conference on
Merzdovnik, and E. Weippl, “Protecting Software Multimedia, Signal Processing and Communication
through Obfuscation,” ACM Comput. Surv., vol. 49, Technologies (IMPACT), Nov. 2022, pp. 1–5. doi:
no. 1, pp. 1–37, Mar. 2017, doi: 10.1145/2886012. 10.1109/IMPACT55510.2022.10029271.
[6] S. T. King and P. M. Chen, “SubVirt: implementing [13] D. Gibert, M. Fredrikson, C. Mateu, J. Planes, and Q.
malware with virtual machines,” in 2006 IEEE Le, “Enhancing the insertion of NOP instructions to
Symposium on Security and Privacy (S&P’06), 2006, obfuscate malware via deep reinforcement learning,”
pp. 14 pp. – 327. doi: 10.1109/SP.2006.38. Comput. Secur., vol. 113, p. 102543, Feb. 2022, doi:
[7] L. Jia, Y. Yang, B. Tang, and Z. Jiang, “ERMDS: A 10.1016/j.cose.2021.102543.
obfuscation dataset for evaluating robustness of [14] M. Sewak, S. K. Sahay, and H. Rathore, “DOOM: A
learning-based malware detection system,” novel adversarial-DRL-based op-code level
BenchCouncil Trans. Benchmarks, Stand. Eval., vol. metamorphic malware obfuscator for the
3, no. 1, p. 100106, Feb. 2023, doi: enhancement of IDS,” UbiComp/ISWC 2020 Adjun.
10.1016/j.tbench.2023.100106. - Proc. 2020 ACM Int. Jt. Conf. Pervasive Ubiquitous
[8] T. Carrier, P. Victor, A. Tekeoglu, and A. Lashkari, Comput. Proc. 2020 ACM Int. Symp. Wearable
“Detecting Obfuscated Malware using Memory Comput., pp. 131–134, 2020, doi:
Feature Engineering,” in Proceedings of the 8th 10.1145/3410530.3414411.
International Conference on Information Systems [15] C. C. San, M. M. S. Thwin, and N. L. Htun,
Security and Privacy, 2022, pp. 177–188. doi: “Malicious Software Family Classification using
10.5220/0010908200003120. Machine Learning Multi-class Classifiers,” 2019, pp.
[9] L. Igor Moraga, J. P. R. Malcó, D. Zabala-Blanco, R. 423–433. doi: 10.1007/978-981-13-2622-6_41.
Ahumada-García, C. A. Azurdia-Meza, and A. D. [16] L. Brieman, “Random Forests,” Machine. Learning.,
Firoozabadi, “Detection of Obfuscated Malware by vol. 45, no. Oct-2001, pp. 5–32, 2001, doi:
Engineering Memory Functions Applying ELM,” in https://doi.org/10.1023/A:1010933404324.
2023 IEEE Colombian Conference on Applications
of Computational Intelligence (ColCACI), Jul. 2023,

Presented in 12th IEEE International Conference on Cloud Computing for Emerging Markets-2023

You might also like