Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views6 pages

Metamorph 2

Talks about metamorphic malware

Uploaded by

littletrout8803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views6 pages

Metamorph 2

Talks about metamorphic malware

Uploaded by

littletrout8803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2015 Fifth International Conference on Communication Systems and Network Technologies

DaCoMM: Detection and Classification of


Metamorphic Malware
Vishakha Mehra Vinesh Jain (Asst. Prof.) Dolly Uppal
Rajasthan Technical University Government Engineering College Rajasthan Technical University
Kota, India Ajmer, Rajasthan, India Kota, India
E-mail: E-mail: E-mail:
[email protected] [email protected] [email protected]

Abstract—with the fast and vast upliftment of IT sector in 21 st knowledge of its software for static analysis, which is not
century, the question for system security also counts. As on one usually possible. Hence, signature based detection fails to
side, the IT field is growing with positivity, malware attacks are result in zero day malware attack.
also arising on the other. Hence, a great challenge for zero day Dynamic Analysis is the technique of analyzing the infected
malware attack. Also, malware authors of metamorphic malware
file while it is executing, in order, to keep a watch on its
and polymorphic malware gain an extra advantage through
mutation engine and virus generation toolkits as they can behavior or actions performed by it. Malware today are very
produce as many malware as they want. Our approach focuses smart, as some of them, stop working as soon as they see
on detection and classification of metamorphic malware emulated or virtual environment and can easily bypass the
according to their families. MM are hardest to detect by malware detection scheme. For this technique, proper
Antivirus Scanners because they differ structurally. We had knowledge of software is not necessary. Dynamic analysis
gathered a total of 600 malware including those also that proves itself to detect unknown malware. In present scenario
bypasses the AVS and 150 benign files. These files are dynamic analysis is most commonly used to detect malware
disassembled, preprocessed, control flow graphs and API call but it is not adequate.
graphs are generated. We had proposed an algorithm-
Gourmand Feature Selection algorithm for selecting desired B. Metamorphic Malware
features from call graphs. Classification is done through WEKA
The term metamorphic means ‘self-change in shape’.
tool, for which J-48 has given the most accuracy of 99.10%. Once
the metamorphic malware are detected, they are classified Metamorphism is the process of generating a copies of itself
according to their families using the histograms and Chi-square by changing its shape in each copy. Metamorphic malware are
distance measurement formula. the self-generated malware that are functionally same but
structurally different, means, every new variant of
Keywords— metamorphic malware, polymorphic malware, metamorphic malware generated on each iteration differs
mutation engine, code obfuscation, histogram. according to size, syntax, structure, instructions but their
behavior remains constant as they preserves the semantics.
I. INTRODUCTION That is, they belong to the same family of malware.
MALWARE- a small seven letter word, yet powerful C. Polymorphic Malware
enough to cause a substantial security attacks in computer
The term polymorphic means ‘multitudinal- the appearance
world. Malware has its various forms. It is a comprehensive
of more than one form.’ Polymorphic malware is same as
term for worms, Trojan, virus, spyware, adware, crimeware,
metamorphic malware, as to fool detectors, it also make
botnet and a list to go. In other words, malware a shortened for
changes to its initial code on each iteration.
MALicious softWARE, is a software designed against
system’s security, integrity and confidentiality.
TABLE 1: COMPARISION OF METAMORPHIC MALWARE AND POLYMORPHIC
MALWARE
A. Malware Analysis
S.
Malware can be analyzed by following two techniques No Metamorphic malware (MM) Polymorphic malware (PM)
Static Analysis and Dynamic Analysis. .
Static analysis uses the concept of pattern (byte code- MM has the ability to rewrite its PM does not have any such
1
own source code quality
signature) recognition for analyzing a malware. The suspected If signature is present in the
code is analyzed without actually executing it. Anti-virus 2 Tougher than PM to detect AVS, then these malware can be
scanners follow traditional signature based detection method easily detected.
to detect malware. Signature are the sequence’s byte present in Can be detected by AVS as one
Cannot be easily detected as these of its part, called the decryptor
the database. A very big disadvantage with signature based 3 malware follows the semantic remains same in every generated
detection is that it fails to detect malware whose signature is preserving scheme. new variant of malware, serving
not present in the database. Also, once should have proper as signature to AVS.

978-1-4799-1797-6/15 $31.00 © 2015 IEEE 668


DOI 10.1109/CSNT.2015.62
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on November 05,2024 at 11:53:07 UTC from IEEE Xplore. Restrictions apply.
But, polymorphic malware has two parts- one is decryptor and malwares. Control Flow Graph is used to represent the
the other is encrypted main body. . Onto which the decryptor Malware’s signature. A novel approach to detect variants of
of each generated malware remains constant, which serve as a malware is proposed by authors [10] using code graph.
signature and makes a bit easy for AVS to detect the malware. Experimental instigation was performed on the malware
samples and topological graph is used to represent the
II. MOTIVATION instruction analogous to system calls.
The terrorism of malware attack has spread all over the In their research work [9], malware surveyor proposed a
globe. Metamorphic malware mutate its code on each method to diagnose obfuscated malware. Experimental tests
replication and with the emergence of mutation engine, the were performed on the four type of malware strain like Win32
code obfuscation can be done easily and fastly in the bulk. Blaster. API calls are captured from the disassembled program
Metamorphic malware cannot be easily detected by preserved code and then unique IDs are assigned to these APIs. Various
signatures in antivirus scanners, as on each iteration they differ Obfuscation techniques for instance data modification, null
structurally but functionally remains same. Also, two important operation and dead code insertion are used to generate the
properties, hiddenness and small size of mutation engine makes variants of malware. Similarity measures as Jaccard measure,
it harder for malware analyst to detect them. This motivated us Euclidean distance and Cosine similarity measure are used to
for study, detection and classification of metamorphic compare the APIs sequence extracted from base malware and
malware. variants of malware. Experimental results conclude that this
method is found best in comparison to malware scanner used in
III. RELATED WORK the study.
In [1], metamorphic malware are detected by dynamically A unique approach is seen in [12], where the authors
analyzing the executables using an emulator by tracing API compares their accuracy against anti-virus scanners and stated
calls. Instead of creating signature for samples, signature for that their proposed methodology detected almost all malicious
entire malware family is generated and authors have shown files while that of AVS detects only 62% of the malicious
that same family metamorphic malware can be detected by contents. The malware are detected using arbitrary length of
same signature pattern. Using the proposed method malware control flow graphs then aligning them into a similarity matrix.
having same byte pattern created from any tool can be detected
once a metamorphic generator is generated. To determine Author in [11] heuristic engine can detect potentially new
similarity between two or more metamorphic generators a malware, previously unexamined. Heuristic detection is
proximity index between them is designed. analogous to signature based detection, except that instead of
looking for specific pattern heuristic detection looks for
An approach to recognize presence of metamorphic instructions or commands within a program that is not found in
malware using normalization has been presented in [3]. The usual application programs. However virtual environment like
paper models the metamorphic malware by rewriting system dynamic detection is required. Heuristic analysis is prone to
and constructing a normalized term for problem such that it problem of false alarm which can cause system more
maintains three properties: confluence, termination, and vulnerable.
equivalence-preservation. Author has presented a way to solve
the normalization construction problem and two IV. TYPES OF METAMORPHIC MALWARE
approximations are defined in case where exact solution is not
feasible. A priority method has been given by author to identify Name of Win95/Zmist {Win32,
Virus Linux}/Simile
false matches produced during approximation. The paper
demonstrated a feasibility report for normalization approach to Created by Zombie The Mental Driller
identify metamorphic virus called”W32.Evol”. The study Released on 2000 2002
concluded that signature based methods are not enough in
detecting metamorphic engine an advancement is required. Techniques used EPO (entry point EPO (does not affect
obfuscation)+ code the entry point of the code
Classification of malware on basis of call graph is Integration + but randomly places
anywhere in the code)
discussed in [2]. In this paper, structural similarity between Jump instructions
samples is detected by representing malwares as call graph.
Attack on Portable PE files on network
Graph similarity is expressed using graph edit distance which Executables .EXE files as well as on machine
is a variable metric. Author compares call graph mutually to excluding files beginning
compute pair wise graph similarity scores. Author has with NO, SC, F, DR, P
employed several clustering algorithm such as k-medeoids and and file having letter V
anywhere in its name
DBSCAN on several malware samples to detect new malware
families. But the density based DBSCAN algorithm has
obtained the better results and it was fully able to find out Engine used RPME (Real Uses oligomorphic
malware families Permutation Engine) code- RDTSC (ReaD
Time Stamp Counter)
Authors in [6] and [8] designed a rewriting engine to
identify the variants of malwares. Semantic as well as syntactic
structures of programs are captured for analyzing variants of

669

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on November 05,2024 at 11:53:07 UTC from IEEE Xplore. Restrictions apply.
V. METAMORPHIC MALWARE CODE OBFUSCATION TECHNIQUES VI. METAMORPHIC MALWARE GENERATOR (MUTATION ENGINE)
It is a technique of making code more difficult and less When the code is fed to mutation engine, variants of
clear to understand and detect. For metamorphic malware code metamorphic malware are generated, which are functionally
obfuscation technique make the code different from its source same but differs structurally as shown in figure 1.
code, which is much harder to understand but is behaviorally
the same. For metamorphic malware code obfuscation FIGURE1. METAMORPHIC MALWARE VARIANTS OUT OF MUTATION ENGINE.
techniques are as follows and these techniques differ according
to the work they perform on targeted code.
(dots as
vertical lines)

TABLE 2. VARIOUS CODE OBFUSCATION TECHNIQUES


(dots as
Sr. Technique Alternative Working Malware evolved left
No. name diagonal
)
1. Dead code Garbage Binary pattern Evol
insertion code of the code is
insertion/ changed by
junk code inserting “Do (dots as
insertion nothing (random wave)
statements” dots)
like NOP,
blank lines etc. Metamorphi
2. Register Register On each RegSwap
c Mutation (dots as
Engine
renaming usage iteration of the thin cross
i t)
exchange malware,
names of the
register are Input (dots as
changed. present small
3. Instruction Equivalent Instruction(s) W32/MetaPhor maliciou )
replacement code are replaced by s file
substitution the group of
equivalent (dots as
instruction(s) zig-zag)
4. Instruction Instruction Sequence of Win32/Ghost
transposition reposition instruction(s) (Dots as thick
are altered cross points)
keeping in
notice
objective of New variants of
code is existing malware
preserved.
k- frames
k!
combinations A. Algorithm for mutation engine
5. Function Function Sequence of Win95/Zperm
transposition permutation function(s) are
Input: set of instruction code
altered keeping
in notice
Output: new variant of malware (metamorphic malware)
objective of Step1. Begin
code is
preserved. Step2. Input set of instruction code D.
6. Procedure Function Procedure call Win32/Simile Step3. Transform D►D^ (by using obfuscation
Inlining inlining are replaced by
the actual code
techniques discussed in table 2).
7. Procedure Function Actual code is Win32/Simile Step4. One or more instruction(s) in D^ differs from
Outlining outlining replaced by the
procedure call instruction(s) of D such that the two code behave similarly.
8. Code
integration
Code
assembler
Decompiles
the code into
Zmist
Step5. Output new variant of metamorphic malware.
smallest and Step6. Stop.
again build the
file As explained in figure 2, as to start with mutating the code,
9. Host code Host code On each Win95/Zperm first of all the malware has to reveal the location of its code. As
mutation alteration iteration code it founds its own location, the next step is to disassemble the
mutates its
code as well as
code into mnemonics (assembly code) and this is done through
its host code disassembler. The disassembled code is then analyzed to gather
also information such as code structure, flow of code, variable(s)
and subroutine(s), information etc. this information is much
useful for code obfuscator which is also known as

670

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on November 05,2024 at 11:53:07 UTC from IEEE Xplore. Restrictions apply.
as heart of the mutation engine. The obfuscator uses the FIGURE 2. METAMORPHIC MALWARE GENERATOR
information provided by code analyzer in order to transform
the binary of the input code, so that new variant of malware G
is generated which is functionally same but structurally
different. The code is then compressed and manificate by
code compressor and is then assembled into machine code Reveal the self-code Tool
through assembler. The code is now obfuscated and is
attached. The new code does not contain any matching
instruction as signature with its previous version, hence Disassemble the Disassemb
makes it difficult for AVS to detect them. ler

VII. PROPOSED WORK Metamorp Analyze the code Analyzer


Metamorphic malwares are getting highly smarter to hic
Mutation
escape its detection from present anti-virus scanners. Our Transform code
Engine
approach focuses on two techniques- detection and
classification. Detection of metamorphic malware variants Obfuscato
(Heart Random inverter
and then classifying them according to their families as of
shown in figure 3. mutati
on Compressor (uses
A. DATA COLLECTION engine Compress the code
code magnification
) & compression
We had gathered a total of 600 malicious samples. Out of
tech.)
which 268 samples of metamorphic malware are created Assemble the code Assembler
using mutation engine present on VX Heaven [4] and the
remaining are the corrupted malicious samples from infected
system that bypassed the AVS. 150 benign samples are Attach the code
collected from freshly installed windows.
B. OUR APPROACH Stop

For our first case i.e., detection of metamorphic malware,


steps are shown in figure 3 below, the collected data set is
disassembled using IDA-Pro disassembler [5].

Portable Executables
B -150 Disassemble through IDA-
Pro Disassembler Build CFGs
Preprocess
M 332

Eng. 265

Select features Generate API call


graphs
Gourmand
Classification Feature
Selection
Algorithm

Benign Malware
Create Histogram

Classification
Evaluate according to family as -1, 0, and 1
:
Chi-square

FIGURE 3: PROPOSED METHODOLOGY FOR DETECTION AND CLASSIFICATION OF METAMORPHIC MALWARE

671

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on November 05,2024 at 11:53:07 UTC from IEEE Xplore. Restrictions apply.
The work of disassembler is to dis-assemble the code into WEKA is a classification tool for classifying benign and
assembly code. Once the code is disassembled, it is malicious files using various algorithms like J-48, Voted
preprocessed to remove comments, blank lines, labels etc. Perceptron, Naïve Bayes, KNN algorithm etc. We had
residue is the creamy code through which control flow passed our dataset to all these algorithms out of which J-48
graphs are generated. CFGs illustrates the flow of code algorithm has given the most accurate result of 99.10% as
segments through which API call graphs are generated. To illustrated in table3 and graph 1.
select the desired features out of API call graphs, we had
given Gourmand Feature Selection Algorithm as shown in GRAPH 1: WEKA ALGORITHMIC GRAPHICAL CLASSIFICATION RESULT OF
MALICIOUS FILE
Algorithm2 below.

ALGORITHM 2: GOURMAND FEATURE


SELECTION ALGORITHM
INPUT: API call graphs G (V, E) where V = {v1, v2,
v3… vn} are the set of nodes representing the features
and e = {e1, e2, e3… en} is the edge between vx and vy
such that x≠y.
Weight W of each node in V is given by wi. For
example, weight of node vx = wx and so on. Weight of
edge between two nodes vx and vy is denoted as pxy.
OUTPUT: Set SF holding selected features
Step1. Set SF = NULL
Step2. Repeat step 3, 4, 5 and 6 while V≠0 // repeat
steps for all nodes
Step3. The node with maximum weight is selected, vA. Our second case is classification of metamorphic
Step4. Weight of nodes having similar weights to that malware according to their families is illustrated
of maximum weighted node vA are upgraded as: diagrammatically in figure 3. The selected feature set of the
Wx = Wy - pxy *2C files which are considered as malicious are used to create
Step5. Set SF = SF U VA // adds maximum weighted histograms. Histograms are created using the features as
vertex to the feature selected set SF feature vector of the malicious files. Each histogram is
Step6. Set G (V, E) = {G (V, E) – G (VA, EA) // delete compared with other histograms and a distance is calculated
the maximum weighted node and all the connected between them using the cosine similarity distance
edges to it from the graph. measurement formula as follows:
Step7. Stop
X = cos (Ɵ) = =
This algorithm is the greedy approach to select the
relevant features. It takes input API call graphs assuming
each of its node as features. Nodes and edges have weights.
A set of selected feature SF is initially set to null.
Where P and Q are the measurements of two histograms.
The node with maximum weight is selected. The nodes
having similar weight to that of maximum weighted node are The output of the above formula lies in the range of -1 to
then upgraded as the formula given in step 5 of algorithm 2. +1.
The maximum weighted node is added to the null set SF and Hence there are three results of this method as shown in
is removed from the main graph with all its edges. The same table 4: The histograms whose cosine similarity result is 1
procedure is repeated for all the other nodes in the graphs. (one) or lies closely to one belongs to same malware family
The selected feature set is ready for classification and is whereas that for -1 are totally different malware.
given as input to WEKA [7].
TABLE 4: COSINE SIMILARITY RESULT
TABLE 3: ALGORITHM WEKA DETECTION OF METAMORPHIC MALWARE Measurement Result
Sr. No. WEKA algorithm Accuracy 1 The two histograms are similar
1. KNN 89.00% 0 The two histograms are independent of
2. Voted Perceptron 92.24%
3. Naïve Bayes 94.56%
each other
4. J-48 99.10% -1 The two histograms are totally
5. SMO 91.08% dissimilar

672

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on November 05,2024 at 11:53:07 UTC from IEEE Xplore. Restrictions apply.
VIII. CONCLUSION AND FUTURE WORK [3] Andrew Walenstein, Rachit Mathur, Mohamed R. Chouchane, &
Arun Lakhotia. Normalizing Metamorphic Malware Using Term
Our approach has been divided into two phases: first phase Rewriting. In International Workshop on Source Code Analysis and
detects the metamorphic malware and classifies it against Manipulation SCAM ’06, pages 75-84. IEEE, 2006.
benign files with the accuracy of 99.10% through our [4] VXHeavens, http://vx.netlux.org/vl.php
proposed Gourmand Feature Selection Algorithm and with [5] IDA Pro Disassembler and Debugger, http://www.hex-rays.com
WEKA [7] classification tool. This process follows dynamic [6] Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion,
approach. In second phase, the classified metamorphic “Architecture of a Morphological Malware Detector", Computer
malwares are again deeply classified according to their Virology, 2009, pp. 263-270.Electronic Publication: Digital Object
Identifiers (DOIs):
families using histograms and chi-square measurement
formula. Detection accuracy of metamorphic malware is [7] WEKA. http://www.cs.waikato.ac.nz/ml/weka.
99.10%, this percentage can be improved in future to 100% [8] Matthieu Kaczmarek Guillaume Bonfante and Jean-Yves Marion,
“Control Flow Graphs as Malware Signatures", 2007.
by introducing more novel approaches of detection. Our
[9] Analyzer of Vicious] Andrew H. Sung and Jianyun Xu and Patrick
approach is limited to portable executables, in future Chavez and Srinivas Mukkamala, “Static Executables (SAVE)", In
approach can be expanded for PDF and various web Proc. of 20th Annual Computer Security Applications Conference
browsers by analyzing these files thoroughly. (ACSAC 2004), 6-10 December 2004, Tucson, AZ, pp. 326-334.
[10] Heejo Lee Kyoochang Jeong, ”Code Graph for metamorphic
REFERENCES Malware Detection", In International conference on Information
Networking,ICOIN, 2008, pp. 1-5. IEEE.
[1] Vinod P. , Harshit Jain, Vijay Laxmi, Yashwant K. Golecha, Manoj [11] Wong, W. and Stamp, “Hunting for metamorphic engines”, Journal in
Singh Gaur. MEDUSA: MEtamorphic malware Dynamic analysis Computer Virology, 2006, 2:21122.
Using Signature from API. In 5th International Conference on
Malicious and Unwanted Software, MALWARE 2010, pages 263- [12] Essam Al Daoud, Ahid Al-Shbail, Adnan M. Al-Smadi. Detecting
269. ACM, 2010. Metamorphic Malware Using Rbitary Lenghth of Control Flow
Graphs and Nodes Alignment. In ICIT conference- Bioinformatics
[2] Qinghua Zhang, Douglas S. Reeves. MetaAware: Identifying and Image, 2009.
Metamorphic Malware. In Computer Security Application
Conference, Annual, pages 411-420, 2007.

673

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on November 05,2024 at 11:53:07 UTC from IEEE Xplore. Restrictions apply.

You might also like