Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
50 views31 pages

Malware - Me Project Document

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views31 pages

Malware - Me Project Document

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

1

ABSTARCT:

Despite the fact that Android apps are rapidly expanding throughout the mobile
ecosystem, Android malware continues to emerge. Malware operations are on the rise,
particularly on Android phones, it make up 72.2 percent of all smartphone sales.
Credential theft, eavesdropping, and malicious advertising are just some of the ways used
by hackers to attack cell phones. Many researchers have looked into Android malware
detection from various perspectives and presented hypothesis and methodologies.
Machine learning (ML)-based techniques have demonstrated to be effective in identifying
these attacks because they can build a classifier from a set of training cases, eliminating
the need for explicit signature definition in malware detection. This project provided a
detailed examination of machine-learning-based Android malware detection approaches.
According to present research, machine learning is good for identifying Android
malware, this is a powerful and promising solution.
2

CHAPTER 1

INTRODUCTION

1.1 Malware Definition:

"Malware" is an abbreviation for "malicious software", it is used as a single term to refer


to Viruses, Trojans, Worms, etc. These programs have a variety of features, such as
stealing, encrypting or deleting sensitive data, modifying or hijacking basic computer
functions, and monitoring computer activity. Show user permission.

There are three main types of infiltration mechanisms that malware use to infect Android
systems: – Repackaging – Updating – Downloading

• Repackaging: Intruders lure users by creating similar apps akin to some of the
most popular ones, tempting them with a variety of useful features. These apps are
repackaged, cloned versions of the original apps, containing malicious code.
• Updating: Many authors include an update component in their app which can later
download malicious code, while the app is in use.
• Downloading: Malware developers lure users into downloading attractive apps,
falsely promising them a variety of amazing features, with malicious code inserted
inside such apps.

There are many malware detection mechanisms mostly based on content signatures,
which compare an app’s signature to a database of known malware signature definitions.
This mechanism can only detect known malware, not new ones. Research has shown that
signature-based approaches never keep up with the speed of new malware development.
Hence, there is an urgent need to research and develop solutions to alleviate non-
detection of malware on the Android platform. Examples of effective solutions can
include characteristic and behavioral-based (static or dynamic) methods. One of the most
popular static behavioral-based methods of malware detection is based on analyzing its
3

requested permission list and resource usages, e.g. Location Services, Contact
Information, WiFi, etc. In our approach, we extracted data from more than one thousand
user-defined permissions. A more powerful approach is dynamic behavioral-based, which
dynamically observes the behavior of applications while running, e.g. dynamic API calls,
capturing the run-time activities of the application. But, such an approach is complex and
time-consuming. Thus, we are proposing a new framework for classifying Android
applications, utilizing machine learning (ML) techniques for large and diverse datasets.
Our framework compiles a set of more than one thousand different app-requested
permissions. Permissions are encoded to train ML classifiers to detect malicious
applications. Experimentation using different quality/quantity ML input-datasets.

What spyware can be used:

 To steal sensitive information. Such programs are interested in personal


information, such as credentials, passwords, bank details, and other similar
information. In addition, they can monitor the user's online activity, track their
web browsing habits, and send all this data to a remote server.
 Show unwanted creatives. Spyware can display a large number of annoying pop-
up ads. Such activity is more associated with adware parasites.
 Redirecting users to questionable or malicious websites against their will. In
addition, some types of spyware threats are able to change web browser settings
and change the search engine and home page.
 Create numerous links in the search results of the victim and redirect him / her to
the desired places (third party spyware sites, websites and other associated fields).
 Cause essential changes to system settings. These changes can reduce overall
security and trigger performance issues.
 Connecting to a compromised computer using backdoors. Most spyware threats
are capable of giving hackers remote access to the system without the user's
4

knowledge. Degradation of the overall performance of the system and causing its
instability.

1.2 Background Information

The first Android smartphone was launched in September 2008, and shortly thereafter,
smartphones powered by the new open-source operating system were everywhere. In
2021, almost 12 new enhanced versions of Android were released, and it is the most
widely used mobile operating system in the world, with an 84% share of the global
smartphone market [1].

With this level of adoption coupled with the open-source nature of Android applications,
security attacks are becoming more and more ubiquitous and seriously threaten the
integrity of Android applications. Statistics show that more than 50 million malware and
potentially unwanted applications (PUA) have been identified for Android [2].

Figure 1: Number of Android Malwares Per Year

Researchers have been studying the nature of malware applications for many years and
have categorized them into different families [3]
5

 Trojans: These appear as benign apps and aim to steal the user's confidential
information without the user's knowledge.
 Backdoors: These exploit root grant privileges and aim to gain control over the
device and perform any operation without the user's knowledge.
 Worms: This malware creates copies of itself and distributes them over the
mobile device's networks.
 Spyware: These appear as benign apps designed to monitor the user's confidential
information, such as messages, contacts, location, bank information, etc., for
undesirable consequences.
 Botnets: A botnet is a network of compromised Android devices controlled by a
remote server.
 Ransomware: This malware prevents users from accessing their data by locking
the mobile phone until a ransom amount is paid.
 Riskwares: These are legitimate that malicious authors exploit to reduce the
device's performance or harm their data.

Standard approaches to detecting malware can be of two types – Static and Dynamic.

 Static Approach:

In this approach, the functionalities and maliciousness of an application can be checked


by disassembling and analyzing its source code without actually executing the
application.

 Dynamic Approach:

In this approach, the application is examined during execution and can help identify
undetected malware by static analysis techniques due to code obfuscation and encryption
of the malware.
6

These approaches can be further sub-divided based on the method of anomaly detection.
Some of these sub-divisions are shown in the diagram below –

Figure 2: Android Malware Detection Techniques

1.3 Project Definition and Goals

The goals and objectives of the project are two-fold:

1. to achieve good accuracy in detecting malware from samples of benign and

malware applications using multiple approaches, and


2. to compare the results of different approaches and algorithms and provide

recommendations on the best strategy for malware detection

Specifically, I focused on two static approach methods – the Signature-based and


Permission-based detection methods. These two methods are described below.

 Permission-based malware detection:

In Android smartphones, the permissions granted to an application are essential in


governing the access rights given to that application. E.g., At the time of installation, the
user can grant an
7

application the permission to send information over the internet or access contacts. These
permissions are assumed to be needed for the application to perform its designed
functions. However, many times, applications request permissions that are not required
for their functionality.

By analyzing the combinations of permissions requested by benign and malware


applications, it is possible to identify whether an application is malware or not. This can
be achieved by training machine learning classification models with data from known
malware and benign applications.

 Signature-based malware detection:

This is a method that is commonly used by commercial antimalware products. In this


method, signatures are generated for the various API calls that the application will make.
By identifying patterns of such signatures and comparing these signatures with existing
malware families’ signatures, it is possible to detect whether an application is benign or
malware.

1.4 Project Definition:

In this project, I demonstrate both permission-based and signature-based methods using


two different datasets.

By using publicly available labeled data sources, different classification models are built
to distinguish between malware and benign applications. The first data source contains
information on permissions granted to the applications, while the second data source
contains information on API call signatures of the applications.

Various data mining models are trained, and their performance metrics, such as precision
and recall, are analyzed and compared. This comparison is first made for different
8

classification models within an approach. Then, the best results for each approach are
compared to understand which of the two systems is better for detecting malware.

1.5 Analysis Methodology

The step-by-step methodology followed in this project is illustrated below.

Figure 3: Project Methodology

Dimensionality Reduction:

The datasets used in the project had a large number of features, so it was necessary to
reduce the dimensions before using classification algorithms.

The dimensions are reduced using a combination of three methods –

 Frequency Counts:

Features that contain only one unique value are removed because they cannot help
differentiate between the classes

 Correlations:

If there are any features that are highly correlated to each other, then only one of those
features is retained.
9

Exploratory Analysis:

In this step, visualizations are used to deep dive into the data.

 The data is split into two classes – malware and benign.


 Each feature is explored individually, and I try to see the distribution of values of
the feature when split by the class variable.
 This step helps in identifying the potentially differentiating features. The features
whose distributions differ significantly between malware and benign classes will
likely be the most important features.

Modeling using Supervised Classification Algorithms:

 The dataset is first broken up into training and testing datasets.


 The training set is used to train the classification algorithm, while the testing set
is used to validate the results seen on the training set and address overfitting
issues.
 Three different supervised classification algorithms are used for each approach to
perform the classification into benign and malware classes.
 These algorithms are – kNN, Logistic Regression, and Random Forest.

Figure 4: Supervised Classification Algorithms


10

CHAPTER 2

LITERATURE REVIEW

Effective Botnet Detection Through Neural Networks on


Convolutional Features

AUTHORS: Shao-Chien Chen, Yi-Ruei Chen, Wen-Guey Tzeng

Botnet is one of the major threats on the Internet for committing


cybercrimes, such as DDoS attacks, stealing sensitive information, spreading
spams, etc. It is a challenging issue to detect modern botnets that are
continuously improving for evading detection. In this paper, we propose a
machine learning based botnet detection system that is shown to be effective in
identifying P2P botnets. Our approach extracts convolutional version of
effective flow-based features, and trains a classification model by using a feed-
forward artificial neural network. The experimental results show that the
accuracy of detection using the convolutional features is better than the ones
using the traditional features. It can achieve 94.7% of detection accuracy and
2.2% of false positive rate on the known P2P botnet datasets. Furthermore, our
system provides an additional confidence testing for enhancing performance of
botnet detection. It further classifies the network traffic of insufficient
confidence in the neural network. The experiment shows that this stage can
increase the detection accuracy up to 98.6% and decrease the false positive rate
up to 0.5%.

An approach for host based botnet detection system

AUTHORS: Yulia ALEKSIEVA, Hristo VALCHANOV, Veneta


ALEKSIEVA.
11

Most serious occurrence of modern malware is Botnet. Botnet is a


rapidly evolving problem that is still not well understood and studied. One of
the main goals for modern network security is to create adequate techniques for
the detection and eventual termination of Botnet threats. The article presents an
approach for implementing a host-based Intrusion Detection System for Botnet
attack detection. The approach is based on a variation of a genetic algorithm to
detect anomalies in a case of attacks. An implementation of the approach and
experimental results are presented.

Towards using transfer learning for Botnet Detection

AUTHORS: Prapa Rattadilok, Basil Alothman

Botnet Detection has been an active research area over the last decades.
Researchers have been working hard to develop effective techniques to detect
Botnets. From reviewing existing approaches it can be noticed that many of
them target specific Botnets. Also, many approaches try to identify any Botnet
activity by analysing network traffic. They achieve this by concatenating
existing Botnet datasets to obtain larger datasets, building predictive models
using these datasets and then employing these models to predict whether
network traffic is safe or harmful. The problem with the first approaches is that
data is usually scarce and costly to obtain. By using small amounts of data, the
quality of predictive models will always be questionable. On the other hand, the
problem with the second approaches is that it is not always correct to
concatenate datasets containing network traffic from different Botnets. Datasets
can have different distributions which means they can downgrade the predictive
performance of machine learning models. Our idea is instead of concatenating
datasets, we propose using transfer learning approaches tocarefully decide what
data to use. Our hypothesis is ―Predictive Performance can be improved by
using transfer learning techniques across datasets containing network traffic
12

from different Botnets‖.

Development of an Intrusion Detection System Using a Botnet


with the R Statistical Computing System

AUTHORS: Takashi Yamanoue, Junya Murakami

Development of an intrusion detection system, which tries to detect signs


of technology of malware, is discussed. The system can detect signs of
technology of malware such as peer to peer (P2P) communication, DDoS
attack, Domain Generation Algorithm (DGA), and network scanning. The
system consists of beneficial botnet and the R statistical computing system. The
beneficial botnet is a group of Wiki servers, agent bots and analyzing bots. The
script in a Wiki page of the Wiki server controls an agent bot or an analyzing
bot. An agent bot is placed between a LAN and its gateway. It can capture every
packet between hosts in the LAN and hosts behind the gateway from the LAN.
An analyzing bot can be placed anywhere in the LAN or WAN if it can
communicate with the Wiki server for controlling the analyzing bot. The
analyzing bot has R statistical computing system and it can analyze data which
is collected by agent bots.

An efficient botnet detection system for P2P botnet

AUTHORS: M. Thangapandiyan, P. M. Rubesh Anand

Peer-to-Peer (P2P) botnets are exploited by the botmasters for their


resiliency against the take down efforts. As the modern botnets are stealthier,
the traditional botnet detection approaches are not suitable for the botnet
detection. In this paper, an efficient botnet detection system is proposed for
detecting the P2P botnet. The proposed botnet detection system estimates the
flow export using NetFlow protocol. The packet flow is analyzed using three
main components namely, Exporter, Collector, and Analyzer. The exporter
13

captures the packet and monitors the contents of the packet. The collector
captures the flow traffic and the analyzer component initiates an automated
analysis of traffic with the captured packet information. The packet flow
information is collected by virtual interface and physical probe. The virtual
interface is used for collecting the malicious traffic information between the
Virtual Machines (VMs) and the physical probe gathers malicious traffic
information between the network bridges connecting VMs. The information
collected from these techniques are analyzed for detecting the botnets in inter
VM and intra VM. Compared to the existing Dendritic Cell Algorithm (DCA),
the proposed VM based botnet detection system has minimal time consumption,
increased detection speed, and higher attack prevention ratio.

Overview of Botnet Detection Based on Machine Learning

AUTHORS: Xiaxin Dong, Jianwei Hu ,Yanpeng Cui

With the rapid development of the information industry, the applications


of Internet of things, cloud computing and artificial intelligence have greatly
affected people's life, and the network equipment has increased with a blowout
type. At the same time, more complex network environment has also led to a
more serious network security problem. The traditional security solution
becomes inefficient in the new situation. Therefore, it is an important task for
the security industry to seek technical progress and improve the protection
detection and protection ability of the security industry. Botnets have been one
of the most important issues in many network security problems, especially in
the last one or two years, and China has become one of the most endangered
countries by botnets, thus the huge impact of botnets in the world has caused its
detection problems to reset people's attention. This paper, based on the topic of
botnet detection, focuses on the latest research achievements of botnet detection
based on machine learning technology. Firstly, it expounds the application
14

process of machine learning technology in the research of network space


security, introduces the structure characteristics of botnet, and then introduces
the machine learning in botnet detection. The security features of these solutions
and the commonly used machine learning algorithms are emphatically analyzed
and summarized. Finally, it summarizes the existing problems in the existing
solutions, and the future development direction and challenges of machine
learning technology in the research of network space security.

Botnet and P2P Botnet Detection Strategies: A Review

AUTHORS: Jitender Kumar , Himanshi Dhayal

Among various network attacks, botnet led attacks are considered as the
most serious threats. A botnet, i.e., the network of compromised computers is
able to perform large scale illegal activities such as Distributed Denial of
Service attacks, click fraud, bitcoin mining etc. These attacks are considered as
the major concern now-a-days. In this paper, we present a comprehensive
review of botnets, their lifecycle and types. We also discuss the peer-to-peer
botnet detection techniques' behaviors using various latest detection techniques.

Botnet Detection Using Recurrent Variational Autoencoder

AUTHORS: Jeeyung Kim, Alex Sim, Jinoh Kim, Kesheng Wu

Botnet detection is an active research topic as botnets are a source of


many malicious activities, including distributed denial-of-service (DDoS),
click-fraud, spamming, and crypto-mining attacks. However, it is getting more
complicated to identify botnets due to the continuous evolution of botnet
software and families that harness new types of devices and attack vectors.
Recent studies employing machine learning (ML) showed improved
performance to detect botnets to some extent, but they are still limited and
ineffective with the lack of sequential pattern analysis, which is a key to detect
15

various classes of botnets. In this paper, we propose a novel botnet detection


method, built upon Recurrent Variational Autoencoder (RVAE), that effectively
captures sequential characteristics of botnet anomalies. We validate the
feasibility of the proposed method with the CTU-13 dataset that have been
widely employed for botnet detection studies, and show that our method is at
least comparable to existing techniques in terms of detection accuracy. In
addition, our experimental results show that the proposed method can detect
previously unseen botnets by utilizing sequential patterns of network traffic. We
will also show how our method can detect botnets in the streaming mode, which
is the essential requirement to perform real-time, on-line detection.

Sonification of Network Traffic for Detecting and Learning About Botnet


Behavior

AUTHORS: Mohamed Debashi, Paul Vickers

Today's computer networks are under increasing threat from malicious


activity. Botnets (networks of remotely controlled computers, or ―bots‖)
operate in such a way that their activity superficially resembles normal network
traffic which makes their behavior hard to detect by current intrusion detection
systems (IDS). Therefore, new monitoring techniques are needed to enable
network operators to detect botnet activity quickly and in real time. Here, we
show a sonification technique using the SoNSTAR system that maps
characteristics of network traffic to a real-time soundscape enabling an operator
to hear and detect botnet activity. A case study demonstrated how using traffic
log files alongside the interactive SoNSTAR system enabled the identification
of new traffic patterns characteristic of botnet behavior and subsequently the
effective targeting and real-time detection of botnet activity by a human
operator. An experiment using the 11.39 GiB ISOT botnet data set, containing
16

labeled botnet traffic data, compared the SoNSTAR system with three leading
machine learning-based traffic classifiers in a botnet activity detection test.
SoNSTAR demonstrated greater accuracy (99.92%), precision (97.1%), and
recall (99.5%) and much lower false positive rates (0.007%) than the other
techniques. The knowledge generated about characteristic botnet behaviors
could be used in the development of future IDSs.

Analysis of Machine Learning Algorithms for IoT Botnet

AUTHOR: Umang Garg, Vaibhav Kaushik, Anushka Panwar, Neha


Gupta

The Internet of Things (IoT) gains a lot of popularity day-by-day due to


their everlasting availability and ease. As the popularity of IoT increases, it also
attracts hackers which try to take advantage of the vulnerability of IoT devices.
An Intrusion Detection System (IDS) is an intelligence-based system that can
investigate or detect the intrusion in the IoT botnet and check the state of
software and hardware executing in the network. Once the intrusion is detected,
it may generate an alarm to alert the administrator or send some alert message to
the owner. In the last decade, there are several IDSs available which can detect
the intrusion. But the major problems with the existing IDSs like accuracy rate,
generation of the false alarm, and fewer chances of detection of unknown
attacks. To deal with the above problems, some machine learning techniques
have been involved by researchers. These techniques can differentiate between
the normal and abnormal behavior of the user's data or network traffic with high
accuracy. In this paper, we summarize and classify the machine learning
algorithms that can be used in IDS with their metrics, parameters. Then, a case
study is implemented with the UNSW-NB15 dataset that has realistic network
traffic with frequently used machine learning techniques. After that, a
comparison will be done and displayed by using an accuracy percentage table
17

and a bar chart. Finally, some challenges and future scope of the machine
learning techniques in the improvement of IDS will be discussed.

AN ENHANCING FRAMEWORK FOR BOTNET


DETECTION USING GENERATIVE ADVERSARIAL NETWORKS

AUTHORS: Chuanlong Yin, Yuefei Zhu, Shengli Liu, Jinlong Fei,


Hetong Zhang

The botnet, as one of the most formidable threats to cyber security, is


often used to launch large-scale attack sabotage. How to accurately identify the
botnet, especially to improve the performance of the detection model, is a key
technical issue. In this paper, we propose a framework based on generative
adversarial networks to augment botnet detection models (Bot-GAN).
Moreover, we explore the performance of the proposed framework based on
flows. The experimental results show that Bot-GAN is suitable for augmenting
the original detection model. Compared with the original detection model, the
proposed approach improves the detection performance, and decreases the false
positive rate, which provides an effective method for improving the detection
performance. In addition, it also retains the primary characteristics of the
original detection model, which does not care about the network payload
information, and has the ability to detect novel botnets and others using
encryption or proprietary protocols.

CHAPTER 3
18

SYSTEM DESIGN

3.1 OBJECTIVE

 The main objective of malware detection is to be able to detect malware in the


system. There are two type of analysis for malware detection which are dynamic
analysis and static analysis.

 For effective and efficient detection, the uses of feature extraction are recommended
for malware detection (Ahmadi, M. et al., 2016).

 There are various type of detection method, the method that we are using will be
detecting through hex and assembly file of the malware. Feature will be extracted
from both hex view and assembly view of malware files.

 After extracting feature to its category, all category is to be combine into one feature
vector for the classifier to run on them (Ahmadi, M. et al., 2016).

 For feature selection, separating binary file into blocks to be compare the similarities
of malware binaries.This will reduce the analysis overhead which cause the process
to be faster

3.2 EXISTING SYSTEM

 Malware detection by using window api sequence and machine learning

 Detecting unknown malicious code by applying classification techniques on oppose


patterns

 Detecting scareware by mining variable length instructions sequence

 Accurate adware detection using oppose sequence extraction

 Detection of spyware by mining executable files

 Detection by using neural networks on the malware


19

Disadvantages

 Detecting unknown malicious code by applying classifications techniques on oppose


pattern

 Evaluated number of experiments and found that setting of 2 grams, TF, using 300
features selected by Df measured outperform the perform lacks ML specific
techniques

 Detecting scareware by Mining variable length instructions sequence:

 This project present the static analysis method based on data mining which extends
the general heuristic detection techniques using a variable length instructions
sequence mining approach for purpose of scareware detection but metrics specific
and unsupervised techniques un included can be broken

3.3 PROPOSED SYSTEM

 The research will cover features obtained from the application code. Three methods
of feature selection where be tested. Then, for each classifier, its selected parameters
will be tested, and with the adopted determined parameters, the classification will be
determined depending on the number of features taken into account.

 Next, the most common features in malware will be listed. The best results in terms
of the number of correctly classified instances will be compiled for the 5 tested
classifiers, along with the time of the algorithm’s operation and the time of the
preprocessing process and the extraction of features

 Machine learning can easily identify the malware in the data and datasets

 Different types of machine learning algorithms are applied such as :

 K-nearest neighbors

 Decision tree
20

 Random forest

 SVM.

 Naïve Bayes

PROPOSED ARCHITECTURE

AN
DRO
ID
MA
LW Pre-
ARE processing D
S
Trai P
ning
Training T
V
data with R
M
N
F
AN algorithm K
DRO Build model B
ID Train N Predicti
on
MA ed N Ht
(MALW
LW
ARE mod ml
ARE –
Flas Use
y/n)
Test el
Metric /jsk r
ing
data evaluatio /cfra inp
final
n ssme ut
mod
el wor
Fig:Architecture k END
FRONT
3.4 MODULES:

 Data Collection
 Data Analysis
 Data Preprocessing
 Model training
 Model testing
21

 Comparative analysis
 Flask framework prediction

MODULES - 1

Dataset :kaggle

Attributes

 name,tcp_packets,dist_port_tcp,external_ips,vulume_bytes,udp_packets,tcp_urg_p
acket,source_app_packets,remote_app_packets,source_app_bytes,remote_app_byt
es,duracion,avg_local_pkt_rate,avg_remote_pkt_rate,source_app_packets,dns_que
ry_times,type

SAMPLE DATA :

 AntiVirus;33;0;1;4502;0;0;34;29;2888;4580;NA;NA;NA;34;1;benign
 AntiVirus;95;0;9;22495;0;0;105;90;20823;23268;NA;NA;NA;105;10;benign
 AntiVirus;27;6;4;4731;0;0;32;27;3888;5134;NA;NA;NA;32;5;benign

MODULE:2

Data Analysis

 The process of preparing, cleaning, transforming, and exploring data to gain


insights and information that can be used to build and train machine learning
models.
 Plots the density plot of seven columns ('tcp_packets', 'dist_port_tcp',
'volume_bytes', 'source_app_packets', 'remote_app_packets',
'source_app_bytes', 'remote_app_bytes') using plot.density method of the
pandas dataframe.
22

MODULE:3
DATA PREPROCESSING

 Drops the duplicate rows using drop method.

 Scales the data using the RobustScaler from scikit-learn's preprocessing


module and stores the scaled data in a new dataframe called "scaledData".

 Removes the rows where the values of 'tcp_packets', 'dist_port_tcp',


'external_ips', 'volume_bytes', 'udp_packets', 'remote_app_packets' are outside
a specified range.

MODULE:4
MODEL TRAINING

Different types of machine learning algorithms are applied such as :

 K-nearest neighbors

 Decision tree

 Random forest

 SVM.

 Naïve Bayes

KNN:

k-Nearest Neighbors classifier using the KNeighborsClassifier class from scikit-learn's


neighbors module. The classifier is initialized with the following parameter:

 n_neighbors: Number of neighbors to use for prediction. In this case, the value is
set to 1.
23

 The classifier is then fit on the training data using the fit method and the trained
classifier is used to make predictions on the test data using the predict method. The
predicted class labels are stored in the pred variable.
 Decision tree:
 decision tree classifier using the DecisionTreeClassifier class from scikit-learn's
tree module. The classifier is initialized with default parameters.

 The classifier is then fit on the training data using the fit method and the trained
classifier is used to make predictions on the test data using the predict method. The
predicted class labels are stored in the pred variable.

RANDOM FOREST :

Random forest classifier using the RandomForestClassifier class from scikit-learn's


ensemble module. The classifier is initialized with the following parameters:

 n_estimators: Number of trees in the forest. The default value is 100. In this
case, the value is set to 650.
 max_depth: Maximum depth of the tree. The default value is None, which
means the nodes are expanded until all the leaves contain less than
min_samples_split samples. In this case, the value is set to 60.
 random_state: Seed for the random number generator. In this case, the value is
set to 25.
 The classifier is then fit on the training data using the fit method and the
trained classifier is used to make predictions on the test data using the predict
method.

SVM:

A linear support vector machine classifier using the LinearSVC class from scikit-
learn's svm module. The classifier is initialized with default parameters. The classifier is
then fit on the training data using the fit method.
24

NAÏVE BAYES:

 Gaussian Naive Bayes classifier using the GaussianNB class from scikit-learn's
naive_bayes module. The classifier is initialized with default parameters.
 The classifier is then fit on the training data using the fit method. Finally, the
trained classifier is used to make predictions on the test data using the predict
method. The predicted class labels are stored in the pred variable.

PREP
R Mo
BLOCK DIAGRAM
MA O del Final–
LW C Tra BEST
ML
AR E inin MOD
E ALGO
S RITHM g& FLAS
EL
Dat
S Tes K
aSe
t
I tin We
FRA
N
Dat
g b
MW PRED
G app ICTIO
a ORK
licat N
inp
ut ion
MA MAL
LW WARE
ARE NOT
DET DETEC
ECT TED
ED

Fig:Block diagram
25

3.5 FLOW DIAGRAM

3.6 Hardware requirements

 System : Pentium i3 Processor.

 Hard Disk : 500 GB.

 Monitor : 15’’ LED

 Input Devices : Keyboard, Mouse

 Ram : 2 GB
26

3.7 SOFTWARE REQUIREMENTS

 Operating system : Windows 10.

 Coding Language : Machine language .

CHAPTER 4

RESULT
27

CHAPTER 5

CONCLUSION

Our main target was to come up with a machine learning framework that generically
detects as much malware samples as it can, with the tough constraint of having a zero
false positive rate. We were very close to our goal, although we still have a non-zero false
positive rate. In order that this framework to become part of a highly competitive
commercial product, a number of deterministic exception mechanisms have to be added.
In our opinion, malware detection via machine learning will not replace the standard
detection methods used by anti-virus vendors, but will come as an addition to them. Any
commercial anti-virus product is subject to certain speed and memory limitations,
therefore the most reliable algorithms.Paper is focused on the issue of malware detection
for currently the most popular mobile system Android, using static analysis. In this thesis,
an overview of Android malware analysis was presented, and a unique set of features was
chosen that was later used in the study of malware classification. Five classification
algorithms (Random Forest, SVM, K-NN, Nave Bayes, Logistic Regression) and three
attribute selection algorithms were examined in order to choose those that would provide
the most effective malware detection. The characteristics of malicious software were
identified based on a collected set of applications. This analysis was conducted for
features extracted from Java class code. It was determined which source of features
provides higher quality of classification.

FUTURE ENHANCEMENT:
28

Ensemble Methods: Consider combining multiple classification algorithms using


ensemble methods like bagging or boosting. This can often lead to improved performance
compared to using a single algorithm. Deep Learning Approaches: Explore the use of
deep learning models like Convolutional Neural Networks (CNNs) or Recurrent Neural
Networks (RNNs) for malware detection. Deep learning models have shown promise in
various domains, including cybersecurity. Dynamic Analysis: Incorporate dynamic
analysis techniques in addition to static analysis. Dynamic analysis involves executing
code in a controlled environment to observe its behavior, which can provide valuable
insights for detecting malware.
29

References

Almin, S. B., & Chatterjee, M. (2015). A novel approach to detect Android malware.
Procedia Computer Science, 45, 407-417.

An Odusami et al. (2018, November). Android malware detection: A survey. In


International conference on applied informatics (pp. 255-266). Springer, Cham.

Arshad et al. (2016). Android malware detection & protection: a survey. International
Journal of Advanced Computer Science and Applications, 7(2), 463-475.

Assisi, A., Abhijith, A., Babu, A., & Nair, A. M. Significant permission identification
for machine learning based android malware detection: a review.

Atlas. (n.d.). Malware statistics. Retrieved Oct, 2021 from https://portal.av-


atlas.org/malware/statistics

Christiana, A., Gyunka, B., & Noah, A. (2020). Android Malware Detection through
Machine Learning Techniques: A Review.

Daoudi et al. (2021, August). Dexray: A simple, yet effective deep learning approach
to android malware detection based on image representation of bytecode. In
International Workshop on Deployable Machine Learning for Security Defense, 81-
106.

Fallah, S., & Bidgoly, A. J. (2019). Benchmarking machine learning algorithms for
android malware detection. Jordanian Journal of Computers and Information
Technology (JJCIT), 5(03).

Jiang et al. (2020). Android malware detection using fine-grained features. Scientific
Programming.
30

Kumar, R., Xiaosong, Z., Khan, R. U., Kumar, J., & Ahad, I. (2018, March). Effective
and explainable detection of android malware based on machine learning algorithms.
In Proceedings of the 2018 International Conference on Computing and Artificial
Intelligence (pp. 35-40).

Kyaw, M. T., & Kham, N. S. M. (2019). Machine Learning Based Android Malware
Detection using Significant Permission Identification (Doctoral dissertation, MERAL
Portal).

Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., & Ye, H. (2018). Significant permission
identification for machine-learning-based android malware detection. IEEE
Transactions on Industrial Informatics, 14(7), 3216-3225.

Liu et al. (2020). A review of android malware detection approaches based on


machine learning. IEEE Access, 8, 124579-124607.

Needham, M. (2021). Smartphone market share: Supply chain constraints finally catch
up to the global smartphone market, contributing to a 6.7% decline in third quarter
shipments, according to IDC. Retrieved Oct, 2021 from
https://www.idc.com/promo/smartphone- market-share/os

Rana, M. S., Gudla, C., & Sung, A. H. (2018, December). Evaluating machine
learning models for Android malware detection: A comparison study. In Proceedings
of the 2018 VII International Conference on Network, Communication and Computing
(pp. 17-21).

Rathore et al. (2020, December). Detection of malicious android applications:


Classical machine learning vs. deep neural network integrated with clustering. In
International Conference on Broadband Communications, Networks and Systems,
109-128.

Roy et al. (2020). Android Malware Detection based on Vulnerable Feature


Aggregation. Procedia Computer Science, 173, 345-353.
31

Shao, K., Xiong, Q., & Cai, Z. (2021). FB2Droid: A Novel Malware Family-Based
Bagging Algorithm for Android Malware Detection. Security and Communication
Networks.

Sharma, S., Krishna, C. R., & Kumar, R. (2020, November). Android Ransomware
Detection using Machine Learning Techniques: A Comparative Analysis on GPU and
CPU. In 2020 21st International Arab Conference on Information Technology (ACIT)
(pp. 1-6). IEEE.

Singh, D., Karpa, S., & Chawla, I. (2022). “Emerging trends in computational
intelligence to solve real-world problems” Android Malware Detection Using
Machine Learning. In the International Conference on Innovative Computing and
Communications (329-341). Springer, Singapore.

Syrris, V., &Geneiatakis, D. (2021). On machine learning effectiveness for malware


detection in Android OS using static analysis data. Journal of Information Security
and Applications, 59, 102794.

Wen, L., & Yu, H. (2017, August). An Android malware detection system based on
machine learning. In AIP Conference Proceedings (Vol. 1864, No. 1, p. 020136). AIP
Publishing LLC.

Yerima, S. Y., & Sezer, S. (2018). Droidfusion: A novel multilevel classifier fusion
approach for android malware detection. IEEE Transactions on Cybernetics, 49(2),
453-466.

You might also like