0% found this document useful (0 votes)

277 views17 pages

Malware Detection

This document discusses requirements for a machine learning-based malware detection system. It outlines both hardware and software requirements, including using Python 3.6 and Anaconda on a system with at least 8GB RAM and 80GB storage. Functional requirements include employing machine learning techniques like Naive Bayes, SVM, J48 and Random Forest for detection and generating a confusion matrix to evaluate performance. Non-functional requirements specify the system should be self-contained, available 100% of the time, and optimized for scalability, maintainability and load balancing. The literature survey will analyze previous research on machine learning for malware detection.

Uploaded by

Sanjana.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

277 views17 pages

Malware Detection

Uploaded by

Sanjana.S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

ABSTRACT

One of the most significant issues facing internet users nowadays is malware. Malware is any
software intentionally designed to cause damage to a computer, server, client, or computer
network. A wide variety of malware types exist, including computer viruses, worms, Trojan
horses, ransomware, spyware, adware, rogue software, wiper and scareware. Polymorphic
malware is a new type of malicious software that is more adaptable than previous generations
of viruses. Polymorphic malware constantly modifies its signature traits to avoid being
identified by traditional signature-based malware detection models. To identify malicious
threats or malware, we will use a number of machine learning techniques. Machine learning
algorithms can be used to detect malware by identifying its behaviour and other
characteristics. The proposed approach is based on computing the difference in correlation
symmetry integrals. Which demonstrates that machine learning algorithms can be used to
effectively detect malware, even polymorphic malware. This is good news for internet users,
as it can help to improve the security of computer systems and networks..

vii
CONTENTS

Chapter Description Page No.

No.
1 Introduction 1

2 Requirements Specification 5

2.1 Software Requirements 9

2.2 Hardware Requirements 9

2.3 Functional Requirements
2.4 Non-Functional Requirements
3 Literature Survey 5

3.1 Existing System

3.2 Drawbacks

3.3 Proposed System

3.4 Working

4 PROBLEM STATEMENT 10

5 OBJECTIVES

6 METHODOLOGIES

7 FUNCTIONAL MODULES
8
CONCLUSION 16

References 17

vii
LIST OF FIGURES

FigureNo. Description Page No.

4.1 System Architecture

vii
LIST OF TABLES

Tableno. Description Page no.

4.1
4.2

4.3

5.1

vii
Project Title

Chapter 1
INTRODUCTION

Malware is a major threat to the security of computer systems and networks. Cyberattacks are
currently the most pressing concern in the realm of modern technology. The word implies
exploiting a system’s flaws for malicious purposes, such as stealing from it, changing it, or
destroying it. Malware is an example of a cyberattack. Malware is any program or set of
instructions that is designed to harm a computer, user, business, or computer system. The term
“malware” encompasses a wide range of threats, including viruses, Trojan horses,
ransomware, spyware, adware, rogue software, wipers, scareware, and so on. Malicious
software, by definition, is any piece of code that is run without the user’s knowledge or
consent. Traditional signature-based malware detection methods are becoming increasingly
ineffective against new and emerging malware strains. Machine learning (ML) algorithms
have the potential to overcome these limitations by detecting malware based on its behaviour
and other characteristics. Both static and dynamic learning methods may be used to identify
behavioral similarities between members of the same family of malware. Unlike static
analysis, which examines dangerous files’ contents without actually running them, dynamic
analysis takes their behavior into account by tracking data flows, recording function calls, and
adding monitoring code to dynamic binaries. Machine learning algorithms may leverage such
static and behavioral artefacts to describe the ever-evolving structure of contemporary
Symmetry 2022, 14, 2304 3 of 11 malware, allowing them to identify increasingly complex
malware assaults that could otherwise avoid detection using signature-based techniques. As
machine learning-based solutions do not rely on signatures, they are more successful against
newly released malware. Deep learning algorithms that can perform feature engineering on
their own can be used to obtain and represent features more accurately. Our synopsis a
comprehensive survey of ML algorithms for malware analysis and detection. We here discuss
the different types of ML algorithms that can be used for malware detection, as well as the
different features that can be extracted from malware samples for classification. We also
review the state-of-the-art ML-based malware detection systems and their performance.

Dept. of CSE,SJCIT 2023-2024 Page 1

Project Title

Chapter 2
REQUIREMENT SPECIFICATIONS
System Requirement Specification (SRS) is a central report, which frames the
establishment of the product advancement process. It records the necessities of a framework
as well as has a depiction of its significant highlight. An SRS is essentially an association's
seeing (in composing) of a client or potential customer's frame work necessities and
conditions at a specific point in time (generally) before any genuine configuration or
improvement work. It's a two-way protection approach that guarantees that both the customer
and the association comprehend alternate's necessities from that viewpoint at a given point in
time.The SRS talks about the item however not the venture that created it, consequently the
SRS serves as a premise for later improvement of the completed item. The SRS may need to
be changed, however it does give an establishment to proceed with creation assessment. In
straightforward words, programming necessity determination is the beginning stage of the
product improvement action.

The SRS means deciphering the thoughts in the brains of the customers – the information, into
a formal archive – the yield of the prerequisite stage. Subsequently the yield of the stage is a
situated of formally determined necessities, which ideally are finished and steady, while the
data has none of these properties.

2.1 Hardware Requirements

• Processor Type : Intel CoreTM– i5
• Speed : 2.4 GHZ

• RAM :8 GB RAM

• Hard disk : 80 GB HDD

2.2 Software Requirements

• Operating System : Windows 64-bit

• Technology : Python

• IDE : PythonIDLE

• Tools : Anaconda

• Python Version : Python 3.6

SJCIT 2023-24 Page 2

Project Title

2.3 Functional Requirements

1.Malware Detection Algorithm: The system must employ machine learning

techniques for malware detection, including Naive Bayes, SVM, J48, RF, and a
proposed approach.
2.High Detection Accuracy: The selected algorithm must achieve a high
detection ratio, ensuring accurate identification of malicious threats.
3. Confusion Matrix: The system should generate a confusion matrix to
measure false positives and false negatives, providing additional performance
insights.
4. Comparison of Classifiers: The system must compare the performance of
DT, CNN, and SVM algorithms in terms of detection accuracy, particularly on
a small False Positive Rate (FPR).

2.4 Non-Functional Requirements

These are requirements that are not functional in nature, that is, these are constraints
within which the system must work.
The program must be self-contained so that it can easily be moved from one Computer
to another. It is assumed that network connection will be available on the computer on
which the program resides.
Capacity, scalability and availability.
The system shall achieve 100 per cent availability at all times.
The system shall be scalable to support additional clients and volunteers.
Maintainability.
The system should be optimized for supportability, or ease of maintenance as far as
possible. This may be achieved through the use documentation of coding standards,
naming conventions, class libraries and abstraction.
Randomness, verifiability and load balancing.
The system should be optimized for supportability, or ease of maintenance as far as
possible. This may be achieved through the use documentation of coding standards,
naming conventions, class libraries and abstraction. It should have randomness to
check the nodes and should be load balanced.

SJCIT 2023-24 Page 3

Project Title

Chapter 3
LITERATURE SURVEY
A literature survey or a literature review in a project report shows the various
analyses and research made in the field of interest and the results already published, taking
into account the various parameters of the project and the extent of the project. Literature
survey is mainly carried out in order to analyze the background of the current project which
helps to find out flaws in the existing system & guides on which unsolved problems we can
work out. So, the following topics not only illustrate the background of the project but also
uncover the problems and flaws which motivated to propose solutions and work on this
project.

A literature survey is a text of a scholarly paper, which includes the current

knowledge including substantive findings, as well as theoretical and methodological
contributions to a particular topic. Literature reviews use secondary sources, and do not
report new or original experimental work. Most often associated with academic-oriented
literature, such as a thesis, dissertation or a peer-reviewed journal article, a literature
review usually precedes the methodology and results sectional though this is not always
the case. Literature reviews are also common in are search proposal or prospectus (the
document that is approved before a student formally begins a dissertation or thesis). Its
main goals are to situate the current study within the body of literature and to provide
context for the particular reader. Literature reviews are a basis for researching nearly every
academic field. demic field. A literature survey includes the following:
• Existing theories about the topic which are accepted universally.

• Books written on the topic, both generic and specific.

• Research done in the field usually in the order of oldest to latest.

• Challenges being faced and on-going work, if available.

Literature survey describes about the existing work on the given project. It deals with the
problem associated with the existing system and also gives user a clear knowledge on how to
deal with the existing problems and how to provide solution to the existing problems.
Objectives of Literature Survey
• Learning the definitions of the concepts.

• Access to latest approaches, methods and theories.

• Discovering research topics based on the existing research

SJCIT 2023-24 Page 4
Project Title

• Concentrate on your own field of expertise– Even if another field uses the same
words, they usually mean completely.
• It improves the quality of the literature survey to exclude sidetracks– Remember to
explicate what is excluded.
Before building our application, the following system is taken into consideration:

Malware Analysis and Detection Using Machine Learning Algorithms, Muhammad

Shoaib Akhtar and Tao Feng Malware is a major threat to the security of computer systems
and networks. Traditional signature-based malware detection methods are becoming
increasingly ineffective against new and emerging malware strains. Machine learning (ML)
algorithms have the potential to overcome these limitations by detecting malware based on its
behaviour and other characteristics. This paper presents a comprehensive survey of ML
algorithms for malware analysis and detection. The authors discuss the different types of ML
algorithms that can be used for malware detection, as well as the different features that can be
extracted from malware samples for classification. They also review the state-of-the-art ML-
based malware detection systems and their performance. The authors conclude that ML
algorithms are a promising approach for malware detection. However, they also highlight
some of the challenges that need to be addressed before MLbased malware detection systems
can be widely deployed. These challenges include the need for large and well-labelled
malware datasets, the need to develop ML algorithms that are robust to adversarial attacks, and
the need to develop ML algorithms that can be deployed in real time. Overall, this paper
provides a valuable overview of ML algorithms for malware analysis and detection. It is a
must-read for anyone interested in this area of research.
A state-of-the-art survey of malware detection approaches using data mining techniques,
Alireza Souri and Rahil Hosseini Malware detection is a challenging task, especially in the
face of new and emerging malware strains. Data mining techniques have the potential to
overcome the limitations of traditional malware detection methods by detecting malware based
on its behaviour and other characteristics. This paper presents a state-of-the-art survey of
malware detection approaches using data mining techniques. The authors discuss the different
types of malware detection approaches, as well as the different data mining techniques that can
be used for malware detection. They also review the state-of-the-art malware detection systems
and their performance. The authors conclude that data mining techniques are a promising
approach for malware detection. However, they also highlight some of the challenges that need
to be addressed before data mining-based malware detection systems can be widely deployed.
These challenges include the need for large and well-labelled malware datasets, the need to
SJCIT 2023-24 Page 5
Project Title

develop data mining algorithms that are robust to adversarial attacks, and the need to develop
data mining algorithms that can be deployed in real time. Overall, this paper provides a
valuable overview of data mining techniques for malware detection. It is a must-read for
anyone interested in this area of research

3.1 Existing system

A high-performance malware detection system using deep learning and feature selection
methodologies is introduced. Two different malware datasets are used to detect malware and
differentiate it from benign activities. The datasets are preprocessed, and then correlation-
based feature selection is applied to produce different feature-selected datasets. The dense and
LSTM-based deep learning models are then trained using these different versions of feature-
selected datasets.
Techniques Used:
In LSTM technique is used.
3.2 Draw backs of Existing System:
Due to the deep learning architecture it consumes more training time.
Accuracy is less than 90%.
It is suitable to detect attack is there or not, which is not suitable to detect different types of
malware.

3.3 Proposed system

The approach used in this project aims to use a multi classifier to detect and classify malware.
Malware classification is approached using two techniques of binary and multi-class problems.
The binary classification includes the differentiation between malicious and benign classes
whereas the multi-classification includes classifying the malicious malware into Virus, Trojan,
Spyware, Worms, Ransomware, and Adware type. Supervised learning approach and machine
learning models like Random Forest model, Decision tree model, Support vector machine
model, Naïve Bayes model, and K-Nearest Neighbour model is used for the classification of
malware. The results show that Random Forest performs well in terms of Binary classification
and the multi-classification problem with an accuracy of 95% and 91% respectively.
Advantages:
 Less time consumption of Implementation
 Accuracy is above 90%.
 It is used to detect types of attacks also.

SJCIT 2023-24 Page 6

Project Title

3.4 Working

The main objective of this malware detection is efficiently detects the different types of
attacks.

SJCIT 2023-24 Page 7

Project Title

CHAPTER-5
OBJECTIVES

 To investigate on how to implement machine learning to malware detection in order to

detection unknown malware.
 To develop a malware detection software that implement machine learning to detect
unknown malware.
 To validate that malware detection that implement machine learning will be able to
achieve a high accuracy rate with low false positive rate.

 To effectively Detecting malware in specific types of files, such as executable files,

PDFs, or images.

SJCIT 2023-24 Page 8

Project Title

Chapter 4
PROBLEM STATEMENT

The term malware is a contraction of malicious software. Put simply, malware is any piece of software
that was written with the intent of doing harm to data, devices or to people. The major part of
protecting a computer system from a malware attack is to identify whether a given piece of
file/software is a malware and identify its family class. Malware detection and classification techniques
are two separate tasks which are performed by anti-malware and cybersecurity companies. Once
detected, malware needs to be categorized into a specific family for further analysis.
Basically, classification accuracy and classification time tend to be comparative. For example,
DT can be classified quickly by using a single tree, but classification performance may be
degraded by instability and dependency on a particular set of features. On the other hand,
random forest (RF) classifies more accurately than DT for most cases, but the classifying
speed is much slower that DT. Thus, if we take advantage of the fast but less accurate
classifier and the slow but more accurate classifier, we can develop a fast and accurate
classifier.

SJCIT 2023-24 Page 9

Project Title

CHAPTER-6
METHODODLOGY

Figure 1: Flowchart
The system architecture illustrate the workflow process from start to finish..

SJCIT 2023-24 Page 10

Project Title

CHAPTER-7
FUNCTIONAL MODULES
7.1 Dataset
This study relied entirely on data provided by the Canadian Institute for Cybersecurity. The
collection has many data files that include log data for various types of malware. These
recovered log features may be used to train a broad variety of models. Approximate 51 distinct
malware families were found in the samples. More than 17,394 data points from different
locations were included; the dataset had 279 columns and 17,394 rows.
7.2. Pre-Processing
Data were stored in the file system as binary code, and the files themselves were unprocessed
executables. We prepared them in advance of our research. Unpacking the executables
required a protected environment, or virtual machine (VM). PEiD software automated
unpacking of compressed executables.
7.3. Features Extraction
Twentieth-century datasets frequently contain tens of thousands of features. In recent years, as
feature counts have grown, it has become clear that the resultant machine learning model has
been overfit. To address this problem, we built a smaller set of features from a larger set; this
technique is commonly used to maintain the same degree of accuracy while using fewer
features. The goal of this study was to refine the existing dataset of dynamic and static features
by keeping those that were most helpful and eliminating those that were not valuable for data
analysis.
7.4. Features Selection
After completing feature extraction, which involved the discovery of more features, feature
selection was performed. Feature selection was a crucial process for enhancing accuracy,
simplifying the model, and reducing overfitting, as it involved choosing features from a pool
of newly recognised qualities. Researchers have used many feature classification strategies in
the past in an effort to identify dangerous code in software.
7. 5. Results and Discussion
The two main phases of the classification process were training and testing. To train a system,
it was sent both harmful and safe files . Automated classifiers were taught using a learning
algorithm. Each classifier (KNN, CNN, NB, RF, SVM, or DT) became smarter with each set
of data it annotated. In the testing phase, a classifier was sent a collection of new files, some
harmful and some not; the classifier determined whether the files were malicious or clean.
SJCIT 2023-24 Page 11
Project Title

Chapter 8
CONCLUSION
This project demonstrates that academics have recently shown a growing interest in ML
algorithm solutions for malware identification. We presented a protective mechanism that
evaluated three ML algorithm approaches to malware detection and chose the most
appropriate one. The results show that compared with other classifiers, DT (99%), LR
(98.76%), and SVM (96.41%) performed well in terms of detection accuracy.

SJCIT 2023-24 Page 12

Project Title

REFERENCES

[1] Akhtar, M.S.; Feng, T, “Malware Analysis and Detection Using Machine Learning
Algorithms (2022)”, DOI:10.3390/sym14112304.
[2] Akshit Kamboj, Priyanshu Kumar, Amit Kumar Bairwa , “Detection of malware in
downloaded files using various machine learning models (2022)”,DOI:
https://doi.org/10.1016/j.eij.2022.12.002.
[3] Raj Sinha, “Study Of Malware Detection Using Machine Learning”, DOI:
10.13140/RG.2.2.11478.16963.
[4] Souri, Hosseini Hum, Cent. Comput. Inf. Sci., “A State-Of-The-Art Survey Of Malware
Detection Approaches Using Datamining Techniques(2018) ”,DOI:org/10.1186/s13673.

SJCIT 2023-24 Page 13

Password Manager Project Report
100% (1)
Password Manager Project Report
27 pages
Project Report of The Virus Detector Final
No ratings yet
Project Report of The Virus Detector Final
50 pages
Matrix-Vector Multiplication Using MapReduce in Big Data.
No ratings yet
Matrix-Vector Multiplication Using MapReduce in Big Data.
4 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
40 pages
Black Book For Multimedia ChatBot
No ratings yet
Black Book For Multimedia ChatBot
60 pages
Malware Detection for CS Students
No ratings yet
Malware Detection for CS Students
30 pages
HT-T09 - Practical Malware Analysis Essentials For Incident Responders
No ratings yet
HT-T09 - Practical Malware Analysis Essentials For Incident Responders
44 pages
Final Synposis
No ratings yet
Final Synposis
10 pages
Phishing Website Detection
No ratings yet
Phishing Website Detection
62 pages
Beyond Binary Classification
No ratings yet
Beyond Binary Classification
34 pages
AI and ML Techniques For Cyber Security
No ratings yet
AI and ML Techniques For Cyber Security
8 pages
Mini Project Phishing Website Detection Using ML
No ratings yet
Mini Project Phishing Website Detection Using ML
45 pages
Soft Computing Techniques
No ratings yet
Soft Computing Techniques
48 pages
Secure Programming & Malicious Code
No ratings yet
Secure Programming & Malicious Code
20 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
Malicious URL Detection with ML
No ratings yet
Malicious URL Detection with ML
52 pages
Android Malware Detection via ML
No ratings yet
Android Malware Detection via ML
55 pages
Python Scripting & Libraries Overview
100% (1)
Python Scripting & Libraries Overview
15 pages
Dynamic Malware Analysis Using Cuckoo Sandbox
No ratings yet
Dynamic Malware Analysis Using Cuckoo Sandbox
5 pages
Soft Computing vs Hard Computing
No ratings yet
Soft Computing vs Hard Computing
60 pages
Password Manager With Multi Factor Authentication: Under The Guidence OF
No ratings yet
Password Manager With Multi Factor Authentication: Under The Guidence OF
29 pages
Unit 1 MWS
No ratings yet
Unit 1 MWS
22 pages
Python Project Titles 2022-2023
No ratings yet
Python Project Titles 2022-2023
18 pages
Literature Review On Malware and Its Analysis
No ratings yet
Literature Review On Malware and Its Analysis
13 pages
Malwarepjct PDF
No ratings yet
Malwarepjct PDF
70 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
5 pages
Detection of Cyber Attacks Using Ai
No ratings yet
Detection of Cyber Attacks Using Ai
92 pages
Malware Detection with ML Analysis
No ratings yet
Malware Detection with ML Analysis
5 pages
Detection of Phishing WebsitesUsing Random Forest and XGBOOST
No ratings yet
Detection of Phishing WebsitesUsing Random Forest and XGBOOST
14 pages
Malware Classification & Damages
No ratings yet
Malware Classification & Damages
7 pages
Web Phishing Detection Report
No ratings yet
Web Phishing Detection Report
83 pages
PHISHING WEBSITE DETECTION USING MACHINE LEARNING - COMPLETED (1) Full
No ratings yet
PHISHING WEBSITE DETECTION USING MACHINE LEARNING - COMPLETED (1) Full
73 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Malware Detection
No ratings yet
Malware Detection
38 pages
Malware Forensics Introduction
No ratings yet
Malware Forensics Introduction
16 pages
Twitter Spam Detection Methods
No ratings yet
Twitter Spam Detection Methods
45 pages
Android Malware Detection Using Machine Learning
No ratings yet
Android Malware Detection Using Machine Learning
4 pages
Wannacry Case Study
No ratings yet
Wannacry Case Study
1 page
Malware Detection
No ratings yet
Malware Detection
37 pages
YouTube Transcript Summarizer
No ratings yet
YouTube Transcript Summarizer
62 pages
Introduction to Cyber Security
No ratings yet
Introduction to Cyber Security
13 pages
Detection of Phishing URLs Using Machine Learning
No ratings yet
Detection of Phishing URLs Using Machine Learning
6 pages
Comprehensive Review On CNN-based Malware Detection With Hybrid Optimization Algorithm
No ratings yet
Comprehensive Review On CNN-based Malware Detection With Hybrid Optimization Algorithm
13 pages
Machine Learning Detection
No ratings yet
Machine Learning Detection
13 pages
21CSC310J Malware Analysis Lab Manual
No ratings yet
21CSC310J Malware Analysis Lab Manual
56 pages
20 - Ransomware Detection Using Machine Learning - A Survey
No ratings yet
20 - Ransomware Detection Using Machine Learning - A Survey
24 pages
Face Detection & Emotion Recognition
No ratings yet
Face Detection & Emotion Recognition
26 pages
#1 Hashing
No ratings yet
#1 Hashing
22 pages
PYTHON 2021-22 Projects List
No ratings yet
PYTHON 2021-22 Projects List
9 pages
Abstract On The Artificial Intelegence
No ratings yet
Abstract On The Artificial Intelegence
15 pages
Text Summarization Using NLP
No ratings yet
Text Summarization Using NLP
6 pages
Major Project Report BIG MART Final Reedited
No ratings yet
Major Project Report BIG MART Final Reedited
91 pages
Unit-1 CS-503 Cyber Security
No ratings yet
Unit-1 CS-503 Cyber Security
74 pages
Detecting Malware in Portable Executable Files Using Machine Learning Approach
No ratings yet
Detecting Malware in Portable Executable Files Using Machine Learning Approach
7 pages
Project Report Final
No ratings yet
Project Report Final
39 pages
Spam Detection Using Machine Learning
No ratings yet
Spam Detection Using Machine Learning
4 pages
Malware Detection and Evasion With Machine Learning Techniques: A Survey
No ratings yet
Malware Detection and Evasion With Machine Learning Techniques: A Survey
9 pages
Deep Learning Based Car Damage Detection, Classification and Severity
No ratings yet
Deep Learning Based Car Damage Detection, Classification and Severity
7 pages
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
No ratings yet
Detection of Fake Online Reviews Using Semi Supervised and Supervised Learning
4 pages
Militant and Weapon Detection Final Report
No ratings yet
Militant and Weapon Detection Final Report
63 pages