1
TABLE OF CONTENTS
CRITIQUE OF PAPER.........................................................................................2
DESCRIPTION OF DATASET..............................................................................4
RAW RESULTS OF ALL EXPERIMENTS..............................................................6
NOVELTY TO THE PAPER................................................................................25
CODE.............................................................................................................25
2
CRITIQUE OF PAPER
1. Background and Motivation:
o The paper addresses the challenge of detecting zero-day attacks using
social media data, specifically focusing on Twitter. The motivation lies in
the need for proactive threat detection, especially for emerging risks that
lack pre-existing anti-malware measures.
2. Methodology and Approach:
o The authors proposed using machine learning techniques, particularly
TensorFlow, to analyze Twitter data. Word categorization is employed to
identify vulnerabilities and counteract zero-day attacks swiftly. The
integration of the Natural Language Toolkit (NLTK) aids in extracting
targeted words across various languages. The study demonstrates an
80% success rate in detecting zero-day attacks using their tool.
o The process of data collection from Twitter is described, including the use
of a crawling procedure that mimics human behavior to bypass limitations
in Twitter's search functionality. This is a clever approach, but the paper
does not sufficiently address ethical considerations and the potential
biases introduced by this method. Discussing these aspects would
strengthen the methodology section.
3. Strengths:
o Utilizing social media data as a proactive tool for early detection and
mitigation of zero-day attacks enhances cybersecurity measures. The
deep character-level anomaly detection technique showcased efficacy in
detecting zero-day threats.
o The paper clearly outlines the use of TensorFlow and NLTK for
data processing and analysis, which is a standard and effective
approach for text-based machine learning tasks. The authors also
mention using real Twitter data, which is crucial for real-world
applicability.
3
o Highlights the potential of using publicly available information on
Twitter for early detection of zero-day attacks, which could be
valuable for security researchers and organizations.
4. Limitations and Considerations:
o The study focuses solely on Twitter data, which may not cover all social
media platforms.
o The effectiveness of the NLTK-based word extraction method may vary
across different languages and contexts.
o The success rate of 80% leaves room for improvement, and false
positives/negatives should be carefully evaluated.
o Limitations in the capabilities of the crawlers, such as:
- Unable to refresh at a higher frequency than once per second
- Unable to perform real-time searches
- Unable to directly target specific individuals or groups on Twitter
- Reliance on Twitter's built-in search functionality, which restricted the
crawlers' ability to operate at higher speeds or target specific entities -
Need for more robust computational resources and advanced crawling
techniques to improve the efficiency and effectiveness of data collection in
future research.
- The zero-day detection mechanism presented in this paper use total
linear data set having only one attribute that is key word “zero day” the
more sophisticated ATP use advance tactics, techniques and procedure
not consider in data set used to train the model moreover model is totally
dependent on the keyword search like zero day and similar it is not
detection any zero day on its behavior tactics techniques and procedure
so it is a very wavered approach for detection.
4
DESCRIPTION OF DATASET
1. Dataset Creation:
o The research aimed to generate a dataset for their model, specifically
focusing on social media data.
o They selected Twitter as the social media platform for data collection.
2. Data Collection Approaches:
o To create the dataset, the following approaches were used:
Crawling Procedure Implementation:
Robots were programmed to mimic human behavior on
Twitter.
These robots operated web browsers in a shadow mode,
navigating Twitter as if they were human users.
Data Extraction:
The robots browsed through Twitter, identifying relevant data
related to specific keywords (e.g., “zero day”).
Extracted data was stored in a database for further
processing.
Human-like Scrolling Behavior:
The crawling procedure replicated typical human scrolling
behavior on Twitter.
Unlike the platform’s interface, which limits scrolling based
on scroll-down count, the robots could scroll indefinitely.
This ensured comprehensive data collection.
Consideration of Twitter’s Response Time:
The study accounted for Twitter’s response time during data
collection.
Understanding and optimizing the crawling process based
on response time were emphasized.
3. Missing Items from Dataset
5
o Size of the dataset: The paper doesn't mention the number of tweets
collected.
o Time period of data collection: It's unclear when the tweets were
collected.
o Specific keywords used: The exact keywords or search queries used to
gather tweets related to zero-day attacks aren't provided.
o Data labeling: The paper mentions manual intervention for handling
certain cases, implying some level of human labeling for training and/or
evaluating the model. However, the labeling process and the inter-rater
reliability (if multiple annotators were involved) aren't discussed.
4. A graph was created to illustrate the relationship between the model's
performance and Twitter's response time. This visual representation helped in
understanding how the model's effectiveness was influenced by the time it took
for Twitter to respond to requests.
6
RAW RESULTS OF ALL EXPERIMENTS
Using Tensor Flow for Zero Day attack Detection
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
NOVELTY TO THE PAPER
Instead of relying solely on keyword-based search for data collection, the authors could
explore more sophisticated methods like Combining multiple machine learning
models (e.g., CNN, RNN, Transformer) to improve detection accuracy and robustness
and incorporating techniques to provide insights into why the model classifies certain
tweets as indicative of zero-day attacks, enhancing trust and interpretability. Moreover,
analyzing user metadata (e.g., account age, follower/following ratio, tweet history) to
identify potentially malicious accounts spreading zero-day information can also be
employed\ along with use of computer vision techniques to analyze any content for
potential zero-day information. Methodologies can be explored to integrate the zero-day
detection system with existing security information and event management tools to
provide actionable alerts and facilitate rapid response.
CODE
Code file is submitted along with this paper as a jupyter notebook.