Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
22 views43 pages

Web Vulnerability

The document presents a term paper on the development of an AI-powered website vulnerability scanner utilizing Google Gemini AI for enhanced detection and remediation of web application vulnerabilities. It aims to address the limitations of traditional scanners by integrating AI for exploitability ranking and tailored mitigation strategies, while aligning with the OWASP Top 10 framework. The study emphasizes the need for intelligent tools that streamline vulnerability management and improve overall application security.

Uploaded by

marambhuvan2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views43 pages

Web Vulnerability

The document presents a term paper on the development of an AI-powered website vulnerability scanner utilizing Google Gemini AI for enhanced detection and remediation of web application vulnerabilities. It aims to address the limitations of traditional scanners by integrating AI for exploitability ranking and tailored mitigation strategies, while aligning with the OWASP Top 10 framework. The study emphasizes the need for intelligent tools that streamline vulnerability management and improve overall application security.

Uploaded by

marambhuvan2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

AI-Powered Website Vulnerability Scanner With

Exploitability Ranking And Mitigations Using Gemini-AI

A Term Paper report Submitted in partial fulfillment of the requirements for the
award of degree of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE AND ENGINEERING

by

Bommareddy Keerthana Reddy (2200030485)


Kondamadugula Hemanth Reddy (2200030094)
Pathuri Manohar (2200030503)
Maram Bhuvanesh(2200030677)

Under the supervision of


Janjhyam Venkata Naga Ramesh
Assistant Professor, Department of
CSE

[i]
KONERU LAKSHMAIAH EDUCATION FOUNDATION

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING

Declaration
The Term Paper Report entitled "AI-Powered Website Vulnerability Scanner with
Exploitability Ranking and Mitigations using Gemini AI" is a record of bon-a- fide work of
Bommareddy Keerthana Reddy(2200030485), Kondamadugula Hemanth Reddy(2200030094),
Pathuri Manohar(2200030503), Maram Bhuvanesh(2200030677) submitted in partial fulfillment
for the award of Bachelor of Technology in Computer Science and Engineering to K L Deemed to
be a University during the academic year 2024-25.
We also declare that this report is of our own effort and it has not been submitted to any
other university for the award of any degree.

B. Keerthana Reddy 2200030485


K. Hemanth Reddy 2200030094
P. Manohar 2200030503
M. Bhuvanesh 2200030677

[ii]
KONERU LAKSHMAIAH EDUCATION FOUNDATION

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Certificate

This is to certify that the Term Paper Report entitled “AI-Powered Website Vulnerability
Scanner with Exploitability Ranking and Mitigations using Gemini AI” is being submitted by
Bommareddy Keerthana Reddy(2200030485), Kondamadugula Hemanth
Reddy(2200030094), Pathuri Manohar(2200030503), Maram Bhuvanesh(2200030677) in
partial fulfillment for the award of Bachelor of Technology in Computer Science and Engineering
to K L Deemed to be a University during the academic year 2024-25.

Signatureof the Supervisor Term Paper Coordinator

Janjhyam Venkata Naga Ramesh


Assistant Professor, Department of CSE

Signature of the HOD Signature of the External Examiner

[iii]
ACKNOWLEDGEMENTS

It is great pleasure for me to express my gratitude to our Hon’ble Chancellor Sri. Koneru
Satyanarayana, for giving the opportunity and platform with facilities in accomplishing the
term paper course successfully.

I express the sincere gratitude to our principal Dr. T. K. RamaKrishna Rao for his
administration towards our academic growth.

I record it as my privilege to deeply thank our pioneer’s Prof. V. Hari kiran, Dean (Addl)
Academics and Dr. A. Senthil, HOD, CSE Department for providing us the efficient faculty and
facilities to make our ideas into reality.

We express our sincere thanks to our project supervisor Janjhyam Venkata Naga Ramesh for
her/his novel association of ideas, encouragement, appreciation, and intellectual zeal which
motivated us to Complete this term paper work successfully.

Finally, it is pleased to acknowledge the indebtedness to all those who devoted themselves
directly or indirectly to make this the term paper success.

Internship Project Associate


Name Student ID
Bommareddy Keerthana Reddy 2200030485
Kondamadugula Hemanth Reddy 2200030094
Pathuri Manohar 2200030503
Maram Bhuvanesh 2200030677

[iv]
ABSTRACT:

Cyber This paper presents the design and implementation of an AI-powered website
vulnerability scanner that utilizes Google Gemini AI to enhance the detection, ranking, and
remediation of web application vulnerabilities. The proposed system integrates conventional
security scanning techniques—such as static and dynamic analysis—with the advanced
reasoning capabilities of a large language model (LLM) to assess the exploitability of identified
vulnerabilities. It maps the results to the OWASP Top 10 framework, ensuring alignment with
industry standards. By employing AI-driven contextual analysis, the scanner not only prioritizes
security issues based on their severity and potential impact but also provides developers with
clear, tailored mitigation strategies. Furthermore, the system generates comprehensive, real-time
reports to support rapid response and informed decision-making. The goal is to empower
developers and security teams with intelligent tools that streamline the vulnerability management
process, reduce manual effort, and significantly enhance overall application security.

Keywords— Web Application Security, Vulnerability Scanner, Google Gemini AI, Large
Language Model (LLM), OWASP Top 10, Exploitability Ranking, Static and Dynamic
Analysis, Real-time Reporting, Mitigation Strategies, Application Vulnerability Detection.

[v]
LIST OF FIGURES

Figure number Figure Name Page Number


4.1 Architecture Diagram 23
61 User interface for submitting 30
potential threat indicators
(IOCs) to the Cyber Threat
Intelligence platform.
6.2 Admin panel showing submitted 30
threat hashes, their descriptions,
and timestamps in reverse
chronological order.
6.3 Dashboard displaying total 31
validated threats, latest IOC
hash, and most recent threat
description
6.4 Positive classification result with 31
mapped MITRE ATT&CK
technique and 100% model
confidence.
6.5 Negative classification output 31
showing the IOC was not
validated, with a low model
confidence of 20%.
6.6 Performance Comparison: 32
Baseline vs Proposed CTI
Platform.
6.7 Threat Classification 32
Confidence Distribution
among 100 IOCs.

[vi]
TABLE OF CONTENTS

Chapter number Tittle Page Number (Tentative)


1 Introduction 8-10
2 Literature Review 11-16
3 Project Proposal 17-21
4 Design Initializations 22-25
5 Experimental Investigation 26-27
6 Result and Analysis 28-32
7 Conclusion 33-34
8 Future Work 34
9 References 35-36
Chapter 1: Introduction

1.1 Objectives of the Study


With the growing reliance on web applications across various industries, the importance of robust
cybersecurity practices has become critical. As these applications handle sensitive data and critical functions,
they are increasingly targeted by malicious actors exploiting security flaws such as SQL injection (SQLi), cross-
site scripting (XSS), and broken authentication. Traditional web vulnerability scanners are effective at detecting
such issues but lack intelligent mechanisms for understanding the real-world impact and prioritizing them. The
emergence of AI and large language models (LLMs), such as Google Gemini AI, presents a new opportunity to
enhance existing tools with risk assessment, contextual analysis, and tailored mitigation guidance.

1.2 Motivation

The motivation behind this project is to address the limitations of existing vulnerability scanners that focus
only on detection without offering deeper insights into the
severity or risk prioritization of vulnerabilities. Developers often struggle with understanding which threats to
resolve first and how to mitigate them effectively.
By leveraging Gemini AI, we aim to introduce an intelligent system that not only detects vulnerabilities but
also evaluates their exploitability and suggests actionable
remediation steps. This AI-powered approach enhances decision-making, reduces manual effort, and
improves overall application security posture.
1.3 Problem Statement
Existing web vulnerability scanners lack advanced exploitability assessment and remediation
support. While they detect common vulnerabilities, they do not prioritize them based on
potential impact, nor do they provide detailed mitigation strategies. This results in inefficient
vulnerability management, increased exposure to cyberattacks, and longer remediation cycles.
There is a clear need for an intelligent, automated system that bridges this gap by combining
traditional scanning techniques with AI-driven analysis.

1.4 Objectives of the Study


The main objectives of this study are:

 To develop an AI-powered web vulnerability scanner that integrates traditional scanning


techniques with Google Gemini AI.

 To classify and prioritize vulnerabilities based on exploitability using LLM-based reasoning.

 To map vulnerabilities to the OWASP Top 10 framework for standardized risk categorization.

 To generate real-time, detailed reports including mitigation strategies for each vulnerability.

 To provide an intuitive user interface for developers to interact with scan results and
download reports in preferred formats.
[x]
Chapter 2: Literature Review

[xi]
2.1 Review of Existing Work
Over the years, researchers have developed various approaches to identify and mitigate web
application vulnerabilities. Some of the most influential works are outlined below:

1. AMNESIA (Analysis and Monitoring for NEutralizing SQL-Injection Attacks)

Developed by Halfond and Orso, AMNESIA is a tool specifically designed to detect and prevent
SQL injection attacks. It combines static code analysis to identify SQL query generation points
and integrates runtime monitoring to ensure that the executed queries match expected safe
patterns. This hybrid approach offers accurate detection and real-time prevention. However,
AMNESIA is limited to SQLi vulnerabilities and lacks the capability to assess other types of
web-based threats or prioritize them by severity.

2. Signature-Based Detection by Sunitha and Sridevi

Sunitha and Sridevi introduced a method that leverages pattern matching techniques to detect SQL
injection attacks. Their approach builds a database of known attack signatures and analyzes code
for matches. While this method is fast and easy to implement, it struggles with detecting new or
obfuscated attacks (zero-day vulnerabilities). Moreover, it does not provide contextual
information or exploitability ranking.

3. OWASP ZAP (Zed Attack Proxy)

OWASP ZAP is an open-source penetration testing tool maintained by the OWASP community. It
performs automated scanning, passive scanning, and even manual testing. ZAP identifies a wide
range of issues including SQLi, XSS, and more. While ZAP is comprehensive, it lacks an AI
component for prioritizing threats or generating intelligent mitigation strategies.

4. Burp Suite

Burp Suite is a popular commercial tool for web vulnerability scanning, offering both automated
and manual testing. It provides detailed reports and payload injection features. However, the
tool is more suitable for experienced penetration testers and lacks AI-driven risk evaluation or
mitigation suggestions tailored to the application context.

5. Acunetix

Acunetix is a fully automated web vulnerability scanner designed to detect a broad range of security
flaws. It integrates with CI/CD pipelines and provides risk scores. However, its proprietary
nature and lack of customizable AI features limit its use in research or educational
settings.
Key Observations

The analysis of existing literature and tools for web vulnerability scanning reveals several important insights that
have influenced the design of the proposed system:

1. Limited Use of Artificial Intelligence in Existing Tools:


Most traditional scanners, such as AMNESIA, Nikto, and OWASP ZAP, rely on rule-based or signature-based
detection methods. While these are effective for identifying known vulnerabilities, they lack intelligent
capabilities to analyze novel threats or adapt to emerging attack patterns. None of the reviewed tools fully utilize
the potential of AI or large language models (LLMs) to enhance vulnerability interpretation or exploitability
ranking.

2. Lack of Exploitability Context and Risk Prioritization:


Existing systems tend to assign basic severity levels (e.g., high, medium, low) without detailed contextual
understanding of how easily a vulnerability can be exploited in the real world. This often leads to inefficient
prioritization and delayed remediation of the most critical issues. Tools like AMNESIA and ZAP, while effective in
detection, do not provide insights into actual risk impact or ranking based on exploitability.

3. Generic or Minimal Remediation Guidance:


Most tools provide limited or generic mitigation suggestions that are not customized to the specific application
context. Developers are often left with vague recommendations that require further interpretation, increasing their
workload. Only a few commercial tools like Acunetix offer detailed fixes, but they are often not tailored and
require paid access.

4. Narrow Scope of Vulnerability Coverage:


Some tools (e.g., AMNESIA, Sunitha & Sridevi) are focused on a specific class of attacks such as SQL injection,
leaving other vulnerabilities (like XSS, CSRF, and security misconfigurations) unaddressed. A comprehensive
solution should map to the entire OWASP Top 10 to ensure complete coverage of common web threats.

5. Insufficient Real-Time Reporting and Usability Features:


Many open-source tools lack user-friendly interfaces or real-time reporting capabilities. The absence of
exportable, well-formatted reports limits their usability in modern DevSecOps pipelines and enterprise settings.
This gap highlights the need for scanners that not only detect issues but also present results in an actionable and
accessible manner.

6. Need for Developer-Friendly and Privacy-Conscious Tools:


Current tools often require manual setup, expertise in cybersecurity, and compromise on privacy by storing user
data or scan histories. There is a growing demand for intelligent, easy-to-use, and privacy-aware tools that can be
used directly by developers, even with minimal security knowledge.

2.2 Comparative Analysis of Techniques/Methods


To evaluate the landscape of CTI validation and sharing platforms, it is important to compare
them across dimensions such as data source integration, trust evaluation, validation
methodology, scalability, real-time capability, and data integrity assurance.
Tool/Approach Detec Vulnerabi AI Exploitabilit Remediati
Open
tion lity Types Integratio y Ranking on Source
Meth Covered n Guidance
od
Static + SQL No
AMNESIA No Partial No
Runtime Injecti
Analysis on
Signature- SQL No
Sunitha & Sridevi No No No
Model Based Injection
Detection
General No
Nikto Signature & No No Yes
Config Server Issues
Checks

Dynamic SQLi,
Acunetix Limited Basic Basic No
Analysis XSS,
CSRF,
etc.
SQLi,
OWASP ZAP Proxy-Based
XSS, etc. No No Generic Yes
Scanning

AI + OWASP
Proposed Gemini Yes Yes Tailored Yes
AI System Static/Dyna Top 10
mic Scan
Behavioral
Implicit Internal
Google Chronicle & rule- High No Yes
trust telemetry
based
Proposed Federated
Dynamic Honeypots,
Platform ML + High Yes Yes
3- factor OSINT, users
(This Work) trust
scoring

Key Innovations of the Proposed Platform

 Integrates Google Gemini AI to rank vulnerabilities based on real-world exploitability using


contextual analysis.
 Provides tailored, AI-generated mitigation strategies specific to each detected
vulnerability.
 Maps findings to OWASP Top 10 and delivers real-time, developer-friendly
reports for efficient remediation.

RESEARCH GAPS IDENTIFIED

1. Lack of AI Integration in Existing Scanners:


Most current tools do not leverage advanced AI or LLMs for intelligent vulnerability analysis, prioritization, or
mitigation.
2. Absence of Exploitability-Based Ranking:
Traditional scanners fail to evaluate and rank vulnerabilities based on how easily they can be exploited in real-
world scenarios.
3. Limited Contextual Mitigation Support:
Existing tools offer generic remediation advice rather than application-specific, actionable guidance.
4. Inadequate Real-Time Reporting and Developer Usability:
Many tools lack real-time, user-friendly reporting features that can be directly used by developers during the
development lifecycle.
5. Narrow Coverage of Web Security Threats:
Several existing approaches focus on specific attack types, lacking comprehensive coverage of the OWASP Top 10.

2.4 Summary and Key Learnings

This study explores the development of an AI-powered website vulnerability scanner that
integrates traditional security scanning methods with Google Gemini AI. Through a detailed
review of existing tools and techniques, it is evident that while many scanners can detect
vulnerabilities, they lack intelligent prioritization, contextual mitigation, and real-time usability.

[xv]
Key Learnings:

 Understanding Web Vulnerabilities: You are learning about common security flaws in
web applications, such as SQL injection, Cross-Site Scripting (XSS), Cross-Site Request
Forgery (CSRF), and Remote File Inclusion (RFI), and how they can be exploited by
attackers.

 Threat Detection: You are developing skills in detecting these vulnerabilities


by scanning websites and web applications for weaknesses, learning how
different vulnerabilities manifest in code and application logic.

 OWASP Top 10: Becoming familiar with the most common security risks outlined by
the Open Web Application Security Project (OWASP) and understanding their impact
on web security.
Chapter 3 : Project Proposal

3.1 Overview of the Proposed System

The Web Vulnerability Scanner project aims to develop an automated tool designed to identify security
vulnerabilities in web applications. In the current digital age,
web applications are vulnerable to a variety of attacks, and ensuring that they are secure is critical to
safeguarding user data and maintaining trust. This tool will
help developers and security professionals perform automated scans on web applications, detect potential
vulnerabilities, and provide remediation advice.

The system will focus on detecting common vulnerabilities such as SQL injection, Cross-Site Scripting (XSS),
Cross-Site Request Forgery (CSRF), and others, as outlined
by the OWASP Top 10. It will provide an easy-to-use interface for users to scan their websites and receive
detailed reports highlighting detected security risks.

Key features include:

1. Automated Vulnerability Scanning


 Automatic Detection: The system can automatically scan web applications for common security
vulnerabilities such as SQL injection, Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF), Remote
File Inclusion (RFI), and others.
 Continuous Scanning: Users can schedule periodic scans for continuous vulnerability assessment, ensuring
that web applications remain secure over time.
2. Detailed Vulnerability Reporting
 Comprehensive Reports: After each scan, the system generates detailed reports that list the vulnerabilities
found, along with descriptions of each vulnerability.
 Severity Levels: Each vulnerability is categorized into severity levels (e.g., High, Medium, Low) based on its
potential impact on the system.
 Remediation Suggestions: The reports include actionable steps to fix or mitigate each vulnerability,
providing developers with clear guidance on how to address the issue.
3. User-Friendly Interface
 Intuitive Dashboard: The system will feature an easy-to-navigate interface where users can input URLs, configure
scan settings, and view scan results.
 Real-Time Results: Users can view scan results in real time, making it easy to monitor the progress of scans
and identify potential issues immediately.
 Scan Configuration Options: Users can customize scan settings, such as depth of scan, target URL, and specific
vulnerabilities to look for.
4. Integration with Security Tools
 OWASP ZAP Integration: Integration with OWASP ZAP (Zed Attack Proxy) to enhance vulnerability detection
capabilities, especially for advanced security issues.
 SQLmap Integration: Automated detection of SQL injection vulnerabilities using the SQLmap tool, allowing
for more robust security testing.
3.2 Detailed Problem Statement

Web applications are often targets for cyberattacks, and manual testing for security vulnerabilities is time-
consuming, error-prone, and can miss critical vulnerabilities.
Traditional security testing often fails to keep up with the rapidly evolving threats in the web landscape. Many
organizations lack the resources or expertise to perform
the comprehensive security assessments on their web applications, leading to undetected vulnerabilities that can be
exploited.
The problem lies in the lack of automated solutions that continuously and efficiently scan web applications for
known vulnerabilities. Developers may also have
limited knowledge of security practices, and many existing tools are either too complex or fail to identify critical
vulnerabilities. Therefore, there is a need for a
web vulnerability scanner that is both easy to use and effective in detecting security issues while providing clear
remediation steps.
3.3 Project Objectives

The main objectives of the Web Vulnerability Scanner project are:

Develop an Automated Vulnerability Scanner:

 Create an automated tool that scans web applications for common security vulnerabilities.
 Ensure the scanner is able to detect OWASP Top 10 vulnerabilities, such as SQL injection, XSS, CSRF,
and Remote File Inclusion (RFI).

Provide Detailed Vulnerability Reports:

 Generate detailed reports outlining detected vulnerabilities, their severity, and potential impact.
 Suggest actionable remediation steps to address the vulnerabilities.

Ensure Ease of Use and Integration:

 Design a user-friendly interface for both security professionals and developers.


 Provide features such as a simple user input for the target web application URL and the ability to
configure scan parameters.

Support Continuous Security Testing:

 Implement automated scheduling of scans and periodic vulnerability assessments to ensure that
web applications remain secure over time.

Maintain Security Standards:

 Follow ethical guidelines for web vulnerability scanning to ensure that the tool is used only with explicit
authorization and permission from website owners.

3.4 Proposed Methodology

The proposed methodology for the development of the Web Vulnerability Scanner will be structured as follows:
1. Requirements Gathering:
 Gather detailed requirements from stakeholders, including security professionals, developers, and users,
to understand the needs and challenges associated with web vulnerability scanning.
2. System Design:
 Design the architecture of the system, including the front-end user interface and back-end scanning engine.
3. Development:
 Develop the scanning engine, which will send requests to the web application and analyze the responses
for vulnerabilities.
4. Testing and Debugging:
 Perform unit testing, integration testing, and system testing to ensure the scanner works as intended.
5. Deployment and Documentation:
 Deploy the tool for use and provide comprehensive user documentation, including setup instructions
and troubleshooting tips.
3.5 Tools and Technologies to be Used

The following tools and technologies will be used to develop the Web Vulnerability Scanner:
Programming Languages:
 Python: Python will be used for the development of the back-end scanning engine due to its simplicity and ability
to work with libraries like requests,
BeautifulSoup, and Scrapy for web scraping and analysis.
 JavaScript: For implementing a dynamic front-end interface for interacting with the tool and displaying results.
Frameworks:
 Flask: Flask will be used for building the web application framework to serve the vulnerability scanner as a web-
based tool.
 Bootstrap: To design a responsive and user-friendly web interface.
Libraries and APIs:
 OWASP ZAP: Integrating with the OWASP Zed Attack Proxy (ZAP) for additional scanning capabilities.
 BeautifulSoup: For parsing HTML content and detecting XSS vulnerabilities.
 SQLmap: Integration of this tool for automated SQL injection testing.
Database:
 SQLite: For storing scan results and historical data about previous scans.
Version Control and Collaboration:
 Git: For source code management and version control.
Deployment:
 Heroku or AWS: For deploying the web application and making it accessible to users.
3.6 Timeline / Gantt Chart

Phase Duration Month(s)


Phase 1: Requirements Gathering 2 weeks Week 1 – Week 2

Phase 2: System Design 2 weeks Week 3 – Week 4

Phase 3: Development 2 weeks Week 5 – Week 6

Phase 4: Testing and Debugging 3 weeks Week 6 – Week 9

Phase 5: Deployment 1 week Week 9 – Week 10

Phase 6: Documentation 1 week Week 11

Phase 7: Feedback and Improvement 1 week Week 12


Chapter 4. Design Initializations

4.1 System Architecture

To implement the AI-powered website vulnerability scanner, the system is structured into four key architectural
layers:

Input Layer
The input layer accepts user-provided URLs for vulnerability assessment. It acts as the system’s entry point.

Processing Layer
This is the core intelligence layer where scanning, AI inference, and threat analysis take place. It consists of:
 Crawler Module: Navigates through submitted URLs to discover linked pages and input fields.
 Vulnerability Detection Engine: Applies known test payloads (SQLi, XSS, etc.) to detect potential flaws.
 Gemini AI Integration: Performs real-time analysis of identified vulnerabilities and assigns exploitability
scores using contextual awareness and historical threat data.
 Ranking Module: Sorts vulnerabilities based on OWASP Top 10 categories and severity levels.

Storage Layer
Although the system avoids storing user data for privacy compliance, temporary data handling is necessary for:
Holding scan results during active sessions.

Output Layer
This layer delivers organized, actionable results back to the user via the frontend:
 Visual display of discovered vulnerabilities with AI-generated risk levels.
 Export options for downloading reports in PDF or CSV formats.
 Real-time feedback interface for suggested remediations.

Algorithm 1: Threat Classification and Logging


Input:
 desc: IOC Description
 hp_match: Honeypot Match
 osint_match: OSINT Match
Output:
 Classification Result (True Threat or False Positive)
 Confidence Score
1: reporter_trust ← estimateTrust(desc)

2: vector ← [reporter_trust, hp_match, osint_match]

3: (is_threat, confidence) ← FederatedML.predict(vector)

This layered and modular design ensures scalability, interpretability, and real-time decision-making, while
maintaining user privacy and data integrity throughout
the scanning and classification process.

1.1 Architecture Diagram


4.2 Module Descriptions

1. Threat Collection Engine


• Collects IOCs from honeypots, user reports, and OSINT feeds
• Outputs raw data to feature extractor
2. Feature Extraction & Scoring Engine
• Parses IOCs to compute reputation score, honeypot match, and OSINT confidence
3. AI Threat Classifier
• Uses federated Random Forest to predict threat confidence
• Outputs label (Validated/False Positive) and confidence score
4. Integrity Module
• Hashes data using SHA-256 and appends to Merkle tree
• Stores timestamp, source ID, decision, and hash in database
5. Admin Dashboard
• Displays stats, supports queries by confidence, type, date
• Enables threat data download/export

4.3 Standards and Compliance

The system aligns with the following security and interoperability standards:

Standard Purpose
STIX/TAXII 2.1 CTI data formatting and exchange
Mapping tactics, techniques, and procedures
MITRE ATT&CK
(TTPs)
System reliability, documentation, and
NIST Cybersecurity Framework
auditability
GPG (OpenPGP Standard) Digital signature of user submissions
SHA-256 (FIPS PUB 180-4) Cryptographic hashing
OWASP Guidelines Securing web and API endpoints
4.4 Design Constraints

1. Federated Learning Overhead


Training across devices/organizations adds latency and needs strong synchronization.
2. Dataset Availability
Limited access to quality real-time threat data may affect model generalization.
3. Resource Limitations
Hashing and Merkle tree updates can be CPU-intensive at scale.
4. Legacy System Integration
Older SIEMs may struggle with STIX/TAXII compatibility.

4.5 Risk Assessment

Risk Impact Mitigation Strategy


Use cross-validation and
Inaccurate Model Predictions High continuously update
training data.
Cryptographic logging and
Data Tampering Critical
integrity checks on all entries.
Optimize database queries and
Performance Bottlenecks Medium
use async APIs.
Implement federated learning
Privacy Violations High
to avoid raw data sharing.
Model Poisoning by Malicious Use differential privacy and
High
Clients source trust weighting.
Chapter 5. Expected Outcomes (Experimental Investigation)

5.1 Deliverables

The successful development and deployment of the AI-Driven Cyber Threat Intelligence
Sharing Platform will result in a suite of concrete deliverables. These deliverables not only
reflect the
project’s functional capabilities but also its architectural, analytical, and security assurances.

Deliverables

1. AI-Powered Threat Intelligence Platform


• Web-based platform for ingesting, validating, scoring, and sharing IOCs
• Accepts data from honeypots, OSINT, and analysts
• Uses federated Random Forest; shows results via admin dashboard
2. Three-Factor Trust Scoring Engine
• Scores IOCs by trust history, honeypot match, OSINT correlation
• Normalizes into trust vector; adapts weights via feedback
3. Tamper-Proof Cryptographic Logging
• SHA-256 hashes and Merkle tree chaining
• Logs stored in SQLite3 with real-time auditing
4. Admin Dashboard
• Real-time UI with charts and filters
• Supports STIX/TAXII 2.1 export
5. GPG Integration for Submissions
• Verifies and secures user-submitted IOCs using GPG
6. Documentation & Reports
• Includes module/API docs, results, datasets, and final report

5.2 Success Criteria


A. Accuracy & False Positives
• Accuracy ≥ 90%, False Positive Rate ≤ 10%
• Industry average is 25–35%; ≤10% shows strong performance
B. Validation Latency
• IOC validation latency ≤ 3 seconds
• Supports near real-time detection using federated learning and cryptography

C. Trust Score Efficacy


• Correct source discrimination rate ≥ 85%
• Ensures reliable vs. spoofed source distinction

D. Cryptographic Integrity
• Tamper detection: 100%
• Detects all alterations using hash checks and Merkle tree

E. Platform Scalability
• Handles 50 concurrent IOC submissions
• Maintains performance under load

F. Interoperability
• Fully compliant with STIX/TAXII 2.1
• Ensures smooth integration with existing CTI/SIEM tools

Experimental Setup Dataset


• 1000+ IOCs curated from:
o Custom honeypot networks (Simulated SSH brute force, port scans,
malware payloads).
o Public OSINT threat feeds (AlienVault OTX, ThreatMiner).
o Simulated crowdsourced reports with signed GPG metadata.
Model Configuration
• Model Type: Random Forest (Federated)
• Local Model Nodes: Simulated environments for contributors (3 simulated orgs)
• Features: Source reputation, honeypot score, OSINT overlap
• Training Split: 80% train, 20% test (cross-validation)
Testing Tools
• Flask + JavaScript dashboard for live submission
• PostgreSQL/SQLite for internal storage
• hashlib for hashing, custom Merkle tree library for chaining
• gnuPG for GPG verification
Chapter 6. Result and Analysis

To evaluate the performance and reliability of the proposed AI-driven Cyber Threat Intelligence
Sharing Platform, extensive experiments were conducted across multiple dimensions including
classification accuracy, false positive rate, validation latency, and system scalability. The
system was tested using a curated dataset of 1,000 synthetic and real-world IOCs generated
from stratified honeypots, public OSINT feeds, and simulated user submissions. The goal was
to assess the ability of the federated machine learning model to accurately distinguish between
true threats and false positives in real-time. The proposed system leverages a federated Random
Forest model, achieving a significant improvement in classification performance. Compared to
traditional rule-based filtering, which achieved only 66% accuracy, our AI-driven approach
yielded 92.7% accuracy and reduced false positives from 34% to just 7.3%. This was made
possible through dynamic trust scoring based on reporter reliability, honeypot pattern matching,
and OSINT corroboration. System latency was measured from IOC submission to validation
and logging. The baseline system required ~6.5 seconds on average due to centralized
processing and static rule matching. In contrast, our platform maintained a median validation
time of ~2.7 seconds thanks to lightweight federated models and optimized hashing via SHA-
256. The additional cryptographic integrity layer (Merkle chaining) added only ~12
milliseconds per entry.To test system scalability, we simulated concurrent IOC submissions.
The baseline platform handled up to 10 parallel events before degradation. The proposed system
successfully processed 50 simultaneous submissions, maintaining sub-5-second response times.
The admin dashboard also remained responsive, displaying threat counts, timestamps, and
confidence metrics without delay
Metric Baseline System Proposed CTI Platform
Federated Random Forest with 3-factor
Validation Model Static rule-based filtering
trust scoring
Average Accuracy 66.0% 92.7%
False Positive Rate 34.0% 7.3%
Classification Latency ~6.5 seconds per IOC ~2.7 seconds per IOC
Scalability (Concurrent Handles up to 10 concurrent Scales up to 50 concurrent IOC
IOCs) submissions submissions
Reporter history, Honeypot
Trust Scoring Factors None
match, OSINT corroboration
SHA-256 hashing with Merkle tree
Tamper-Proof Storage Not supported
chaining
Stratified honeypots, OSINT
Data Sources Only public feeds
crawlers, GPG-signed reports
MITRE ATT&CK Automated based on honeypot behavior
Manual tagging
Mapping analysis
Dashboard with logs, confidence,
Admin Visibility Limited logs
timestamps, techniques
Fig 6.1: User interface for submitting potential threat indicators (IOCs) to the Cyber Threat
Intelligence platform.

Fig 6.2: Admin panel showing submitted threat hashes, their descriptions, and timestamps in reverse
chronological order.
Fig 6.3: Dashboard displaying total validated threats, latest IOC hash, and most recent threat
description.

Fig 6.4: Positive classification result with mapped MITRE ATT&CK technique and 100% model
confidence.
Fig 6.5: Negative classification output showing the IOC was not validated, with a low model
confidence of 20%.

Fig 6.6: Performance Comparison: Baseline vs Proposed CTI Platform.

Fig 6.7: Threat Classification Confidence Distribution among 100 IOCs.


Chapter 7 : Conclusion

The increasing sophistication of cyber threats demands a paradigm shift towards accurate,
timely, and trustworthy Cyber Threat Intelligence (CTI). Current CTI sharing faces challenges
like high false positives, lack of trust, limited validation, and data tampering risks.

This project addresses these issues by presenting an AI-driven CTI Sharing Platform with key
innovations:

•A federated machine learning (ML) model for privacy-preserving threat classification.

•A three-factor trust scoring mechanism based on source credibility, honeypot correlation,


and OSINT matching.

•A cryptographic logging system using SHA-256 and Merkle trees for tamper-proof
record keeping.

•An intuitive admin dashboard for real-time monitoring and validated CTI export.

Experimental evaluation demonstrated:

•92.7% overall classification accuracy, outperforming traditional platforms.

•7.3% false positive rate, reducing alert fatigue.

•2.7 seconds validation latency, enabling real-time response.

•100% detection of tampered entries, validating data integrity.

Federated learning enabled privacy-compliant collaborative model training. The trust score
mechanism effectively contextualized source credibility, reducing false alerts. The tamper-proof
log infrastructure enhanced data integrity and forensic traceability. The extensible dashboard
provided a user-friendly interface for analysts.
Overall, this project introduces a novel, modular, and scalable solution that enhances the
technical, human, and operational aspects of modern CTI sharing.

Chapter 8 : Future Work

To do this for future work we will integrate continuous learning of model’s accuracy using
continuous learning of real-time feedback loops from participating organizations. It will provide
extension of the platform to multi tenant architectures with differential trust profiles and API
based integration with SIEM and SOAR tools for automated response. An advanced threat
correlation based on NLP based intent recognition and clustering will also be discussed to
detect campaign level activities. Then it will be looked at how to start moving away from
SQLite, either toward a distributed NoSQL store or to a blockchain backed store in order to
achieve further levels of scalability and auditing in an enterprise environment.
Chapter 9 : References

[1] CIRCL, “MISP: Threat Intelligence Sharing Platform,” MISP Project


Documentation, 2020.

[2] ThreatConnect Inc., “ThreatConnect Platform: Intelligence-Driven Defense,” 2019.

[3] S. Nakamoto, R. Patel, and M. Zhang, “Blockchain-based secure CTI sharing


framework,” IEEE Transactions on Information Forensics and Security, vol. 15, no. 4, pp.
1123– 1135, 2020.

[4] MITRE Corporation, “ATT&CK: Adversarial Tactics, Techniques, and


Common Knowledge,” Technical Report, 2021.

[5] S. Jones and A. Liu, “Dynamic CTI validation using machine learning,” in
Proceedings of the IEEE CyberSci Conference, 2021, pp. 45–50.

[6] IBM Corporation, “X-Force Threat Intelligence Exchange,” IBM Whitepaper, 2021.

[7] Anomali Inc., “ThreatStream: Threat Intelligence Platform,” Technical Brief, 2020.

[8] Recorded Future, “Intelligence-Driven Security: Leveraging OSINT,” Technical Report,


2021.

[9] Google Cloud, “Chronicle Security Analytics Infrastructure,” White Paper, 2020.

[10] Cisco Systems, “Talos Intelligence Group: Threat Research Report,” Technical
Report, 2021.

[11] Splunk Inc., “Behavioral Analytics for Threat Detection,” Security Report, 2022.

[12] Elastic Inc., “Elastic Security: Anomaly Detection Pipelines,” White Paper, 2022.

[13] CrowdStrike, “Falcon Endpoint and Threat Intelligence Platform,” Company


Report, 2021.
[14] Microsoft Corp., “Microsoft Defender Threat Intelligence Framework,” Security
White Paper, 2022.

[15] Check Point Software Technologies, “ThreatCloud Global Intelligence,” Technical


Brief, 2021.

[16] AlienVault, “Open Threat Exchange (OTX) and Community Intelligence


Sharing,” Security Analysis, 2020.

[17] Palo Alto Networks, “AutoFocus: Intelligence for Security Operations,”


Company Documentation, 2020.

[18] Kaspersky Lab, “Kaspersky Threat Intelligence Portal,” Intelligence Services


Report, 2021.

[19] FireEye Inc., “Cyber Threat Intelligence Feed and Correlation,” Security Report, 2020.

[20] Cyber Threat Alliance, “Collaborative Threat Intelligence Sharing among


Vendors,” CTA Annual Report, 2021.

[21] OASIS Consortium, “STIX and TAXII Standards for Structured CTI
Exchange,” Technical Specification, 2019.

[22] National Institute of Standards and Technology (NIST), “Framework for


Improving Critical Infrastructure Cybersecurity,” NIST Special Publication 800-53, 2020.

[23] T. Holz, M. Engelberth, and F. Freiling, “Learning More About the Underground
Economy: A Case-Study of Keyloggers and Dropzones,” Proceedings of the Workshop on
the Economics of Information Security, 2009.

[24] J. Benet, “Merkle-DAGs and Tamper-Proof Storage for Secure Data Distribution,”
IPFS Whitepaper, 2015.

[25] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated Learning: Challenges, Methods,
and Future Directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
Page 1 of 7 - Cover Page Submission ID trn:oid:::1:3240957669

Dhanush S
AI-Driven Cyber Threat Intelligence Sharing Platform
Quick Submit
Quick Submit

K L
University

Document Details

Submission ID trn:oid:::1:3240957669
5 Pages
Submission Date
May 6, 2025, 10:00 AM GMT+5:30 2,597 Words

Download Date 16,171 Characters


May 6, 2025, 10:01 AM GMT+5:30

File Name 5_6109639928302277798.pdf

File Size
1.3 MB

Page 1 of 7 - Cover Page Submission ID trn:oid:::1:3240957669


Page 2 of 7 - AI Writing Overview Submission ID trn:oid:::1:3240957669

0% detected as AI
The percentage indicates the combined amount of likely AI-generated
text as well as likely AI-generated text that was also likely AI- about a student’s work. We encourage you to learn more about Turnitin’s AI
detection
paraphrased.

Detection Groups
0 AI-generated only 0%
Likely AI-generated text from a large-language model.

0 AI-generated text that was AI-paraphrased 0%


Likely AI-generated text that was likely revised using an AI-
paraphrase tool or word spinner.

Disclaimer
Our AI writing assessment is designed to help educators identify text that might be prepared by a generative AI tool. Our AI writing assessment may not always be
accurate (it may misidentify writing that is likely AI generated as AI generated and AI paraphrased or likely AI generated and AI paraphrased writing as only AI generated)
so it should not be used as the sole basis for adverse actions against a student. It takes further scrutiny and human judgment in conjunction with an organization's
application of its specific academic policies to determine whether any academic misconduct has occurred.

Frequently Asked Questions

How should I interpret Turnitin's AI writing percentage and false positives?


The percentage shown in the AI writing report is the amount of qualifying text within the submission that
Turnitin’s AI writing detection model determines was either likely AI-generated text from a large-language model
or likely AI-generated text that was likely revised using an AI-paraphrase tool or word spinner.

False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models.

AI detection scores under 20%, which we do not surface in new reports, have a higher likelihood of false
positives. To reduce the likelihood of misinterpretation, no score or highlights are attributed and are indicated
with an asterisk in the report (*%).

The AI writing percentage should not be the sole basis to determine whether misconduct has occurred. The
reviewer/instructor should use the percentage as a means to start a formative conversation with their student
and/or use it to examine the submitted assignment in accordance with their school's policies.

What does 'qualifying text' mean?


Our model only processes qualifying text in the form of long-form writing. Long-form writing means individual sentences contained in
paragraphs that make up a longer piece of written work, such as an essay, a dissertation, or an article, etc. Qualifying text that has been
determined to be likely AI-generated will be highlighted in cyan in the submission, and likely AI-generated and then likely AI-paraphrased will
be highlighted purple.

Non-qualifying text, such as bullet points, annotated bibliographies, etc., will not be processed and can create disparity between the submission
highlights and the percentage shown.

Page 2 of 7 - AI Writing Overview Submission ID trn:oid:::1:3240957669


AI-Driven Cyber Threat Intelligence Sharing Platform
ORIGINALITY
REPORT

5 %
SIMILARITY
4%
INTERNET
SOURCES
5%
PUBLICATIO
NS
4%
STUDENT PAPERS
INDEX

PRIMARY SOURCES

1
www.kuey.net
Internet Source 2%
2
Submitted to Somaiya Vidyavihar
Student Paper 1%
3
www.ijisae.org
Internet Source 1%
4
V. Sharmila, S. Kannadhasan, A. Rajiv Kannan,
P. Sivakumar, V. Vennila. "Challenges in
1
Information, Communication and Computing %
Technology", CRC Press, 2024
Publication

www.coursehero.com
5 Internet Source <1%
Shaker Sepasgozar, Sanaz. "Network
6
Traffic Prediction Based on Artificial <1%
Intelligence in Vehicular Ad-Hoc
Networks", Ecole Polytechnique,
Montreal (Canada), 2024
Publication

Exclude quotes
Exclude matches Off

Off Exclude bibliography


On

You might also like