Web Vulnerability
Web Vulnerability
A Term Paper report Submitted in partial fulfillment of the requirements for the
award of degree of
BACHELOR OF TECHNOLOGY
in
by
[i]
KONERU LAKSHMAIAH EDUCATION FOUNDATION
ENGINEERING
Declaration
The Term Paper Report entitled "AI-Powered Website Vulnerability Scanner with
Exploitability Ranking and Mitigations using Gemini AI" is a record of bon-a- fide work of
Bommareddy Keerthana Reddy(2200030485), Kondamadugula Hemanth Reddy(2200030094),
Pathuri Manohar(2200030503), Maram Bhuvanesh(2200030677) submitted in partial fulfillment
for the award of Bachelor of Technology in Computer Science and Engineering to K L Deemed to
be a University during the academic year 2024-25.
We also declare that this report is of our own effort and it has not been submitted to any
other university for the award of any degree.
[ii]
KONERU LAKSHMAIAH EDUCATION FOUNDATION
Certificate
This is to certify that the Term Paper Report entitled “AI-Powered Website Vulnerability
Scanner with Exploitability Ranking and Mitigations using Gemini AI” is being submitted by
Bommareddy Keerthana Reddy(2200030485), Kondamadugula Hemanth
Reddy(2200030094), Pathuri Manohar(2200030503), Maram Bhuvanesh(2200030677) in
partial fulfillment for the award of Bachelor of Technology in Computer Science and Engineering
to K L Deemed to be a University during the academic year 2024-25.
[iii]
ACKNOWLEDGEMENTS
It is great pleasure for me to express my gratitude to our Hon’ble Chancellor Sri. Koneru
Satyanarayana, for giving the opportunity and platform with facilities in accomplishing the
term paper course successfully.
I express the sincere gratitude to our principal Dr. T. K. RamaKrishna Rao for his
administration towards our academic growth.
I record it as my privilege to deeply thank our pioneer’s Prof. V. Hari kiran, Dean (Addl)
Academics and Dr. A. Senthil, HOD, CSE Department for providing us the efficient faculty and
facilities to make our ideas into reality.
We express our sincere thanks to our project supervisor Janjhyam Venkata Naga Ramesh for
her/his novel association of ideas, encouragement, appreciation, and intellectual zeal which
motivated us to Complete this term paper work successfully.
Finally, it is pleased to acknowledge the indebtedness to all those who devoted themselves
directly or indirectly to make this the term paper success.
[iv]
ABSTRACT:
Cyber This paper presents the design and implementation of an AI-powered website
vulnerability scanner that utilizes Google Gemini AI to enhance the detection, ranking, and
remediation of web application vulnerabilities. The proposed system integrates conventional
security scanning techniques—such as static and dynamic analysis—with the advanced
reasoning capabilities of a large language model (LLM) to assess the exploitability of identified
vulnerabilities. It maps the results to the OWASP Top 10 framework, ensuring alignment with
industry standards. By employing AI-driven contextual analysis, the scanner not only prioritizes
security issues based on their severity and potential impact but also provides developers with
clear, tailored mitigation strategies. Furthermore, the system generates comprehensive, real-time
reports to support rapid response and informed decision-making. The goal is to empower
developers and security teams with intelligent tools that streamline the vulnerability management
process, reduce manual effort, and significantly enhance overall application security.
Keywords— Web Application Security, Vulnerability Scanner, Google Gemini AI, Large
Language Model (LLM), OWASP Top 10, Exploitability Ranking, Static and Dynamic
Analysis, Real-time Reporting, Mitigation Strategies, Application Vulnerability Detection.
[v]
LIST OF FIGURES
[vi]
TABLE OF CONTENTS
1.2 Motivation
The motivation behind this project is to address the limitations of existing vulnerability scanners that focus
only on detection without offering deeper insights into the
severity or risk prioritization of vulnerabilities. Developers often struggle with understanding which threats to
resolve first and how to mitigate them effectively.
By leveraging Gemini AI, we aim to introduce an intelligent system that not only detects vulnerabilities but
also evaluates their exploitability and suggests actionable
remediation steps. This AI-powered approach enhances decision-making, reduces manual effort, and
improves overall application security posture.
1.3 Problem Statement
Existing web vulnerability scanners lack advanced exploitability assessment and remediation
support. While they detect common vulnerabilities, they do not prioritize them based on
potential impact, nor do they provide detailed mitigation strategies. This results in inefficient
vulnerability management, increased exposure to cyberattacks, and longer remediation cycles.
There is a clear need for an intelligent, automated system that bridges this gap by combining
traditional scanning techniques with AI-driven analysis.
To map vulnerabilities to the OWASP Top 10 framework for standardized risk categorization.
To generate real-time, detailed reports including mitigation strategies for each vulnerability.
To provide an intuitive user interface for developers to interact with scan results and
download reports in preferred formats.
[x]
Chapter 2: Literature Review
[xi]
2.1 Review of Existing Work
Over the years, researchers have developed various approaches to identify and mitigate web
application vulnerabilities. Some of the most influential works are outlined below:
Developed by Halfond and Orso, AMNESIA is a tool specifically designed to detect and prevent
SQL injection attacks. It combines static code analysis to identify SQL query generation points
and integrates runtime monitoring to ensure that the executed queries match expected safe
patterns. This hybrid approach offers accurate detection and real-time prevention. However,
AMNESIA is limited to SQLi vulnerabilities and lacks the capability to assess other types of
web-based threats or prioritize them by severity.
Sunitha and Sridevi introduced a method that leverages pattern matching techniques to detect SQL
injection attacks. Their approach builds a database of known attack signatures and analyzes code
for matches. While this method is fast and easy to implement, it struggles with detecting new or
obfuscated attacks (zero-day vulnerabilities). Moreover, it does not provide contextual
information or exploitability ranking.
OWASP ZAP is an open-source penetration testing tool maintained by the OWASP community. It
performs automated scanning, passive scanning, and even manual testing. ZAP identifies a wide
range of issues including SQLi, XSS, and more. While ZAP is comprehensive, it lacks an AI
component for prioritizing threats or generating intelligent mitigation strategies.
4. Burp Suite
Burp Suite is a popular commercial tool for web vulnerability scanning, offering both automated
and manual testing. It provides detailed reports and payload injection features. However, the
tool is more suitable for experienced penetration testers and lacks AI-driven risk evaluation or
mitigation suggestions tailored to the application context.
5. Acunetix
Acunetix is a fully automated web vulnerability scanner designed to detect a broad range of security
flaws. It integrates with CI/CD pipelines and provides risk scores. However, its proprietary
nature and lack of customizable AI features limit its use in research or educational
settings.
Key Observations
The analysis of existing literature and tools for web vulnerability scanning reveals several important insights that
have influenced the design of the proposed system:
Dynamic SQLi,
Acunetix Limited Basic Basic No
Analysis XSS,
CSRF,
etc.
SQLi,
OWASP ZAP Proxy-Based
XSS, etc. No No Generic Yes
Scanning
AI + OWASP
Proposed Gemini Yes Yes Tailored Yes
AI System Static/Dyna Top 10
mic Scan
Behavioral
Implicit Internal
Google Chronicle & rule- High No Yes
trust telemetry
based
Proposed Federated
Dynamic Honeypots,
Platform ML + High Yes Yes
3- factor OSINT, users
(This Work) trust
scoring
This study explores the development of an AI-powered website vulnerability scanner that
integrates traditional security scanning methods with Google Gemini AI. Through a detailed
review of existing tools and techniques, it is evident that while many scanners can detect
vulnerabilities, they lack intelligent prioritization, contextual mitigation, and real-time usability.
[xv]
Key Learnings:
Understanding Web Vulnerabilities: You are learning about common security flaws in
web applications, such as SQL injection, Cross-Site Scripting (XSS), Cross-Site Request
Forgery (CSRF), and Remote File Inclusion (RFI), and how they can be exploited by
attackers.
OWASP Top 10: Becoming familiar with the most common security risks outlined by
the Open Web Application Security Project (OWASP) and understanding their impact
on web security.
Chapter 3 : Project Proposal
The Web Vulnerability Scanner project aims to develop an automated tool designed to identify security
vulnerabilities in web applications. In the current digital age,
web applications are vulnerable to a variety of attacks, and ensuring that they are secure is critical to
safeguarding user data and maintaining trust. This tool will
help developers and security professionals perform automated scans on web applications, detect potential
vulnerabilities, and provide remediation advice.
The system will focus on detecting common vulnerabilities such as SQL injection, Cross-Site Scripting (XSS),
Cross-Site Request Forgery (CSRF), and others, as outlined
by the OWASP Top 10. It will provide an easy-to-use interface for users to scan their websites and receive
detailed reports highlighting detected security risks.
Web applications are often targets for cyberattacks, and manual testing for security vulnerabilities is time-
consuming, error-prone, and can miss critical vulnerabilities.
Traditional security testing often fails to keep up with the rapidly evolving threats in the web landscape. Many
organizations lack the resources or expertise to perform
the comprehensive security assessments on their web applications, leading to undetected vulnerabilities that can be
exploited.
The problem lies in the lack of automated solutions that continuously and efficiently scan web applications for
known vulnerabilities. Developers may also have
limited knowledge of security practices, and many existing tools are either too complex or fail to identify critical
vulnerabilities. Therefore, there is a need for a
web vulnerability scanner that is both easy to use and effective in detecting security issues while providing clear
remediation steps.
3.3 Project Objectives
Create an automated tool that scans web applications for common security vulnerabilities.
Ensure the scanner is able to detect OWASP Top 10 vulnerabilities, such as SQL injection, XSS, CSRF,
and Remote File Inclusion (RFI).
Generate detailed reports outlining detected vulnerabilities, their severity, and potential impact.
Suggest actionable remediation steps to address the vulnerabilities.
Implement automated scheduling of scans and periodic vulnerability assessments to ensure that
web applications remain secure over time.
Follow ethical guidelines for web vulnerability scanning to ensure that the tool is used only with explicit
authorization and permission from website owners.
The proposed methodology for the development of the Web Vulnerability Scanner will be structured as follows:
1. Requirements Gathering:
Gather detailed requirements from stakeholders, including security professionals, developers, and users,
to understand the needs and challenges associated with web vulnerability scanning.
2. System Design:
Design the architecture of the system, including the front-end user interface and back-end scanning engine.
3. Development:
Develop the scanning engine, which will send requests to the web application and analyze the responses
for vulnerabilities.
4. Testing and Debugging:
Perform unit testing, integration testing, and system testing to ensure the scanner works as intended.
5. Deployment and Documentation:
Deploy the tool for use and provide comprehensive user documentation, including setup instructions
and troubleshooting tips.
3.5 Tools and Technologies to be Used
The following tools and technologies will be used to develop the Web Vulnerability Scanner:
Programming Languages:
Python: Python will be used for the development of the back-end scanning engine due to its simplicity and ability
to work with libraries like requests,
BeautifulSoup, and Scrapy for web scraping and analysis.
JavaScript: For implementing a dynamic front-end interface for interacting with the tool and displaying results.
Frameworks:
Flask: Flask will be used for building the web application framework to serve the vulnerability scanner as a web-
based tool.
Bootstrap: To design a responsive and user-friendly web interface.
Libraries and APIs:
OWASP ZAP: Integrating with the OWASP Zed Attack Proxy (ZAP) for additional scanning capabilities.
BeautifulSoup: For parsing HTML content and detecting XSS vulnerabilities.
SQLmap: Integration of this tool for automated SQL injection testing.
Database:
SQLite: For storing scan results and historical data about previous scans.
Version Control and Collaboration:
Git: For source code management and version control.
Deployment:
Heroku or AWS: For deploying the web application and making it accessible to users.
3.6 Timeline / Gantt Chart
To implement the AI-powered website vulnerability scanner, the system is structured into four key architectural
layers:
Input Layer
The input layer accepts user-provided URLs for vulnerability assessment. It acts as the system’s entry point.
Processing Layer
This is the core intelligence layer where scanning, AI inference, and threat analysis take place. It consists of:
Crawler Module: Navigates through submitted URLs to discover linked pages and input fields.
Vulnerability Detection Engine: Applies known test payloads (SQLi, XSS, etc.) to detect potential flaws.
Gemini AI Integration: Performs real-time analysis of identified vulnerabilities and assigns exploitability
scores using contextual awareness and historical threat data.
Ranking Module: Sorts vulnerabilities based on OWASP Top 10 categories and severity levels.
Storage Layer
Although the system avoids storing user data for privacy compliance, temporary data handling is necessary for:
Holding scan results during active sessions.
Output Layer
This layer delivers organized, actionable results back to the user via the frontend:
Visual display of discovered vulnerabilities with AI-generated risk levels.
Export options for downloading reports in PDF or CSV formats.
Real-time feedback interface for suggested remediations.
This layered and modular design ensures scalability, interpretability, and real-time decision-making, while
maintaining user privacy and data integrity throughout
the scanning and classification process.
The system aligns with the following security and interoperability standards:
Standard Purpose
STIX/TAXII 2.1 CTI data formatting and exchange
Mapping tactics, techniques, and procedures
MITRE ATT&CK
(TTPs)
System reliability, documentation, and
NIST Cybersecurity Framework
auditability
GPG (OpenPGP Standard) Digital signature of user submissions
SHA-256 (FIPS PUB 180-4) Cryptographic hashing
OWASP Guidelines Securing web and API endpoints
4.4 Design Constraints
5.1 Deliverables
The successful development and deployment of the AI-Driven Cyber Threat Intelligence
Sharing Platform will result in a suite of concrete deliverables. These deliverables not only
reflect the
project’s functional capabilities but also its architectural, analytical, and security assurances.
Deliverables
D. Cryptographic Integrity
• Tamper detection: 100%
• Detects all alterations using hash checks and Merkle tree
E. Platform Scalability
• Handles 50 concurrent IOC submissions
• Maintains performance under load
F. Interoperability
• Fully compliant with STIX/TAXII 2.1
• Ensures smooth integration with existing CTI/SIEM tools
To evaluate the performance and reliability of the proposed AI-driven Cyber Threat Intelligence
Sharing Platform, extensive experiments were conducted across multiple dimensions including
classification accuracy, false positive rate, validation latency, and system scalability. The
system was tested using a curated dataset of 1,000 synthetic and real-world IOCs generated
from stratified honeypots, public OSINT feeds, and simulated user submissions. The goal was
to assess the ability of the federated machine learning model to accurately distinguish between
true threats and false positives in real-time. The proposed system leverages a federated Random
Forest model, achieving a significant improvement in classification performance. Compared to
traditional rule-based filtering, which achieved only 66% accuracy, our AI-driven approach
yielded 92.7% accuracy and reduced false positives from 34% to just 7.3%. This was made
possible through dynamic trust scoring based on reporter reliability, honeypot pattern matching,
and OSINT corroboration. System latency was measured from IOC submission to validation
and logging. The baseline system required ~6.5 seconds on average due to centralized
processing and static rule matching. In contrast, our platform maintained a median validation
time of ~2.7 seconds thanks to lightweight federated models and optimized hashing via SHA-
256. The additional cryptographic integrity layer (Merkle chaining) added only ~12
milliseconds per entry.To test system scalability, we simulated concurrent IOC submissions.
The baseline platform handled up to 10 parallel events before degradation. The proposed system
successfully processed 50 simultaneous submissions, maintaining sub-5-second response times.
The admin dashboard also remained responsive, displaying threat counts, timestamps, and
confidence metrics without delay
Metric Baseline System Proposed CTI Platform
Federated Random Forest with 3-factor
Validation Model Static rule-based filtering
trust scoring
Average Accuracy 66.0% 92.7%
False Positive Rate 34.0% 7.3%
Classification Latency ~6.5 seconds per IOC ~2.7 seconds per IOC
Scalability (Concurrent Handles up to 10 concurrent Scales up to 50 concurrent IOC
IOCs) submissions submissions
Reporter history, Honeypot
Trust Scoring Factors None
match, OSINT corroboration
SHA-256 hashing with Merkle tree
Tamper-Proof Storage Not supported
chaining
Stratified honeypots, OSINT
Data Sources Only public feeds
crawlers, GPG-signed reports
MITRE ATT&CK Automated based on honeypot behavior
Manual tagging
Mapping analysis
Dashboard with logs, confidence,
Admin Visibility Limited logs
timestamps, techniques
Fig 6.1: User interface for submitting potential threat indicators (IOCs) to the Cyber Threat
Intelligence platform.
Fig 6.2: Admin panel showing submitted threat hashes, their descriptions, and timestamps in reverse
chronological order.
Fig 6.3: Dashboard displaying total validated threats, latest IOC hash, and most recent threat
description.
Fig 6.4: Positive classification result with mapped MITRE ATT&CK technique and 100% model
confidence.
Fig 6.5: Negative classification output showing the IOC was not validated, with a low model
confidence of 20%.
The increasing sophistication of cyber threats demands a paradigm shift towards accurate,
timely, and trustworthy Cyber Threat Intelligence (CTI). Current CTI sharing faces challenges
like high false positives, lack of trust, limited validation, and data tampering risks.
This project addresses these issues by presenting an AI-driven CTI Sharing Platform with key
innovations:
•A cryptographic logging system using SHA-256 and Merkle trees for tamper-proof
record keeping.
•An intuitive admin dashboard for real-time monitoring and validated CTI export.
Federated learning enabled privacy-compliant collaborative model training. The trust score
mechanism effectively contextualized source credibility, reducing false alerts. The tamper-proof
log infrastructure enhanced data integrity and forensic traceability. The extensible dashboard
provided a user-friendly interface for analysts.
Overall, this project introduces a novel, modular, and scalable solution that enhances the
technical, human, and operational aspects of modern CTI sharing.
To do this for future work we will integrate continuous learning of model’s accuracy using
continuous learning of real-time feedback loops from participating organizations. It will provide
extension of the platform to multi tenant architectures with differential trust profiles and API
based integration with SIEM and SOAR tools for automated response. An advanced threat
correlation based on NLP based intent recognition and clustering will also be discussed to
detect campaign level activities. Then it will be looked at how to start moving away from
SQLite, either toward a distributed NoSQL store or to a blockchain backed store in order to
achieve further levels of scalability and auditing in an enterprise environment.
Chapter 9 : References
[5] S. Jones and A. Liu, “Dynamic CTI validation using machine learning,” in
Proceedings of the IEEE CyberSci Conference, 2021, pp. 45–50.
[6] IBM Corporation, “X-Force Threat Intelligence Exchange,” IBM Whitepaper, 2021.
[7] Anomali Inc., “ThreatStream: Threat Intelligence Platform,” Technical Brief, 2020.
[9] Google Cloud, “Chronicle Security Analytics Infrastructure,” White Paper, 2020.
[10] Cisco Systems, “Talos Intelligence Group: Threat Research Report,” Technical
Report, 2021.
[11] Splunk Inc., “Behavioral Analytics for Threat Detection,” Security Report, 2022.
[12] Elastic Inc., “Elastic Security: Anomaly Detection Pipelines,” White Paper, 2022.
[19] FireEye Inc., “Cyber Threat Intelligence Feed and Correlation,” Security Report, 2020.
[21] OASIS Consortium, “STIX and TAXII Standards for Structured CTI
Exchange,” Technical Specification, 2019.
[23] T. Holz, M. Engelberth, and F. Freiling, “Learning More About the Underground
Economy: A Case-Study of Keyloggers and Dropzones,” Proceedings of the Workshop on
the Economics of Information Security, 2009.
[24] J. Benet, “Merkle-DAGs and Tamper-Proof Storage for Secure Data Distribution,”
IPFS Whitepaper, 2015.
[25] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated Learning: Challenges, Methods,
and Future Directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
Page 1 of 7 - Cover Page Submission ID trn:oid:::1:3240957669
Dhanush S
AI-Driven Cyber Threat Intelligence Sharing Platform
Quick Submit
Quick Submit
K L
University
Document Details
Submission ID trn:oid:::1:3240957669
5 Pages
Submission Date
May 6, 2025, 10:00 AM GMT+5:30 2,597 Words
File Size
1.3 MB
0% detected as AI
The percentage indicates the combined amount of likely AI-generated
text as well as likely AI-generated text that was also likely AI- about a student’s work. We encourage you to learn more about Turnitin’s AI
detection
paraphrased.
Detection Groups
0 AI-generated only 0%
Likely AI-generated text from a large-language model.
Disclaimer
Our AI writing assessment is designed to help educators identify text that might be prepared by a generative AI tool. Our AI writing assessment may not always be
accurate (it may misidentify writing that is likely AI generated as AI generated and AI paraphrased or likely AI generated and AI paraphrased writing as only AI generated)
so it should not be used as the sole basis for adverse actions against a student. It takes further scrutiny and human judgment in conjunction with an organization's
application of its specific academic policies to determine whether any academic misconduct has occurred.
False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models.
AI detection scores under 20%, which we do not surface in new reports, have a higher likelihood of false
positives. To reduce the likelihood of misinterpretation, no score or highlights are attributed and are indicated
with an asterisk in the report (*%).
The AI writing percentage should not be the sole basis to determine whether misconduct has occurred. The
reviewer/instructor should use the percentage as a means to start a formative conversation with their student
and/or use it to examine the submitted assignment in accordance with their school's policies.
Non-qualifying text, such as bullet points, annotated bibliographies, etc., will not be processed and can create disparity between the submission
highlights and the percentage shown.
5 %
SIMILARITY
4%
INTERNET
SOURCES
5%
PUBLICATIO
NS
4%
STUDENT PAPERS
INDEX
PRIMARY SOURCES
1
www.kuey.net
Internet Source 2%
2
Submitted to Somaiya Vidyavihar
Student Paper 1%
3
www.ijisae.org
Internet Source 1%
4
V. Sharmila, S. Kannadhasan, A. Rajiv Kannan,
P. Sivakumar, V. Vennila. "Challenges in
1
Information, Communication and Computing %
Technology", CRC Press, 2024
Publication
www.coursehero.com
5 Internet Source <1%
Shaker Sepasgozar, Sanaz. "Network
6
Traffic Prediction Based on Artificial <1%
Intelligence in Vehicular Ad-Hoc
Networks", Ecole Polytechnique,
Montreal (Canada), 2024
Publication
Exclude quotes
Exclude matches Off