PONDICHERRY UNIVERSITY
(A CENTRAL UNIVERSITY)
SCHOOL OF ENGINEERING AND TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE
Master of Technology
(Network & Information Security)
Project Phase -1
Title: Predictive Analysis to SQL Injection Attack Detection and
Prevention using Machine Learning Approach.
Software Requirements Specification document
by
Mukilarasu M
(20394015)
Under Guidance of Project Guide: Mr. K. PALANIVEL
M.Tech, UGC-NET,
Systems Analyst (Sr. Scale),
Computer Centre,
Pondicherry University.
Signature of Project Guide:
Software Requirements Specification (SRS)
Table of Contents
1. Abstract
2. Introduction
3. Scope
4. Background
5. Module
5.1.Prevention of SQLIA
5.2.Understanding the Architecture for Detection of SQLIA
5.3. Proposed Technique.
5.4.Testing.
6. Language & Platform
7. Design & Deployment
8.Schedule.
1.ABSTRACT:
Web application has various input functions which are susceptible to SQL-
Injection attack. SQL-Injection occurs by injecting suspicious code or data
fragments in a web application.Personal information disclosure ,loss of
authenticity, data theft and site fishing falls under this attack category. It is
impossible to check original data code and suspicious data code using available
algorithms and approaches because of inefficient and proper training techniques
of dataset or design aspects.
In this paper we will use SVM (Support Vector Machine) for classification and
prediction of SQL-Injection attack. In our propose algorithm, SQL-Injection attack
detection accuracy is (96.47% and which is the highest among the existing SQL-
Injection detection Techniques.
In this technique, the SVM algorithm will be trained with all possible malicious
expressions and then generate the model. Whenever a user gives any new query
then SVM will be applied to that model to predict whether a given query contains
any malicious expressions or not. If the user invents the new technique then also
SVM can detect that malicious expression by matching with a minimum number
of syntax.
2.INTRODUCTION:
In recent years, much research has been conducted not only at educational
institutions but also to prevent needle attacks. Below are some of the preventive
measures recommended by researchers. Vulnerabilities in the Web can
compromise personal information and other valuable resources. When a user
tries to submit a request to a web server, I does this using Hypertext Markup
Language (HTML) forms, Uniform Resource Positions (URLs), or other fields where
data can be entered. The unfiltered form allows users to use SQL injection. This is
because the form data submitted to the database is processed and processed
without checking. SQL Hall of Fame Search reports the latest trends in SQLIA data
triggers. The ability to protect your backend database from SQLIA in the big data
era is a subject-based issue.
SQL injection is a type of attack that enables an attacker to access or alter data by
inserting query language code composed of a web form login form. A SQL
injection vulnerability could allow an attacker to send commands directly to the
base database of the web application, removing privacy and functionality. SVM is
a set of methods of supervisory learning based on the theory of statistical learning
and used for classification and regression tasks. As a classification system, SVM is
a global model of classification that generates non-overlapping parts and uses all
characteristics in general. The partitions are shared in one pass, creating linear
and linear partitions. SVM is based on maximum margin and linear discriminant,
similar to a probabilistic approach, but without considering the dependencies
between qualities. The basic idea of the SVM classifier is to choose this approach,
the max bridge plane.
A type of SQL detection system built into the cloud environment that protects
web applications in cloud deployments and provides dynamic analysis and input
filtering. First, this method gets the SQL keywords through a lexical order analysis
of the SQL statement. Then analyze the syntax of the SQL statement to create a
rule expression. Finally, a miniature view based on the attack detection model
defined by the SQL syntax arrangement was passed. To prevent SQL injection
attacks, many techniques such as content filtering, attack testing, and defensive
coding are used to identify and prevent a subset of SQL filtering vulnerabilities.
SQL checks will recognize this as a malicious query Queries paid by the attacker
will be stopped by the database parser.
Python is a programming language at a high level that is easy to read and execute.
For commercial applications, it is open-source and free. It is called a Python, Ruby,
or Perl scripting language and is mostly used to build web applications and
interactive web content. It is possible to evaluate the scripts (.py files)
immediately written in Python. Save the file as a compiled program (.py file) that
is used in other Python programs as a programming block that can be expressed.
This document contributes to a representative dataset that includes the ability to
teach a predictable control learning model through the SQLIA SVM (Support
Vector Machine) algorithm to prevent malicious web requests from accessing the
target backend database. It also offers, on large data networks, an SQLIA
discovery and blockchain environment.
3.Scope:
This work is focuses on design and implementation of an model by using machine
learning approach which is used to prevent and detect from SQL Injection Attacks
through the internet on web application. Because today, almost everyone is in
touch with ‘computer technology’. To serve this large number of users, a great
volume of data is being stored in Web application databases in different parts of
the globe. From time to time, the users interact with the backend databases via
the user interfaces for various tasks such as: updating data, making queries,
extracting data, and so forth. So that we develop a model by Machine Learning
approach with the Support Vector Machine Algorithm for Regressions and
classification also to prediction.
4.Background:
The proposed detection model may report vulnerabilities in web applications. As
a result, this model can reduce the likelihood that SQLIA will launch in your web
application. Machine learning with SVM algorithms is used to prevent SQLIA
runtime monitoring. The solution behind this technique is to detect and prevent
SQLIA outages when the home page of each application is transferred to a test
page.
5. Module:
5.1.Prevention of SQLIA:
SQL Injection flaws are introduced when software developers create dynamic
database queries that include user supplied input. To avoid SQL injection flaws is
simple. Developers need to either:
a) stop writing dynamic queries; and/or
b) prevent user supplied input which contains malicious SQL from affecting the
logic of the executed query.
c) sanitize data by limiting Special characters.
d) Use stored procedures in the database.
e) Actively manage patches and updates.
5.2.Understanding the Architecture for Detection of SQLIA :
The proposed detection model may report vulnerabilities in web applications. As
a result, this model can reduce the likelihood that SQLIA will launch in your web
application. Machine learning with SVM algorithms is used to prevent SQLIA
runtime monitoring. The solution behind this technique is to detect and prevent
SQLIA outages when the home page of each application is transferred to a test
page Straight lines and straight-line sections are created by sharing SVM sections
in one pass.
Fig:Proposed Architecture.
SVM is based on maximum margin and linear discriminant, similar to a
probabilistic approach, but without considering the dependencies between
qualities. The basic idea of the SVM classifier is to use this approach, i.e. select the
bridge plane with the largest edges. Training classifiers used in designing
predictive analytics web applications using the level of training data.
Attack signatures take the form of SQLIA tokens and SQLIA positive symbols at
injection points, but a legitimate web request will take the form of data that the
application expects. The vector of training data defined as a matrix or dictionary
keyword property (SQLIA negative) and a SQL token (SQLIA positive). The SVM will
identify this malicious expression when a user invites a new technique by
matching it with a minimum amount of syntax.
Support Vector Machine (SVM):
The term SVM is typically used to describe classification with support vector
methods and support vector regression is used to describe regression with
support vector methods. SVM (Support Vector Machine) is a useful technique for
data classification. The classification problem can be restricted to consideration of
the two-class problem without loss of generality. In this problem the goal is to
separate the two classes by a function which is induced from available examples.
The goal is to produce a classifier that will work well on unseen examples, i.e. it
generalizes well. Consider the example in figure.
Here there are many possible linear classifiers that can separate the data, but
there is only one that maximizes the margin (maximizes the distance between it
and the nearest data point of each class). This linear classifier is termed the
optimal separating hyper plane. Intuitively, we would expect this boundary to
generalize well as opposed to the other possible boundaries. Fig. Optimal
Separating Hyper Plane A classification task usually involves with training and
testing data which consist of some data instances. Each instance in the training
set contains one “target value" (class labels) and several “attributes" (features).
The goal of SVM is to produce a model which predicts target value of data
instances in the testing set which are given only the attributes.
Fig. Optimal Separating Hyper Plane
To attain this goal there are four different kernel functions.
1. Linear:K(𝑥𝑖 ,𝑥𝑗 ) = 𝑥𝑖 𝑇 𝑥𝑗
2. Polynomial: The polynomial kernel of degree d is of the form. K (𝑥 ,𝑗 )=( 𝑥𝑖𝑥𝑗 ) 3.
RBF: The Gaussian kernel, known also as the radial basis function, is of the form K
(𝑋 ,𝑗 ) =exp (- (𝑥𝑖 ,𝑥𝑗 ) 2𝜎 2 ) 4.
Sigmoid: The sigmoid kernel is of the form K (𝑥 ,𝑗 ) =tanh(k ( 𝑥𝑖𝑥𝑗 ) + r)
The RBF kernel nonlinearly maps samples into a higher dimensional space, so it,
unlike the linear kernel, can handle the case when the relation between class
labels and attributes is nonlinear. Furthermore, the linear kernel is a special case
of RBF show that the linear kernel with a penalty parameter C has the same
performance as the RBF kernel with some parameters (C, r). In addition, the
sigmoid kernel behaves like RBF for certain parameters.
Determining the SQL-injection attack using SVM(support Vector
Machine).classification of Suspecious query is done by analyzing the datasets of
Original query and suspicious query. classifies learns the dataset and according to
learning procedure ,it classifies the queries. Appropriate classification occurs in
our system because of best learning approaches and by designing concerns.
5.3.Proposed Techinque:
Our propose work contains the unique idea that compares SQL query strings and
blocks suspicious sql-query and passes original sql-query.
ExOriginal query=select * from admin where uid=’1
Suspecious query=select * from admin where uid=” OR 1=1;--‘
Here, original query is passed and suspicious query is blocked.
Word-list contains the tokens of sql-query strings.
‘O’-Original query
‘S‘ -Suspecious query
Ex- (‘O’) select * from admin where uid = ‘1’;
(„S‟) select * from admin where uid = ‘’OR 1=1;--‘
(„O‟) select * from admin where uid =’1’ && pwd =’abc’;
(„S‟) select * from admin where uid = ‘’OR 1=1;--‘
Tokens:-
t1=‟select‟,t2=‟*from‟,t3=‟admin‟,t4=‟where‟,t5=‟uid‟,t6=‟„,t7=‟OR‟,t9=‟1‟,t1
0=‟&&‟,t11=‟pwd=‟,t12=‟ab c‟,t13=‟=‟
Word-list contains various tokens of named t1…t13,which are listed above. Vector
of string is created, and classifier classifies the original and suspicious query.
Algorithm:
Step 1. Select a reasonable amount as the training set.
Step 2. Input the SQL-Query string.
Step 3. Feed the training set into the SVM-Train process to generate a model.
Step 4. Now we are ready to make prediction.
Step 5. Now classify the model using SVM classifier.
Step 6. Labeled output will give us the accuracy of our algorithm.
Step 7. Repeat step 2 to 6 till the correct classification precision is achieved.
5.4.Testing:
The proposed technique has been tested on a SQL-query string dataset.
The dataset has been populated with the records of Original SQL-query
string(„O‟) and Suspicious SQL-query string(„S‟) and was tested.
Detection time(in seconds) is calculated by taking average of 100 queries of
Original SQL-query string and 100 queries of suspicious SQL-injection qery
string.
6. Language and Platform:
Language
Python
HTML, CSS, Javascript.
Platform:
Python runtime environment
JavaScript engine
Tensor flow
Scikit learn
Numpy
7.Design & Deployment:
Design:
Here am going to develop a model that can reduce the likelihood that
SQLIA will launch in our web application. I am going to develop the model
using Machine learning with SVM algorithms is used to predict SQLIA
runtime monitoring.
Deployment:
There is no deployment because we implement the program and algorithm
through the Machine learning platform also the web application we have to
run through the localhost html.
8.Schedule:
Sl.No. Month Week Work
1
2 Choosing base paper
1 Sep
3
4 Understanding base paper
1
2 Understanding existing system and design
2 Oct
3 Finding problem statement and future work
4
1 Learning language and tools
2
3 Nov Literature review
3
4 Gathering Requirements
1
Designing architecture and development
4 Dec 2
3 Building SQLIA detection model
4
5 Jan
to Second Phase
Mar
Reference:
1. T.P.Latchoumi, Manoj Sahit Reddy, K.Balamurugan.(February 2020) Applied Machine Learning
Predictive Analytics to SQL Injection Attack Detection and Prevention.
2. Li, Q., Wang, F., Wang, J., & Li, W. (2019). LSTM-based SQL injection detection method for an
intelligent transportation system. IEEE Transactions on Vehicular Technology, 68(5), 4182-4191.
3. Justin Clarke. SQL Injection Attacks and Defense [M]. Elsevier.2012.
4. Wang Zhihu. Research of SQL injection attack and prevention measures. Coal Technology, 2011, 30
(1): 95-97.
5. Li, Q., Li, W., Wang, J., & Cheng, M. (2019). A SQL Injection Detection Method Based on Adaptive
Deep Forest. IEEE Access, 7, 145385-145394.
6. Diksha Patil, Sheetal Sonamble, Prof.V.M. Kharache. SQL Injection Detection and prevention. ,
(April-2018) International Journal of Scientific & Engineering Research Volume 9.
7. Jagdish Halde, San Jose State University. SQL Injection analysis, Detection and Prevention.(2008) San
Jose State University.
8. Stephen W. Boyd and Angelos D. Keromytis. SQLrand: Preventing SQL Injection Attacks, Department
of Computer Science, Columbia University.
9. Tensorflow documentation: https://www.tensorflow.org/tutorials.
10. Sckitlearn documentation: https://scikit-learn.org/stable/tutorial/index.html.
11. Numpy documentation: https://numpy.org/doc/stable/user/whatisnumpy.htm.
12. Python documentation: https://docs.python.org/3/tutorial/index.html.
13. JavaScript documentation: https://www.w3schools.com/js/DEFAULT.asp.