DETECTION OF
PHISHING WEB
PAGE
E C 9 5 60 DATA M IN I N G
2 02 0 /E /155
• Objective
• Scope Identification
• Data Description
Contents
• Comprehensive study on
Data
• Data pre-processing
2
OBJECTIVE
To develop a ML classifier to predict whether a
website is a phishing or legitimate considering the
URL of the web page.
Scope
Examining the relationships
between URL characteristics such
as length, special characters,
number of redirections, and
phishing likelihood. This scope is
crucial for identifying patterns in
URL Pattern Analysis URLs that may distinguish phishing
websites from legitimate ones.
Insights from this analysis can help
improve the model's accuracy in
real-time phishing detection,
supporting cybersecurity measures
and user safety.
4
• This dataset was collected from
Kaggle.
• Each row represents a website DATA
with last column representing
whether it is phishing or not. DESCRIPTION
• Contains totally 100077 web
pages each with 20 features.
Comprehensive study on Data
About the Data Distribution Analysis
Q2 Q4
Q1 Q3
Data Visualization Correlation Analysis
6
Click to add photo
About the Data
7
Handling
Null Values
8
Data Visualization & Distribution
Analysis
9
Box Plot
10
Pair Plot
11
Correlation Analysis
12
Correlation Matrix
13
• REMOVE DUPLICATES
• OUTLIER DETECTION
AND REMOVING DATA
• FEATURE SCALING PREPROCESSING
• TRAIN TEST SPLIT
Handle Duplicates
• As they all are numerical
values, there will be some
duplicate values like 0.
• No need for categorical
encoding
15
Outlier detection and Removal
• Z-Score method
• Removes the rows with values
beyond a threshold.
16
Feature Scaling
• Features and Target were
separated
• Standard scaler for
normalizing the data to bring
all the features to a similar
scale
17
Train-Test Split
• 15% for Test
• 15% for validation
• 70% for Test
18
Future Works
FEATURE
SELECTION
MODEL
BUILDING
MODEL
EVALUATION
19
THANK
YOU
T H A RAN YA A. R
2 02 0 /E /155