Paper • Website • Video • Citation
-
This is the official implementation of "Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision-Based Approach"USENIX'22 link to paper, link to our website
-
Existing reference-based phishing detectors:
- ❌ Subject to false positive because they only capture brand intention
-
The contributions of our paper:
- ✅ We propose a referenced-based phishing detection system that captures both brand intention and credential-taking intention. To the best of our knowledge, this is the first work that analyzes both brand intention and credential-taking intentions in a systematic way for phishing detection.
- ✅ We set up a phishing monitoring system. It reports phishing webpages per day with the highest precision in comparison to state-of-the-art phishing detection solutions.
Input: a screenshot, Output: Phish/Benign, Phishing target
-
Step 1: Enter Abstract Layout detector, get predicted elements
-
Step 2: Enter Siamese Logo Comparison
- If Siamese report no target,
Return Benign, None - Else Siamese report a target, Enter step 3 CRP classifier
- If Siamese report no target,
-
Step 3: CRP classifier
- If CRP classifier reports its a CRP page, go to step 5 Return
- ElIf not a CRP page and havent execute CRP Locator before, go to step 4: CRP Locator
- Else not a CRP page but have done CRP Locator before,
Return Benign, None
-
Step 4: CRP Locator
- Find login/signup links and click, if reach a CRP page at the end, go back to step 1 Abstract Layout detector with an updated URL and screenshot
- Else cannot reach a CRP page,
Return Benign, None
-
Step 5:
- If reach a CRP + Siamese report target:
Return Phish, Phishing target - Else
Return Benign, None
- If reach a CRP + Siamese report target:
|_ configs: Configuration files for the object detection models and the gloal configurations
|_ modules: Inference code for layout detector, CRP classifier, CRP locator, and OCR-aided siamese model
|_ models: the model weights and reference list
|_ ocr_lib: external code for the OCR encoder
|_ utils
|_ configs.py: load configuration files
|_ phishintention.py: main script
-
Prerequisite: Pixi installed
-
For Linux/Mac,
export KMP_DUPLICATE_LIB_OK=TRUE git clone https://github.com/lindsey98/PhishIntention.git cd PhishIntention pixi install chmod +x setup.sh ./setup.sh
-
For Windows,
git clone https://github.com/lindsey98/PhishIntention.git cd Phishpedia pixi install setup.bat
- Check your chrome binary version, you can do so by typing
chrome://version/in your browser, or typegoogle-chrome --versionfrom the command line. - Download the corresponding chromedriver from this repository. For example, if you are using
135.0.7049.42on Linux, then you should look for135.0.7049.42 chromedriver-linux64.zip. - Unzip the downloaded zip, put the
chromedriver.exeunder./chromedriver-linux64/.
When you run the scripts for the 1st time, the reference list needs to be loaded, this may take some time.
pixi run python phishintention.py --folder <folder you want to test e.g. datasets/test_sites> --output_txt <where you want to save the results e.g. test.txt>The testing folder should be in the structure of:
test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
......
- In our paper, we also implement several phishing detection and identification baselines, see here
Please consider citing our work :)
@inproceedings{liu2022inferring,
title={Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach},
author={Liu, Ruofan and Lin, Yun and Yang, Xianglin and Ng, Siang Hwee and Divakaran, Dinil Mon and Dong, Jin Song},
booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
year={2022}
}If you have any issues running our code, you can raise an issue or send an email to [email protected], [email protected], [email protected]