PhishBowl

Introduction

This project is comprised of 2 parts: the PhishBowl, a phishing email dataset, and PhishNet, a phishing detection tool. Emails can be added to the PhishBowl via the API and PhishNet will automatically detect similar phishing scams in the future. You can analyze both email texts and screenshots of emails using PhishNet.

PhishNet uses an ensemble model comprised of a prompt-engineered GPT-4o and a modified weighted k-NN classifier to classify both common and new types of phishing scams. Essentially, the k-NN classifier allows the model to quickly learn and detect new types of scams, while the GPT-based model validates the k-NN classifier using its vast internal knowledge.

The k-NN classifier is modified in a way such that it can classify both positive and negative labels with only positive (phishing) labels in the PhishBowl. Thus, the classifier's false positive rate should not increase drastically as more and more phishing emails are added to the PhishBowl. Of course, the accuracy can be improved by also adding benign emails to the PhishBowl to help prevent false positives.

Usage

Install Docker

Add the following .env file inside /api:

env="prod"  # prod | stage | dev
AZURE_OPENAI_API_KEY="YOUR_API_KEY"
AZURE_OPENAI_ENDPOINT="YOUR_ENDPOINT"

Add an empty .env file inside /app
Compose up docker-compose.yaml

That's it! Go to localhost:3000 to use the interactive PhishNet webapp, or localhost:8000 to try out the API.

Note that for PhishNet to work, you'll need to add a few sample emails to the PhishBowl first. To see how you can populate Phishbowl using a dataset or evaluate different PhishNets, go to the Advanced Usage section.

Development

Install Docker

Add the following .env file inside /api:

env="dev"  # prod | stage | dev
AZURE_OPENAI_API_KEY="YOUR_API_KEY"  # for GPTPhishNet and AzureDB
AZURE_OPENAI_ENDPOINT="https://YOUR_ENDPOINT"  # for GPTPhishNet and AzureDB
HUGGINGFACE_TOKEN_READ="hf_YOUR_HUGGINGFACE_READ_TOKEN"  # for HFBERTPhishNet
HUGGINGFACE_TOKEN_WRITE="hf_YOUR_HUGGINGFACE_WRITE_TOKEN"  # for FineTunedBERTPhishNet

Add an empty .env file inside /app

To enable linting inside your preferred IDE, create a Python virtual environment and install the Python dependencies. Likewise, install the JavaScript dependencies for JS linting.

python -m venv venv
source venv/bin/activate  # for unix
./venv/Scripts/activate  # for windows
python -m pip install -r api/requirements.txt

cd app
npm install yarn
yarn install

Compose up docker-compose-dev.yaml for a build with hot-reloading enabled on the webapp upon file changes

Advanced Usage

There are several features available through api/main.py such as evaluating different PhishNets and populating the PhishBowl. To use them, ensure either docker-compose.yaml or docker-compose-dev.yaml is running, and either attach a shell to api or run the commands directly by appending them after docker exec -it phishbowl-api-1.

To get all available commands, run python main.py -h.

Populating the PhishBowl

To analyze emails using PhishNet, you will need to add a few sample emails to the PhishBowl. You can either add your own emails via the API (localhost:8000), or run python main.py populate to populate the PhishBowl using a curated dataset. You will need to download the dataset yourself and add them to /api/services/data/curated. You will get a more detailed instruction when running populate for the first time.

For the first time you run populate after downloading the dataset, the data loader will load all emails in memory before shuffling, splitting into train and test sets, and saving to file. After this step, you may delete the /api/services/data/curated directory and future loading will happen lazily, thus not using as much memory.

Evaluating PhishNets

There are several different PhishNets implemented already, such as GPTPhishNet, SemanticPhishNet, and FineTunedBERTPhishNET. Some of these PhishNets will require different API keys, as shown in the Development section. You can evaluate different PhishNets by running python main.py eval NAME. There are other optional arguments such as a retrain flag or batch size which is important if your API is rate limited. For more details, run python main.py eval -h.

Testing

There are several tests scripts which can be called with the pytest command inside the api container. Reference the Pytest usage guide more details on running a specific test. For local usage, it is recommended to add the -v flag for verbose outputs.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
api		api
app		app
chromadb		chromadb
nginx-dev		nginx-dev
.gitignore		.gitignore
README.md		README.md
docker-compose-dev.yaml		docker-compose-dev.yaml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishBowl

Introduction

Usage

Development

Advanced Usage

Populating the PhishBowl

Evaluating PhishNets

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Orbital-Web/PhishBowl

Folders and files

Latest commit

History

Repository files navigation

PhishBowl

Introduction

Usage

Development

Advanced Usage

Populating the PhishBowl

Evaluating PhishNets

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages