🛡️ Website Phishing Detection

This project implements a Machine Learning and Deep Learning hybrid approach to detect phishing websites. By analyzing URLs and their associated features, the system predicts whether a given website is legitimate or phishing, leveraging multiple ML algorithms and neural networks for improved accuracy.

📌 Features

📊 Collects phishing URLs from PhishTank and legitimate URLs from University of New Brunswick datasets
🔍 Extracts 17 handcrafted features (address bar, domain, HTML & JavaScript based)
🧠 Trains multiple ML & DL models including Decision Trees, Random Forest, XGBoost, SVMs, Autoencoders, and MLPs
📈 Evaluates models and compares performance
🏆 Best performing model: XGBoost (86.4% accuracy)
💾 Saves trained model as .pickle file for future predictions

🛠️ Tech Stack

Purpose	Technology Used
Data Collection	PhishTank, UNB Dataset
Feature Extraction	Python, Regex, Pandas
ML Models	Decision Tree, Random Forest, SVM, XGBoost
DL Models	Autoencoder, MLP
Model Persistence	Pickle
Visualization	Matplotlib, Seaborn

🚀 How to Run

Clone the Repo

git clone https://github.com/jasoncobra3/Website_Phishing_Detection.git
cd Website_Phishing_Detection

Create Virtual Environment

python -m venv phishing_env
 # Windows:
 phishing_env\Scripts\activate
 # macOS/Linux:
 source phishing_env/bin/activatee

Install Dependencies
```
pip install -r requirements.txt
```
Run Jupyter Notebooks
```
jupyter notebook
```

Open URL Feature Extraction.ipynb - extract features
Open Phishing Website Detection_Models & Training.ipynb - train & evaluate models

📊 Results

✅ XGBoost achieved 86.4% accuracy, outperforming other models
✅ Random Forest and SVM performed moderately well
✅ Autoencoder and MLP showed promising results for DL integration

🌟 Future Work

🌐 Develop a browser extension to detect phishing in real-time
🎨 Build a GUI/web app for user-friendly phishing detection
🔄 Improve hybrid ML-DL pipeline with larger datasets

📄 References

PhishTank Dataset : https://www.phishtank.com/developer_info.php
UNB Dataset : https://www.unb.ca/cic/datasets/url-2016.html
UCI Phishing Features : https://archive.ics.uci.edu/ml/datasets/Phishing+Websites

🤝 Contributing

Feel free to fork, star, or submit a pull request to contribute improvements!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DataFiles		DataFiles
Phishing Website Detection_Models & Training.ipynb		Phishing Website Detection_Models & Training.ipynb
README.md		README.md
URL Feature Extraction.ipynb		URL Feature Extraction.ipynb
URLFeatureExtraction.py		URLFeatureExtraction.py
XGBoostClassifier.pickle.dat		XGBoostClassifier.pickle.dat
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

🛡️ Website Phishing Detection

📌 Features

🛠️ Tech Stack

🚀 How to Run

📊 Results

🌟 Future Work

📄 References

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

jasoncobra3/Website_Phishing_Detection

Folders and files

Latest commit

History

Repository files navigation

🛡️ Website Phishing Detection

📌 Features

🛠️ Tech Stack

🚀 How to Run

📊 Results

🌟 Future Work

📄 References

🤝 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages