This project implements a Machine Learning and Deep Learning hybrid approach to detect phishing websites. By analyzing URLs and their associated features, the system predicts whether a given website is legitimate or phishing, leveraging multiple ML algorithms and neural networks for improved accuracy.
- 📊 Collects phishing URLs from PhishTank and legitimate URLs from University of New Brunswick datasets
- 🔍 Extracts 17 handcrafted features (address bar, domain, HTML & JavaScript based)
- 🧠 Trains multiple ML & DL models including Decision Trees, Random Forest, XGBoost, SVMs, Autoencoders, and MLPs
- 📈 Evaluates models and compares performance
- 🏆 Best performing model: XGBoost (86.4% accuracy)
- 💾 Saves trained model as .pickle file for future predictions
| Purpose | Technology Used |
|---|---|
| Data Collection | PhishTank, UNB Dataset |
| Feature Extraction | Python, Regex, Pandas |
| ML Models | Decision Tree, Random Forest, SVM, XGBoost |
| DL Models | Autoencoder, MLP |
| Model Persistence | Pickle |
| Visualization | Matplotlib, Seaborn |
- Clone the Repo
git clone https://github.com/jasoncobra3/Website_Phishing_Detection.git cd Website_Phishing_Detection - Create Virtual Environment
python -m venv phishing_env # Windows: phishing_env\Scripts\activate # macOS/Linux: source phishing_env/bin/activatee
- Install Dependencies
pip install -r requirements.txt
- Run Jupyter Notebooks
jupyter notebook
- Open URL Feature Extraction.ipynb - extract features
- Open Phishing Website Detection_Models & Training.ipynb - train & evaluate models
- ✅ XGBoost achieved 86.4% accuracy, outperforming other models
- ✅ Random Forest and SVM performed moderately well
- ✅ Autoencoder and MLP showed promising results for DL integration
- 🌐 Develop a browser extension to detect phishing in real-time
- 🎨 Build a GUI/web app for user-friendly phishing detection
- 🔄 Improve hybrid ML-DL pipeline with larger datasets
- PhishTank Dataset : https://www.phishtank.com/developer_info.php
- UNB Dataset : https://www.unb.ca/cic/datasets/url-2016.html
- UCI Phishing Features : https://archive.ics.uci.edu/ml/datasets/Phishing+Websites
Feel free to fork, star, or submit a pull request to contribute improvements!