A classic machine learning project for text classification, now built with a clean, modern web interface using Python and Flask.
This project implements a Natural Language Processing (NLP) model to solve a common problem: spam filtering. Originally a command-line tool, it has been transformed into an interactive web application where users can enter any message and instantly see whether it's classified as spam or legitimate ("ham").
It uses a Naive Bayes classifier, a simple yet powerful algorithm well-suited for text-based tasks, to learn the patterns that differentiate spam from ham.
- Text Vectorization: Converting text messages into numerical features using CountVectorizer
- Naive Bayes Classifier: Implementing and training a probabilistic model for text classification
- Scikit-learn Pipelines: Building a clean, reusable workflow that chains preprocessing and modeling steps
- Web Development with Flask: Creating routes, handling form submissions, and rendering dynamic templates
- Frontend Integration: Using HTML and Tailwind CSS to build a responsive and user-friendly interface
- Full-Stack Connection: Wiring a Python machine learning backend to a web frontend to create a complete application
- Python: Core programming language
- Flask: Web framework for the backend
- Scikit-learn: For the machine learning pipeline and model
- Pandas: For data manipulation
- HTML & Tailwind CSS: For the frontend user interface
- Pytest: For running the automated tests
The core of the model is a scikit-learn Pipeline that automates the workflow. The Flask application loads this trained model on startup and uses it to serve predictions through a web interface.
self.model = Pipeline([
('vectorizer', CountVectorizer()),
('classifier', MultinomialNB()),
])- Model Training: When the Flask application starts, it loads the spam.csv dataset and trains the Naive Bayes model just once
- User Input: A user visits the web page and submits a message through an HTML form
- Backend Prediction: The Flask backend receives the message, feeds it to the pre-trained model, and gets a prediction (spam/ham) along with the confidence probabilities
- Display Results: The application re-renders the webpage, dynamically displaying the result, confidence scores, and a clean visual summary
git clone https://github.com/username/sms-spam-detector.git
cd sms-spam-detector
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
flask run
To verify that the data handling and model components are working correctly, you can run the test suite.
pytestYou can install all dependencies from the requirements.txt file.
requirements.txt
Flask==3.0.0
scikit-learn==1.4.2
pandas==2.2.1
numpy==1.26.4
pytest==7.4.0
pytest-cov==4.1.0
Python 3.8 or higher
The project is organized to separate the Flask application logic from the machine learning model code.
sms-spam-detector/
├── app.py # Main Flask application file
├── requirements.txt # Project dependencies
├── data/
│ └── spam.csv # The training dataset
├── spam_detector/
│ ├── __init__.py
│ ├── data_handler.py # Functions for loading and cleaning data
│ └── model.py # The SpamDetector class and ML logic
└── templates/
└── index.html # The HTML file for the user interface
Note: A tests/ directory containing test_model.py can be included for development and validation.
- Use TF-IDF Vectorization: Instead of simple word counts, use Term Frequency-Inverse Document Frequency (TF-IDF) for potentially better feature representation
- Try Other Models: Experiment with other classifiers like Logistic Regression or Support Vector Machines (SVM) to compare performance
- Containerize: Package the application with Docker for easier deployment and scalability
- Deploy to the Cloud: Host the application on a service like Heroku, Vercel, or AWS so anyone can access it
Evan William
Version 2.0 (2025)
This project was an incredible learning journey that transformed a simple command-line script into a complete, production-ready ML web application! It's been amazing to see how all the pieces come together - from raw text data to a sleek web interface that anyone can use.
If you have any feedback or suggestions, feel free to open an issue or pull request!
This project is for educational purposes. Feel free to fork, modify, or use it for your own learning.