TuringPass is a captcha solver detection system that determines whether a captcha is solved by a human or a bot. By analyzing behavioral biometrics and leveraging deep learning, TuringPass aims to enhance CAPTCHA security through anomaly detection and behavioral analysis β going beyond just checking if the solution is correct.
Traditional CAPTCHA systems are no longer reliable. Bots powered by advanced ML models can now solve text-based CAPTCHAs with high accuracy, making correctness alone an insufficient measure of human verification.
π A better approach should:
- Validate the solution, and
- Analyze how the user arrives at it.
π‘ This project explores both the attack vector (bot solving captchas) and the defense mechanism (detecting automated solvers using behavioral biometrics and anomaly detection).
- Captures user interactions while solving captchas using a Flask-based web interface.
- Uses an LSTM Autoencoder trained on human behavior to flag bot-like actions as anomalies.
- Simulates bot behavior via:
- Random mouse movements
- Timed keystrokes
- OCR-based captcha solving using a CNN model
- Frontend is built using Tailwind CSS for a clean and responsive UI.
DemoCS.mp4
TuringPass/
β
βββ backend/ # Backend code
β βββ app.py # Contains endpoints for OCR inference and human/bot autoencoder inference
β βββ captcha_model.h5 # OCR model
β βββ lstm_autoencoder_model.h5 # Trained LSTM Autoencoder
β βββ scaler.save
β
βββ models/ # Model Training code
β βββ captcha/ # CAPTCHA solving module
β β βββ README.md
β β βββ human_captcha_solve_data.csv
β β βββ inference.ipynb
β β βββ ocr-model.ipynb
β β βββ ocr_model.h5
β β
β βββ human-bot/ # Human vs Bot anomaly detection module
β β βββ README.md
β β βββ anomaly.ipynb
β β βββ ci_main_cleaned.csv
β β βββ lstm_autoencoder_model.h5
β β βββ preprocessing.ipynb
β β βββ scaler.save
β
βββ static/
β βββ dataset/ # CAPTCHA dataset images
β
βββ templates/ # HTML templates for frontend rendering
β
βββ app.py # Flask file integrating human data collection and bot detection
βββ captcha_interactions.csv # Human interaction data collected
βββ captcha_solver_bot.js # Advanced bot
βββ captcha_solver_simple.js # Script for simple CAPTCHA solving bot
βββ demo.mp4 # Demo video showcasing the project
βββ requirements.txt # Python dependencies
βββ README.md
To build a robust model for detecting non-human behavior, we collected 2,000+ CAPTCHA solving sessions from 20+ real participants, capturing fine-grained behavioral data during each interaction.
We recorded:
- π± Mouse Coordinates
Tracked continuously to analyze movement smoothness, path variance, and reaction time. - β¨οΈ Keystroke Events
Logged every key press/release along with timestamps, including usage of backspace to detect natural corrections. - β± Timestamps
Captured time per character, total session time, and inter-event delays to model decision-making pace. - π‘ Final Typed Response
Compared with actual CAPTCHA values to validate correctness and analyze error tendencies.
These features were critical for:
- Training an LSTM Autoencoder on genuine human behavior patterns.
- Setting anomaly detection thresholds using reconstruction error distributions.
- Simulating and distinguishing realistic vs bot-like interactions.
With this multi-dimensional dataset, we created a behavioral fingerprint for every sessionβenabling precise and interpretable detection of automated activity.
- Tracks mouse coordinates, keypresses, and timing during captcha solving.
- Trains an LSTM Autoencoder on normal (human) behavior.
- At inference time, calculates reconstruction error β high error implies anomalous (bot) behavior.
- Programmatically simulates user actions with:
- Noisy/randomized mouse movement paths
- Fixed or abnormal timing for typing
- Uses a CNN-based OCR model to read captcha images and solve them.
- Custom CNN model trained to recognize alphanumeric characters from captchas.
- Dataset used:
Kaggle - Captcha Dataset
git clone https://github.com/yourusername/TuringPass.git
cd TuringPasspip install -r requirements.txtcd app
python app.pycd backend
python app.pyML Models Used:
- π§ LSTM Autoencoder β Anomaly detection on behavioral data
- π CNN β Captcha OCR (Optical Character Recognition)
This project is licensed under the MIT License.
See the LICENSE file for more details.
- The captcha OCR model is trained using the open-source dataset: Kaggle - Captcha Dataset
- Inspired by real-world security challenges in distinguishing between bots and humans in critical applications.