Thanks to visit codestin.com
Credit goes to github.com

Skip to content

amri-tah/TuringPass

Repository files navigation

🧠 TuringPass

TuringPass is a captcha solver detection system that determines whether a captcha is solved by a human or a bot. By analyzing behavioral biometrics and leveraging deep learning, TuringPass aims to enhance CAPTCHA security through anomaly detection and behavioral analysis β€” going beyond just checking if the solution is correct.


🧩 Problem Statement

Traditional CAPTCHA systems are no longer reliable. Bots powered by advanced ML models can now solve text-based CAPTCHAs with high accuracy, making correctness alone an insufficient measure of human verification.

πŸ”’ A better approach should:

  • Validate the solution, and
  • Analyze how the user arrives at it.

πŸ’‘ This project explores both the attack vector (bot solving captchas) and the defense mechanism (detecting automated solvers using behavioral biometrics and anomaly detection).


πŸ” What It Does

  • Captures user interactions while solving captchas using a Flask-based web interface.
  • Uses an LSTM Autoencoder trained on human behavior to flag bot-like actions as anomalies.
  • Simulates bot behavior via:
    • Random mouse movements
    • Timed keystrokes
    • OCR-based captcha solving using a CNN model
  • Frontend is built using Tailwind CSS for a clean and responsive UI.

πŸŽ₯ Demo

DemoCS.mp4

Image

πŸ—οΈ Project Architecture

TuringPass/
β”‚
β”œβ”€β”€ backend/                  # Backend code
β”‚   β”œβ”€β”€ app.py            # Contains endpoints for OCR inference and human/bot autoencoder inference
β”‚   β”œβ”€β”€ captcha_model.h5  # OCR model 
β”‚   β”œβ”€β”€ lstm_autoencoder_model.h5   # Trained LSTM Autoencoder
β”‚   └── scaler.save          
β”‚
β”œβ”€β”€ models/                          # Model Training code
β”‚   β”œβ”€β”€ captcha/                    # CAPTCHA solving module
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ human_captcha_solve_data.csv
β”‚   β”‚   β”œβ”€β”€ inference.ipynb
β”‚   β”‚   β”œβ”€β”€ ocr-model.ipynb
β”‚   β”‚   └── ocr_model.h5
β”‚   β”‚
β”‚   β”œβ”€β”€ human-bot/                  # Human vs Bot anomaly detection module
β”‚   β”‚   β”œβ”€β”€ README.md
β”‚   β”‚   β”œβ”€β”€ anomaly.ipynb
β”‚   β”‚   β”œβ”€β”€ ci_main_cleaned.csv
β”‚   β”‚   β”œβ”€β”€ lstm_autoencoder_model.h5
β”‚   β”‚   β”œβ”€β”€ preprocessing.ipynb
β”‚   β”‚   └── scaler.save
β”‚
β”œβ”€β”€ static/
β”‚   └── dataset/                         # CAPTCHA dataset images
β”‚
β”œβ”€β”€ templates/                           # HTML templates for frontend rendering
β”‚
β”œβ”€β”€ app.py                               # Flask file integrating human data collection and bot detection
β”œβ”€β”€ captcha_interactions.csv             # Human interaction data collected 
β”œβ”€β”€ captcha_solver_bot.js                # Advanced bot
β”œβ”€β”€ captcha_solver_simple.js             # Script for simple CAPTCHA solving bot
β”œβ”€β”€ demo.mp4                             # Demo video showcasing the project
β”œβ”€β”€ requirements.txt                     # Python dependencies
└── README.md                         

πŸ“Š Data Collection

To build a robust model for detecting non-human behavior, we collected 2,000+ CAPTCHA solving sessions from 20+ real participants, capturing fine-grained behavioral data during each interaction.

We recorded:

  • πŸ–± Mouse Coordinates
    Tracked continuously to analyze movement smoothness, path variance, and reaction time.
  • ⌨️ Keystroke Events
    Logged every key press/release along with timestamps, including usage of backspace to detect natural corrections.
  • ⏱ Timestamps
    Captured time per character, total session time, and inter-event delays to model decision-making pace.
  • πŸ”‘ Final Typed Response
    Compared with actual CAPTCHA values to validate correctness and analyze error tendencies.

These features were critical for:

  • Training an LSTM Autoencoder on genuine human behavior patterns.
  • Setting anomaly detection thresholds using reconstruction error distributions.
  • Simulating and distinguishing realistic vs bot-like interactions.

With this multi-dimensional dataset, we created a behavioral fingerprint for every sessionβ€”enabling precise and interpretable detection of automated activity.

πŸ’‘ How It Works

πŸ‘€ Human Behavior Detection

  • Tracks mouse coordinates, keypresses, and timing during captcha solving.
  • Trains an LSTM Autoencoder on normal (human) behavior.
  • At inference time, calculates reconstruction error β€” high error implies anomalous (bot) behavior.

πŸ€– Bot Simulation

  • Programmatically simulates user actions with:
    • Noisy/randomized mouse movement paths
    • Fixed or abnormal timing for typing
  • Uses a CNN-based OCR model to read captcha images and solve them.

🧠 OCR Captcha Solver


πŸš€ Getting Started

1. Clone the Repository

git clone https://github.com/yourusername/TuringPass.git
cd TuringPass

2. Install Dependencies

pip install -r requirements.txt

3. Run the Flask App

cd app
python app.py

4. Run the Bot Simulator (Optional)

cd backend
python app.py

πŸ“¦ Tech Stack

My Skills

ML Models Used:

  • 🧠 LSTM Autoencoder β€” Anomaly detection on behavioral data
  • πŸ”Ž CNN β€” Captcha OCR (Optical Character Recognition)

πŸ“„ License

This project is licensed under the MIT License.
See the LICENSE file for more details.


πŸ™Œ Acknowledgements

  • The captcha OCR model is trained using the open-source dataset: Kaggle - Captcha Dataset
  • Inspired by real-world security challenges in distinguishing between bots and humans in critical applications.

About

A CAPTCHA solver detection system using behavioral analysis πŸ§ πŸ‘¨β€πŸ’»

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •