This project is an automated detection system designed to find exposed cryptocurrency private keys and seed phrases in public GitHub repositories in real-time. It uses a highly efficient, multi-stage pipeline that combines fast local analysis with powerful Large Language Model (LLM) verification to ensure both high speed and high accuracy.
The scanner operates on a fully parallel, multi-stage pipeline designed for maximum efficiency and responsiveness:
graph TD
A[GitHub Events API] --> B{New Push Event?}
B -->|Yes| C[Fetch Commit Files in Parallel]
C --> D[Local Analyzer: Regex & BIP-39 Scan]
D --> E{Potential Leaks Found?}
E -->|Yes| F[Add to LLM Analysis Queue]
E -->|No| G[End]
F --> H[LLM Worker Threads]
H --> I{Real Key Verified?}
I -->|Yes| J[Log Detection & Save File]
I -->|No| K[Discard]
- Real-Time Monitoring: Scans new commits from the public GitHub Events API as they happen.
- Intelligent Multi-Stage Analysis:
- A fast, parallelized download of all relevant files in a commit.
- A highly efficient local analysis using a library of specific regex patterns and an intelligent, multi-line BIP-39 seed phrase detector.
- A parallelized batch analysis of all potential leaks by a powerful LLM for final verification.
- Flexible LLM Backend: Supports both local LLMs via Ollama (recommended) and direct model loading via the Hugging Face
transformerslibrary. - Configurable: Easily switch between different LLMs, scanning modes, and providers through a simple and well-documented configuration file.
- Interactive: Allows you to skip repositories on the fly with a simple command (
s+ Enter) and limits the number of files scanned per repository to avoid getting bogged down. - Actionable Output: Provides clear, color-coded logs and saves the full content of any detected leak to a local directory with a detailed metadata header for easy review.
- Comprehensive Test Suite: Includes an accuracy test suite to verify and benchmark the performance of the LLM and the detection logic.
.
├── detected_leaks/ # Saved files with detected secrets
├── scanner/
│ ├── logs/
│ │ └── detections.log # Log file for all verified leaks
│ ├── __init__.py
│ ├── analyzer.py # Local analysis (regex & BIP-39)
│ ├── config.py # Your local (ignored) configuration
│ ├── llm_analyzer.py # LLM-based verification
│ └── main.py # Main script to monitor GitHub
├── scripts/
│ ├── clear_logs.sh # Utility to clear logs and leaks
│ └── proxy_tester/ # Advanced proxy testing utility
├── tests/
│ ├── samples/ # Sample files for accuracy testing
│ └── test_accuracy.py # Script to test LLM accuracy
├── .gitignore
├── README.md
├── requirements.txt
└── scanner/config.example.py # Template for configuration
-
Clone the repository.
-
Install dependencies:
pip install -r requirements.txt
-
Configure the scanner:
- Create your local configuration file:
cp scanner/config.example.py scanner/config.py - Edit
scanner/config.pyand add your GitHub Personal Access Token.
- Create your local configuration file:
-
Set up your LLM Provider (Ollama Recommended):
- Install Ollama from ollama.com.
- Download your chosen model (e.g.,
mistral:7b-instruct-v0.2-q4_1):ollama run mistral:7b-instruct-v0.2-q4_1
- Ensure the
LLM_PROVIDERin yourconfig.pyis set to"ollama"and themodelis correct.
-
Run the scanner (use
--verbosefor detailed logging):python3.9 -m scanner.main --verbose
-
Run the accuracy test to verify your setup:
python3.9 tests/test_accuracy.py