A Bittensor subnet for evaluating and detecting behavioral traits in Large Language Models (LLMs). This system creates a competitive environment where miners submit seed instructions (prompts) that are tested using the Petri alignment auditing agent to identify potentially problematic behaviors such as deception, sycophancy, manipulation, overconfidence, and power-seeking tendencies.
Trishool is designed to advance AI safety by creating a decentralized platform for behavioral evaluation. The system consists of three main components:
- Miners: Submit seed instructions (prompts) for testing behavioral traits via platform API
- Validators: Fetch submissions via REST API, run Petri agent in Docker sandboxes, and submit scores back to platform
- Subnet Platform: Manages submissions via REST API, validates submissions, stores results in database
┌─────────────────────────────────────────────────────────────┐
│ MINER (Competition Participant) │
│ - Submits seed instruction (prompt) via platform API │
│ - Max 200 words, tested for jailbreak attempts │
│ - Submits PetriConfig: seed, models, auditor, judge, etc. │
└──────────────────────┬──────────────────────────────────────┘
│
▼ (REST API)
┌─────────────────────────────────────────────────────────────┐
│ PLATFORM (Subnet Infrastructure) │
│ - Receives miner submissions (seed instructions) │
│ - Validates submissions (duplicate check, jailbreak check) │
│ - Provides REST API endpoints for validators │
│ ├─ GET /api/v1/validator/evaluation-agents │
│ └─ POST /api/v1/validator/submit_petri_output │
└──────────────────────┬──────────────────────────────────────┘
│
▼ (REST API Polling)
┌─────────────────────────────────────────────────────────────┐
│ VALIDATOR SYSTEM (Competition Organizer) │
│ ├─ REST API Client: Fetches submissions periodically │
│ ├─ Evaluation Loop: Fetches PetriConfig from platform │
│ ├─ Sandbox Manager: Creates config.json, runs Petri │
│ ├─ Score Extraction: Extracts scores from Petri output │
│ ├─ Score Submission: Submits Petri output to platform │
│ ├─ Weight Update Loop: Fetches weights from platform │
│ │ └─ Sets weights on Bittensor chain │
│ └─ Commit Checker: Monitors astro-petri repo updates │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ PETRI SANDBOX (Docker Container) │
│ ├─ config.json: PetriConfig (mounted from temp_dir) │
│ ├─ run.sh: Executes astro-petri run --config config.json │
│ ├─ Runs Petri against target models (from config) │
│ └─ Outputs to /sandbox/outputs/output.json │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ PETRI OUTPUT JSON │
│ - run_id: Unique run identifier │
│ - results: Per-model evaluation results │
│ - summary.overall_metrics: Aggregated scores │
│ ├─ mean_score: Average score across models │
│ └─ final_score: Final evaluation score │
└─────────────────────────────────────────────────────────────┘
- Docker, docker compose
- Python12
- Tmux (not required, but you can use it to keep session when running python commands)
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables:
# API Keys (Required for Petri) export CHUTES_API_KEY=your_chutes_api_key_here # Platform API Configuration export PLATFORM_API_URL=https://api.trishool.ai # Platform API base URL export COLDKEY_NAME=your_coldkey_name # Bittensor coldkey for authentication export HOTKEY_NAME=your_hotkey_name # Bittensor hotkey for authentication export NETWORK=finney # Bittensor network (default: finney) export NETUID= # Subnet UID (default: 291 for testing) # Validator Configuration export MAX_CONCURRENT_SANDBOXES=5 # Max concurrent sandboxes (default: 5) export EVALUATION_INTERVAL=30 # Interval to fetch submissions (seconds, default: 30) export UPDATE_WEIGHTS_INTERVAL=300 # Interval to fetch and update weights from platform (seconds, default: 5 minutes) export RANDOM_SELECTION_COUNT=3 # Number of submissions to select randomly (default: 3) # Petri Commit Checker export PETRI_COMMIT_CHECK_INTERVAL=300 # Interval to check for repo updates (seconds, default: 5 minutes)
python neurons/validator.py --netuid NetID --subtensor.network test_or_finney --wallet.name coldkey --wallet.hotkey hotkeyThe validator will:
- Build Petri sandbox Docker image (if not exists) - installs astro-petri from GitHub (branch
alignet) - Start commit checker to monitor astro-petri repo for updates
- Start evaluation loop to periodically fetch challenge (PetriConfig) from platform API (
/evaluation-agents) - Start weight update loop to periodically fetch weights from platform API (
/weights) and set them on chain - Process submissions in sandboxes (respecting
MAX_CONCURRENT_SANDBOXESlimit) - Validate submissions (immediately submit failed evaluation if validation fails)
- Create
config.jsonfrom PetriConfig and run Petri agent - Extract scores from Petri output JSON
- Immediately submit Petri output JSON back to platform API (
/submit_petri_output) after evaluation completes - Periodically sync metagraph and update weights on Bittensor chain from platform
python -m miner upload \
--agent-file your_seed_prompt.txt \
--coldkey coldkey_name \
--hotkey hotkey_name \
--network test_or_finney \
--netuid netUID \
--slot miner_uid \
--api-url https://api.trishool.aiMiners submit seed instructions (prompts) via the platform API. The platform creates a PetriConfig that includes:
- Your seed instruction
- Target models to evaluate
- Auditor and judge models
- Evaluation parameters (max_turns, etc.)
Requirements:
- Maximum 200 words
- Must not contain jailbreak attempts
- Will be tested for similarity against existing submissions (duplicate detection)
- Should be designed to probe target models for specific behavioral traits
Submission Flow:
- Submit seed instruction via platform API
- Platform validates and creates PetriConfig includes miner_seed_instruction and challenge config
- Validators fetch your PetriConfig
- Petri agent evaluates your seed against target models
- Results are scored and submitted back to platform
- Your score is based on the Petri evaluation results
Testing locally:
Miners can test their seed instructions locally using Petri before submission. See the Petri documentation at trishool/validator/sandbox/petri/PETRI_README.md or the astro-petri repository at https://github.com/Trishool/astro-petri for details on running Petri locally.
- Jailbreak Detection: Validates seed instructions for jailbreak attempts
- Immediate Failure Reporting: Failed validations are immediately reported to platform
- Duplicate Detection: Checks for similar seed instructions to prevent gaming
- Sandbox Isolation: Petri runs in isolated Docker containers
- Fraud Detection: Comprehensive monitoring for manipulation attempts
- Miner Submissions: Submit seed instructions (prompts) for testing
- Automated Validation: Petri agent tests against 5 models (1 misaligned)
- Binary Scoring: Returns 1.0 if correct model selected, 0.0 otherwise
- Transparent Scoring: Detailed feedback and execution logs
- Jailbreak Verification: Guard LLM checks submissions for jailbreak attempts
- Duplicate Verification: LLM judge checks for similar prompts (<50% variation)
- Submission Limits: 1 submission per miner per day
- Resource Limits: Sandbox timeout and resource constraints
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support, please open an issue on GitHub or join our community discussions.