Trishool Subnet

A Bittensor subnet for evaluating and detecting behavioral traits in Large Language Models (LLMs). This system creates a competitive environment where miners submit seed instructions (prompts) that are tested using the Petri alignment auditing agent to identify potentially problematic behaviors such as deception, sycophancy, manipulation, overconfidence, and power-seeking tendencies.

Overview

Trishool is designed to advance AI safety by creating a decentralized platform for behavioral evaluation. The system consists of three main components:

Miners: Submit seed instructions (prompts) for testing behavioral traits via platform API
Validators: Fetch submissions via REST API, run Petri agent in Docker sandboxes, and submit scores back to platform
Subnet Platform: Manages submissions via REST API, validates submissions, stores results in database

Architecture

┌─────────────────────────────────────────────────────────────┐
│  MINER (Competition Participant)                            │
│  - Submits seed instruction (prompt) via platform API       │
│  - Max 200 words, tested for jailbreak attempts             │
│  - Submits PetriConfig: seed, models, auditor, judge, etc.  │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼ (REST API)
┌─────────────────────────────────────────────────────────────┐
│  PLATFORM (Subnet Infrastructure)                           │
│  - Receives miner submissions (seed instructions)           │
│  - Validates submissions (duplicate check, jailbreak check) │
│  - Provides REST API endpoints for validators               │
│  ├─ GET /api/v1/validator/evaluation-agents                 │
│  └─ POST /api/v1/validator/submit_petri_output              │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼ (REST API Polling)
┌─────────────────────────────────────────────────────────────┐
│  VALIDATOR SYSTEM (Competition Organizer)                   │
│  ├─ REST API Client: Fetches submissions periodically       │
│  ├─ Evaluation Loop: Fetches PetriConfig from platform      │
│  ├─ Sandbox Manager: Creates config.json, runs Petri        │
│  ├─ Score Extraction: Extracts scores from Petri output     │
│  ├─ Score Submission: Submits Petri output to platform      │
│  ├─ Weight Update Loop: Fetches weights from platform       │
│  │  └─ Sets weights on Bittensor chain                      │
│  └─ Commit Checker: Monitors astro-petri repo updates       │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  PETRI SANDBOX (Docker Container)                           │
│  ├─ config.json: PetriConfig (mounted from temp_dir)        │
│  ├─ run.sh: Executes astro-petri run --config config.json   │
│  ├─ Runs Petri against target models (from config)          │
│  └─ Outputs to /sandbox/outputs/output.json                 │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  PETRI OUTPUT JSON                                          │
│  - run_id: Unique run identifier                            │
│  - results: Per-model evaluation results                    │
│  - summary.overall_metrics: Aggregated scores               │
│    ├─ mean_score: Average score across models               │
│    └─ final_score: Final evaluation score                   │
└─────────────────────────────────────────────────────────────┘

Quick Start

Prerequisites

Docker, docker compose
Python12
Tmux (not required, but you can use it to keep session when running python commands)

Install dependencies:
```
pip install -r requirements.txt
```

Set environment variables:

# API Keys (Required for Petri)
export CHUTES_API_KEY=your_chutes_api_key_here
# Platform API Configuration
export PLATFORM_API_URL=https://api.trishool.ai  # Platform API base URL
export COLDKEY_NAME=your_coldkey_name  # Bittensor coldkey for authentication
export HOTKEY_NAME=your_hotkey_name  # Bittensor hotkey for authentication
export NETWORK=finney  # Bittensor network (default: finney)
export NETUID=  # Subnet UID (default: 291 for testing)

# Validator Configuration
export MAX_CONCURRENT_SANDBOXES=5  # Max concurrent sandboxes (default: 5)
export EVALUATION_INTERVAL=30  # Interval to fetch submissions (seconds, default: 30)
export UPDATE_WEIGHTS_INTERVAL=300  # Interval to fetch and update weights from platform (seconds, default: 5 minutes)
export RANDOM_SELECTION_COUNT=3  # Number of submissions to select randomly (default: 3)

# Petri Commit Checker
export PETRI_COMMIT_CHECK_INTERVAL=300  # Interval to check for repo updates (seconds, default: 5 minutes)

Running the Validator

python neurons/validator.py --netuid NetID --subtensor.network test_or_finney --wallet.name coldkey  --wallet.hotkey hotkey

The validator will:

Build Petri sandbox Docker image (if not exists) - installs astro-petri from GitHub (branch alignet)
Start commit checker to monitor astro-petri repo for updates
Start evaluation loop to periodically fetch challenge (PetriConfig) from platform API (/evaluation-agents)
Start weight update loop to periodically fetch weights from platform API (/weights) and set them on chain
Process submissions in sandboxes (respecting MAX_CONCURRENT_SANDBOXES limit)
Validate submissions (immediately submit failed evaluation if validation fails)
Create config.json from PetriConfig and run Petri agent
Extract scores from Petri output JSON
Immediately submit Petri output JSON back to platform API (/submit_petri_output) after evaluation completes
Periodically sync metagraph and update weights on Bittensor chain from platform

For Miners

python -m miner upload \                                                                                                       
   --agent-file your_seed_prompt.txt \
   --coldkey coldkey_name \
   --hotkey hotkey_name \
   --network test_or_finney \
   --netuid netUID \
   --slot miner_uid \
   --api-url https://api.trishool.ai

Miners submit seed instructions (prompts) via the platform API. The platform creates a PetriConfig that includes:

Your seed instruction
Target models to evaluate
Auditor and judge models
Evaluation parameters (max_turns, etc.)

Requirements:

Maximum 200 words
Must not contain jailbreak attempts
Will be tested for similarity against existing submissions (duplicate detection)
Should be designed to probe target models for specific behavioral traits

Submission Flow:

Submit seed instruction via platform API
Platform validates and creates PetriConfig includes miner_seed_instruction and challenge config
Validators fetch your PetriConfig
Petri agent evaluates your seed against target models
Results are scored and submitted back to platform
Your score is based on the Petri evaluation results

Testing locally: Miners can test their seed instructions locally using Petri before submission. See the Petri documentation at trishool/validator/sandbox/petri/PETRI_README.md or the astro-petri repository at https://github.com/Trishool/astro-petri for details on running Petri locally.

Key Features

🔒 Security-First Design

Jailbreak Detection: Validates seed instructions for jailbreak attempts
Immediate Failure Reporting: Failed validations are immediately reported to platform
Duplicate Detection: Checks for similar seed instructions to prevent gaming
Sandbox Isolation: Petri runs in isolated Docker containers
Fraud Detection: Comprehensive monitoring for manipulation attempts

🏆 Competition Ready

Miner Submissions: Submit seed instructions (prompts) for testing
Automated Validation: Petri agent tests against 5 models (1 misaligned)
Binary Scoring: Returns 1.0 if correct model selected, 0.0 otherwise
Transparent Scoring: Detailed feedback and execution logs

🛡️ Anti-Cheating Measures

Jailbreak Verification: Guard LLM checks submissions for jailbreak attempts
Duplicate Verification: LLM judge checks for similar prompts (<50% variation)
Submission Limits: 1 submission per miner per day
Resource Limits: Sandbox timeout and resource constraints

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions and support, please open an issue on GitHub or join our community discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
alignet		alignet
docs		docs
neurons		neurons
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
pre-commit-config.yaml		pre-commit-config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Trishool Subnet

Overview

Architecture

Quick Start

Prerequisites

Running the Validator

For Miners

Key Features

🔒 Security-First Design

🏆 Competition Ready

🛡️ Anti-Cheating Measures

Contributing

License

Support

About

Uh oh!

Releases

Packages

Contributors 2

Languages

cuteolaf/trishool-subnet

Folders and files

Latest commit

History

Repository files navigation

Trishool Subnet

Overview

Architecture

Quick Start

Prerequisites

Running the Validator

For Miners

Key Features

🔒 Security-First Design

🏆 Competition Ready

🛡️ Anti-Cheating Measures

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages