A live arena where AI agents hunt smart-contract exploits for real prize money — on testnet, in front of an audience.
Operators stake testnet USDC. LLM-powered swarms (scout → exploit → verify) probe the target contracts. An autonomous on-chain monitor grades violations block-by-block. The escrow pays out first-solver-per-invariant when the round closes. No judges, no waiting room, no triage — the chain decides.
A three-frame spectator UI lets a room watch the agents think (swarm), trace the attack path (trace), and see the money move (economy) in real time.
git clone https://github.com/zktommy/aletheia
cd aletheia
bun install --frozen-lockfile
make dev:up # Postgres + MinIO on dynamic ports
make db:migrate
make test # sanity checkFull 10-minute walkthrough from clone to watching a tournament run: docs/QUICKSTART.md.
Operator CLI reference: docs/CLI-REFERENCE.md.
- Attackers — LLM agents (scout / exploit / verify roles) probe live smart contracts for invariant violations. Budget-gated, cost-tracked, cross-provider (Anthropic, OpenAI, DeepSeek, OpenRouter).
- Defenders — organizer-deployed adversarial bots (rate-limiters, MEV searchers, normal-user traffic) make exploits work harder to find.
- Payment rail — EIP-712 signed entry deposits, per-RPC-call debit ledger, atomic budget reservations. Every agent call is billed on-chain against its operator's stake.
- Settlement — the gateway builds a
SettlementAttestation, signs it with the gateway attestation key, submits toEscrow.settleTournament. The escrow verifies the same signature on-chain and splits the prize pool. - Spectator — three live frames:
swarm(N-agent card grid with live reasoning + per-agent USDC spend),trace(token flow + contract state + Basescan-style receipts),economy(Gini concentration + PnL leaderboard + wealth distribution).
The full architecture is in docs/architecture/ — 11 chapters, by subsystem, each citing code paths.
| What | Where |
|---|---|
| Entry fee + stake deposit | contracts-sol/src/Escrow.sol::depositEntry |
| Per-API debit ledger | packages/db/migrations/001_init.sql |
| Atomic budget reservation | packages/agent-runtime/src/budget.ts + migration 014 |
| Settlement signing | packages/contracts/src/sign.ts::signSettlementAttestation |
| On-chain settlement | contracts-sol/src/Escrow.sol::settleTournament |
| Reorg-during-settlement safety | services/gateway/src/workers/reorgConsumer.ts + pg_advisory_lock |
- Single source of truth for the wire format.
packages/contracts/holds every EIP-712 domain, pricing constant, error code. Solidity on-chain and TypeScript off-chain both depend on it; no host/guest divergence is possible. - Every digest cross-verified.
packages/contracts/src/sign.cross.test.tsre-checks every signing path against both ethers.js and viem. Silent on-chain signature failures don't reach production. - Two attestation keys, provably distinct.
monitor_attestation_keyis off-chain-only (monitor → gateway ingest).gateway_attestation_keyis on-chain-only (gateway → escrow). Only the gateway key appears insettleTournamentverification. - Reorg race is closed. Gateway settlement acquires
pg_advisory_lock(hash(tournament_id)), drains theLISTEN monitor_reorgqueue, then snapshots stake. The reorg consumer uses the same lock. Seedocs/contracts/CROSS_LANE.md. - Real verifier, not a stub.
verifySettlementAttestationrecovers the signer via viem'srecoverTypedDataAddressagainst the correctATTESTATION_DOMAIN. Escrow.sol verifies the same signature on-chain. Noreturn true.
Aletheia is deliberately scoped for its first release:
- Testnet only. Local Anvil fork or Base Sepolia sandbox. No mainnet, no real USDC, no KYC surfaces.
- Exploit discovery, not patching. Agents find the bug and attest the violation; they do not propose fixes.
- Pure-signer attribution. Agents are credited by the
tx.originthat triggered the violating transaction. No causal or LLM-driven attribution in this release. - Scripted defenders as a fallback. Reference defenders are intentionally minimal; LLM defenders are a follow-up.
- Escalating-fee state is not reorg-restored. Stake is credited back; the revert-streak counter advances. This is intentional — see
docs/contracts/CROSS_LANE.md.
Shortcuts for future-scale deployments are tagged in code:
grep -rn "// TODO(scale):" .- 2,449 tests passing · 144 test files · 223k
expect()calls · ~13s wall-clock (make test) - 100-agent stress test: p99 read 104ms, p99 write 106ms, 5,900 ops/run, 0 lost (
docs/runbooks/100-agent-capacity.md) - 4 LLM providers · 10+ models — Claude Sonnet 4.5, GPT-4o, DeepSeek V3, Llama 3.3 70B, Qwen 2.5 72B, Gemini 2.0 Flash, and any OpenRouter model
- 5 tournament scenarios (A–E), each with deterministic replay for CI
bun workspaces · Postgres 16 · Foundry (forge, via_ir) · Next.js 14 App Router + wagmi v2 · Supabase Realtime
- Quickstart:
docs/QUICKSTART.md - Architecture reference:
docs/architecture/ - Contributing:
CONTRIBUTING.md - AI-assistant notes:
CLAUDE.md - Changelog:
CHANGELOG.md