Multi-model deliberation for better answers.
A system where AI models debate, critique, and synthesize answers together.
Instead of asking one LLM and hoping for the best, LLM Council orchestrates multiple frontier models through structured deliberationβproducing more accurate, nuanced, and well-reasoned answers.
Single-model responses have blind spots. LLM Council fixes this by:
- Consulting multiple models β GPT, Claude, Gemini, Grok, and DeepSeek all weigh in
- Anonymous peer review β Models rank each other's responses without knowing who wrote what (prevents favoritism)
- Structured debate β Models critique and defend positions across multiple rounds
- Chairman synthesis β A designated model synthesizes the collective wisdom into one answer
The result? Answers that capture the best insights from each model while filtering out individual weaknesses.
Query your council models in parallel. Each provides an independent response, then anonymously evaluates the others. A chairman model synthesizes the final answer based on the full deliberation.
Stage 1: Independent Responses β Council models answer your question
Stage 2: Anonymous Peer Review β Each model ranks the others (blind)
Stage 3: Chairman Synthesis β Best insights combined into final answer
For complex or controversial questions, enable multi-round debate where models critique each other's reasoning and defend their positions.
llm-council --debate "Is capitalism or socialism better for reducing poverty?"
llm-council --debate --rounds 3 "Should AI development be paused?"Round 1: Initial Responses β Each model presents their position
Round 2: Critiques β Models challenge each other's arguments
Round 3: Defense & Revision β Models defend valid points, concede weaknesses
Final: Chairman Synthesis β Synthesizes the evolved positions
Models decide when they need current information. No manual flagsβthey call the search tool when the question requires it.
llm-council "What is the current price of Bitcoin?"
# Models automatically search for real-time dataThe CLI shows which models used search with a [searched] indicator on each panel. When ReAct reasoning is enabled (default), panels also show [reasoned].
Search-enabled rounds in debate mode:
- Round 1 (Initial): Models search to gather facts for their position
- Round 3 (Defense): Models search to find evidence supporting their defense
The chairman deeply analyzes all model responses before synthesizing. It identifies areas of agreement, disagreement, and factual claims that warrant scrutinyβthen produces a well-reasoned final answer.
βββ CHAIRMAN'S REFLECTION βββ
ββ Reflection β’ gemini-2.5-pro βββββββββββββββββββββββββ
β Areas of agreement: All models agree that... β
β Areas of disagreement: GPT claims X while Claude... β
β Factual claims to verify: The price cited by... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Final Answer β’ gemini-2.5-pro βββββββββββββββββββββββ
β [Synthesized answer] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Council members use ReAct (Reasoning + Acting) to decide when to search for current information. Their reasoning is visible in streaming mode.
gpt-4.1 thought: The question asks about current prices. I should verify.
gpt-4.1 search: "bitcoin price today"
Bitcoin is currently trading at $67,234...
gpt-4.1: Based on my research, Bitcoin is currently...
ReAct is enabled by default for council members. Disable with --no-react or /react off in chat mode.
Multi-turn conversations with persistent history. The chat REPL remembers context and lets you switch between ranking and debate modes on the fly.
uv run llm-council chatββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Council Chat Β· abc12345 Β· Resumed
Mode: Debate Β· 2 rounds Β· React on Β· Stream off
Commands: /new /history /use <id> Β· /debate /rounds /stream /react Β· /mode /help /exit
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
council> What is the capital of France?
Slash commands:
- Session:
/new,/history,/use <id> - Config:
/debate on|off,/rounds N,/stream on|off,/react on|off - Info:
/mode,/help,/exit
- CLI mode β Full 3-stage output with progress indicators
- Simple mode β Just the final answer, pipe-friendly
- Chat mode β Interactive REPL with conversation history
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Query β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 1: Parallel Model Queries β
β βββββββββββ βββββββββββ βββββββββββ βββββββββββ βββββββββββ β
β β Model A β β Model B β β Model C β β Model D β β Model E β β
β β (GPT) β β(Gemini) β β(Claude) β β (Grok) β β(DeepSeek) β
β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β
β β β β β β β
β βββββββββββββ΄ββββββ¬ββββββ΄ββββββββββββ΄ββββββββββββ β
β β β
β ββββββββββββΌβββββββββββ β
β β Web Search Tool β (Tavily API) β
β β Models call when β β
β β they need current β β
β β information β β
β βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 2: Anonymous Peer Review β
β β
β Responses anonymized as "Response A, B, C, D, E" β
β Each model ranks all responses (can't identify authors) β
β Aggregate rankings computed from all evaluations β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 3: Chairman Synthesis β
β β
β Chairman model receives: β
β - All original responses β
β - All peer evaluations β
β - Aggregate rankings β
β β
β Produces: Single comprehensive answer β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Round 1: Initial Round 2: Critique Round 3: Defend β
β βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ β
β βModelβ βModelβ β β A βββ B β β βReviseβ βReviseβ β
β β A β β B β βcritiques B,C,Dβ β A β β B β β
β βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ β
β βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ β
β βModelβ βModelβ β β C βββ D β β βReviseβ βReviseβ β
β β C β β D β βcritiques A,B,Dβ β C β β D β β
β βββββββ βββββββ βββββββ βββββββ βββββββ βββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Chairman β
β Synthesizes β
β Full Debate β
βββββββββββββββββββ
# Build once
docker build -t llm-council https://github.com/zhihaolin/llm-council-cli.git
# Run
docker run -e OPENROUTER_API_KEY=your-key llm-council query "What is 2+2?"
# With web search
docker run -e OPENROUTER_API_KEY=your-key -e TAVILY_API_KEY=your-key \
llm-council query "What is the current price of Bitcoin?"
# Debate mode
docker run -e OPENROUTER_API_KEY=your-key llm-council query --debate "Should AI be regulated?"git clone https://github.com/zhihaolin/llm-council-cli.git
cd llm-council-cli
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Configure API keys
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env
echo "TAVILY_API_KEY=tvly-your-key-here" >> .env # Optional, for web search
# Run
uv run llm-council query "What is the best programming language for beginners?"Get your API keys:
- openrouter.ai β Required, provides access to GPT, Claude, Gemini, etc.
- tavily.com β Optional, enables web search (free tier: 1000 searches/month)
# Query with full deliberation output
uv run llm-council query "Your question"
# Query with debate mode
uv run llm-council query --debate "Complex question"
uv run llm-council query --debate --rounds 3 "Very complex question"
# Simple output (final answer only, no stages)
uv run llm-council query --simple "Quick question"
# Final answer with formatting (skip stages 1 & 2)
uv run llm-council query --final-only "Question"
# Show current council configuration
uv run llm-council models
# Interactive chat with history
uv run llm-council chat
uv run llm-council chat --new # Start fresh conversation| Flag | Short | Description |
|---|---|---|
--simple |
-s |
Output only the final answer (no formatting) |
--final-only |
-f |
Show only chairman's synthesis (with formatting) |
--debate |
-d |
Enable debate mode |
--rounds N |
-r N |
Number of critique-defense cycles (default: 1) |
--stream |
Stream token-by-token (sequential, debate mode) | |
--no-react |
Disable council ReAct reasoning (use native function calling) | |
--new |
Start a new conversation (chat mode) | |
--max-turns N |
-t N |
Context turns to include (chat mode, default: 6) |
Edit config.yaml in the project root to customize the council:
# Council models - list of OpenRouter model identifiers
council_models:
- openai/gpt-4o-mini # Fast, cost-effective
- x-ai/grok-3 # X.AI's latest
- deepseek/deepseek-chat # Strong reasoning
# Chairman model - synthesizes the final response
chairman_model: openai/gpt-4o-miniAll models are accessed through OpenRouter, which provides a unified API for 200+ models from OpenAI, Anthropic, Google, Meta, and more. Choose models based on your budget and quality requirements.
Docker users: Mount a custom config with -v /path/to/config.yaml:/app/config.yaml
| Component | Technology |
|---|---|
| Backend | Python 3.10+, async httpx |
| CLI | Typer, Rich |
| LLM Access | OpenRouter API (unified access to GPT, Claude, Gemini, etc.) |
| Web Search | Tavily API (LLM-optimized search) |
| Testing | pytest, pytest-asyncio |
| Storage | JSON files |
| Practice | Status | Details |
|---|---|---|
| Async/Parallel | β | Concurrent API calls with asyncio.gather() |
| Graceful Degradation | β | Continues if individual models fail |
| Test Suite | β | pytest + pytest-asyncio, 108 tests |
| Linting | β | Ruff (check + format) in CI |
| Type Checking | β | Pyright in basic mode |
| Type Hints | β | Throughout codebase |
| CI/CD | β | GitHub Actions (lint β test β docker pipeline) |
| SOLID (SRP/ISP) | β | Focused modules, clean API exports |
| Pydantic Models | π | Data validation (planned) |
| Structured Logging | π | JSON logs with correlation IDs (planned) |
| Config Management | β | YAML config file (config.yaml) |
See docs/PLAN.md for the full engineering roadmap.
# Install dev dependencies
uv sync --extra dev
# Run all tests
uv run pytest tests/ -vtests/
βββ conftest.py # Fixtures and mock API responses
βββ test_chat_commands.py # Chat REPL + model panel indicators (11 tests)
βββ test_cli_imports.py # CLI smoke test (1 test)
βββ test_conversation_context.py # Context extraction (5 tests)
βββ test_debate.py # Debate mode + RoundConfig + ReAct (24 tests)
βββ test_ranking_parser.py # Ranking extraction (14 tests)
βββ test_react.py # ReAct parsing & council loop (12 tests)
βββ test_reflection.py # Chairman Reflection parsing & loop (6 tests)
βββ test_search.py # Web search & tool calling (18 tests)
βββ test_streaming.py # Streaming & parallel (17 tests)
βββ integration/ # CLI integration tests (planned)
| Version | Feature | Status |
|---|---|---|
| v1.0 | CLI | β Complete |
| v1.1 | Autonomous Web Search | β Complete |
| v1.2 | Multi-Turn Debate Mode | β Complete |
| v1.3 | Interactive Chat with History | β Complete |
| v1.4 | Token Streaming | β Complete |
| v1.5 | Parallel Execution with Progress | β Complete |
| v1.6 | Tool Calling for Council | β Complete |
| v1.6.1 | SOLID Refactoring | β Complete |
| v1.6.2 | CI Quality Gates (ruff, pyright) | β Complete |
| v1.6.3 | Docker Support | β Complete |
| v1.7 | Unify Debate Logic | β Complete |
| v1.8 | Rename Debate Functions | β Complete |
| v1.9 | Strategy Pattern (OCP/DIP) | β Complete |
| β | Chairman Reflection | β Complete |
| β | Council ReAct | β Complete |
| β | Chat UI Improvements | β Complete |
| β | Compact Chat Banner | β Complete |
| v1.10 | Workflow State Machine | Planned |
| v1.11 | Human-in-the-Loop (HITL) | Planned |
| v1.12 | Observability (OpenTelemetry) | Planned |
| v1.13 | Tool Registry (MCP) | Planned |
| v1.14 | Retry & Fallback Logic | Planned |
| v1.15 | Security Foundations | Planned |
See docs/PLAN.md for the full roadmap and docs/DEVLOG.md for development history.
This project builds upon the original LLM Council concept by Andrej Karpathy. The core idea of using multiple LLMs with peer review comes from his work.
This fork extends the original with:
- Full CLI interface
- Interactive chat with conversation history
- Autonomous web search via tool calling
- Multi-turn debate mode
- Chairman Reflection + Council ReAct reasoning
- Rich terminal output with progress indicators
MIT


