Thanks to visit codestin.com
Credit goes to github.com

Skip to content

zhihaolin/llm-council-cli

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

162 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Council

MIT License Python 3.10+ Tests Ruff

Multi-model deliberation for better answers.

A system where AI models debate, critique, and synthesize answers together.

Instead of asking one LLM and hoping for the best, LLM Council orchestrates multiple frontier models through structured deliberationβ€”producing more accurate, nuanced, and well-reasoned answers.

Chairman synthesis after multi-round debate

Why This Exists

Single-model responses have blind spots. LLM Council fixes this by:

  1. Consulting multiple models β€” GPT, Claude, Gemini, Grok, and DeepSeek all weigh in
  2. Anonymous peer review β€” Models rank each other's responses without knowing who wrote what (prevents favoritism)
  3. Structured debate β€” Models critique and defend positions across multiple rounds
  4. Chairman synthesis β€” A designated model synthesizes the collective wisdom into one answer

The result? Answers that capture the best insights from each model while filtering out individual weaknesses.


Features

Multi-Model Deliberation

Query your council models in parallel. Each provides an independent response, then anonymously evaluates the others. A chairman model synthesizes the final answer based on the full deliberation.

Stage 1: Independent Responses    β†’  Council models answer your question
Stage 2: Anonymous Peer Review    β†’  Each model ranks the others (blind)
Stage 3: Chairman Synthesis       β†’  Best insights combined into final answer

Debate Mode

For complex or controversial questions, enable multi-round debate where models critique each other's reasoning and defend their positions.

llm-council --debate "Is capitalism or socialism better for reducing poverty?"
llm-council --debate --rounds 3 "Should AI development be paused?"
Round 1: Initial Responses    β†’  Each model presents their position
Round 2: Critiques            β†’  Models challenge each other's arguments
Round 3: Defense & Revision   β†’  Models defend valid points, concede weaknesses
Final:   Chairman Synthesis   β†’  Synthesizes the evolved positions

Models critiquing each other in debate mode

Autonomous Web Search

Models decide when they need current information. No manual flagsβ€”they call the search tool when the question requires it.

llm-council "What is the current price of Bitcoin?"
# Models automatically search for real-time data

The CLI shows which models used search with a [searched] indicator on each panel. When ReAct reasoning is enabled (default), panels also show [reasoned].

Search-enabled rounds in debate mode:

  • Round 1 (Initial): Models search to gather facts for their position
  • Round 3 (Defense): Models search to find evidence supporting their defense

Models autonomously searching for current information

Chairman Reflection

The chairman deeply analyzes all model responses before synthesizing. It identifies areas of agreement, disagreement, and factual claims that warrant scrutinyβ€”then produces a well-reasoned final answer.

━━━ CHAIRMAN'S REFLECTION ━━━

β”Œβ”€ Reflection β€’ gemini-2.5-pro ────────────────────────┐
β”‚ Areas of agreement: All models agree that...          β”‚
β”‚ Areas of disagreement: GPT claims X while Claude...   β”‚
β”‚ Factual claims to verify: The price cited by...       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€ Final Answer β€’ gemini-2.5-pro ──────────────────────┐
β”‚ [Synthesized answer]                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Council ReAct

Council members use ReAct (Reasoning + Acting) to decide when to search for current information. Their reasoning is visible in streaming mode.

  gpt-4.1 thought: The question asks about current prices. I should verify.
  gpt-4.1 search: "bitcoin price today"
  Bitcoin is currently trading at $67,234...
  gpt-4.1: Based on my research, Bitcoin is currently...

ReAct is enabled by default for council members. Disable with --no-react or /react off in chat mode.

Interactive Chat Mode

Multi-turn conversations with persistent history. The chat REPL remembers context and lets you switch between ranking and debate modes on the fly.

uv run llm-council chat
────────────────────────────────────────────────────────
  Council Chat  Β·  abc12345  Β·  Resumed
  Mode: Debate Β· 2 rounds Β· React on Β· Stream off
  Commands: /new /history /use <id> Β· /debate /rounds /stream /react Β· /mode /help /exit
────────────────────────────────────────────────────────

council> What is the capital of France?

Slash commands:

  • Session: /new, /history, /use <id>
  • Config: /debate on|off, /rounds N, /stream on|off, /react on|off
  • Info: /mode, /help, /exit

Rich Terminal Interface

  • CLI mode β€” Full 3-stage output with progress indicators
  • Simple mode β€” Just the final answer, pipe-friendly
  • Chat mode β€” Interactive REPL with conversation history

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         User Query                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Stage 1: Parallel Model Queries                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Model A β”‚ β”‚ Model B β”‚ β”‚ Model C β”‚ β”‚ Model D β”‚ β”‚ Model E β”‚   β”‚
β”‚  β”‚  (GPT)  β”‚ β”‚(Gemini) β”‚ β”‚(Claude) β”‚ β”‚ (Grok)  β”‚ β”‚(DeepSeek)   β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜   β”‚
β”‚       β”‚           β”‚           β”‚           β”‚           β”‚         β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                         β”‚                                        β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                            β”‚
β”‚              β”‚   Web Search Tool   β”‚  (Tavily API)              β”‚
β”‚              β”‚  Models call when   β”‚                            β”‚
β”‚              β”‚  they need current  β”‚                            β”‚
β”‚              β”‚    information      β”‚                            β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Stage 2: Anonymous Peer Review                      β”‚
β”‚                                                                  β”‚
β”‚   Responses anonymized as "Response A, B, C, D, E"              β”‚
β”‚   Each model ranks all responses (can't identify authors)        β”‚
β”‚   Aggregate rankings computed from all evaluations               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Stage 3: Chairman Synthesis                         β”‚
β”‚                                                                  β”‚
β”‚   Chairman model receives:                                       β”‚
β”‚   - All original responses                                       β”‚
β”‚   - All peer evaluations                                         β”‚
β”‚   - Aggregate rankings                                           β”‚
β”‚                                                                  β”‚
β”‚   Produces: Single comprehensive answer                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Debate Mode Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Round 1: Initial        Round 2: Critique       Round 3: Defend β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚Modelβ”‚ β”‚Modelβ”‚   β†’    β”‚  A  β”‚β†’β”‚  B  β”‚   β†’    β”‚Reviseβ”‚ β”‚Reviseβ”‚ β”‚
β”‚  β”‚  A  β”‚ β”‚  B  β”‚        β”‚critiques B,C,Dβ”‚        β”‚  A   β”‚ β”‚  B   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚Modelβ”‚ β”‚Modelβ”‚   β†’    β”‚  C  β”‚β†’β”‚  D  β”‚   β†’    β”‚Reviseβ”‚ β”‚Reviseβ”‚ β”‚
β”‚  β”‚  C  β”‚ β”‚  D  β”‚        β”‚critiques A,B,Dβ”‚        β”‚  C   β”‚ β”‚  D   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    Chairman     β”‚
                    β”‚   Synthesizes   β”‚
                    β”‚  Full Debate    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

Option A: Docker (Recommended)

# Build once
docker build -t llm-council https://github.com/zhihaolin/llm-council-cli.git

# Run
docker run -e OPENROUTER_API_KEY=your-key llm-council query "What is 2+2?"

# With web search
docker run -e OPENROUTER_API_KEY=your-key -e TAVILY_API_KEY=your-key \
  llm-council query "What is the current price of Bitcoin?"

# Debate mode
docker run -e OPENROUTER_API_KEY=your-key llm-council query --debate "Should AI be regulated?"

Option B: Local Install

git clone https://github.com/zhihaolin/llm-council-cli.git
cd llm-council-cli

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Configure API keys
echo "OPENROUTER_API_KEY=sk-or-v1-your-key-here" > .env
echo "TAVILY_API_KEY=tvly-your-key-here" >> .env  # Optional, for web search

# Run
uv run llm-council query "What is the best programming language for beginners?"

Get your API keys:

  • openrouter.ai β€” Required, provides access to GPT, Claude, Gemini, etc.
  • tavily.com β€” Optional, enables web search (free tier: 1000 searches/month)

CLI Usage

Commands

# Query with full deliberation output
uv run llm-council query "Your question"

# Query with debate mode
uv run llm-council query --debate "Complex question"
uv run llm-council query --debate --rounds 3 "Very complex question"

# Simple output (final answer only, no stages)
uv run llm-council query --simple "Quick question"

# Final answer with formatting (skip stages 1 & 2)
uv run llm-council query --final-only "Question"

# Show current council configuration
uv run llm-council models

# Interactive chat with history
uv run llm-council chat
uv run llm-council chat --new  # Start fresh conversation

Flags

Flag Short Description
--simple -s Output only the final answer (no formatting)
--final-only -f Show only chairman's synthesis (with formatting)
--debate -d Enable debate mode
--rounds N -r N Number of critique-defense cycles (default: 1)
--stream Stream token-by-token (sequential, debate mode)
--no-react Disable council ReAct reasoning (use native function calling)
--new Start a new conversation (chat mode)
--max-turns N -t N Context turns to include (chat mode, default: 6)

Configuration

Models

Edit config.yaml in the project root to customize the council:

# Council models - list of OpenRouter model identifiers
council_models:
  - openai/gpt-4o-mini      # Fast, cost-effective
  - x-ai/grok-3             # X.AI's latest
  - deepseek/deepseek-chat  # Strong reasoning

# Chairman model - synthesizes the final response
chairman_model: openai/gpt-4o-mini

All models are accessed through OpenRouter, which provides a unified API for 200+ models from OpenAI, Anthropic, Google, Meta, and more. Choose models based on your budget and quality requirements.

Docker users: Mount a custom config with -v /path/to/config.yaml:/app/config.yaml


Tech Stack

Component Technology
Backend Python 3.10+, async httpx
CLI Typer, Rich
LLM Access OpenRouter API (unified access to GPT, Claude, Gemini, etc.)
Web Search Tavily API (LLM-optimized search)
Testing pytest, pytest-asyncio
Storage JSON files

Engineering Practices

Practice Status Details
Async/Parallel βœ… Concurrent API calls with asyncio.gather()
Graceful Degradation βœ… Continues if individual models fail
Test Suite βœ… pytest + pytest-asyncio, 108 tests
Linting βœ… Ruff (check + format) in CI
Type Checking βœ… Pyright in basic mode
Type Hints βœ… Throughout codebase
CI/CD βœ… GitHub Actions (lint β†’ test β†’ docker pipeline)
SOLID (SRP/ISP) βœ… Focused modules, clean API exports
Pydantic Models πŸ”œ Data validation (planned)
Structured Logging πŸ”œ JSON logs with correlation IDs (planned)
Config Management βœ… YAML config file (config.yaml)

See docs/PLAN.md for the full engineering roadmap.


Development

Running Tests

# Install dev dependencies
uv sync --extra dev

# Run all tests
uv run pytest tests/ -v

Test Structure

tests/
β”œβ”€β”€ conftest.py                  # Fixtures and mock API responses
β”œβ”€β”€ test_chat_commands.py        # Chat REPL + model panel indicators (11 tests)
β”œβ”€β”€ test_cli_imports.py          # CLI smoke test (1 test)
β”œβ”€β”€ test_conversation_context.py # Context extraction (5 tests)
β”œβ”€β”€ test_debate.py               # Debate mode + RoundConfig + ReAct (24 tests)
β”œβ”€β”€ test_ranking_parser.py       # Ranking extraction (14 tests)
β”œβ”€β”€ test_react.py                # ReAct parsing & council loop (12 tests)
β”œβ”€β”€ test_reflection.py           # Chairman Reflection parsing & loop (6 tests)
β”œβ”€β”€ test_search.py               # Web search & tool calling (18 tests)
β”œβ”€β”€ test_streaming.py            # Streaming & parallel (17 tests)
└── integration/                 # CLI integration tests (planned)

Roadmap

Version Feature Status
v1.0 CLI βœ… Complete
v1.1 Autonomous Web Search βœ… Complete
v1.2 Multi-Turn Debate Mode βœ… Complete
v1.3 Interactive Chat with History βœ… Complete
v1.4 Token Streaming βœ… Complete
v1.5 Parallel Execution with Progress βœ… Complete
v1.6 Tool Calling for Council βœ… Complete
v1.6.1 SOLID Refactoring βœ… Complete
v1.6.2 CI Quality Gates (ruff, pyright) βœ… Complete
v1.6.3 Docker Support βœ… Complete
v1.7 Unify Debate Logic βœ… Complete
v1.8 Rename Debate Functions βœ… Complete
v1.9 Strategy Pattern (OCP/DIP) βœ… Complete
β€” Chairman Reflection βœ… Complete
β€” Council ReAct βœ… Complete
β€” Chat UI Improvements βœ… Complete
β€” Compact Chat Banner βœ… Complete
v1.10 Workflow State Machine Planned
v1.11 Human-in-the-Loop (HITL) Planned
v1.12 Observability (OpenTelemetry) Planned
v1.13 Tool Registry (MCP) Planned
v1.14 Retry & Fallback Logic Planned
v1.15 Security Foundations Planned

See docs/PLAN.md for the full roadmap and docs/DEVLOG.md for development history.


Credits

This project builds upon the original LLM Council concept by Andrej Karpathy. The core idea of using multiple LLMs with peer review comes from his work.

This fork extends the original with:

  • Full CLI interface
  • Interactive chat with conversation history
  • Autonomous web search via tool calling
  • Multi-turn debate mode
  • Chairman Reflection + Council ReAct reasoning
  • Rich terminal output with progress indicators

License

MIT

About

LLM Council works together to answer your hardest questions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.8%
  • Dockerfile 0.2%