Run prompts through 10 AI models in parallel and compare their responses. Built as a Claude Code skill.
- Parallel execution with idle timeout (16 min) and hard timeout (40 min)
- Git reference expansion - Automatically expands commit SHAs and temporal references ("today", "this week", "last 5 days")
- Smart file context injection - Discovers and injects relevant files for code-focused agents
- Web-enabled agent filtering (
--webflag) for real-time information queries - Reasoning task filtering (
--reasoningflag) to skip code-only models - Concurrency control (
--concurrency N) to avoid API rate limits - Dry run mode (
--dry-run) to preview which agents will run - Markdown output with summary tables saved to
./agent-logs/compare/
| Tool | Description | Installation |
|---|---|---|
| Claude Code | Anthropic's CLI for Claude | npm install -g @anthropic-ai/claude-code |
| Gemini CLI | Google's Gemini CLI | npm install -g @anthropic-ai/gemini-cli |
| Codex CLI | OpenAI's Codex CLI | npm install -g @openai/codex |
| Aider | AI pair programming tool | pip install aider-chat |
jq |
JSON parsing | sudo dnf install jq (Fedora) or brew install jq (macOS) |
Set these environment variables (e.g., in ~/.bashrc or ~/.zshrc):
export ANTHROPIC_API_KEY="sk-ant-..." # For Claude
export GOOGLE_AI_API_KEY="..." # For Gemini
export OPENAI_API_KEY="sk-..." # For Codex
export OPENROUTER_API_KEY="sk-or-..." # For Aider models (DeepSeek, Mistral, Grok, Llama)-
Clone this repository:
git clone https://github.com/David-Marsh-Photo/multi-agent-compare.git cd multi-agent-compare -
Copy the agent configuration:
mkdir -p ~/.claude cp agents.json.example ~/.claude/agents.json
-
Copy the runner script:
mkdir -p ~/.claude/scripts cp run_agents.sh ~/.claude/scripts/ chmod +x ~/.claude/scripts/run_agents.sh
-
Copy the Claude Code skill (optional - enables
/comparecommand):mkdir -p ~/.claude/commands cp compare.md ~/.claude/commands/
-
Set your API keys (see Requirements above)
If you installed the skill definition, use the /compare command directly:
/compare "review this commit for security issues"
Run the script directly with bash:
# Basic comparison (all 10 agents)
bash ~/.claude/scripts/run_agents.sh "explain the difference between REST and GraphQL"
# Reasoning tasks only (skip code-only models like Qwen3 Coder, Codestral, Grok Code)
bash ~/.claude/scripts/run_agents.sh --reasoning "which is better: microservices or monolith?"
# Web-enabled agents only (Claude, Gemini, Codex)
bash ~/.claude/scripts/run_agents.sh --web "what are the latest Next.js 15 features?"
# Combined flags
bash ~/.claude/scripts/run_agents.sh --web --reasoning "research the best AI agent frameworks in 2025"
# Limit concurrent agents (avoid rate limits)
bash ~/.claude/scripts/run_agents.sh --concurrency 3 "audit this codebase for security issues"
# Dry run (preview which agents will execute)
bash ~/.claude/scripts/run_agents.sh --dry-run "your prompt here"
# Limit git commit expansion
bash ~/.claude/scripts/run_agents.sh --max-commits 5 "review today's commits"| Flag | Description |
|---|---|
--web |
Only run agents with web access (Claude, Gemini, Codex) |
--reasoning |
Skip code-only agents (for opinions, analysis, design discussions) |
--concurrency N |
Limit parallel agents to N at a time (default: unlimited) |
--max-commits N |
Limit git commit expansion to N commits (default: 10) |
--dry-run |
Show what would run without executing |
--help |
Show usage information |
The tool comes with 10 pre-configured agents:
| Agent | Type | Context | Web Access | Code-Only |
|---|---|---|---|---|
| Claude (Opus) | claude | 200K | ✓ | |
| Gemini (3 Pro) | gemini | 1M | ✓ | |
| Codex (GPT-5.2) | codex | 200K | ✓ | |
| DeepSeek V3 | aider | 131K | ||
| Qwen3 Coder | aider | 262K | ✓ | |
| Mistral Large 3 | aider | 262K | ||
| Grok 4.1 | aider | 500K | ||
| Llama 4 Maverick | aider | 1M | ||
| Codestral | aider | 256K | ✓ | |
| Grok Code | aider | 256K | ✓ |
Edit ~/.claude/agents.json to:
- Enable/disable agents (
"enabled": true/false) - Change models (
"model": "...") - Add new agents
Agent types supported:
claude- Uses Claude Code CLIgemini- Uses Gemini CLIcodex- Uses Codex CLIaider- Uses Aider with OpenRouter modelsollama- Uses local Ollama models (optional)
Results are saved to ./agent-logs/compare/YYYYMMDD-HHMM-prompt-slug.md with:
- Summary table (agent, status, time, output size)
- Original prompt
- Expanded prompt (if git refs were resolved)
- Each agent's full response
# Agent Comparison Results
**Date:** 2025-01-03 14:30:00
**Agents:** 10 (8 succeeded, 2 failed)
**Total Time:** 3m 45s
## Summary
| Agent | Status | Time | Output |
|-------|--------|------|--------|
| Claude (Opus) | ✅ done | 45s | 12KB |
| Gemini (3 Pro) | ✅ done | 38s | 8KB |
| DeepSeek V3 | ✅ done | 1m 12s | 15KB |
...
## Prompt
review this code for security issues
---
## ✓ Claude (Opus)
**Status:** done | **Time:** 45s
[Full response here]
---
...
- All agents run in read-only or sandboxed modes
- A safety prefix is injected into all prompts requesting read-only behavior
- Stalled processes are terminated with graceful escalation (SIGTERM → SIGKILL)
- Prompts are written to temp files to avoid shell injection
If you see "Warning: [tool] CLI not found, skipping [agent]":
- Ensure the CLI tool is installed and in your PATH
- Check with
which claude,which gemini,which codex,which aider
If agents fail due to rate limits:
- Use
--concurrency 2or--concurrency 3to limit parallel requests - Wait a few minutes between runs
For very large prompts or many file references:
- The script automatically limits injected file content to 50KB
- Use
--max-commitsto limit git expansion - Disable some agents in
~/.claude/agents.json
MIT License - see LICENSE file.
Contributions welcome! Please open an issue or PR.
Built by @David-Marsh-Photo for use with Claude Code.