Lightweight LLM API proxy — routing, fallback, and cost tracking for 6 providers. Built to fix what LiteLLM got wrong.
Switching LLM providers requires rewriting code. Rate limits crash your agents. Costs explode silently. LiteLLM "solves" this but ships 100+ dependencies, breaks on Windows, and takes seconds to import.
nano-proxy fixes all of it.
| Problem with others | nano-proxy solution |
|---|---|
| Switching providers = rewriting code | One URL change — base_url="http://localhost:8765/groq/v1" |
| Rate limit crashes entire pipeline | Automatic fallback — 429 → next provider, transparent to caller |
| 100+ dependencies (LiteLLM) | Minimal deps — only fastapi, httpx, pydantic, uvicorn |
| Cloud-only (OpenRouter, Portkey) | Fully local — runs on your machine, data never leaves |
| No cost visibility | Real-time cost tracking — per-provider USD counters, /cost endpoint |
| Windows support broken | Windows-first — tested on PowerShell, no POSIX assumptions |
| Slow import / startup | Fast start — proxy up in under 1 second |
# Install
pip install git+https://github.com/ghanibot/nano-proxy.git
# Start proxy (default: http://localhost:8765)
nano-proxy start
# Start with config file
nano-proxy start --config configs/proxy.yaml
# Check status
nano-proxy status
# Cost report
nano-proxy costimport anthropic
import openai
# Direct to Anthropic (no proxy)
client = anthropic.Anthropic()
# Via nano-proxy → Anthropic
client = anthropic.Anthropic(base_url="http://localhost:8765/anthropic")
# Via nano-proxy → Groq (llama/mixtral)
client = openai.OpenAI(
api_key="anything",
base_url="http://localhost:8765/groq/v1"
)
# Via nano-proxy → auto-route (proxy picks best available provider)
client = openai.OpenAI(
api_key="anything",
base_url="http://localhost:8765/auto"
)Works with any SDK that supports custom base_url — anthropic, openai, langchain, nano-eval, nano-memory, nano-orchestrator.
| Provider | Route | Fallback | Cost |
|---|---|---|---|
| Anthropic | /anthropic/v1/... |
Yes | Paid |
| OpenAI | /openai/v1/... |
Yes | Paid |
| Groq | /groq/v1/... |
Yes | Free tier |
| Google Gemini | /gemini/v1/... |
Yes | Free tier |
| Ollama | /ollama/v1/... |
Yes | Free, local |
| Mistral | /mistral/v1/... |
Yes | Paid |
# configs/proxy.yaml
host: "127.0.0.1"
port: 8765
log_requests: true
router:
strategy: priority # priority | round-robin | cheapest
fallback: true
fallback_order: # explicit fallback sequence
- groq
- ollama
timeout_seconds: 60.0
max_retries: 2
budget:
max_cost_usd: 10.0
alert_at_percent: 0.8
kill_on_exceed: false
providers:
- name: anthropic
base_url: "https://api.anthropic.com"
api_key_env: "ANTHROPIC_API_KEY"
priority: 0
enabled: true
- name: groq
base_url: "https://api.groq.com/openai"
api_key_env: "GROQ_API_KEY"
priority: 1
enabled: true
- name: ollama
base_url: "http://localhost:11434"
api_key_env: ""
priority: 2
enabled: true| Strategy | Behavior |
|---|---|
priority |
Always try highest-priority provider first (default) |
round-robin |
Distribute requests evenly across all providers |
cheapest |
Route to lowest cost-per-token provider available |
Fallback activates automatically when a provider returns 429 (rate limit) or 5xx (server error). The rate-limited provider is blocked for 60 seconds, then re-enabled.
POST /{provider}/v1/messages → Anthropic format
POST /{provider}/v1/chat/completions → OpenAI format
POST /auto/v1/... → auto-routed
GET /health → provider status
GET /cost → token + USD breakdown
{
"total_usd": 0.0142,
"providers": {
"anthropic": { "requests": 12, "tokens": 8400, "cost_usd": 0.0112, "errors": 0 },
"groq": { "requests": 3, "tokens": 1200, "cost_usd": 0.0003, "errors": 0 },
"ollama": { "requests": 5, "tokens": 2100, "cost_usd": 0.0000, "errors": 0 }
}
}Client (nano-eval / nano-memory / any SDK)
│
▼
nano-proxy (FastAPI, port 8765)
├── Router — strategy: priority | round-robin | cheapest
│ └── RateLimitTracker — per-provider backoff, auto-unblock after 60s
├── GenericProvider — rewrites auth header, forwards to real API
├── CostTracker — per-provider USD counters, PRICING table
└── /health /cost — observability endpoints
│
├── api.anthropic.com
├── api.openai.com
├── api.groq.com
├── generativelanguage.googleapis.com
├── localhost:11434 (Ollama)
└── api.mistral.ai
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...
GOOGLE_API_KEY=AIza...
MISTRAL_API_KEY=...
OLLAMA_HOST=http://localhost:11434 # optional, default: localhost:11434nano-proxy is the single routing layer for all nano-* projects. Each project supports NANO_PROXY_URL env var:
# Set once
export NANO_PROXY_URL=http://localhost:8765
# nano-eval routes through proxy automatically
nano-eval run configs/example.yaml
# nano-memory embeddings route through proxy
# nano-orchestrator agent calls route through proxyimport os
import anthropic
base_url = os.environ.get("NANO_PROXY_URL")
client = anthropic.Anthropic(
base_url=f"{base_url}/anthropic" if base_url else None
)nano-proxy start # Start on default port 8765
nano-proxy start --config proxy.yaml # Start with config file
nano-proxy start --host 0.0.0.0 --port 9000 # Custom host/port
nano-proxy status # Check if proxy is running
nano-proxy cost # Show cost breakdowngit clone https://github.com/ghanibot/nano-proxy
cd nano-proxy
pip install -e ".[dev]"
pytestMIT — see LICENSE