Codestin Search App

Lightweight LLM API proxy — routing, fallback, and cost tracking for 6 providers. Built to fix what LiteLLM got wrong.

The Problem

Switching LLM providers requires rewriting code. Rate limits crash your agents. Costs explode silently. LiteLLM "solves" this but ships 100+ dependencies, breaks on Windows, and takes seconds to import.

nano-proxy fixes all of it.

What Makes It Different

Problem with others	nano-proxy solution
Switching providers = rewriting code	One URL change — `base_url="http://localhost:8765/groq/v1"`
Rate limit crashes entire pipeline	Automatic fallback — 429 → next provider, transparent to caller
100+ dependencies (LiteLLM)	Minimal deps — only `fastapi`, `httpx`, `pydantic`, `uvicorn`
Cloud-only (OpenRouter, Portkey)	Fully local — runs on your machine, data never leaves
No cost visibility	Real-time cost tracking — per-provider USD counters, `/cost` endpoint
Windows support broken	Windows-first — tested on PowerShell, no POSIX assumptions
Slow import / startup	Fast start — proxy up in under 1 second

Quick Start

# Install
pip install git+https://github.com/ghanibot/nano-proxy.git

# Start proxy (default: http://localhost:8765)
nano-proxy start

# Start with config file
nano-proxy start --config configs/proxy.yaml

# Check status
nano-proxy status

# Cost report
nano-proxy cost

Integration — 1 Line Change

import anthropic
import openai

# Direct to Anthropic (no proxy)
client = anthropic.Anthropic()

# Via nano-proxy → Anthropic
client = anthropic.Anthropic(base_url="http://localhost:8765/anthropic")

# Via nano-proxy → Groq (llama/mixtral)
client = openai.OpenAI(
    api_key="anything",
    base_url="http://localhost:8765/groq/v1"
)

# Via nano-proxy → auto-route (proxy picks best available provider)
client = openai.OpenAI(
    api_key="anything",
    base_url="http://localhost:8765/auto"
)

Works with any SDK that supports custom base_url — anthropic, openai, langchain, nano-eval, nano-memory, nano-orchestrator.

Supported Providers

Provider	Route	Fallback	Cost
Anthropic	`/anthropic/v1/...`	Yes	Paid
OpenAI	`/openai/v1/...`	Yes	Paid
Groq	`/groq/v1/...`	Yes	Free tier
Google Gemini	`/gemini/v1/...`	Yes	Free tier
Ollama	`/ollama/v1/...`	Yes	Free, local
Mistral	`/mistral/v1/...`	Yes	Paid

Configuration

# configs/proxy.yaml
host: "127.0.0.1"
port: 8765
log_requests: true

router:
  strategy: priority       # priority | round-robin | cheapest
  fallback: true
  fallback_order:          # explicit fallback sequence
    - groq
    - ollama
  timeout_seconds: 60.0
  max_retries: 2

budget:
  max_cost_usd: 10.0
  alert_at_percent: 0.8
  kill_on_exceed: false

providers:
  - name: anthropic
    base_url: "https://api.anthropic.com"
    api_key_env: "ANTHROPIC_API_KEY"
    priority: 0
    enabled: true

  - name: groq
    base_url: "https://api.groq.com/openai"
    api_key_env: "GROQ_API_KEY"
    priority: 1
    enabled: true

  - name: ollama
    base_url: "http://localhost:11434"
    api_key_env: ""
    priority: 2
    enabled: true

Routing Strategies

Strategy	Behavior
`priority`	Always try highest-priority provider first (default)
`round-robin`	Distribute requests evenly across all providers
`cheapest`	Route to lowest cost-per-token provider available

Fallback activates automatically when a provider returns 429 (rate limit) or 5xx (server error). The rate-limited provider is blocked for 60 seconds, then re-enabled.

API Endpoints

POST  /{provider}/v1/messages          → Anthropic format
POST  /{provider}/v1/chat/completions  → OpenAI format
POST  /auto/v1/...                     → auto-routed

GET   /health                          → provider status
GET   /cost                            → token + USD breakdown

Cost Response

{
  "total_usd": 0.0142,
  "providers": {
    "anthropic": { "requests": 12, "tokens": 8400, "cost_usd": 0.0112, "errors": 0 },
    "groq":      { "requests": 3,  "tokens": 1200, "cost_usd": 0.0003, "errors": 0 },
    "ollama":    { "requests": 5,  "tokens": 2100, "cost_usd": 0.0000, "errors": 0 }
  }
}

Architecture

Client (nano-eval / nano-memory / any SDK)
         │
         ▼
    nano-proxy  (FastAPI, port 8765)
    ├── Router          — strategy: priority | round-robin | cheapest
    │   └── RateLimitTracker  — per-provider backoff, auto-unblock after 60s
    ├── GenericProvider — rewrites auth header, forwards to real API
    ├── CostTracker     — per-provider USD counters, PRICING table
    └── /health /cost   — observability endpoints
         │
         ├── api.anthropic.com
         ├── api.openai.com
         ├── api.groq.com
         ├── generativelanguage.googleapis.com
         ├── localhost:11434  (Ollama)
         └── api.mistral.ai

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...
GOOGLE_API_KEY=AIza...
MISTRAL_API_KEY=...
OLLAMA_HOST=http://localhost:11434   # optional, default: localhost:11434

Integration with nano-* Ecosystem

nano-proxy is the single routing layer for all nano-* projects. Each project supports NANO_PROXY_URL env var:

# Set once
export NANO_PROXY_URL=http://localhost:8765

# nano-eval routes through proxy automatically
nano-eval run configs/example.yaml

# nano-memory embeddings route through proxy
# nano-orchestrator agent calls route through proxy

import os
import anthropic

base_url = os.environ.get("NANO_PROXY_URL")
client = anthropic.Anthropic(
    base_url=f"{base_url}/anthropic" if base_url else None
)

CLI Reference

nano-proxy start                          # Start on default port 8765
nano-proxy start --config proxy.yaml     # Start with config file
nano-proxy start --host 0.0.0.0 --port 9000  # Custom host/port
nano-proxy status                         # Check if proxy is running
nano-proxy cost                           # Show cost breakdown

Contributing

git clone https://github.com/ghanibot/nano-proxy
cd nano-proxy
pip install -e ".[dev]"
pytest

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
banner		banner
configs		configs
nano_proxy		nano_proxy
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem

What Makes It Different

Quick Start

Integration — 1 Line Change

Supported Providers

Configuration

Routing Strategies

API Endpoints

Cost Response

Architecture

Environment Variables

Integration with nano-* Ecosystem

CLI Reference

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Problem

What Makes It Different

Quick Start

Integration — 1 Line Change

Supported Providers

Configuration

Routing Strategies

API Endpoints

Cost Response

Architecture

Environment Variables

Integration with nano-* Ecosystem

CLI Reference

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages