Production-ready AI orchestration with intelligent cost optimization. Zero infrastructure cost.
# Make API requests:
curl -X POST http://100.96.197.84:4000/v1/chat/completions \
-H "Authorization: Bearer sk-oracle1-master" \
-H "Content-Type: application/json" \
-d '{
"model": "chat-best",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'
# Or use with Claude Code:
export ANTHROPIC_API_BASE_URL="http://100.96.197.84:4000"
export ANTHROPIC_API_KEY="sk-oracle1-master"
claude-code "write code"
- ✅ Intelligent Routing - Berkeley RouteLLM for cost optimization (47%+ savings)
- ✅ Native Python - SQLAlchemy ORM, no Prisma, no TypeScript dependencies
- ✅ ARM64 Optimized - Runs perfectly on Oracle Cloud Ampere A1
- ✅ OpenAI Compatible - Drop-in replacement for OpenAI API
- ✅ PostgreSQL Logging - All requests tracked with metrics
- ✅ Self-Hosted CI/CD - GitHub Actions runner on oracle1
- ✅ Free Infrastructure - $0/month on Oracle Cloud Always Free
Client → RouteLLM Proxy (4000) → OpenRouter API
↓
PostgreSQL (logging)
Redis (caching)
Prometheus (metrics)
Grafana (dashboards)
Query Length | Model | Cost/M Tokens | Latency |
---|---|---|---|
< 100 chars | Gemini Flash | $0.075/$0.30 | ~150ms |
100-500 chars | GPT-4o-mini | $0.15/$0.60 | ~300ms |
> 500 chars | Claude Sonnet | $3.00/$15.00 | ~400ms |
Without routing: $126/month (all Claude)
With RouteLLM: $67/month (47% savings)
With caching (40%): $40/month (68% savings)
Infrastructure: $0 (Oracle Always Free)
Services:
- RouteLLM Proxy - Python 3.11, FastAPI, SQLAlchemy
- PostgreSQL 16 - Request logging
- Redis 7 - Caching layer
- Prometheus - Metrics collection
- Grafana - Visualization
- GitHub Actions Runner - Auto-deployment
Infrastructure:
- Oracle Cloud Ampere A1 (ARM64)
- 4 cores, 24GB RAM, 194GB storage
- Tailscale private network
- GitHub Pro+ (Actions, GHCR, Pages)
agentz-proxy/
├── services/
│ ├── routellm-proxy/ # Main AI router service
│ ├── attribution-logger/ # Metrics collector
│ └── routing-engine/ # Performance analyzer
├── infra/
│ └── oracle1/
│ ├── docker-compose.yml # Service orchestration
│ └── configs/ # Service configs
├── docs/ # Documentation
└── .github/workflows/ # CI/CD pipelines
- Final Architecture - Complete system overview
- Hardware Architecture - Physical deployment
- GitHub Pro+ Features - GitHub integration
- Self-Hosted Runner - CI/CD setup
- Claude Code Usage - IDE integration
git push
# GitHub Actions automatically:
# 1. Runs CI (lint, test, security scan)
# 2. Builds Docker images
# 3. Deploys to oracle1 via self-hosted runner
# Total time: ~2 minutes
ssh oracle1
cd ~/agentz-proxy
git pull
cd infra/oracle1
docker compose up -d
# Health check
curl http://100.96.197.84:4000/health
# Grafana dashboards
open http://100.96.197.84:3000
# Database metrics
ssh oracle1
docker exec agentz-postgres psql -U litellm -d agentz -c \
'SELECT model_routed, COUNT(*), AVG(latency_ms) FROM request_logs GROUP BY model_routed;'
# Local development (MacBook M3)
git clone https://github.com/aahmed954/agentz-proxy.git
cd agentz-proxy
# Make changes
# ... edit code ...
# Test against oracle1
curl http://100.96.197.84:4000/health
# Deploy
git add .
git commit -m "feat: add feature"
git push
# Monitor
gh run watch
Pure Python (No Binaries):
- FastAPI, SQLAlchemy, asyncpg, httpx
- RouteLLM, Pydantic, Redis
- All ARM64 native wheels
No Complexity:
- ❌ No Prisma
- ❌ No TypeScript
- ❌ No binary compatibility issues
- ❌ No migration headaches
Standard Tools:
- ✅ SQLAlchemy (Python ORM standard)
- ✅ PostgreSQL (industry standard)
- ✅ FastAPI (modern Python web framework)
- ✅ Docker Compose (simple orchestration)
Target: Sub-500ms for 90% of requests
Actual: 260-700ms depending on model
Breakdown:
- Routing: <1ms (RouteLLM)
- Network: 50-100ms (Tailscale)
- Model TTFT: 150-600ms (varies by model)
Throughput: 100+ req/sec on 4 ARM cores
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing
) - Commit changes (
git commit -m 'feat: add amazing'
) - Push to branch (
git push origin feature/amazing
) - Open Pull Request
MIT License - See LICENSE file
- Berkeley RouteLLM - Intelligent routing library
- Oracle Cloud - Free ARM64 infrastructure
- OpenRouter - Multi-provider AI API
- GitHub - CI/CD, Container Registry, Pages
Built with native Python. No Prisma. No complexity. Just works. 🚀