Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aahmed954/agentz-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentz-Proxy: Native Python AI Router

Production-ready AI orchestration with intelligent cost optimization. Zero infrastructure cost.

CI Pipeline Deploy

Quick Start

# Make API requests:
curl -X POST http://100.96.197.84:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-oracle1-master" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chat-best",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

# Or use with Claude Code:
export ANTHROPIC_API_BASE_URL="http://100.96.197.84:4000"
export ANTHROPIC_API_KEY="sk-oracle1-master"
claude-code "write code"

Features

  • Intelligent Routing - Berkeley RouteLLM for cost optimization (47%+ savings)
  • Native Python - SQLAlchemy ORM, no Prisma, no TypeScript dependencies
  • ARM64 Optimized - Runs perfectly on Oracle Cloud Ampere A1
  • OpenAI Compatible - Drop-in replacement for OpenAI API
  • PostgreSQL Logging - All requests tracked with metrics
  • Self-Hosted CI/CD - GitHub Actions runner on oracle1
  • Free Infrastructure - $0/month on Oracle Cloud Always Free

Architecture

Client → RouteLLM Proxy (4000) → OpenRouter API
           ↓
    PostgreSQL (logging)
    Redis (caching)
    Prometheus (metrics)
    Grafana (dashboards)

Routing Strategy

Query Length Model Cost/M Tokens Latency
< 100 chars Gemini Flash $0.075/$0.30 ~150ms
100-500 chars GPT-4o-mini $0.15/$0.60 ~300ms
> 500 chars Claude Sonnet $3.00/$15.00 ~400ms

Cost Breakdown (60K requests/month)

Without routing:     $126/month (all Claude)
With RouteLLM:       $67/month (47% savings)
With caching (40%):  $40/month (68% savings)

Infrastructure: $0 (Oracle Always Free)

Stack

Services:

  • RouteLLM Proxy - Python 3.11, FastAPI, SQLAlchemy
  • PostgreSQL 16 - Request logging
  • Redis 7 - Caching layer
  • Prometheus - Metrics collection
  • Grafana - Visualization
  • GitHub Actions Runner - Auto-deployment

Infrastructure:

  • Oracle Cloud Ampere A1 (ARM64)
  • 4 cores, 24GB RAM, 194GB storage
  • Tailscale private network
  • GitHub Pro+ (Actions, GHCR, Pages)

Repository Structure

agentz-proxy/
├── services/
│   ├── routellm-proxy/          # Main AI router service
│   ├── attribution-logger/      # Metrics collector
│   └── routing-engine/          # Performance analyzer
├── infra/
│   └── oracle1/
│       ├── docker-compose.yml   # Service orchestration
│       └── configs/             # Service configs
├── docs/                        # Documentation
└── .github/workflows/           # CI/CD pipelines

Documentation

Deployment

Automated (Recommended)

git push
# GitHub Actions automatically:
# 1. Runs CI (lint, test, security scan)
# 2. Builds Docker images
# 3. Deploys to oracle1 via self-hosted runner
# Total time: ~2 minutes

Manual

ssh oracle1
cd ~/agentz-proxy
git pull
cd infra/oracle1
docker compose up -d

Monitoring

# Health check
curl http://100.96.197.84:4000/health

# Grafana dashboards
open http://100.96.197.84:3000

# Database metrics
ssh oracle1
docker exec agentz-postgres psql -U litellm -d agentz -c \
  'SELECT model_routed, COUNT(*), AVG(latency_ms) FROM request_logs GROUP BY model_routed;'

Development

# Local development (MacBook M3)
git clone https://github.com/aahmed954/agentz-proxy.git
cd agentz-proxy

# Make changes
# ... edit code ...

# Test against oracle1
curl http://100.96.197.84:4000/health

# Deploy
git add .
git commit -m "feat: add feature"
git push

# Monitor
gh run watch

Tech Stack Highlights

Pure Python (No Binaries):

  • FastAPI, SQLAlchemy, asyncpg, httpx
  • RouteLLM, Pydantic, Redis
  • All ARM64 native wheels

No Complexity:

  • ❌ No Prisma
  • ❌ No TypeScript
  • ❌ No binary compatibility issues
  • ❌ No migration headaches

Standard Tools:

  • ✅ SQLAlchemy (Python ORM standard)
  • ✅ PostgreSQL (industry standard)
  • ✅ FastAPI (modern Python web framework)
  • ✅ Docker Compose (simple orchestration)

Performance

Target: Sub-500ms for 90% of requests
Actual: 260-700ms depending on model

Breakdown:
- Routing: <1ms (RouteLLM)
- Network: 50-100ms (Tailscale)
- Model TTFT: 150-600ms (varies by model)

Throughput: 100+ req/sec on 4 ARM cores

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing)
  3. Commit changes (git commit -m 'feat: add amazing')
  4. Push to branch (git push origin feature/amazing)
  5. Open Pull Request

License

MIT License - See LICENSE file

Acknowledgments

  • Berkeley RouteLLM - Intelligent routing library
  • Oracle Cloud - Free ARM64 infrastructure
  • OpenRouter - Multi-provider AI API
  • GitHub - CI/CD, Container Registry, Pages

Built with native Python. No Prisma. No complexity. Just works. 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •