A lightweight HTTP proxy for LLMs. It exposes a single /v1/chat/completions endpoint and routes to OpenAI or Ollama based on the requested model. Includes API key auth and simple rate limiting.
# 1) Clone (replace {user} after you fork)
git clone https://github.com/{user}/llm-proxy.git
cd llm-proxy
# 2) Make your module path yours (replace <your-gh-user>)
# macOS:
sed -i.bak 's#github.com/{user}/llm-proxy#github.com/<your-gh-user>/llm-proxy#g' \
go.mod $(git ls-files "*.go")
# Linux:
# sed -i 's#github.com/{user}/llm-proxy#github.com/<your-gh-user>/llm-proxy#g' \
# go.mod $(git ls-files "*.go")
# 3) Deps
go mod tidy
# 4) Run (Ollama example)
PORT=8081 API_KEYS=demo_123 OLLAMA_URL=http://localhost:11434 go run .Health:
curl -s localhost:8081/healthzChat (Ollama; ensure Ollama is running and a model pulled, e.g., ollama pull llama3.1):
curl -s -X POST localhost:8081/v1/chat/completions \
-H 'content-type: application/json' \
-H 'X-API-Key: demo_123' \
-d '{
"model":"llama3.1",
"messages":[{"role":"user","content":"Say hi in 3 words"}],
"temperature":0.2
}'Chat (OpenAI; requires OPENAI_API_KEY):
OPENAI_API_KEY=sk-... PORT=8081 API_KEYS=demo_123 go run .
curl -s -X POST localhost:8081/v1/chat/completions \
-H 'content-type: application/json' \
-H 'X-API-Key: demo_123' \
-d '{
"model":"gpt-4o-mini",
"messages":[{"role":"user","content":"Give me one fun fact"}]
}'To-do: Enhance later
PORT=8081
API_KEYS=demo_123,admin_456 # comma-separated; required for /v1/chat/completions
OPENAI_API_KEY= # set to use OpenAI
OLLAMA_URL=http://localhost:11434
RATE_LIMIT_TOKENS_PER_MIN=60
RATE_LIMIT_BURST=60Create .env.example:
PORT=8081
API_KEYS=demo_123
OPENAI_API_KEY=
OLLAMA_URL=http://localhost:11434
RATE_LIMIT_TOKENS_PER_MIN=60
RATE_LIMIT_BURST=60Health:
GET /healthz→ok(no API key required)
Chat:
POST /v1/chat/completions- Header:
X-API-Key: <your-key> - Body:
{ "model": "llama3.1", "messages": [ { "role": "user", "content": "Explain RAG in one sentence" } ], "temperature": 0.2 } - Response (OpenAI-like):
{ "model": "llama3.1", "choices": [ { "message": { "role": "assistant", "content": "..." } } ] }
Routing rule: models that start with gpt- (and o* if you keep the sample) go to OpenAI; everything else goes to Ollama. See internal/providers/router.go.
- Run → Edit Configurations → + Go Build
- Run kind:
Package - Package path:
. - Working dir: project root
- Env:
PORT=8081 API_KEYS=demo_123 OLLAMA_URL=http://localhost:11434 # OPENAI_API_KEY=sk-... (optional) RATE_LIMIT_TOKENS_PER_MIN=60 RATE_LIMIT_BURST=60
- Run kind:
- Set a breakpoint in the
/v1/chat/completionshandler, click Debug, then send a curl request.
Optional GoLand HTTP file requests.http:
### Health
GET http://localhost:8081/healthz
### Chat (Ollama)
POST http://localhost:8081/v1/chat/completions
Content-Type: application/json
X-API-Key: demo_123
{
"model": "llama3.1",
"messages": [{"role":"user","content":"name 3 colors"}],
"temperature": 0.2
}- This repo uses a placeholder import/module path
github.com/{user}/llm-proxy. Replace{user}with your GitHub username after you clone/fork.
MIT