Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A lightweight HTTP proxy for LLMs. It exposes a single /v1/chat/completions endpoint and routes to OpenAI or Ollama based on the requested model. Includes API key auth and simple rate limiting.

Notifications You must be signed in to change notification settings

MackCesar/llm-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-proxy (Go)

A lightweight HTTP proxy for LLMs. It exposes a single /v1/chat/completions endpoint and routes to OpenAI or Ollama based on the requested model. Includes API key auth and simple rate limiting.


Quick Start (local)

# 1) Clone (replace {user} after you fork)
git clone https://github.com/{user}/llm-proxy.git
cd llm-proxy

# 2) Make your module path yours (replace <your-gh-user>)
# macOS:
sed -i.bak 's#github.com/{user}/llm-proxy#github.com/<your-gh-user>/llm-proxy#g' \
  go.mod $(git ls-files "*.go")
# Linux:
# sed -i 's#github.com/{user}/llm-proxy#github.com/<your-gh-user>/llm-proxy#g' \
#   go.mod $(git ls-files "*.go")

# 3) Deps
go mod tidy

# 4) Run (Ollama example)
PORT=8081 API_KEYS=demo_123 OLLAMA_URL=http://localhost:11434 go run .

Health:

curl -s localhost:8081/healthz

Chat (Ollama; ensure Ollama is running and a model pulled, e.g., ollama pull llama3.1):

curl -s -X POST localhost:8081/v1/chat/completions \
  -H 'content-type: application/json' \
  -H 'X-API-Key: demo_123' \
  -d '{
    "model":"llama3.1",
    "messages":[{"role":"user","content":"Say hi in 3 words"}],
    "temperature":0.2
  }'

Chat (OpenAI; requires OPENAI_API_KEY):

OPENAI_API_KEY=sk-... PORT=8081 API_KEYS=demo_123 go run .

curl -s -X POST localhost:8081/v1/chat/completions \
  -H 'content-type: application/json' \
  -H 'X-API-Key: demo_123' \
  -d '{
    "model":"gpt-4o-mini",
    "messages":[{"role":"user","content":"Give me one fun fact"}]
  }'

Run in Docker

To-do: Enhance later


Environment variables

PORT=8081
API_KEYS=demo_123,admin_456     # comma-separated; required for /v1/chat/completions
OPENAI_API_KEY=                 # set to use OpenAI
OLLAMA_URL=http://localhost:11434
RATE_LIMIT_TOKENS_PER_MIN=60
RATE_LIMIT_BURST=60

Create .env.example:

PORT=8081
API_KEYS=demo_123
OPENAI_API_KEY=
OLLAMA_URL=http://localhost:11434
RATE_LIMIT_TOKENS_PER_MIN=60
RATE_LIMIT_BURST=60

API

Health:

  • GET /healthzok (no API key required)

Chat:

  • POST /v1/chat/completions
  • Header: X-API-Key: <your-key>
  • Body:
    {
      "model": "llama3.1",
      "messages": [
        { "role": "user", "content": "Explain RAG in one sentence" }
      ],
      "temperature": 0.2
    }
  • Response (OpenAI-like):
    {
      "model": "llama3.1",
      "choices": [
        { "message": { "role": "assistant", "content": "..." } }
      ]
    }

Routing rule: models that start with gpt- (and o* if you keep the sample) go to OpenAI; everything else goes to Ollama. See internal/providers/router.go.


Run/Debug in GoLand

  • Run → Edit Configurations → + Go Build
    • Run kind: Package
    • Package path: .
    • Working dir: project root
    • Env:
      PORT=8081
      API_KEYS=demo_123
      OLLAMA_URL=http://localhost:11434
      # OPENAI_API_KEY=sk-...  (optional)
      RATE_LIMIT_TOKENS_PER_MIN=60
      RATE_LIMIT_BURST=60
      
  • Set a breakpoint in the /v1/chat/completions handler, click Debug, then send a curl request.

Optional GoLand HTTP file requests.http:

### Health
GET http://localhost:8081/healthz

### Chat (Ollama)
POST http://localhost:8081/v1/chat/completions
Content-Type: application/json
X-API-Key: demo_123

{
  "model": "llama3.1",
  "messages": [{"role":"user","content":"name 3 colors"}],
  "temperature": 0.2
}

Notes

  • This repo uses a placeholder import/module path github.com/{user}/llm-proxy. Replace {user} with your GitHub username after you clone/fork.

License

MIT

About

A lightweight HTTP proxy for LLMs. It exposes a single /v1/chat/completions endpoint and routes to OpenAI or Ollama based on the requested model. Includes API key auth and simple rate limiting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages