17 releases (7 breaking)

Uses new Rust 2024

new 0.11.0	Jan 16, 2026
0.9.2	Nov 14, 2025
0.4.0	Jul 31, 2025

#136 in Machine learning

1,002 downloads per month
Used in 2 crates

MIT license

305KB
5.5K SLoC

Onwards

A Rust-based AI Gateway that provides a unified interface for routing requests to openAI compatible targets. The goal is to be as 'transparent' as possible.

Quickstart

Create a config.json file with your target configurations:

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "onwards_model": "gpt-4"
    },
    "claude-3": {
      "url": "https://api.anthropic.com",
      "onwards_key": "sk-ant-your-anthropic-key"
    },
    "local-model": {
      "url": "http://localhost:8080"
    }
  }
}

Start the gateway:

cargo run -- -f config.json

Modifying the file will automatically & atomically reload the configuration (to disable, set the --watch flag to false).

Configuration Options

url: The base URL of the AI provider
onwards_key: API key to include in requests to the target (optional)
onwards_model: Model name to use when forwarding requests (optional)
keys: Array of API keys required for authentication to this target (optional)
rate_limit: Rate limiting configuration with requests_per_second and burst_size (optional)
concurrency_limit: Concurrency limiting configuration with max_concurrent_requests (optional)
upstream_auth_header_name: Custom header name for upstream authentication (optional, defaults to "Authorization")
upstream_auth_header_prefix: Custom prefix for upstream authentication header value (optional, defaults to "Bearer ")
rate_limit: Configuration for per-target rate limiting (optional)
- requests_per_second: Number of requests allowed per second
- burst_size: Maximum burst size of requests
response_header: Key-value pairs to add or override headers in the response (optional)

Usage

Command Line Options

--targets <file>: Path to configuration file (required)
--port <port>: Port to listen on (default: 3000)
--watch: Enable configuration file watching for hot-reloading (default: true)
--metrics: Enable Prometheus metrics endpoint (default: true)
--metrics-port <port>: Port for Prometheus metrics (default: 9090)
--metrics-prefix <prefix>: Prefix for metrics (default: "onwards")

API Usage

List Available Models

Get a list of all configured targets, in the openAI models format:

curl http://localhost:3000/v1/models

Sending requests

Send requests to the gateway using the standard OpenAI API format:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model Override Header

Override the target using the model-override header:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "model-override: claude-3" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

This is also used for routing requests without bodies - for example, to get the embeddings usage for your organization:

curl -X GET http://localhost:3000/v1/organization/usage/embeddings \
  -H "model-override: claude-3"

Metrics

To enable Prometheus metrics, start the gateway with the --metrics flag, then access the metrics endpoint by:

curl http://localhost:9090/metrics

Authentication

Onwards supports bearer token authentication to control access to your AI targets. You can configure authentication keys both globally and per-target.

Global Authentication Keys

Global keys apply to all targets that have authentication enabled:

{
  "auth": {
    "global_keys": ["global-api-key-1", "global-api-key-2"]
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "keys": ["target-specific-key"]
    }
  }
}

Per-Target Authentication

You can also specify authentication keys for individual targets:

{
  "targets": {
    "secure-gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "keys": ["secure-key-1", "secure-key-2"]
    },
    "open-local": {
      "url": "http://localhost:8080"
    }
  }
}

In this example:

secure-gpt-4 requires a valid bearer token from the keys array
open-local has no authentication requirements

If both global and local keys are supplied, either global or local keys will be valid for accessing models with local keys.

How Authentication Works

When a target has keys configured, requests must include a valid Authorization: Bearer <token> header where <token> matches one of the configured keys. If global keys are configured, they are automatically added to each target's key set.

Successful authenticated request:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer secure-key-1" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "secure-gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Failed authentication (invalid key):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer wrong-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "secure-gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Returns: 401 Unauthorized

Failed authentication (missing header):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "secure-gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Returns: 401 Unauthorized

No authentication required:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "open-local",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Success - no authentication required for this target

Upstream Authentication Configuration

By default, Onwards sends upstream API keys using the standard Authorization: Bearer <key> header format. However, some AI providers use different authentication header formats. You can customize both the header name and prefix per target.

Custom Header Name

Some providers use custom header names for authentication:

{
  "targets": {
    "custom-api": {
      "url": "https://api.custom-provider.com",
      "onwards_key": "your-api-key-123",
      "upstream_auth_header_name": "X-API-Key"
    }
  }
}

This sends: X-API-Key: Bearer your-api-key-123

Custom Header Prefix

Some providers use different prefixes or no prefix at all:

{
  "targets": {
    "api-with-prefix": {
      "url": "https://api.provider1.com",
      "onwards_key": "token-xyz",
      "upstream_auth_header_prefix": "ApiKey "
    },
    "api-without-prefix": {
      "url": "https://api.provider2.com",
      "onwards_key": "plain-key-456",
      "upstream_auth_header_prefix": ""
    }
  }
}

This sends:

To provider1: Authorization: ApiKey token-xyz
To provider2: Authorization: plain-key-456

Combining Custom Name and Prefix

You can customize both the header name and prefix:

{
  "targets": {
    "fully-custom": {
      "url": "https://api.custom.com",
      "onwards_key": "secret-key",
      "upstream_auth_header_name": "X-Custom-Auth",
      "upstream_auth_header_prefix": "Token "
    }
  }
}

This sends: X-Custom-Auth: Token secret-key

Default Behavior

If these options are not specified, Onwards uses the standard OpenAI-compatible format:

{
  "targets": {
    "standard-api": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-openai-key"
    }
  }
}

This sends: Authorization: Bearer sk-openai-key

Rate Limiting

Onwards supports per-target rate limiting using a token bucket algorithm. This allows you to control the request rate to each AI provider independently.

Configuration

Add rate limiting to any target in your config.json:

{
  "targets": {
    "rate-limited-model": {
      "url": "https://api.provider.com",
      "key": "your-api-key",
      "rate_limit": {
        "requests_per_second": 5.0,
        "burst_size": 10
      }
    }
  }
}

How It Works

We use a "Token Bucket Algorithm": Each target gets its own token bucket.Tokens are refilled at a rate determined by the "requests_per_second" parameter. The maximum number of tokens in the bucket is determined by the "burst_size" parameter. When the bucket is empty, requests to that target will be rejected with a 429 Too Many Requests response.

Examples

// Allow 1 request per second with burst of 5
"rate_limit": {
  "requests_per_second": 1.0,
  "burst_size": 5
}

// Allow 100 requests per second with burst of 200  
"rate_limit": {
  "requests_per_second": 100.0,
  "burst_size": 200
}

Rate limiting is optional - targets without rate_limit configuration have no rate limiting applied.

Per-API-Key Rate Limiting

In addition to per-target rate limiting, Onwards supports individual rate limits for different API keys. This allows you to provide different service tiers to your users - for example, basic users might have lower limits while premium users get higher limits.

Configuration

Per-key rate limiting uses a key_definitions section in the auth configuration:

{
  "auth": {
    "global_keys": ["fallback-key"],
    "key_definitions": {
      "basic_user": {
        "key": "sk-user-12345",
        "rate_limit": {
          "requests_per_second": 10,
          "burst_size": 20
        }
      },
      "premium_user": {
        "key": "sk-premium-67890",
        "rate_limit": {
          "requests_per_second": 100,
          "burst_size": 200
        }
      },
      "enterprise_user": {
        "key": "sk-enterprise-abcdef",
        "rate_limit": {
          "requests_per_second": 500,
          "burst_size": 1000
        }
      }
    }
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key",
      "keys": ["basic_user", "premium_user", "enterprise_user", "fallback-key"]
    }
  }
}

Priority Order

Rate limits are checked in this order:

Per-key rate limits (if the API key has limits configured)
Per-target rate limits (if the target has limits configured)

If either limit is exceeded, the request returns 429 Too Many Requests.

Usage Examples

Basic user request (10/sec limit):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-user-12345" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Premium user request (100/sec limit):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer sk-premium-67890" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Legacy key (no per-key limits):

curl -X POST http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer fallback-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Concurrency Limiting

In addition to rate limiting (which controls how fast requests are made), Onwards supports concurrency limiting to control how many requests are processed simultaneously. This is useful for managing resource usage and preventing overload.

Per-Target Concurrency Limiting

Limit the number of concurrent requests to a specific target:

{
  "targets": {
    "resource-limited-model": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "concurrency_limit": {
        "max_concurrent_requests": 5
      }
    }
  }
}

With this configuration, only 5 requests will be processed concurrently for this target. Additional requests will receive a 429 Too Many Requests response until an in-flight request completes.

Per-API-Key Concurrency Limiting

You can also set different concurrency limits for different API keys:

{
  "auth": {
    "key_definitions": {
      "basic_user": {
        "key": "sk-user-12345",
        "concurrency_limit": {
          "max_concurrent_requests": 2
        }
      },
      "premium_user": {
        "key": "sk-premium-67890",
        "concurrency_limit": {
          "max_concurrent_requests": 10
        },
        "rate_limit": {
          "requests_per_second": 100,
          "burst_size": 200
        }
      }
    }
  },
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-openai-key"
    }
  }
}

Combining Rate Limiting and Concurrency Limiting

You can use both rate limiting and concurrency limiting together:

Rate limiting controls how fast requests are made over time
Concurrency limiting controls how many requests are active at once

{
  "targets": {
    "balanced-model": {
      "url": "https://api.provider.com",
      "onwards_key": "your-api-key",
      "rate_limit": {
        "requests_per_second": 10,
        "burst_size": 20
      },
      "concurrency_limit": {
        "max_concurrent_requests": 5
      }
    }
  }
}

How It Works

Concurrency limits use a semaphore-based approach:

When a request arrives, it tries to acquire a permit
If a permit is available, the request proceeds (holding the permit)
If no permits are available, the request is rejected with 429 Too Many Requests
When the request completes, the permit is automatically released

The error response distinguishes between rate limiting and concurrency limiting:

Rate limit: "code": "rate_limit"
Concurrency limit: "code": "concurrency_limit_exceeded"

Both use HTTP 429 status code for consistency.

Response Headers

Onwards can include custom headers in the response for each target. These can override existing headers or add new ones.

Pricing

One use of this feature is to set pricing information. This means that if you have a dynamic token price when a user's request is accepted the price is then agreed and can be recorded in the HTTP headers.

Add pricing information to any target in your config.json:

{
  "targets": {
    "priced-model": {
      "url": "https://api.provider.com",
      "key": "your-api-key",
      "response_headers": {
        "Input-Price-Per-Token": "0.0001",
        "Output-Price-Per-Token": "0.0002"
      }

Response Sanitization

Onwards can enforce strict OpenAI API schema compliance for /v1/chat/completions responses. This feature:

Removes provider-specific fields from responses
Rewrites the model field to match what the client originally requested
Supports both streaming and non-streaming responses
Validates responses against OpenAI's official API schema

This is useful when proxying to non-OpenAI providers that add custom fields, or when using onwards_model to rewrite model names upstream.

Enabling Response Sanitization

Add sanitize_response: true to any target or provider in your config.json:

Single provider:

{
  "targets": {
    "gpt-4": {
      "url": "https://api.openai.com",
      "onwards_key": "sk-your-key",
      "onwards_model": "gpt-4-turbo-2024-04-09",
      "sanitize_response": true
    }
  }
}

Pool with multiple providers:

{
  "targets": {
    "gpt-4": {
      "sanitize_response": true,
      "providers": [
        {
          "url": "https://api1.example.com",
          "onwards_key": "sk-key-1"
        },
        {
          "url": "https://api2.example.com",
          "onwards_key": "sk-key-2"
        }
      ]
    }
  }
}

How it Works

When sanitize_response: true and a client requests model: gpt-4:

Request sent upstream with model: gpt-4
Upstream responds with custom fields and model: gpt-4-turbo-2024-04-09
Onwards sanitizes:
- Parses response using OpenAI schema (removes unknown fields)
- Rewrites model field to gpt-4 (matches original request)
- Reserializes clean response
Client receives standard OpenAI response with model: gpt-4

Common Use Cases

Third-party providers (e.g., OpenRouter, Together AI) often add extra fields:

provider, native_finish_reason, cost, etc.

Provider comparison - normalize responses from different providers for consistent handling

Debugging - reduce noise by filtering to only standard OpenAI fields

Supported Endpoints

Currently supports:

/v1/chat/completions (streaming and non-streaming)

Load Balancing

Onwards supports load balancing across multiple providers for a single alias, with automatic failover, weighted distribution, and configurable retry behavior.

Configuration

{
  "targets": {
    "gpt-4": {
      "strategy": "weighted_random",
      "fallback": {
        "enabled": true,
        "on_status": [429, 5],
        "on_rate_limit": true
      },
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-key-1", "weight": 3 },
        { "url": "https://api.openai.com", "onwards_key": "sk-key-2", "weight": 1 }
      ]
    }
  }
}

Strategy

weighted_random (default): Distributes traffic randomly based on weights. A provider with weight: 3 receives ~3x the traffic of weight: 1.
priority: Always routes to the first provider. Falls through to subsequent providers only when fallback is triggered.

Fallback

Controls automatic retry on other providers when requests fail:

enabled: Master switch (default: false)
on_status: Status codes that trigger fallback. Supports wildcards:
- 5 → all 5xx (500-599)
- 50 → 500-509
- 502 → exact match
on_rate_limit: Fallback when hitting local rate limits (default: false)

When fallback triggers, the next provider is selected based on strategy (weighted random resamples from remaining pool; priority uses definition order).

Pool-Level Options

Settings that apply to the entire alias:

Option	Description
`keys`	Access control keys for this alias
`rate_limit`	Rate limit for all requests to this alias
`concurrency_limit`	Max concurrent requests to this alias
`response_headers`	Headers added to all responses
`strategy`	`weighted_random` or `priority`
`fallback`	Retry configuration (see above)
`providers`	Array of provider configurations

Provider-Level Options

Settings specific to each provider:

Option	Description
`url`	Provider endpoint URL
`onwards_key`	API key for this provider
`onwards_model`	Model name override
`weight`	Traffic weight (default: 1)
`rate_limit`	Provider-specific rate limit
`concurrency_limit`	Provider-specific concurrency limit
`response_headers`	Provider-specific headers

Examples

Primary/backup failover:

{
  "targets": {
    "gpt-4": {
      "strategy": "priority",
      "fallback": { "enabled": true, "on_status": [5], "on_rate_limit": true },
      "providers": [
        { "url": "https://primary.example.com", "onwards_key": "sk-primary" },
        { "url": "https://backup.example.com", "onwards_key": "sk-backup" }
      ]
    }
  }
}

Multiple API keys with pool-level rate limit:

{
  "targets": {
    "gpt-4": {
      "rate_limit": { "requests_per_second": 100, "burst_size": 200 },
      "providers": [
        { "url": "https://api.openai.com", "onwards_key": "sk-key-1" },
        { "url": "https://api.openai.com", "onwards_key": "sk-key-2" }
      ]
    }
  }
}

Backwards Compatibility

Single-provider configs still work unchanged:

{ "targets": { "gpt-4": { "url": "https://api.openai.com", "onwards_key": "sk-key" } } }

Testing

Run the test suite:

cargo test

Dependencies

~28–48MB
~671K SLoC