The endpoint package ties together a queue, a limiter, and an HTTP handler into a single rate-limited path. It is the core orchestrator of the rls request lifecycle.
Each Endpoint owns:
- A Queue (from the
queuepackage) — holds waiting tickets - A Limiter (from the
limiterpackage) — controls release timing - A dispatcher goroutine — pulls from the queue after acquiring a rate-limit slot
- A work channel — signals the dispatcher when a new ticket is pushed
HTTP request → Handle()
1. Create Ticket
2. Admission timeout check (if configured)
3. Push to Queue (or 429 if full)
4. Signal work channel
5. Block on Ticket.Release
6. Dispatcher: wait for work → lim.Wait() → queue.Pop() → ticket.Release
7. Return JSON response
HTTP request → Handle()
1. Create Ticket with Cost from ?tokens=N (or default_tokens)
2. Reject immediately if cost > window capacity (429)
3. Admission timeout check based on pending token cost
4. Push to Queue (or 429 if full)
5. Track pending tokens (atomic counter)
6. Signal work channel
7. Block on Ticket.Release
8. Dispatcher: wait for work or window reset →
PopWhere(TryConsume) → release all fitting tickets
9. Return JSON response with token window fields
Predictive queue rejection based on estimated wait time. Configured via queue_timeout (seconds, 0 = disabled).
How it works: before pushing a ticket, the endpoint estimates how long the request would wait:
estimatedWait = queueLen / rps(for fifo and priority schedulers)- For
token_bucket, available burst tokens are subtracted from the queue length - For
lifoandrandom, the check is skipped (wait time is unpredictable)
If the estimate exceeds the timeout, the request is rejected immediately with HTTP 429.
Per-request override: the ?timeout=N query parameter (float seconds) overrides the endpoint config. Use ?timeout=999 to effectively disable the check for a single request.
When algorithm: token_window is configured, the endpoint uses a different dispatch loop and does not create a standard Limiter. Instead, it owns a TokenWindow capacity tracker.
The dispatchTokenWindow() goroutine selects on two channels:
work— new ticket arrivedtokenWindow.ResetCh()— window capacity reset (ticker fired)
On either signal, it calls releaseTokenFitting() which uses queue.PopWhere(func(t) { return tokenWindow.TryConsume(t.Cost) }) to release all tickets that fit in the current window. This gives best-fit scheduling: small requests pass larger deferred ones when they fit in the remaining capacity.
The pendingTokens atomic counter tracks the total token cost of all queued tickets, used by estimateWait() for admission timeout: ceil(pendingTokens / capacity) * windowSeconds.
| Mode | Behavior |
|---|---|
reject |
Return 429 immediately when queue is full (default) |
block |
Retry pushing until space opens or the server shuts down |
Endpoints optionally emit Event structs to a buffered channel for telemetry:
| Event | When |
|---|---|
EventQueued |
Ticket successfully pushed to queue |
EventServed |
Ticket released and response sent |
EventRejected |
Request rejected (queue full or admission timeout) |
Configure with WithEventSink(ch):
ch := make(chan Event, 100)
ep, _ := endpoint.New(cfg, endpoint.WithEventSink(ch))Events are non-blocking: if the channel is full, the event is dropped silently.
Registry maps URL paths to endpoints. When a request arrives for an unconfigured path, a dynamic endpoint is created on the fly with its own independent queue and limiter, inheriting configuration from the nearest configured ancestor.
reg, _ := endpoint.NewRegistryWithOpts(configs, []endpoint.RegistryOption{
endpoint.WithMaxDynamic(1000),
}, endpoint.WithEventSink(ch))
ep, ok := reg.Match("/api/v2/users") // creates dynamic endpoint inheriting from "/api"When Match() is called with a path that has no exact match in the registry:
- Walk parent paths:
/api/v2/users→/api/v2→/api→/ - Find the nearest registered ancestor
- Create a new endpoint with
config.InheritFrom()— zero-value fields in the child are filled from the parent - Register it in the map with
Dynamic: true
The dynamic endpoint gets its own queue and limiter — it does not share the parent's. This gives per-path visibility and independent statistics.
Dynamic endpoints persist until server restart. They appear in Snapshot(), QueueDepths(), TUI, and attach mode.
Dynamic creation is capped by max_dynamic_endpoints (default 1000). Once the cap is reached, unconfigured paths fall back to the nearest configured parent's endpoint instead of creating a new one.
defaults:
max_dynamic_endpoints: 1000 # cap on dynamically created endpointsSnapshot() returns all endpoints (configured and dynamic) sorted by path:
type EndpointInfo struct {
Config config.EndpointConfig
QueueLen int
}
infos := reg.Snapshot() // thread-safe, sorted by pathThe registry uses sync.RWMutex:
Match()fast path (exact hit):RLockonlyMatch()slow path (dynamic creation): releasesRLock, acquiresLock, double-checksSnapshot(),QueueDepths():RLockStopAll():Lock
defaults:
max_dynamic_endpoints: 1000 # cap on dynamically created endpoints (default 1000)
endpoints:
- path: "/api"
rate: 10 # requests per unit
unit: rps # rps | rpm
scheduler: fifo # fifo | lifo | priority | random
algorithm: strict # strict | token_bucket | sliding_window | token_window
max_queue_size: 500 # max tickets in queue
overflow: reject # reject | block
burst_size: 20 # token_bucket only
window_seconds: 60 # sliding_window and token_window
queue_timeout: 3 # admission timeout in seconds (0 = disabled)
tokens_per_window: 10000 # token_window only: token budget per window
default_tokens: 1 # token_window only: cost when client omits ?tokens=Dynamic endpoints inherit all zero-value fields from their nearest configured ancestor. The InheritFrom(child, parent) function fills these fields: Rate, Unit, Scheduler, Algorithm, MaxQueueSize, Overflow, BurstSize, WindowSeconds, QueueTimeout, TokensPerWindow, DefaultTokens. The child's Path and Dynamic flag are always preserved.
Every response includes the full resolved configuration. For dynamic endpoints, inherited values are shown as if explicitly configured:
{
"ok": true,
"endpoint": "/api/v2/users",
"queued_for_ms": 347,
"queue_depth": 2,
"rate": 10,
"unit": "rps",
"scheduler": "fifo",
"algorithm": "strict",
"max_queue_size": 500,
"overflow": "reject",
"burst_size": 20,
"queue_timeout": 3,
"dynamic": true
}Fields with zero values (burst_size, window_seconds, queue_timeout, dynamic) are omitted from JSON output.
For token_window endpoints, the response includes additional fields:
{
"ok": true,
"endpoint": "/llm",
"queued_for_ms": 0,
"algorithm": "token_window",
"tokens_consumed": 500,
"tokens_remaining": 9500,
"window_capacity": 10000,
"waiting_for_next_window": 3
}| Field | Description |
|---|---|
tokens_consumed |
Token cost charged for this request |
tokens_remaining |
Tokens left in the current window |
window_capacity |
Total token budget per window |
waiting_for_next_window |
Number of requests still queued (deferred to future windows) |