endpoint

The endpoint package ties together a queue, a limiter, and an HTTP handler into a single rate-limited path. It is the core orchestrator of the rls request lifecycle.

Architecture

Each Endpoint owns:

A Queue (from the queue package) — holds waiting tickets
A Limiter (from the limiter package) — controls release timing
A dispatcher goroutine — pulls from the queue after acquiring a rate-limit slot
A work channel — signals the dispatcher when a new ticket is pushed

Request lifecycle (standard algorithms)

HTTP request → Handle()
  1. Create Ticket
  2. Admission timeout check (if configured)
  3. Push to Queue (or 429 if full)
  4. Signal work channel
  5. Block on Ticket.Release
  6. Dispatcher: wait for work → lim.Wait() → queue.Pop() → ticket.Release
  7. Return JSON response

Request lifecycle (token_window)

HTTP request → Handle()
  1. Create Ticket with Cost from ?tokens=N (or default_tokens)
  2. Reject immediately if cost > window capacity (429)
  3. Admission timeout check based on pending token cost
  4. Push to Queue (or 429 if full)
  5. Track pending tokens (atomic counter)
  6. Signal work channel
  7. Block on Ticket.Release
  8. Dispatcher: wait for work or window reset →
     PopWhere(TryConsume) → release all fitting tickets
  9. Return JSON response with token window fields

Admission Timeout

Predictive queue rejection based on estimated wait time. Configured via queue_timeout (seconds, 0 = disabled).

How it works: before pushing a ticket, the endpoint estimates how long the request would wait:

estimatedWait = queueLen / rps (for fifo and priority schedulers)
For token_bucket, available burst tokens are subtracted from the queue length
For lifo and random, the check is skipped (wait time is unpredictable)

If the estimate exceeds the timeout, the request is rejected immediately with HTTP 429.

Per-request override: the ?timeout=N query parameter (float seconds) overrides the endpoint config. Use ?timeout=999 to effectively disable the check for a single request.

Token Window Dispatch

When algorithm: token_window is configured, the endpoint uses a different dispatch loop and does not create a standard Limiter. Instead, it owns a TokenWindow capacity tracker.

The dispatchTokenWindow() goroutine selects on two channels:

work — new ticket arrived
tokenWindow.ResetCh() — window capacity reset (ticker fired)

On either signal, it calls releaseTokenFitting() which uses queue.PopWhere(func(t) { return tokenWindow.TryConsume(t.Cost) }) to release all tickets that fit in the current window. This gives best-fit scheduling: small requests pass larger deferred ones when they fit in the remaining capacity.

The pendingTokens atomic counter tracks the total token cost of all queued tickets, used by estimateWait() for admission timeout: ceil(pendingTokens / capacity) * windowSeconds.

Overflow Modes

Mode	Behavior
`reject`	Return 429 immediately when queue is full (default)
`block`	Retry pushing until space opens or the server shuts down

Events

Endpoints optionally emit Event structs to a buffered channel for telemetry:

Event	When
`EventQueued`	Ticket successfully pushed to queue
`EventServed`	Ticket released and response sent
`EventRejected`	Request rejected (queue full or admission timeout)

Configure with WithEventSink(ch):

ch := make(chan Event, 100)
ep, _ := endpoint.New(cfg, endpoint.WithEventSink(ch))

Events are non-blocking: if the channel is full, the event is dropped silently.

Registry

Registry maps URL paths to endpoints. When a request arrives for an unconfigured path, a dynamic endpoint is created on the fly with its own independent queue and limiter, inheriting configuration from the nearest configured ancestor.

reg, _ := endpoint.NewRegistryWithOpts(configs, []endpoint.RegistryOption{
    endpoint.WithMaxDynamic(1000),
}, endpoint.WithEventSink(ch))

ep, ok := reg.Match("/api/v2/users")  // creates dynamic endpoint inheriting from "/api"

Dynamic endpoint creation

When Match() is called with a path that has no exact match in the registry:

Walk parent paths: /api/v2/users → /api/v2 → /api → /
Find the nearest registered ancestor
Create a new endpoint with config.InheritFrom() — zero-value fields in the child are filled from the parent
Register it in the map with Dynamic: true

The dynamic endpoint gets its own queue and limiter — it does not share the parent's. This gives per-path visibility and independent statistics.

Dynamic endpoints persist until server restart. They appear in Snapshot(), QueueDepths(), TUI, and attach mode.

DoS protection

Dynamic creation is capped by max_dynamic_endpoints (default 1000). Once the cap is reached, unconfigured paths fall back to the nearest configured parent's endpoint instead of creating a new one.

defaults:
  max_dynamic_endpoints: 1000   # cap on dynamically created endpoints

Snapshot

Snapshot() returns all endpoints (configured and dynamic) sorted by path:

type EndpointInfo struct {
    Config   config.EndpointConfig
    QueueLen int
}

infos := reg.Snapshot()  // thread-safe, sorted by path

Concurrency

The registry uses sync.RWMutex:

Match() fast path (exact hit): RLock only
Match() slow path (dynamic creation): releases RLock, acquires Lock, double-checks
Snapshot(), QueueDepths(): RLock
StopAll(): Lock

Configuration Reference

defaults:
  max_dynamic_endpoints: 1000  # cap on dynamically created endpoints (default 1000)

endpoints:
  - path: "/api"
    rate: 10                  # requests per unit
    unit: rps                 # rps | rpm
    scheduler: fifo           # fifo | lifo | priority | random
    algorithm: strict         # strict | token_bucket | sliding_window | token_window
    max_queue_size: 500       # max tickets in queue
    overflow: reject          # reject | block
    burst_size: 20            # token_bucket only
    window_seconds: 60        # sliding_window and token_window
    queue_timeout: 3          # admission timeout in seconds (0 = disabled)
    tokens_per_window: 10000  # token_window only: token budget per window
    default_tokens: 1         # token_window only: cost when client omits ?tokens=

Config inheritance

Dynamic endpoints inherit all zero-value fields from their nearest configured ancestor. The InheritFrom(child, parent) function fills these fields: Rate, Unit, Scheduler, Algorithm, MaxQueueSize, Overflow, BurstSize, WindowSeconds, QueueTimeout, TokensPerWindow, DefaultTokens. The child's Path and Dynamic flag are always preserved.

Response Format

Every response includes the full resolved configuration. For dynamic endpoints, inherited values are shown as if explicitly configured:

{
  "ok": true,
  "endpoint": "/api/v2/users",
  "queued_for_ms": 347,
  "queue_depth": 2,
  "rate": 10,
  "unit": "rps",
  "scheduler": "fifo",
  "algorithm": "strict",
  "max_queue_size": 500,
  "overflow": "reject",
  "burst_size": 20,
  "queue_timeout": 3,
  "dynamic": true
}

Fields with zero values (burst_size, window_seconds, queue_timeout, dynamic) are omitted from JSON output.

For token_window endpoints, the response includes additional fields:

{
  "ok": true,
  "endpoint": "/llm",
  "queued_for_ms": 0,
  "algorithm": "token_window",
  "tokens_consumed": 500,
  "tokens_remaining": 9500,
  "window_capacity": 10000,
  "waiting_for_next_window": 3
}

Field	Description
`tokens_consumed`	Token cost charged for this request
`tokens_remaining`	Tokens left in the current window
`window_capacity`	Total token budget per window
`waiting_for_next_window`	Number of requests still queued (deferred to future windows)

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
endpoint.go		endpoint.go
endpoint_test.go		endpoint_test.go
events.go		events.go
events_test.go		events_test.go
registry.go		registry.go
registry_test.go		registry_test.go
response.go		response.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

endpoint

Architecture

Request lifecycle (standard algorithms)

Request lifecycle (token_window)

Admission Timeout

Token Window Dispatch

Overflow Modes

Events

Registry

Dynamic endpoint creation

DoS protection

Snapshot

Concurrency

Configuration Reference

Config inheritance

Response Format

FilesExpand file tree

endpoint

Directory actions

More options

Directory actions

More options

Latest commit

History

endpoint

Folders and files

parent directory

README.md

endpoint

Architecture

Request lifecycle (standard algorithms)

Request lifecycle (token_window)

Admission Timeout

Token Window Dispatch

Overflow Modes

Events

Registry

Dynamic endpoint creation

DoS protection

Snapshot

Concurrency

Configuration Reference

Config inheritance

Response Format