Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b38e57d

Browse files
authored
refactor: unify resilience controls (#1449)
Integrated into release/v3.7.0
1 parent 9311f2a commit b38e57d

68 files changed

Lines changed: 3734 additions & 3788 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 35 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -324,12 +324,13 @@ AI providers can become unstable, return 5xx errors, or hit temporary rate limit
324324

325325
**How OmniRoute solves it:**
326326

327-
- **Settings-Driven Lock Hierarchy** — Provider profiles control default account/model lockouts, global model quarantine, and provider circuit breakers from one control surface, while explicit upstream `Retry-After` windows still take priority
328-
- **Exponential Backoff** — Progressive retry delays for both account/model lockouts and higher-level quarantine
327+
- **Request Queue & Pacing** — Per-connection request buckets smooth bursts before they hit upstream rate caps
328+
- **Connection Cooldown** — A single connection cools down after retryable failures with optional upstream `Retry-After` hints and exponential backoff
329+
- **Provider Circuit Breaker** — The provider only trips after fallback is exhausted and the provider request still fails with provider-wide transient errors; connection-scoped `429` rate limits stay in Connection Cooldown
330+
- **Wait For Cooldown** — The server can wait for the earliest connection cooldown to expire and retry the same client request automatically
329331
- **Anti-Thundering Herd** — Mutex + semaphore protection against concurrent retry storms
330332
- **Combo Fallback Chains** — If the primary provider fails, automatically falls through the chain with no intervention
331-
- **Combo Circuit Breaker** — Auto-disables failing providers within a combo chain
332-
- **Health Dashboard** — Uptime monitoring, circuit breaker states, lockouts, cache stats, p50/p95/p99 latency
333+
- **Health Dashboard** — Uptime monitoring, provider circuit breaker states, cooldowns, cache stats, p50/p95/p99 latency
333334

334335
</details>
335336

@@ -470,7 +471,7 @@ As request volume grows, without caching the same questions generate duplicate c
470471
- **Semantic Cache** — Two-tier cache (signature + semantic) reduces cost and latency
471472
- **Request Idempotency** — 5s deduplication window for identical requests
472473
- **Rate Limit Detection** — Per-provider RPM, min gap, and max concurrent tracking
473-
- **Editable Rate Limits** — Configurable defaults in Settings → Resilience with persistence
474+
- **Request Queue & Pacing** — Configurable queue, pacing, and concurrency defaults in Settings → Resilience
474475
- **API Key Validation Cache** — 3-tier cache for production performance
475476
- **Health Dashboard with Telemetry** — p50/p95/p99 latency, cache stats, uptime
476477

@@ -567,8 +568,8 @@ Teams need quick runtime changes during incidents or cost events.
567568
**How OmniRoute solves it:**
568569

569570
- Switch combo activation directly from MCP dashboard
570-
- Apply resilience profiles from pre-defined policy packs
571-
- Reset circuit breaker state from the same operations panel
571+
- Tune queue, cooldown, breaker, and wait settings from the dedicated Resilience page
572+
- Review live provider breaker state from the Health dashboard
572573

573574
</details>
574575

@@ -1352,7 +1353,7 @@ OmniRoute v3.6 is built as an operational platform, not just a relay proxy.
13521353
| 🔢 **Hybrid Token Counting** | Uses provider-side `/messages/count_tokens` when available; falls back to estimation — accurate usage tracking without guessing |
13531354
| 🌱 **Model Alias Auto-Seed** | 30+ cross-proxy dialect aliases normalised at startup — no more routing mismatches |
13541355
| 🛡️ **Safe Outbound Fetch** | All provider validation and model discovery go through a guarded fetch layer blocking private/local URLs with retry, timeout, and SSRF protection |
1355-
| 🔄 **Cooldown-Aware Retries** | Chat requests auto-retry on model-scoped cooldowns with configurable `requestRetry` and `maxRetryIntervalSec` |
1356+
| **Wait For Cooldown** | Server-side chat retries when every candidate connection is cooling down; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec` |
13561357
| 🔍 **Runtime Env Validation** | Startup validates all env vars with Zod schemas — clear errors for missing secrets, invalid URLs, or wrong types |
13571358
| 📋 **Compliance Audit Expansion** | Structured audit logs with pagination, request context, auth events, provider CRUD events, and SSRF-blocked validation logging |
13581359
| 🔐 **TPS Log Metric** | Log details modal shows Tokens Per Second (TPS) — quick performance at-a-glance for every request |
@@ -1406,7 +1407,7 @@ OmniRoute v3.6 is built as an operational platform, not just a relay proxy.
14061407
| 📡 **A2A Task Lifecycle Management** | List/filter tasks, inspect events/artifacts, cancel running tasks |
14071408
| 📋 **Agent Card Discovery** | `/.well-known/agent.json` for client auto-discovery |
14081409
| 🧪 **Protocol E2E Test Harness** | Real MCP SDK + A2A client flows in `test:protocols:e2e` |
1409-
| ⚙️ **Operational Controls** | Switch combo, apply resilience profiles, reset breakers from one control surface |
1410+
| ⚙️ **Operational Controls** | Switch combos, tune resilience settings, and review breaker state from dedicated Health and Settings surfaces |
14101411
14111412
### 🧠 Routing & Intelligence
14121413
@@ -1446,29 +1447,30 @@ OmniRoute v3.6 is built as an operational platform, not just a relay proxy.
14461447
14471448
### 🛡️ Resilience, Security & Governance
14481449
1449-
| Feature | What It Does |
1450-
| ----------------------------------- | --------------------------------------------------------------------------------------- |
1451-
| 🔌 **Circuit Breakers** | Per-provider and per-model trip/recover with 10-minute cooldowns |
1452-
| 🔒 **Daily Quota Lock** 🆕 | Detects exhaustion signals and locks routing for the specific model until midnight |
1453-
| 🎯 **Endpoint-Aware Models** | Custom models declare supported endpoints + API format |
1454-
| 🛡️ **Anti-Thundering Herd** | Mutex + semaphore protections on retry/rate events |
1455-
| 🧠 **Semantic + Signature Cache** | Cost/latency reduction with two cache layers |
1456-
| ⚡ **Request Idempotency** | Duplicate protection window |
1457-
| 🔒 **TLS Fingerprint Spoofing** | Browser-like TLS fingerprint — **reduces bot detection and account flagging** |
1458-
| 🔏 **CLI Fingerprint Matching** | Matches native CLI request signatures — **reduces ban risk while preserving proxy IP** |
1459-
| 🌐 **IP Filtering** | Allowlist/blocklist control for exposed deployments |
1460-
| 📊 **Editable Rate Limits** | Configurable global/provider-level limits with persistence |
1461-
| 📉 **Graceful Degradation** | Multi-layer capability fallbacks protecting core gateway operations |
1462-
| 📜 **Config Audit Trail** | Diff-based change tracking preventing operational drift with simple rollbacks |
1463-
| ⏳ **Provider Health Sync** | Proactive token expiration monitoring triggering alerts before authorization failures |
1464-
| 🚪 **Auto-Disable Banned Accounts** | Operational circuit breaker sealing permanently blocked token accounts automatically |
1465-
| 🔑 **API Key Management + Scoping** | Secure key issuance/rotation and model/provider controls |
1466-
| 👁️ **Scoped API Key Reveal** 🆕 | Opt-in recovery of API keys via `ALLOW_API_KEY_REVEAL` |
1467-
| 🛡️ **Protected `/models`** | Optional auth gating and provider hiding for model catalog |
1468-
| 🛡️ **Safe Outbound Fetch** 🆕 | Guarded fetch for provider calls — blocks private/local URLs, retries, SSRF protection |
1469-
| 🔄 **Cooldown-Aware Retries** 🆕 | Auto-retry chat on model cooldowns; configurable `requestRetry` / `maxRetryIntervalSec` |
1470-
| 🔍 **Runtime Env Validation** 🆕 | Zod-based env schema validation at startup with actionable error messages |
1471-
| 📋 **Compliance Audit v2** 🆕 | Pagination, request context, auth events, provider CRUD, and SSRF-blocked logging |
1450+
| Feature | What It Does |
1451+
| ----------------------------------- | ------------------------------------------------------------------------------------------------------- |
1452+
| 🔌 **Provider Circuit Breakers** | Provider-wide trip/recover after fallback exhaustion with configurable thresholds |
1453+
| 🔒 **Daily Quota Lock** 🆕 | Detects exhaustion signals and locks routing for the specific model until midnight |
1454+
| 🎯 **Endpoint-Aware Models** | Custom models declare supported endpoints + API format |
1455+
| 🛡️ **Anti-Thundering Herd** | Mutex + semaphore protections on retry/rate events |
1456+
| 🧠 **Semantic + Signature Cache** | Cost/latency reduction with two cache layers |
1457+
| ⚡ **Request Idempotency** | Duplicate protection window |
1458+
| 🔒 **TLS Fingerprint Spoofing** | Browser-like TLS fingerprint — **reduces bot detection and account flagging** |
1459+
| 🔏 **CLI Fingerprint Matching** | Matches native CLI request signatures — **reduces ban risk while preserving proxy IP** |
1460+
| 🌐 **IP Filtering** | Allowlist/blocklist control for exposed deployments |
1461+
| 🚦 **Request Queue & Pacing** | Configurable per-connection request buckets for RPM, spacing, concurrency, and max wait |
1462+
| 📉 **Graceful Degradation** | Multi-layer capability fallbacks protecting core gateway operations |
1463+
| 📜 **Config Audit Trail** | Diff-based change tracking preventing operational drift with simple rollbacks |
1464+
| ⏳ **Provider Health Sync** | Proactive token expiration monitoring triggering alerts before authorization failures |
1465+
| ❄️ **Connection Cooldown** | Retryable 408/429/5xx failures cool down a single connection with optional upstream hints |
1466+
| 🚪 **Auto-Disable Banned Accounts** | Permanently blocked token accounts can be disabled automatically |
1467+
| 🔑 **API Key Management + Scoping** | Secure key issuance/rotation and model/provider controls |
1468+
| 👁️ **Scoped API Key Reveal** 🆕 | Opt-in recovery of API keys via `ALLOW_API_KEY_REVEAL` |
1469+
| 🛡️ **Protected `/models`** | Optional auth gating and provider hiding for model catalog |
1470+
| 🛡️ **Safe Outbound Fetch** 🆕 | Guarded fetch for provider calls — blocks private/local URLs, retries, SSRF protection |
1471+
| ⏳ **Wait For Cooldown** 🆕 | Auto-retry chat after connection cooldowns; configurable `enabled`, `maxRetries`, and `maxRetryWaitSec` |
1472+
| 🔍 **Runtime Env Validation** 🆕 | Zod-based env schema validation at startup with actionable error messages |
1473+
| 📋 **Compliance Audit v2** 🆕 | Pagination, request context, auth events, provider CRUD, and SSRF-blocked logging |
14721474
14731475
### 📊 Observability & Analytics
14741476
@@ -2283,7 +2285,7 @@ OmniRoute has **218+ features planned** across multiple development phases. Here
22832285
| 🧠 **Routing & Intelligence** | 25+ | Lowest-latency routing, tag-based routing, quota preflight, quota-aware P2C, step-based combo routing |
22842286
| 🔒 **Security & Compliance** | 20+ | SSRF hardening, credential cloaking, rate-limit per endpoint, management key scoping |
22852287
| 📊 **Observability** | 15+ | OpenTelemetry integration, real-time quota monitoring, combo target health, cost tracking per model |
2286-
| 🔄 **Provider Integrations** | 20+ | Dynamic model registry, provider cooldowns, multi-account Codex, Copilot quota parsing |
2288+
| 🔄 **Provider Integrations** | 20+ | Dynamic model registry, connection cooldowns, multi-account Codex, Copilot quota parsing |
22872289
| ⚡ **Performance** | 15+ | Dual cache layer, prompt cache, response cache, streaming keepalive, batch API |
22882290
| 🌐 **Ecosystem** | 10+ | WebSocket API, config hot-reload, distributed config store, commercial mode |
22892291

docs/API_REFERENCE.md

Lines changed: 6 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -284,12 +284,12 @@ GET response includes `agents[]` (id, name, binary, version, installed, protocol
284284

285285
### Resilience & Rate Limits
286286

287-
| Endpoint | Method | Description |
288-
| ----------------------- | --------- | ------------------------------- |
289-
| `/api/resilience` | GET/PATCH | Get/update resilience profiles |
290-
| `/api/resilience/reset` | POST | Reset circuit breakers |
291-
| `/api/rate-limits` | GET | Per-account rate limit status |
292-
| `/api/rate-limit` | GET | Global rate limit configuration |
287+
| Endpoint | Method | Description |
288+
| ----------------------- | --------- | ---------------------------------------------------------------------------------- |
289+
| `/api/resilience` | GET/PATCH | Get/update request queue, connection cooldown, provider breaker, and wait settings |
290+
| `/api/resilience/reset` | POST | Reset provider circuit breakers |
291+
| `/api/rate-limits` | GET | Per-account rate limit status |
292+
| `/api/rate-limit` | GET | Global rate limit configuration |
293293

294294
### Evals
295295

@@ -443,25 +443,6 @@ Content-Type: application/json
443443
}
444444
```
445445

446-
---
447-
448-
## Model Availability
449-
450-
```bash
451-
# Get real-time model availability across all providers
452-
GET /api/models/availability
453-
454-
# Check availability for a specific model
455-
POST /api/models/availability
456-
Content-Type: application/json
457-
458-
{
459-
"model": "claude-sonnet-4-5-20250929"
460-
}
461-
```
462-
463-
---
464-
465446
## Request Processing
466447

467448
1. Client sends request to `/v1/*`

docs/ARCHITECTURE.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Core capabilities:
4242
- Circuit breaker pattern for provider resilience
4343
- Anti-thundering herd protection with mutex locking
4444
- Signature-based request deduplication cache
45-
- Domain layer: model availability, cost rules, fallback policy, lockout policy
45+
- Domain layer: cost rules, fallback policy, lockout policy
4646
- Context Relay: session handoff summaries for account rotation continuity
4747
- Domain state persistence (SQLite write-through cache for fallbacks, budgets, lockouts, circuit breakers)
4848
- Policy engine for centralized request evaluation (lockout → budget → fallback)
@@ -51,7 +51,7 @@ Core capabilities:
5151
- Correlation ID (X-Request-Id) for end-to-end tracing
5252
- Compliance audit logging with opt-out per API key
5353
- Eval framework for LLM quality assurance
54-
- Resilience UI dashboard with real-time circuit breaker status
54+
- Health dashboard with real-time provider circuit breaker status
5555
- MCP Server (25 tools) with 3 transports (stdio/SSE/Streamable HTTP)
5656
- A2A Server (JSON-RPC 2.0 + SSE) with skills and task lifecycle
5757
- Memory system (extraction, injection, retrieval, summarization)
@@ -205,10 +205,9 @@ Management domains:
205205
- System prompt: `src/app/api/settings/system-prompt` (GET/PUT)
206206
- Sessions: `src/app/api/sessions` (GET)
207207
- Rate limits: `src/app/api/rate-limits` (GET)
208-
- Resilience: `src/app/api/resilience` (GET/PATCH) — provider profiles, circuit breaker, rate limit state
209-
- Resilience reset: `src/app/api/resilience/reset` (POST) — reset breakers + cooldowns
208+
- Resilience: `src/app/api/resilience` (GET/PATCH) — request queue, connection cooldown, provider breaker, wait-for-cooldown config
209+
- Resilience reset: `src/app/api/resilience/reset` (POST) — reset provider breakers
210210
- Cache stats: `src/app/api/cache/stats` (GET/DELETE)
211-
- Model availability: `src/app/api/models/availability` (GET/POST)
212211
- Telemetry: `src/app/api/telemetry/summary` (GET)
213212
- Budget: `src/app/api/usage/budget` (GET/POST)
214213
- Fallback chains: `src/app/api/fallback/chains` (GET/POST/DELETE)
@@ -265,7 +264,6 @@ Services (business logic):
265264

266265
Domain layer modules:
267266

268-
- Model availability: `src/lib/domain/modelAvailability.ts`
269267
- Cost rules/budgets: `src/lib/domain/costRules.ts`
270268
- Fallback policy: `src/lib/domain/fallbackPolicy.ts`
271269
- Combo resolver: `src/lib/domain/comboResolver.ts`
@@ -794,7 +792,7 @@ legacy compatibility. The current runtime contract uses:
794792

795793
## 1) Account/Provider Availability
796794

797-
- provider account cooldown on transient/rate/auth errors
795+
- connection cooldown on retryable upstream failures
798796
- account fallback before failing request
799797
- combo model fallback when current model/provider path is exhausted
800798

@@ -876,7 +874,7 @@ Environment variables actively used by code:
876874
5. The `open-sse/` directory is published as the `@omniroute/open-sse` **npm workspace package**. Source code imports it via `@omniroute/open-sse/...` (resolved by Next.js `transpilePackages`). File paths in this document still use the directory name `open-sse/` for consistency.
877875
6. Charts in the dashboard use **Recharts** (SVG-based) for accessible, interactive analytics visualizations (model usage bar charts, provider breakdown tables with success rates).
878876
7. E2E tests use **Playwright** (`tests/e2e/`), run via `npm run test:e2e`. Unit tests use **Node.js test runner** (`tests/unit/`), run via `npm run test:unit`. Source code under `src/` is **TypeScript** (`.ts`/`.tsx`); the `open-sse/` workspace remains JavaScript (`.js`).
879-
8. Settings page is organized into 5 tabs: Security, Routing (6 global strategies: fill-first, round-robin, p2c, random, least-used, cost-optimized), Resilience (editable rate limits, circuit breaker, policies, **Context Relay** handoff config), AI (thinking budget, system prompt, prompt cache), Advanced (proxy).
877+
8. Settings page is organized into 7 tabs: General, Appearance, AI, Security, Routing, Resilience, Advanced. The Resilience page only configures request queue, connection cooldown, provider breaker, and wait-for-cooldown behavior; live breaker runtime state is shown on the Health page.
880878
9. **Context Relay** strategy (`context-relay`) is split across two layers: `combo.ts` decides if a handoff should be generated, `chat.ts` injects the handoff after account resolution. Handoff data lives in `context_handoffs` SQLite table. This split is intentional because only `chat.ts` knows whether the actual account changed.
881879
10. **Proxy enforcement** is now comprehensive: `tokenHealthCheck.ts` resolves proxy per connection, `/api/providers/validate` uses `runWithProxyContext`, and `proxyFetch.ts` uses `undici.fetch()` to maintain dispatcher compatibility on Node 22.
882880
11. **Node.js runtime policy detection**: `/api/settings/require-login` returns `nodeVersion` and `nodeCompatible` fields. The login page renders a warning banner when the runtime falls outside the supported secure Node.js lines.

0 commit comments

Comments
 (0)