-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
TL;DR
Let anyone plug in their own retriever (including private/proprietary ones) via a simple, secure HTTP contract or lightweight in-repo adapter. No code changes needed to add new retrievers. Works with existing dynamic flow, voting/Elo, and the leaderboard.
Why this matters
- Teams want to compare their private retrievers without exposing IP or forking.
- Maintainers need a safe, scalable way to add lots of retrievers with consistent I/O and limits.
- Unlocks the core value: easy, apples-to-apples RAG evaluations using real user feedback.
What we’re proposing
- Standard plugin contract
- Input: query, topK, metadata
- Output: docs[] with text/score/metadata, optional latency
- Health check endpoint
- Strict schema validation (e.g., Zod)
- External HTTP Retriever adapter
- POST standard input to a user’s HTTPS endpoint; expect standard output
- Timeouts, retries, response-size limits, and per-retriever rate limiting
- Domain allowlist to prevent SSRF; request logging for auditability
- Config-driven registry
- Supabase table for retrievers (id, key, type, config, limits, enabled)
- Seed/migrations included; admin UI to add/edit, health-check, enable/disable
- Dynamic loading so new retrievers appear without code changes
- Seamless scoring/leaderboard
- Keep current voting/Elo; associate votes with retriever ids for clean history
- Great DX
- Example external server (Flask/Worker) + copy-pasteable schema
- Clear docs: how to register, secure, and troubleshoot
How it works (at a glance)
- Admin registers a retriever (built-in or HTTP) in the registry.
- Dynamic route loads retrievers from config.
- Arena calls retrievers in parallel with standard input.
- Results are validated, rate-limited, and logged.
- Users vote; Elo updates by retriever id.
Alternatives considered
- PRs per retriever: doesn’t scale; risks IP leakage.
- Sidecar containers: heavier ops.
- Arbitrary code hooks: security risk.
Impact
- Fast onboarding for proprietary retrievers (minutes, not days).
- Safer, scalable growth for maintainers.
- Richer, more meaningful comparisons for everyone.
Priority
High
Related
- Can I add my retriever api? #10 (generalizes the request to add custom retriever APIs)
Metadata
Metadata
Assignees
Labels
No labels