feat(chat): web_search tool via SearXNG-compatible backend with content extraction#72
Open
agpituk wants to merge 2 commits into
Open
feat(chat): web_search tool via SearXNG-compatible backend with content extraction#72agpituk wants to merge 2 commits into
agpituk wants to merge 2 commits into
Conversation
…nt extraction
Adds an opt-in `web_search` tool to /v1/chat/completions. Mirrors the
SandboxBackend pattern: a thin HTTP client to any service exposing a
SearXNG-compatible /search?format=json endpoint. The bundled default is
a SearXNG container under a new `web-search` compose profile.
Request shape matches OpenAI's server-managed tool plus Anthropic's
versioned variants (matched by prefix, so future dated versions keep
working):
{"tools": [{"type": "web_search"}]} # gateway-native / OpenAI
{"tools": [{"type": "web_search_20250305"}]} # Anthropic
Top results are fetched and run through trafilatura to produce LLM-ready
Markdown. Backends that pre-extract content can pass it through the
optional `extracted_content` response field to bypass the gateway-side
trafilatura step.
Security:
- New `validate_outbound_fetch_url()` blocks private / loopback /
link-local / reserved IPs (async DNS so the event loop isn't blocked
under fan-out).
- Manual bounded redirect walk re-validates every hop, preventing the
classic 302-to-cloud-metadata SSRF bypass.
- 5 MB per-page byte cap; semaphore-bounded concurrent fetches.
Engine defaults: duckduckgo, mojeek, qwant, wikipedia. Google, Bing,
Yahoo and Brave are explicitly disabled in scripts/searxng/settings.yml;
operators who enable them should review the upstream Terms of Service.
For commercial or production use, swap the bundled SearXNG container
for any service that wraps a licensed API (Tavily, Brave Search API,
Exa, Linkup, Serper) using the same wire protocol — no gateway code
change needed.
Mutually exclusive with code_execution and mcp_servers for now;
multi-attempt routing-policy fallback collapses to the primary attempt
when web_search is active, matching the sandbox path.
Demo at demo/web-search/ mirrors demo/code-exec/, with multi-provider
walkthrough (anthropic/openai/llamafile) and an architecture diagram.
Fixes a few pre-existing papercuts in demo/code-exec/ along the way:
hardcoded container names broke after the repo dir rename, stop.sh
required .env even for tear-down, and demo user creation was missing.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
6 tasks
There was a problem hiding this comment.
Pull request overview
Adds an opt-in, gateway-managed web_search tool for /v1/chat/completions, backed by a SearXNG-compatible /search?format=json service and optional per-result content extraction (via trafilatura). This extends the existing “managed tools” approach (similar to the sandbox/code-exec path) so any routed model can use web search without client-side tool execution.
Changes:
- Introduces
WebSearchBackend(search + optional fetch/extract + SSRF protections) and wires it into the chat tool loop dispatch path (streaming + non-streaming). - Adds outbound-fetch URL safety checks (
validate_outbound_fetch_url) with async DNS resolution and redirect re-validation to mitigate SSRF. - Adds unit tests, docs, docker-compose profile + demo scripts for bringing up a bundled SearXNG backend.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/gateway/services/web_search_backend.py |
Implements the web_search managed-tool backend, including fetch/extract and redirect handling. |
src/gateway/services/url_safety.py |
Adds async outbound-fetch URL validation for SSRF defense in web-search fetches. |
src/gateway/api/routes/chat.py |
Detects web_search tool entries, enforces mutual exclusivity constraints, and dispatches via the existing tool loop. |
tests/unit/test_web_search_backend.py |
New unit tests covering formatting, extraction behavior, SSRF protections, redirects, and byte cap behavior. |
tests/unit/test_chat_request_helpers.py |
Adds coverage for extracting web_search tool entries from request shapes. |
pyproject.toml |
Adds trafilatura dependency and mypy override for trafilatura.*. |
uv.lock |
Updates lockfile for new dependency tree. |
docker-compose.yml |
Adds searxng service under web-search profile and gateway env knobs for web search. |
scripts/searxng/settings.yml |
Adds a conservative default SearXNG config with explicit engine allow/deny. |
README.md |
Documents built-in tools (code_execution, web_search) and how to enable them. |
demo/web-search/* |
Adds a full web-search demo workflow (start/stop/ask/demo flow) mirroring code-exec demo patterns. |
demo/code-exec/* |
Demo robustness fixes (conditional .env usage, container lookup by compose service, etc.). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three issues raised by the Copilot reviewer: * `_fetch_capped` could overshoot `_FETCH_MAX_BYTES`: the chunk crossing the threshold was appended in full, and `b"".join(...)` briefly held two copies during the final allocation. Under fetch concurrency that put peak memory at ~2x the cap. Switch to a `bytearray` and truncate the overshooting chunk to the remaining budget; decode once from the capped buffer. * `WebSearchBackend.__init__` now clamps `max_results` to `[1, cap]` so a misconfigured `GATEWAY_WEB_SEARCH_MAX_RESULTS=0` / `-1` can't reach `results[: 0 / -1]` and produce silently-wrong slicing. Env-path in `_build_web_search_backend` also rejects sub-1 values with a warning before clamp. * `SEARXNG_BASE_URL` in docker-compose now matches the published host port (`http://localhost:8181/`) so ad-hoc curl / browser access lines up with the file's own comment about the host mapping being for debugging. Tests added: buffer-never-exceeds-cap (with `_FETCH_MAX_BYTES` monkeypatched to 1 KB so it's enforceable), and three parametrized clamp cases for `max_results` plus a regression guard for the existing above-cap clamp. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in
web_searchtool to/v1/chat/completions, dispatched server-side by the gateway so any model — including open-weight ones — gets parity with what frontier APIs expose as a managed search tool. Mirrors theSandboxBackendpattern from #65: a thin HTTP client to any service exposing a SearXNG-compatible/search?format=jsonendpoint. The bundled default is a SearXNG container under a newweb-searchcompose profile (opt-in viadocker compose --profile web-search up, same shape ascode-exec).What's in the request body
{ "tools": [{"type": "web_search"}] } // gateway-native / OpenAI server-managed shape { "tools": [{"type": "web_search_20250305"}] } // Anthropic, versioned (matched by prefix)Optional per-tool overrides on the tool entry:
max_resultsallowed_domains/blocked_domainspurpose_hintGATEWAY_WEB_SEARCH_PURPOSE_HINTenv, which overrides the built-in default)Operator-controlled env knobs:
GATEWAY_WEB_SEARCH_URL(required),GATEWAY_WEB_SEARCH_ENGINES,GATEWAY_WEB_SEARCH_MAX_RESULTS,GATEWAY_WEB_SEARCH_EXTRACT,GATEWAY_WEB_SEARCH_PURPOSE_HINT.Architecture
`WebSearchBackend` duck-types as the MCP tool-use loop's pool (`openai_tools` / `owns_tool` / `purpose_hints` / `call_tool`), so the existing loop in `mcp_loop.py` accepts it as a drop-in with no refactor. Plugs in at the same three dispatch sites as sandbox/MCP (streaming standalone, platform non-streaming, standalone non-streaming).
Top results are fetched and run through `trafilatura` to produce LLM-ready Markdown for the model. Backends that pre-extract content (commercial-API adapters etc.) can pass it through the optional `extracted_content` response field to bypass the gateway-side extraction.
Security
Test coverage: 19 unit tests including SSRF (cloud metadata / RFC1918 / loopback / DNS rebinding / redirect chain), the env-var override path, and the byte cap.
Engine policy
Bundled SearXNG defaults are pinned to engines that don't formally prohibit metasearch: duckduckgo, mojeek, qwant, wikipedia. Google, Bing, Yahoo and Brave are explicitly disabled in `scripts/searxng/settings.yml`, with a "review ToS first" note for operators who enable them.
For commercial or production use, swap the bundled SearXNG container for any service that wraps a licensed API (Tavily, Brave Search API, Exa, Linkup, Serper) using the same SearXNG-compatible wire protocol — `GATEWAY_WEB_SEARCH_URL` is the only thing that changes. No gateway code change needed.
Constraints (v1)
Mutually exclusive with `code_execution` and `mcp_servers` for now (same constraint code-exec has in #65); multi-attempt routing-policy fallback collapses to the primary attempt when `web_search` is active, matching the sandbox path.
Demo
`demo/web-search/` mirrors `demo/code-exec/` — multi-provider walkthrough (anthropic/openai/llamafile), architecture diagram, request-shape comparison, and a "what the LLM actually receives" inspection of the injected system message. Also fixes a few pre-existing papercuts in `demo/code-exec/` along the way: hardcoded container names broke after the repo dir rename, `stop.sh` required `.env` even for tear-down, and demo-user creation was missing.
Deferred follow-ups
Test plan
🤖 Generated with Claude Code