release: v0.0.18#67
Merged
Merged
Conversation
Delete cooperbench._proxy and the --openai-base-url / --openai-model
flags. vLLM v0.17.1+ exposes /v1/messages natively, so claude-code can
be pointed straight at a vLLM endpoint with --base-url — no LiteLLM,
no proxy subprocess, no extras.
Removing the auto-spawned LiteLLM also removes a class of bugs we kept
hitting from LiteLLM version drift:
- /v1/responses auto-rewrite on litellm>=1.82 when the inbound request
has thinking={"type":"enabled"} (claude-code 2.1.x sends it by
default). vLLM 0.19.0 then rejected the body with hundreds of
Pydantic validation errors against EasyInputMessageParam /
ResponseOutputMessageParam / ResponseFunctionToolCallParam.
- litellm_params.stream: false being honored on the openai/ provider
but silently ignored on hosted_vllm/, with no way to force
non-streaming upstream in newer versions.
- Intermittent "API Error: Content block not found" from vLLM's
streaming tool_call extractor forwarding content_block_delta events
without first emitting the corresponding content_block_start.
After deletion: --base-url alone forwards ANTHROPIC_BASE_URL into the
container; the adapter rewrites localhost -> host.docker.internal,
adds --add-host, injects a placeholder auth token, and writes
settings.json with CLAUDE_CODE_ATTRIBUTION_HEADER=0 (KV-cache fix).
Real Anthropic runs (no --base-url) are unaffected.
Validated end-to-end on the same anyhow_task/390 [1,2] pair that
errored in 0.0.17:
- 0 "Content block not found"
- 0 "212 validation errors"
- 0 "/v1/responses" references in any trajectory
- agent1: 29 turns, 34-line Rust macro patch, Submitted
- agent2: 126 turns of real work (7 Edits + 98 Greps + 10 Reads + 10
Bash) before hitting Qwen's 128K context ceiling (input 126977 +
4096 output > 131072) — model-side context-management limit, not a
plumbing bug; can be addressed with context_window add-back to the
profile in a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Delete
cooperbench._proxyand the--openai-base-url/--openai-modelflags. vLLM v0.17.1+ exposes/v1/messagesnatively, soclaude-codecan be pointed straight at a vLLM endpoint with--base-url— no LiteLLM, no proxy subprocess, no extras.Why
The proxy was added because we assumed an OpenAI-compatible vLLM needed an Anthropic↔OpenAI translation layer for the claude-code CLI. That was wrong — vLLM has implemented the Anthropic Messages API natively since 0.17.1, at the same
/v1/messagespath. Removing the LiteLLM proxy also removes a class of bugs from LiteLLM version drift:/v1/responsesauto-rewrite onlitellm>=1.82when the inbound request hasthinking={"type":"enabled"}(claude-code 2.1.x sends it by default). vLLM 0.19.0 then rejects the body with hundreds of Pydantic validation errors againstEasyInputMessageParam/ResponseOutputMessageParam/ResponseFunctionToolCallParam.litellm_params.stream: falsehonored onopenai/provider but silently ignored onhosted_vllm/, with no way to force non-streaming upstream in newer versions.API Error: Content block not foundfrom vLLM's streaming tool-call extractor forwardingcontent_block_deltaevents without first emitting the correspondingcontent_block_start.What changed
src/cooperbench/_proxy.pycli.pyflags--openai-base-url,--openai-model,--base-url,--auth-token--base-url,--auth-tokenlitellm[proxy]--openai-base-url https://...modal.run/v1 --openai-model Qwen/Qwen3.5-9B--base-url https://...modal.runNet diff: -251 lines across 5 files.
Verified
On the same
anyhow_task/390 [1,2]pair that errored under 0.0.17's--openai-base-urlpath (litellm 1.83.0 + openai 2.24.0 in CooperData's env):Content block not found212 validation errors/v1/responsesreferences in any trajectoryensure!), Submittedcontext_windowcap in profile right now), can address in 0.0.19 by re-addingcontext_window: 120000so claude-code auto-compacts before exhausting the upstream.Test plan
uv run ruff check src/cooperbench/uv run ruff format --check src/cooperbench/uv run python -m mypy src/cooperbench/uv run python -m pytest tests/ -v --tb=short(385 passed, 63 skipped)anyhow_task/390 [1,2]: 0 plumbing errors, real Rust patches produced.🤖 Generated with Claude Code