Thanks to visit codestin.com
Credit goes to github.com

Skip to content

release: v0.0.18#67

Merged
akhatua2 merged 1 commit into
mainfrom
release/v0.0.18
May 26, 2026
Merged

release: v0.0.18#67
akhatua2 merged 1 commit into
mainfrom
release/v0.0.18

Conversation

@akhatua2
Copy link
Copy Markdown
Collaborator

Summary

Delete cooperbench._proxy and the --openai-base-url / --openai-model flags. vLLM v0.17.1+ exposes /v1/messages natively, so claude-code can be pointed straight at a vLLM endpoint with --base-url — no LiteLLM, no proxy subprocess, no extras.

Why

The proxy was added because we assumed an OpenAI-compatible vLLM needed an Anthropic↔OpenAI translation layer for the claude-code CLI. That was wrong — vLLM has implemented the Anthropic Messages API natively since 0.17.1, at the same /v1/messages path. Removing the LiteLLM proxy also removes a class of bugs from LiteLLM version drift:

  • /v1/responses auto-rewrite on litellm>=1.82 when the inbound request has thinking={"type":"enabled"} (claude-code 2.1.x sends it by default). vLLM 0.19.0 then rejects the body with hundreds of Pydantic validation errors against EasyInputMessageParam / ResponseOutputMessageParam / ResponseFunctionToolCallParam.
  • litellm_params.stream: false honored on openai/ provider but silently ignored on hosted_vllm/, with no way to force non-streaming upstream in newer versions.
  • Intermittent API Error: Content block not found from vLLM's streaming tool-call extractor forwarding content_block_delta events without first emitting the corresponding content_block_start.

What changed

Before (0.0.17) After (0.0.18)
src/cooperbench/_proxy.py 186 lines spawning LiteLLM deleted
cli.py flags --openai-base-url, --openai-model, --base-url, --auth-token just --base-url, --auth-token
Runtime deps requires litellm[proxy] none
User invocation --openai-base-url https://...modal.run/v1 --openai-model Qwen/Qwen3.5-9B --base-url https://...modal.run
Path claude-code → LiteLLM-subprocess → vLLM-OpenAI claude-code → vLLM-Anthropic

Net diff: -251 lines across 5 files.

Verified

On the same anyhow_task/390 [1,2] pair that errored under 0.0.17's --openai-base-url path (litellm 1.83.0 + openai 2.24.0 in CooperData's env):

  • 0 Content block not found
  • 0 212 validation errors
  • 0 /v1/responses references in any trajectory
  • agent1: 29 turns, 34-line Rust macro patch (raw-pointer support for ensure!), Submitted
  • agent2: 126 turns of real work (7 Edits + 98 Greps + 10 Reads + 10 Bash) before hitting Qwen's 128K context ceiling — model-side limit (no context_window cap in profile right now), can address in 0.0.19 by re-adding context_window: 120000 so claude-code auto-compacts before exhausting the upstream.

Test plan

  • uv run ruff check src/cooperbench/
  • uv run ruff format --check src/cooperbench/
  • uv run python -m mypy src/cooperbench/
  • uv run python -m pytest tests/ -v --tb=short (385 passed, 63 skipped)
  • End-to-end coop run via direct vLLM endpoint on CooperData's anyhow_task/390 [1,2]: 0 plumbing errors, real Rust patches produced.

🤖 Generated with Claude Code

Delete cooperbench._proxy and the --openai-base-url / --openai-model
flags.  vLLM v0.17.1+ exposes /v1/messages natively, so claude-code can
be pointed straight at a vLLM endpoint with --base-url — no LiteLLM,
no proxy subprocess, no extras.

Removing the auto-spawned LiteLLM also removes a class of bugs we kept
hitting from LiteLLM version drift:

- /v1/responses auto-rewrite on litellm>=1.82 when the inbound request
  has thinking={"type":"enabled"} (claude-code 2.1.x sends it by
  default).  vLLM 0.19.0 then rejected the body with hundreds of
  Pydantic validation errors against EasyInputMessageParam /
  ResponseOutputMessageParam / ResponseFunctionToolCallParam.
- litellm_params.stream: false being honored on the openai/ provider
  but silently ignored on hosted_vllm/, with no way to force
  non-streaming upstream in newer versions.
- Intermittent "API Error: Content block not found" from vLLM's
  streaming tool_call extractor forwarding content_block_delta events
  without first emitting the corresponding content_block_start.

After deletion: --base-url alone forwards ANTHROPIC_BASE_URL into the
container; the adapter rewrites localhost -> host.docker.internal,
adds --add-host, injects a placeholder auth token, and writes
settings.json with CLAUDE_CODE_ATTRIBUTION_HEADER=0 (KV-cache fix).
Real Anthropic runs (no --base-url) are unaffected.

Validated end-to-end on the same anyhow_task/390 [1,2] pair that
errored in 0.0.17:
- 0 "Content block not found"
- 0 "212 validation errors"
- 0 "/v1/responses" references in any trajectory
- agent1: 29 turns, 34-line Rust macro patch, Submitted
- agent2: 126 turns of real work (7 Edits + 98 Greps + 10 Reads + 10
  Bash) before hitting Qwen's 128K context ceiling (input 126977 +
  4096 output > 131072) — model-side context-management limit, not a
  plumbing bug; can be addressed with context_window add-back to the
  profile in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@akhatua2 akhatua2 merged commit 8780541 into main May 26, 2026
3 checks passed
@akhatua2 akhatua2 deleted the release/v0.0.18 branch May 26, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant