Codestin Search App

akhatua2 · 2026-05-26T02:33:41Z

Summary

Delete cooperbench._proxy and the --openai-base-url / --openai-model flags. vLLM v0.17.1+ exposes /v1/messages natively, so claude-code can be pointed straight at a vLLM endpoint with --base-url — no LiteLLM, no proxy subprocess, no extras.

Why

The proxy was added because we assumed an OpenAI-compatible vLLM needed an Anthropic↔OpenAI translation layer for the claude-code CLI. That was wrong — vLLM has implemented the Anthropic Messages API natively since 0.17.1, at the same /v1/messages path. Removing the LiteLLM proxy also removes a class of bugs from LiteLLM version drift:

/v1/responses auto-rewrite on litellm>=1.82 when the inbound request has thinking={"type":"enabled"} (claude-code 2.1.x sends it by default). vLLM 0.19.0 then rejects the body with hundreds of Pydantic validation errors against EasyInputMessageParam / ResponseOutputMessageParam / ResponseFunctionToolCallParam.
litellm_params.stream: false honored on openai/ provider but silently ignored on hosted_vllm/, with no way to force non-streaming upstream in newer versions.
Intermittent API Error: Content block not found from vLLM's streaming tool-call extractor forwarding content_block_delta events without first emitting the corresponding content_block_start.

What changed

	Before (0.0.17)	After (0.0.18)
`src/cooperbench/_proxy.py`	186 lines spawning LiteLLM	deleted
`cli.py` flags	`--openai-base-url`, `--openai-model`, `--base-url`, `--auth-token`	just `--base-url`, `--auth-token`
Runtime deps	requires `litellm[proxy]`	none
User invocation	`--openai-base-url https://...modal.run/v1 --openai-model Qwen/Qwen3.5-9B`	`--base-url https://...modal.run`
Path	claude-code → LiteLLM-subprocess → vLLM-OpenAI	claude-code → vLLM-Anthropic

Net diff: -251 lines across 5 files.

Verified

On the same anyhow_task/390 [1,2] pair that errored under 0.0.17's --openai-base-url path (litellm 1.83.0 + openai 2.24.0 in CooperData's env):

0 Content block not found
0 212 validation errors
0 /v1/responses references in any trajectory
agent1: 29 turns, 34-line Rust macro patch (raw-pointer support for ensure!), Submitted
agent2: 126 turns of real work (7 Edits + 98 Greps + 10 Reads + 10 Bash) before hitting Qwen's 128K context ceiling — model-side limit (no context_window cap in profile right now), can address in 0.0.19 by re-adding context_window: 120000 so claude-code auto-compacts before exhausting the upstream.

Test plan

uv run ruff check src/cooperbench/
uv run ruff format --check src/cooperbench/
uv run python -m mypy src/cooperbench/
uv run python -m pytest tests/ -v --tb=short (385 passed, 63 skipped)
End-to-end coop run via direct vLLM endpoint on CooperData's anyhow_task/390 [1,2]: 0 plumbing errors, real Rust patches produced.

🤖 Generated with Claude Code

Delete cooperbench._proxy and the --openai-base-url / --openai-model flags. vLLM v0.17.1+ exposes /v1/messages natively, so claude-code can be pointed straight at a vLLM endpoint with --base-url — no LiteLLM, no proxy subprocess, no extras. Removing the auto-spawned LiteLLM also removes a class of bugs we kept hitting from LiteLLM version drift: - /v1/responses auto-rewrite on litellm>=1.82 when the inbound request has thinking={"type":"enabled"} (claude-code 2.1.x sends it by default). vLLM 0.19.0 then rejected the body with hundreds of Pydantic validation errors against EasyInputMessageParam / ResponseOutputMessageParam / ResponseFunctionToolCallParam. - litellm_params.stream: false being honored on the openai/ provider but silently ignored on hosted_vllm/, with no way to force non-streaming upstream in newer versions. - Intermittent "API Error: Content block not found" from vLLM's streaming tool_call extractor forwarding content_block_delta events without first emitting the corresponding content_block_start. After deletion: --base-url alone forwards ANTHROPIC_BASE_URL into the container; the adapter rewrites localhost -> host.docker.internal, adds --add-host, injects a placeholder auth token, and writes settings.json with CLAUDE_CODE_ATTRIBUTION_HEADER=0 (KV-cache fix). Real Anthropic runs (no --base-url) are unaffected. Validated end-to-end on the same anyhow_task/390 [1,2] pair that errored in 0.0.17: - 0 "Content block not found" - 0 "212 validation errors" - 0 "/v1/responses" references in any trajectory - agent1: 29 turns, 34-line Rust macro patch, Submitted - agent2: 126 turns of real work (7 Edits + 98 Greps + 10 Reads + 10 Bash) before hitting Qwen's 128K context ceiling (input 126977 + 4096 output > 131072) — model-side context-management limit, not a plumbing bug; can be addressed with context_window add-back to the profile in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

akhatua2 merged commit 8780541 into main May 26, 2026
3 checks passed

akhatua2 deleted the release/v0.0.18 branch May 26, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.0.18#67

release: v0.0.18#67
akhatua2 merged 1 commit into
mainfrom
release/v0.0.18

akhatua2 commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akhatua2 commented May 26, 2026

Summary

Why

What changed

Verified

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant