release: v0.0.17#66
Merged
Merged
Conversation
Single-fix release: cooperbench._proxy.managed_litellm now starts LiteLLM with a temp YAML config that sets litellm_params.stream: false so the upstream call to vLLM is non-streaming. LiteLLM buffers the full response and re-emits well-formed Anthropic SSE to claude-code. Why: vLLM 0.19.0's qwen3_coder / qwen3_xml streaming tool-call extractors intermittently forward content_block_delta events without first emitting a matching content_block_start for the synthesized tool_use block. claude-code's stream parser then raises "API Error: Content block not found" and the agent loop aborts. Tracking upstream as vllm-project/vllm#39056. Validation on Qwen3.5-9B at 128k (dspy_task subset): - streaming upstream: 4/6 Submitted, 8 occurrences of "Content block not found" - non-streaming upstream: 8/8 Submitted, 0 errors, patches 30-186 lines, up to 35 steps of real multi-turn iteration. Confirmed end-to-end through the auto-proxy with the same flag. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Single-fix release.
cooperbench._proxy.managed_litellmnow forces the upstream call to vLLM to be non-streaming, so LiteLLM buffers the full response and re-emits well-formed Anthropic SSE to claude-code.The bug
vLLM 0.19.0's
qwen3_coderandqwen3_xmlstreaming tool-call extractors intermittently forwardcontent_block_deltaevents without first emitting a matchingcontent_block_startfor the synthesizedtool_useblock. claude-code's stream parser raisesAPI Error: Content block not foundand the agent loop aborts mid-task.Tracking upstream as vllm-project/vllm#39056.
The fix
_proxy.pypreviously spawned LiteLLM with inline--modelflags (no way to overridestream). It now writes a temp YAML config withlitellm_params.stream: falseand starts LiteLLM with--config <path>. Result: upstream calls to vLLM are non-streaming, LiteLLM collects the full response, and downstream streams to claude-code with propercontent_block_start→content_block_delta→content_block_stopordering.Validation (Qwen3.5-9B at 128k, dspy_task subset)
Content block not founderrorsRe-validated end-to-end through the auto-proxy with the new code path:
cooperbench run --openai-base-url ... -m Qwen/Qwen3.5-9B -a claude_code --setting coop -s lite -r dspy_task -t 8587 -f 1,4Test plan
uv run ruff check src/cooperbench/uv run ruff format --check src/cooperbench/uv run python -m mypy src/cooperbench/uv run python -m pytest tests/ -v --tb=short(385 passed, 63 skipped)🤖 Generated with Claude Code