Codestin Search App

akhatua2 · 2026-05-26T01:20:58Z

Summary

Single-fix release. cooperbench._proxy.managed_litellm now forces the upstream call to vLLM to be non-streaming, so LiteLLM buffers the full response and re-emits well-formed Anthropic SSE to claude-code.

The bug

vLLM 0.19.0's qwen3_coder and qwen3_xml streaming tool-call extractors intermittently forward content_block_delta events without first emitting a matching content_block_start for the synthesized tool_use block. claude-code's stream parser raises API Error: Content block not found and the agent loop aborts mid-task.

Tracking upstream as vllm-project/vllm#39056.

The fix

_proxy.py previously spawned LiteLLM with inline --model flags (no way to override stream). It now writes a temp YAML config with litellm_params.stream: false and starts LiteLLM with --config <path>. Result: upstream calls to vLLM are non-streaming, LiteLLM collects the full response, and downstream streams to claude-code with proper content_block_start → content_block_delta → content_block_stop ordering.

Validation (Qwen3.5-9B at 128k, dspy_task subset)

	streaming upstream (0.0.16)	non-streaming upstream (0.0.17)
Agents Submitted	4/6	8/8
Agents Error	2/6	0/8
`Content block not found` errors	8	0
Patch sizes	30, 142, 75, 48 (rest empty)	30, 102, 72, 76, 70, 48, 186, 47
Cross-agent messages	7	13

Re-validated end-to-end through the auto-proxy with the new code path:

cooperbench run --openai-base-url ... -m Qwen/Qwen3.5-9B -a claude_code --setting coop -s lite -r dspy_task -t 8587 -f 1,4
Result: agent1 Submitted (8 steps, 30L patch), agent2 Submitted (26 steps, 105L patch), 2 coop messages, 0 errors.

Test plan

uv run ruff check src/cooperbench/
uv run ruff format --check src/cooperbench/
uv run python -m mypy src/cooperbench/
uv run python -m pytest tests/ -v --tb=short (385 passed, 63 skipped)
End-to-end coop run via auto-proxy: 0 errors, real patches.

🤖 Generated with Claude Code

Single-fix release: cooperbench._proxy.managed_litellm now starts LiteLLM with a temp YAML config that sets litellm_params.stream: false so the upstream call to vLLM is non-streaming. LiteLLM buffers the full response and re-emits well-formed Anthropic SSE to claude-code. Why: vLLM 0.19.0's qwen3_coder / qwen3_xml streaming tool-call extractors intermittently forward content_block_delta events without first emitting a matching content_block_start for the synthesized tool_use block. claude-code's stream parser then raises "API Error: Content block not found" and the agent loop aborts. Tracking upstream as vllm-project/vllm#39056. Validation on Qwen3.5-9B at 128k (dspy_task subset): - streaming upstream: 4/6 Submitted, 8 occurrences of "Content block not found" - non-streaming upstream: 8/8 Submitted, 0 errors, patches 30-186 lines, up to 35 steps of real multi-turn iteration. Confirmed end-to-end through the auto-proxy with the same flag. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

akhatua2 merged commit 086b6bd into main May 26, 2026
3 checks passed

akhatua2 deleted the release/v0.0.17 branch May 26, 2026 01:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.0.17#66

release: v0.0.17#66
akhatua2 merged 1 commit into
mainfrom
release/v0.0.17

akhatua2 commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akhatua2 commented May 26, 2026

Summary

The bug

The fix

Validation (Qwen3.5-9B at 128k, dspy_task subset)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant