Thanks to visit codestin.com
Credit goes to github.com

Skip to content

release: v0.0.17#66

Merged
akhatua2 merged 1 commit into
mainfrom
release/v0.0.17
May 26, 2026
Merged

release: v0.0.17#66
akhatua2 merged 1 commit into
mainfrom
release/v0.0.17

Conversation

@akhatua2
Copy link
Copy Markdown
Collaborator

Summary

Single-fix release. cooperbench._proxy.managed_litellm now forces the upstream call to vLLM to be non-streaming, so LiteLLM buffers the full response and re-emits well-formed Anthropic SSE to claude-code.

The bug

vLLM 0.19.0's qwen3_coder and qwen3_xml streaming tool-call extractors intermittently forward content_block_delta events without first emitting a matching content_block_start for the synthesized tool_use block. claude-code's stream parser raises API Error: Content block not found and the agent loop aborts mid-task.

Tracking upstream as vllm-project/vllm#39056.

The fix

_proxy.py previously spawned LiteLLM with inline --model flags (no way to override stream). It now writes a temp YAML config with litellm_params.stream: false and starts LiteLLM with --config <path>. Result: upstream calls to vLLM are non-streaming, LiteLLM collects the full response, and downstream streams to claude-code with proper content_block_startcontent_block_deltacontent_block_stop ordering.

Validation (Qwen3.5-9B at 128k, dspy_task subset)

streaming upstream (0.0.16) non-streaming upstream (0.0.17)
Agents Submitted 4/6 8/8
Agents Error 2/6 0/8
Content block not found errors 8 0
Patch sizes 30, 142, 75, 48 (rest empty) 30, 102, 72, 76, 70, 48, 186, 47
Cross-agent messages 7 13

Re-validated end-to-end through the auto-proxy with the new code path:

  • cooperbench run --openai-base-url ... -m Qwen/Qwen3.5-9B -a claude_code --setting coop -s lite -r dspy_task -t 8587 -f 1,4
  • Result: agent1 Submitted (8 steps, 30L patch), agent2 Submitted (26 steps, 105L patch), 2 coop messages, 0 errors.

Test plan

  • uv run ruff check src/cooperbench/
  • uv run ruff format --check src/cooperbench/
  • uv run python -m mypy src/cooperbench/
  • uv run python -m pytest tests/ -v --tb=short (385 passed, 63 skipped)
  • End-to-end coop run via auto-proxy: 0 errors, real patches.

🤖 Generated with Claude Code

Single-fix release: cooperbench._proxy.managed_litellm now starts
LiteLLM with a temp YAML config that sets litellm_params.stream: false
so the upstream call to vLLM is non-streaming.  LiteLLM buffers the
full response and re-emits well-formed Anthropic SSE to claude-code.

Why: vLLM 0.19.0's qwen3_coder / qwen3_xml streaming tool-call
extractors intermittently forward content_block_delta events without
first emitting a matching content_block_start for the synthesized
tool_use block.  claude-code's stream parser then raises "API Error:
Content block not found" and the agent loop aborts.  Tracking upstream
as vllm-project/vllm#39056.

Validation on Qwen3.5-9B at 128k (dspy_task subset):
- streaming upstream: 4/6 Submitted, 8 occurrences of "Content block
  not found"
- non-streaming upstream: 8/8 Submitted, 0 errors, patches 30-186
  lines, up to 35 steps of real multi-turn iteration.

Confirmed end-to-end through the auto-proxy with the same flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@akhatua2 akhatua2 merged commit 086b6bd into main May 26, 2026
3 checks passed
@akhatua2 akhatua2 deleted the release/v0.0.17 branch May 26, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant