fix(deepseek): preserve reasoning_content for V4 thinking-mode multi-turn#165
Merged
Kuberwastaken merged 1 commit intoMay 23, 2026
Merged
Conversation
…turn DeepSeek V4 models enable thinking mode by default and stream chain-of-thought as `reasoning_content`. Per the API contract, any assistant turn that produced a tool call must have its `reasoning_content` echoed back on subsequent turns, otherwise the server returns a 400 error. The OpenAI-compatible provider path was not preserving reasoning in multi-turn tool flows, causing all interactions with V4 to fail. ## Changes 1. **openai_compat.rs stream loop**: Now opens a dedicated Thinking block (index usize::MAX - 100) on first reasoning delta and closes it before text/tool_calls/finish_reason. This ensures reasoning deltas are properly accumulated instead of being silently dropped. 2. **openai_compat.rs message building**: Only includes reasoning_content for providers that require it (currently DeepSeek V4). Added `requires_reasoning_roundtrip` quirk flag to gate this inclusion. Prevents wasting tokens on providers that ignore the field (OpenAI, Groq, Azure, etc.). 3. **Provider registration**: DeepSeek provider explicitly sets `requires_reasoning_roundtrip: true`. This implementation is based on PR Kuberwastaken#111 (Kuberwastaken#111) which identified and fixed the issue. Research shows this requirement is unique to DeepSeek V4; no other major LLM provider has this multi-turn reasoning round-trip requirement.
Contributor
Author
|
Fixes #121 |
Owner
|
LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes DeepSeek V4 multi-turn tool-call failures by properly preserving
reasoning_contentthrough the streaming pipeline. DeepSeek V4's thinking mode requires reasoning_content to be echoed back on subsequent turns with tool calls, or the API returns a 400 error.Related
Based on PR #111 which identified the issue and provided the fix approach.
Changes
openai_compat.rs stream loop: Opens a dedicated Thinking block (using reserved index) on first reasoning delta and closes it before text/tool_calls/finish_reason. Ensures reasoning deltas are accumulated instead of silently dropped.
Token optimization: Only includes
reasoning_contentfor providers that require it (currently DeepSeek V4). Addedrequires_reasoning_roundtripquirk flag to gate inclusion, preventing wasted tokens on providers like OpenAI, Groq, Azure that ignore the field.Provider registration: DeepSeek provider sets
requires_reasoning_roundtrip: true.Testing
Notes
Research confirms this requirement is unique to DeepSeek V4; no other major LLM provider has multi-turn reasoning round-trip requirements.