Version: 0.7.5
Describe the Bug
When a thinking model's generation is interrupted (Qwen 3 8B used), the next response does not generate at times or the thought window does not show up. The response speed gets very slow and breaks the context management if the context memory is nearly full in this case.
Steps to Reproduce
- Ask a thinking model a few questions.
- Pause the generation while a question is being answered.
- Ask the thinking model another question. The model then breaks, or delays response.
- Repeat this with nearly full context memory.
Screenshots / Logs
Operating System