-
Notifications
You must be signed in to change notification settings - Fork 150
Refactor chat model health check to lower tokens usage for reasoning models #7317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor chat model health check to lower tokens usage for reasoning models #7317
Conversation
✅ Pull with Spice PassedPassing checks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the health check process for OpenAI chat models to reduce token usage for reasoning models. The changes specifically target models that may trigger reasoning during health checks (GPT-5, O3, O4), which could cause excessive token consumption and model loading failures.
- Added reasoning effort control for reasoning-capable models
- Increased max_tokens limit from 150 to 300 tokens
- Enhanced health check logging and response handling
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/llms/src/openai/mod.rs | Added supports_reasoning_effort() method to identify models that support reasoning effort parameter |
| crates/llms/src/openai/chat.rs | Updated health check to use low reasoning effort for supported models and increased token limits |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…models (#7317) * Refactor chat model health check to lower tokens usage for reasoning models * Fix comment * Update crates/llms/src/openai/chat.rs * Update crates/llms/src/openai/chat.rs Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Luke Kim <[email protected]> Co-authored-by: Copilot <[email protected]>
📝 Summary
Problem: Some OpenAI models (GPT-5) may trigger reasoning during the spicepod model loader health check. This can lead to increased total token usage, reaching the specified
max_tokenslimit and causing the model to fail to load, preventing the spicepod from being ready.Chat health checks were adjusted to apply
reasoning_effort: "low"for thegpt-5*,o3*, ando4*models.max_tokensparameter increased to 300.Before:
After:
reasoning_effortreasoning_effortreasoning_effort🔗 Related
🚨 Breaking Changes
📚 Docs
👀 Notes for Reviewers