Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ewgenius
Copy link
Contributor

@ewgenius ewgenius commented Sep 28, 2025

📝 Summary

Problem: Some OpenAI models (GPT-5) may trigger reasoning during the spicepod model loader health check. This can lead to increased total token usage, reaching the specified max_tokens limit and causing the model to fail to load, preventing the spicepod from being ready.

Chat health checks were adjusted to apply reasoning_effort: "low" for the gpt-5*, o3*, and o4* models.
max_tokens parameter increased to 300.

Before:

model total_tokens
gpt-5-nano 94
gpt-5-mini 30
gpt-5 94
gpt-4o 22
gpt-4o-mini 22
gpt-4.1-nano 22
gpt-4.1-mini 22
gpt-4.1 22
o1 99
o1-mini 221
o3 39
o3-mini 220
o4-mini 103
CleanShot 2025-09-28 at 14 51 58@2x

After:

model total_tokens note
gpt-5-nano 24
gpt-5-mini 24
gpt-5 24
gpt-4o 16
gpt-4o-mini 16
gpt-4.1-nano 16
gpt-4.1-mini 16
gpt-4.1 16
o1 214 doesn't support reasoning_effort
o1-mini 155 doesn't support reasoning_effort
o3 33
o3-mini 156 doesn't support reasoning_effort
o4-mini 33
CleanShot 2025-09-28 at 17 48 21@2x

🔗 Related

🚨 Breaking Changes

📚 Docs

👀 Notes for Reviewers

@ewgenius ewgenius requested a review from a team as a code owner September 28, 2025 09:01
@ewgenius ewgenius requested review from Copilot and removed request for a team September 28, 2025 09:01
@github-actions
Copy link
Contributor

github-actions bot commented Sep 28, 2025

✅ Pull with Spice Passed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: kind/refactor, kind/bug, kind/enhancement, kind/documentation, kind/optimization, kind/dependencies, kind/endgame
  • ✅ No banned labels detected
  • ✅ Has at least one assignee: ewgenius

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the health check process for OpenAI chat models to reduce token usage for reasoning models. The changes specifically target models that may trigger reasoning during health checks (GPT-5, O3, O4), which could cause excessive token consumption and model loading failures.

  • Added reasoning effort control for reasoning-capable models
  • Increased max_tokens limit from 150 to 300 tokens
  • Enhanced health check logging and response handling

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
crates/llms/src/openai/mod.rs Added supports_reasoning_effort() method to identify models that support reasoning effort parameter
crates/llms/src/openai/chat.rs Updated health check to use low reasoning effort for supported models and increased token limits

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ewgenius ewgenius self-assigned this Sep 28, 2025
@ewgenius ewgenius added the kind/bug Something isn't working label Sep 28, 2025
@ewgenius ewgenius requested review from a team and Jeadie September 28, 2025 09:02
Copilot AI review requested due to automatic review settings September 28, 2025 09:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Jeadie
Jeadie previously approved these changes Sep 28, 2025
Copilot AI review requested due to automatic review settings September 28, 2025 19:07
lukekim
lukekim previously approved these changes Sep 28, 2025
@lukekim lukekim enabled auto-merge September 28, 2025 19:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings September 28, 2025 19:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@lukekim lukekim added this pull request to the merge queue Sep 28, 2025
Merged via the queue into trunk with commit 634eda8 Sep 28, 2025
79 of 80 checks passed
@lukekim lukekim deleted the evgenii/2025-09-25/tweak-openai-healthcheck-prompt branch September 28, 2025 23:55
@phillipleblanc phillipleblanc added this to the v1.7.1 milestone Sep 29, 2025
kczimm pushed a commit that referenced this pull request Sep 29, 2025
…models (#7317)

* Refactor chat model health check to lower tokens usage for reasoning models

* Fix comment

* Update crates/llms/src/openai/chat.rs

* Update crates/llms/src/openai/chat.rs

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Luke Kim <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants