Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: Add streaming tool-call parse buffer limit to prevent excessive memory usage#8811

Open
pskiran1 wants to merge 5 commits into
mainfrom
spolisetty/tri-1016-psirt-triton-openai-frontend-auto-toolpparsing-can-oom-kill
Open

feat: Add streaming tool-call parse buffer limit to prevent excessive memory usage#8811
pskiran1 wants to merge 5 commits into
mainfrom
spolisetty/tri-1016-psirt-triton-openai-frontend-auto-toolpparsing-can-oom-kill

Conversation

@pskiran1
Copy link
Copy Markdown
Member

@pskiran1 pskiran1 commented May 31, 2026

What does the PR do?

The streaming tool-call parser (partial_json_parser.loads()) re-parses the full accumulated output on every chunk, resulting in excessive CPU and memory growth for large tool-call arguments. This PR adds a configurable per-request buffer cap --max-tool-call-parse-bytes that truncates the stream gracefully when exceeded.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: 53226753

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 added the PR: feat A new feature label Jun 1, 2026
Copy link
Copy Markdown
Contributor

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I did have a couple of non-blocking questions.

Would be good if we could get @yinggeh to review this as well, but please merge by EoD Friday even if he's not able to get a review completed by then.

self.chat_template = load_chat_template(chat_template)

if self.tool_call_parser is not None:
print(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, why print("[INFO] ...") and use logger.info()?

and len(previous_text) + len(delta_text)
> self.max_tool_call_parse_bytes
):
print(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better as logger.warning()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: feat A new feature

Development

Successfully merging this pull request may close these issues.

2 participants