llm: make token repeat limit configurable and fix done response#14523
Open
4RH1T3CT0R7 wants to merge 1 commit intoollama:mainfrom
Open
llm: make token repeat limit configurable and fix done response#145234RH1T3CT0R7 wants to merge 1 commit intoollama:mainfrom
4RH1T3CT0R7 wants to merge 1 commit intoollama:mainfrom
Conversation
The hardcoded token repeat limit of 30 in llm/server.go caused false positives on valid output (e.g., OCR dots ". . . . . .") and returned an incorrect response without done:true when triggered. Changes: - Add configurable `token_repeat_limit` option (default 0 = disabled) - Add DoneReasonRepeat to properly signal why generation stopped - Send done:true with done_reason:"repeat" instead of returning error - Skip empty/whitespace tokens in repeat tracking to avoid false matches - Map "repeat" to "stop" in OpenAI compatibility layer Fixes ollama#14117
8bae4e1 to
c0f67e6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
token_repeat_limitoption (default: 0 = disabled)done: falseresponse when repeat limit is hit — now properly sendsdone: truewithdone_reason: "repeat""repeat"to"stop"in OpenAI compatibility layer for spec complianceContext
The hardcoded repeat limit of 30 tokens in
llm/server.gocaused false positives on valid model output (e.g., OCR dots. . . . . . .as reported in #14117). When triggered, the function returnedctx.Err()without sending a finaldone: trueresponse, leaving clients in an ambiguous state.Changes
llm/server.goDoneReasonRepeatenum, fix repeat limit check to use configurable option and send proper done response, skip empty tokensapi/types.goTokenRepeatLimitfield toOptionsstructopenai/openai.go"repeat"finish reason to"stop"for OpenAI API compatibilitydocs/modelfile.mdxtoken_repeat_limitparameterdocs/api.mddocs/openapi.yamlUsage
Test plan
token_repeat_limit: 0(default) disables repeat detection — no false positives on repeated dots/patternstoken_repeat_limit: 100stops generation after 100 consecutive identical tokensdone: trueanddone_reason: "repeat"when limit is hit/v1/chat/completionsendpoint returnsfinish_reason: "stop"(not"repeat")Fixes #14117