Thanks to visit codestin.com
Credit goes to github.com

Skip to content

llm: make token repeat limit configurable and fix done response#14523

Open
4RH1T3CT0R7 wants to merge 1 commit intoollama:mainfrom
4RH1T3CT0R7:fix/token-repeat-limit-configurable
Open

llm: make token repeat limit configurable and fix done response#14523
4RH1T3CT0R7 wants to merge 1 commit intoollama:mainfrom
4RH1T3CT0R7:fix/token-repeat-limit-configurable

Conversation

@4RH1T3CT0R7
Copy link

Summary

  • Make the hardcoded token repeat limit (30) configurable via new token_repeat_limit option (default: 0 = disabled)
  • Fix incorrect done: false response when repeat limit is hit — now properly sends done: true with done_reason: "repeat"
  • Skip empty/whitespace-only tokens in repeat tracking to prevent false positives
  • Map "repeat" to "stop" in OpenAI compatibility layer for spec compliance

Context

The hardcoded repeat limit of 30 tokens in llm/server.go caused false positives on valid model output (e.g., OCR dots . . . . . . . as reported in #14117). When triggered, the function returned ctx.Err() without sending a final done: true response, leaving clients in an ambiguous state.

Changes

File Change
llm/server.go Add DoneReasonRepeat enum, fix repeat limit check to use configurable option and send proper done response, skip empty tokens
api/types.go Add TokenRepeatLimit field to Options struct
openai/openai.go Map "repeat" finish reason to "stop" for OpenAI API compatibility
docs/modelfile.mdx Document token_repeat_limit parameter
docs/api.md Add option to API docs example
docs/openapi.yaml Add to OpenAPI specification

Usage

# Via API option
curl http://localhost:11434/api/generate -d '{
  "model": "glm-ocr",
  "prompt": "OCR this image",
  "options": {"token_repeat_limit": 100}
}'

# Via Modelfile
PARAMETER token_repeat_limit 100

Test plan

  • Verify token_repeat_limit: 0 (default) disables repeat detection — no false positives on repeated dots/patterns
  • Verify token_repeat_limit: 100 stops generation after 100 consecutive identical tokens
  • Verify response includes done: true and done_reason: "repeat" when limit is hit
  • Verify OpenAI /v1/chat/completions endpoint returns finish_reason: "stop" (not "repeat")
  • Verify existing tests pass

Fixes #14117

The hardcoded token repeat limit of 30 in llm/server.go caused false
positives on valid output (e.g., OCR dots ". . . . . .") and returned
an incorrect response without done:true when triggered.

Changes:
- Add configurable `token_repeat_limit` option (default 0 = disabled)
- Add DoneReasonRepeat to properly signal why generation stopped
- Send done:true with done_reason:"repeat" instead of returning error
- Skip empty/whitespace tokens in repeat tracking to avoid false matches
- Map "repeat" to "stop" in OpenAI compatibility layer

Fixes ollama#14117
@4RH1T3CT0R7 4RH1T3CT0R7 force-pushed the fix/token-repeat-limit-configurable branch from 8bae4e1 to c0f67e6 Compare March 1, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

glm-ocr: failed to fully read image, stopped with "done": false

1 participant