Codestin Search App

4RH1T3CT0R7 · 2026-03-01T09:27:19Z

Summary

Make the hardcoded token repeat limit (30) configurable via new token_repeat_limit option (default: 0 = disabled)
Fix incorrect done: false response when repeat limit is hit — now properly sends done: true with done_reason: "repeat"
Skip empty/whitespace-only tokens in repeat tracking to prevent false positives
Map "repeat" to "stop" in OpenAI compatibility layer for spec compliance

Context

The hardcoded repeat limit of 30 tokens in llm/server.go caused false positives on valid model output (e.g., OCR dots . . . . . . . as reported in #14117). When triggered, the function returned ctx.Err() without sending a final done: true response, leaving clients in an ambiguous state.

Changes

File	Change
`llm/server.go`	Add `DoneReasonRepeat` enum, fix repeat limit check to use configurable option and send proper done response, skip empty tokens
`api/types.go`	Add `TokenRepeatLimit` field to `Options` struct
`openai/openai.go`	Map `"repeat"` finish reason to `"stop"` for OpenAI API compatibility
`docs/modelfile.mdx`	Document `token_repeat_limit` parameter
`docs/api.md`	Add option to API docs example
`docs/openapi.yaml`	Add to OpenAPI specification

Usage

# Via API option
curl http://localhost:11434/api/generate -d '{
  "model": "glm-ocr",
  "prompt": "OCR this image",
  "options": {"token_repeat_limit": 100}
}'

# Via Modelfile
PARAMETER token_repeat_limit 100

Test plan

Verify token_repeat_limit: 0 (default) disables repeat detection — no false positives on repeated dots/patterns
Verify token_repeat_limit: 100 stops generation after 100 consecutive identical tokens
Verify response includes done: true and done_reason: "repeat" when limit is hit
Verify OpenAI /v1/chat/completions endpoint returns finish_reason: "stop" (not "repeat")
Verify existing tests pass

Fixes #14117

The hardcoded token repeat limit of 30 in llm/server.go caused false positives on valid output (e.g., OCR dots ". . . . . .") and returned an incorrect response without done:true when triggered. Changes: - Add configurable `token_repeat_limit` option (default 0 = disabled) - Add DoneReasonRepeat to properly signal why generation stopped - Send done:true with done_reason:"repeat" instead of returning error - Skip empty/whitespace tokens in repeat tracking to avoid false matches - Map "repeat" to "stop" in OpenAI compatibility layer Fixes ollama#14117

4RH1T3CT0R7 force-pushed the fix/token-repeat-limit-configurable branch from 8bae4e1 to c0f67e6 Compare March 1, 2026 09:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm: make token repeat limit configurable and fix done response#14523

llm: make token repeat limit configurable and fix done response#14523
4RH1T3CT0R7 wants to merge 1 commit intoollama:mainfrom
4RH1T3CT0R7:fix/token-repeat-limit-configurable

4RH1T3CT0R7 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

4RH1T3CT0R7 commented Mar 1, 2026

Summary

Context

Changes

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant