Pass --keep to llama-server #14120

MightyAlex200 · 2025-06-11T08:35:05Z

This is just a super simple 2 line change to llama-server that passes the value from the --keep command line flag to the server slots as keep_n. This allows llama-server to implement StreamingLLM properly, increasing model coherency on long contexts when context shift is enabled. All proper handling for this was already in the code, merely unused, so I think this should be considered a bug fix.

Pass --keep to llama-server

dc2e892

MightyAlex200 requested a review from ngxson as a code owner June 11, 2025 08:35

github-actions bot added examples server labels Jun 11, 2025

ggerganov approved these changes Jun 11, 2025

View reviewed changes

ggerganov merged commit 2baf077 into ggml-org:master Jun 11, 2025
45 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass --keep to llama-server #14120

Pass --keep to llama-server #14120

Uh oh!

MightyAlex200 commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Pass --keep to llama-server #14120

Pass --keep to llama-server #14120

Uh oh!

Conversation

MightyAlex200 commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!