server: --n-predict option document and ensure the completion request does not exceed it #5549

phymbert · 2024-02-17T13:49:03Z

Context
server --n-predict option is supported but not documented. When endpoints completion or the oai compatible one are called with n_predict or max_tokens , global configuration is not checked and completion tokens can exceed --n-predict server option.

It may help people to ensure request will never infinite loop for example in #3969.

Proposed changes

Document the --n-predict option in README.md and in the server print usage.
Min the slot n_predict param by the server params n_predict or the user input data.

Open question

@ggerganov Should the server reject the completion requests with 400 when --n-predict > 0 && data['n_predict'] > --n-predict or it can be done later on

phymbert · 2024-02-17T17:29:23Z

examples/server/server.cpp

@@ -545,6 +547,15 @@ struct llama_server_context
        slot->sparams.grammar           = json_value(data, "grammar",           default_sparams.grammar);
        slot->sparams.n_probs           = json_value(data, "n_probs",           default_sparams.n_probs);

+        if (slot->n_predict > 0 && slot->params.n_predict > slot->n_predict) {


@ggerganov Should this check be moved in llama_client_slot::has_budget

ggerganov

Thanks for looking into this.

I think the proposed change is good enough, instead of throwing an error. Though there can be arguments either way.

) * server: document --n-predict * server: ensure client request cannot override n_predict if set * server: fix print usage LF in new --n-predict option

phymbert added 2 commits February 17, 2024 13:18

server: document --n-predict

cf7137e

server: ensure client request cannot override n_predict if set

8852de3

phymbert changed the title ~~server: --n_predict option document and ensure the completion request does not exceed it~~ server: --n-predict option document and ensure the completion request does not exceed it Feb 17, 2024

server: fix print usage LF in new --n-predict option

d112457

phymbert force-pushed the feature/server-global-n-predict branch from b45e111 to d112457 Compare February 17, 2024 13:56

phymbert commented Feb 17, 2024

View reviewed changes

phymbert mentioned this pull request Feb 18, 2024

Infinite loop of "context shift" #3969

Closed

4 tasks

ggerganov approved these changes Feb 18, 2024

View reviewed changes

ggerganov merged commit 36376ab into ggml-org:master Feb 18, 2024

phymbert deleted the feature/server-global-n-predict branch February 23, 2024 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: --n-predict option document and ensure the completion request does not exceed it #5549

server: --n-predict option document and ensure the completion request does not exceed it #5549

phymbert commented Feb 17, 2024

phymbert Feb 17, 2024

ggerganov left a comment

server: --n-predict option document and ensure the completion request does not exceed it #5549

server: --n-predict option document and ensure the completion request does not exceed it #5549

Conversation

phymbert commented Feb 17, 2024

phymbert Feb 17, 2024

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment