server: --n-predict option document and ensure the completion request does not exceed it #5549
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
server
--n-predict
option is supported but not documented. When endpointscompletion
or the oai compatible one are called withn_predict
ormax_tokens
, global configuration is not checked and completion tokens can exceed--n-predict
server option.It may help people to ensure request will never infinite loop for example in #3969.
Proposed changes
--n-predict
option inREADME.md
and in the server print usage.n_predict
param by the server paramsn_predict
or the user input data.Open question
400
when--n-predict
> 0 &&data['n_predict']
>--n-predict
or it can be done later on