-
Notifications
You must be signed in to change notification settings - Fork 13.6k
server : include usage statistics only when user request them #16052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When serving the OpenAI compatible API, we should check if
{"stream_options": {"include_usage": true} is set in the request when
deciding whether we should send usage statistics
closes: ggml-org#16048
|
Not sure if this change may affect the new web ui, could you check @allozaur ? And btw I think this should be mentioned in server's changelog as it's technically a breaking change. |
I will check it tomorrow |
Sure. It seems we missed that for PR #15444 which introduced this new way of sending usage statistics. |
|
I've tested with just running. build/bin/llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 0I didn't see anything out of ordinary when testing webui on this branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good then, thanks for testing. I added an entry to #9291
…rg#16052) * server : include usage statistics only when user request them When serving the OpenAI compatible API, we should check if {"stream_options": {"include_usage": true} is set in the request when deciding whether we should send usage statistics closes: ggml-org#16048 * add unit test
When serving the OpenAI compatible API, we should check if {"stream_options": {"include_usage": true} is set in the request when deciding whether we should send usage statistics
closes: #16048