Codestin Search App

tpfau · 2024-06-25T06:42:48Z

This PR adds the stream_options and include_usage fields as defined in the OpenAI API reference to the server.
Set to false by default they can be activated which leads to the last chunk being returned during streaming to supply the usage information.
This solves the Question raised in #1461 and the request indicated in #1498.
I decided to put the chunk generation code into its own function since not doing so would have lead to blown up code.
I did not update the places where stream is clearly set to false.

I'm happy to update the PR with any necessary/requested modifications wrt code style or if you think all chunk generation places should be changed to use functions (or add more options/update the chunk generation function). Just let me know.

Closes #1498

abetlen · 2024-07-02T14:47:39Z

@tpfau thank you for starting on this, I'll review in more depth but my initial request would be that instead of stream_include_usage we just add stream_options directly to the methods in the Llama class as they should also mirror the OpenAI api.

tpfau · 2024-07-04T07:33:01Z

@abetlen I hope this change is what you had in mind.
I'm admittedly unsure how I can properly add the description of the include_usage field description to the API (I tried to stick to how it was done with the completion requests specs for now).
I also wanted to avoid having to add the stream option check in every single function so I kept it in the "main" functionalities and left the addition of the stream_include_usage parameter in the conversion functions.

tpfau · 2024-08-05T06:52:46Z

I would really like to see this in production, as I think this feature can be quite important for a lot of applications that need some kind of usage control or just let the users know about their usage. Of course people can work around this kind of approach but it requires additional code in the code base that does tokenization and thus computation which is completely unnecessary as it is already calculated by the model.

tpfau · 2024-08-28T10:54:20Z

@abetlen Any news on this?

tpfau · 2024-09-30T07:59:07Z

Updated again to fix conflicts.

Would be great to see this feature in the codebase. If there is anything missing, please let me know

…g token generation into it's own function to avoid replicated statements

tpfau · 2024-10-01T08:26:03Z

Did some tests again, seems to be working. Please let me know if you have any updates which you want to have in.

lukestanley · 2024-10-14T13:22:22Z

llama_cpp/llama.py

            if stream:
+                if stream_options is not None and "include_usage" in stream_options:
+                    include_usage = True if stream_options["include_usage"] else False
+                else:
+                    include_usage = False    


Instead of the double nested if block, presumably you could do: if stream and stream_options and "include_usage" in stream_options:? @tpfau

No you can't since all the following code has to be run in both instances, and only the include stream usage needs to be adapted based on the props.

Ahh, of course. I guess I was meant something like this:

if stream_options and "include_usage" in stream_options: include_usage = stream_options["include_usage"] else: include_usage = False

It relies on a None stream_options being falsey, and stream_options.include_usage being a boolean.
Just a little more readable.

lukestanley · 2024-10-14T13:37:00Z

llama_cpp/server/app.py

-
+

Does Black or Ruff need to be ran? @tpfau

lukestanley · 2024-10-14T16:07:49Z

llama_cpp/server/types.py

+    stream_options: Optional[llama_cpp.StreamOptions] = Field(
+        default=None,
+        description="Options for streaming response. Only set this when you set stream: true.",
+    )


Presumably this is where you would make use of the description you have in include_usage_field somehow?
I wonder if FastAPI dependency injection could help with this, or if there is an example pattern in the server code for providing property help descriptions that is better followed instead. I've not used FastAPI zealously enough to know yet.
@tpfau

I'm not sure how to do this neither. The problem I have is that llama_types defines TypedDicts which don't have descriptions like BaseModels. So if anyone got a good suggestion I'm happy to update this.

Signed-off-by: Jeff MAURY <[email protected]>

tpfau changed the title ~~This commit adds streaming options~~ Add stream_options support according to OpenAI API Jun 25, 2024

jeffmaury mentioned this pull request Aug 1, 2024

Expose model server metrics in model playground containers/podman-desktop-extension-ai-lab#438

Closed

jeffmaury mentioned this pull request Sep 16, 2024

Expose metrics in the inference server API containers/podman-desktop-extension-ai-lab#1730

Closed

tpfau force-pushed the stream_testing branch from 771721e to dcca56a Compare September 30, 2024 07:57

Adding stream_options and include_usage to server. Extracting Updatin…

aa397ee

…g token generation into it's own function to avoid replicated statements

tpfau force-pushed the stream_testing branch from 3e737a3 to aa397ee Compare October 1, 2024 08:25

lukestanley reviewed Oct 14, 2024

View reviewed changes

tpfau added 2 commits October 15, 2024 08:41

fix linting

cbaaea3

Clearer if clauses

ffc47e5

jeffmaury added a commit to jeffmaury/llama-cpp-python that referenced this pull request Nov 21, 2024

fix: apply abetlen#1552

3670f1e

Signed-off-by: Jeff MAURY <[email protected]>

axel7083 mentioned this pull request Nov 27, 2024

fix: update image to support usage info containers/podman-desktop-extension-ai-lab-playground-images#81

Merged

hh23485 mentioned this pull request Mar 27, 2025

Stream last block not return token usage info when create_chat_completion_openai_v1 or create_chat_completion but server does #1984

Open

4 tasks

Conversation

tpfau commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abetlen commented Jul 2, 2024

Uh oh!

tpfau commented Jul 4, 2024

Uh oh!

tpfau commented Aug 5, 2024

Uh oh!

tpfau commented Aug 28, 2024

Uh oh!

tpfau commented Sep 30, 2024

Uh oh!

tpfau commented Oct 1, 2024

Uh oh!

lukestanley Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

tpfau Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

lukestanley Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

lukestanley Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

tpfau Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

lukestanley Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

tpfau Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tpfau commented Jun 25, 2024 •

edited

Loading