Add stream_options support according to OpenAI API#1552
Add stream_options support according to OpenAI API#1552tpfau wants to merge 3 commits intoabetlen:mainfrom
Conversation
|
@tpfau thank you for starting on this, I'll review in more depth but my initial request would be that instead of |
|
@abetlen I hope this change is what you had in mind. |
|
I would really like to see this in production, as I think this feature can be quite important for a lot of applications that need some kind of usage control or just let the users know about their usage. Of course people can work around this kind of approach but it requires additional code in the code base that does tokenization and thus computation which is completely unnecessary as it is already calculated by the model. |
|
@abetlen Any news on this? |
771721e to
dcca56a
Compare
|
Updated again to fix conflicts. Would be great to see this feature in the codebase. If there is anything missing, please let me know |
…g token generation into it's own function to avoid replicated statements
|
Did some tests again, seems to be working. Please let me know if you have any updates which you want to have in. |
| if stream: | ||
| if stream_options is not None and "include_usage" in stream_options: | ||
| include_usage = True if stream_options["include_usage"] else False | ||
| else: | ||
| include_usage = False |
There was a problem hiding this comment.
Instead of the double nested if block, presumably you could do: if stream and stream_options and "include_usage" in stream_options:? @tpfau
There was a problem hiding this comment.
No you can't since all the following code has to be run in both instances, and only the include stream usage needs to be adapted based on the props.
There was a problem hiding this comment.
Ahh, of course. I guess I was meant something like this:
if stream_options and "include_usage" in stream_options:
include_usage = stream_options["include_usage"]
else:
include_usage = FalseIt relies on a None stream_options being falsey, and stream_options.include_usage being a boolean.
Just a little more readable.
llama_cpp/server/app.py
Outdated
|
|
||
| stream_options: Optional[llama_cpp.StreamOptions] = Field( | ||
| default=None, | ||
| description="Options for streaming response. Only set this when you set stream: true.", | ||
| ) |
There was a problem hiding this comment.
Presumably this is where you would make use of the description you have in include_usage_field somehow?
I wonder if FastAPI dependency injection could help with this, or if there is an example pattern in the server code for providing property help descriptions that is better followed instead. I've not used FastAPI zealously enough to know yet.
@tpfau
There was a problem hiding this comment.
I'm not sure how to do this neither. The problem I have is that llama_types defines TypedDicts which don't have descriptions like BaseModels. So if anyone got a good suggestion I'm happy to update this.
Signed-off-by: Jeff MAURY <[email protected]>
This PR adds the
stream_optionsandinclude_usagefields as defined in the OpenAI API reference to the server.Set to false by default they can be activated which leads to the last chunk being returned during streaming to supply the usage information.
This solves the Question raised in #1461 and the request indicated in #1498.
I decided to put the chunk generation code into its own function since not doing so would have lead to blown up code.
I did not update the places where
streamis clearly set tofalse.I'm happy to update the PR with any necessary/requested modifications wrt code style or if you think all chunk generation places should be changed to use functions (or add more options/update the chunk generation function). Just let me know.
Closes #1498