-
Notifications
You must be signed in to change notification settings - Fork 450
chat: Catch when context gets too long #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
leseb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this code change would benefit from #547. At least, it would allow us to write a functional test by tweaking max_ctx_size and verify this code works as intended :)
cli/chat/chat.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth a debug log? So we know that something got trimmed at one point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I added it, though lab chat doesn't have an option to easily set the logger level. It's there at least! A debug option could be a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering it! Could we log how much we trimmed as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, for now, it's the oldest message every time you see the log entry, ha. Need an option to turn on logging, though ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me!
a82202e to
178885f
Compare
hickeyma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing PR @russellb. When testing get the following error:
$ lab chat
╭──────────────────────────────────────────────── system ─────────────────────────────────────────────────╮
│ Welcome to Chat CLI w/ MERLINITE-7B-Q4_K_M (type /h for help) │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> Tell me about spain in the hot summer days. [S][default]
╭────────────────────────────────────────── merlinite-7b-Q4_K_M ──────────────────────────────────────────╮
│ <|system|><|pad|><|pad|><|pad|><|pad|><|pad|>, or │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 0.159 seconds ─╯Started server with small context window for checking:
$ lab serve --max-ctx-size 10
INFO 2024-03-13 15:20:14,618 lab.py:260 Using model 'models/merlinite-7b-Q4_K_M.gguf' with -1 gpu-layers and 10 max context size.
INFO 2024-03-13 15:20:14,840 server.py:89 Starting server process, press CTRL+C to shutdown server...
INFO 2024-03-13 15:20:14,840 server.py:90 After application startup complete see http://localhost:8000/docs for API.
Exception: Requested tokens (59) exceed context window of 10
Traceback (most recent call last):
File "/Users/mhickey/.pyenv/versions/3.11.7/envs/test-skill/lib/python3.11/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
response = await original_route_handler(request)
[...]It works as expected with default context/big enough context.
|
@hickeyma That traceback is still expected on the |
We maintain chat history and pass it as context to the API. When that context grew too long, chat would fail out saying an unknown error had occurred and that it got a BadRequest exception. The output from `lab serve` shows that it got a request with context that was too long. To resolve this, we just catch this specific error and trim the oldest message from our history and try again. If we have nothing left to trim and still get the error, we'll print an error to the console. Closes #528 Signed-off-by: Russell Bryant <[email protected]>
|
@hickeyma OK as discussed on slack, I've now addressed the behavior on the
|
hickeyma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @russellb
After #559, a new message was added in the chat when the context size is too small for both prompt and response. We now have a test that validates this behavior. Signed-off-by: Sébastien Han <[email protected]> Signed-off-by: Kai Xu <[email protected]> Co-authored-by: Kai Xu <[email protected]>
We maintain chat history and pass it as context to the API. When that
context grew too long, chat would fail out saying an unknown error had
occurred and that it got a BadRequest exception.
The output from
lab serveshows that it got a request withcontext that was too long.
To resolve this, we just catch this specific error and trim the oldest
message from our history and try again. If we have nothing left to
trim and still get the error, we'll let it pass through and fail as
before.
Closes #528
Signed-off-by: Russell Bryant [email protected]