chat: Catch when context gets too long #559

russellb · 2024-03-12T18:49:39Z

We maintain chat history and pass it as context to the API. When that
context grew too long, chat would fail out saying an unknown error had
occurred and that it got a BadRequest exception.

The output from lab serve shows that it got a request with
context that was too long.

To resolve this, we just catch this specific error and trim the oldest
message from our history and try again. If we have nothing left to
trim and still get the error, we'll let it pass through and fail as
before.

Closes #528

Signed-off-by: Russell Bryant [email protected]

leseb

I believe this code change would benefit from #547. At least, it would allow us to write a functional test by tweaking max_ctx_size and verify this code works as intended :)

leseb · 2024-03-12T19:46:11Z

cli/chat/chat.py

Is it worth a debug log? So we know that something got trimmed at one point.

Good suggestion. I added it, though lab chat doesn't have an option to easily set the logger level. It's there at least! A debug option could be a later PR.

Thanks for considering it! Could we log how much we trimmed as well?

Well, for now, it's the oldest message every time you see the log entry, ha. Need an option to turn on logging, though ...

Fine by me!

hickeyma

Thanks for pushing PR @russellb. When testing get the following error:

$ lab chat 
╭──────────────────────────────────────────────── system ─────────────────────────────────────────────────╮
│ Welcome to Chat CLI w/ MERLINITE-7B-Q4_K_M (type /h for help)                                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> Tell me about spain in the hot summer days.                                                [S][default]
╭────────────────────────────────────────── merlinite-7b-Q4_K_M ──────────────────────────────────────────╮
│ <|system|><|pad|><|pad|><|pad|><|pad|><|pad|>, or                                                       │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 0.159 seconds ─╯

Started server with small context window for checking:

$ lab serve --max-ctx-size 10

INFO 2024-03-13 15:20:14,618 lab.py:260 Using model 'models/merlinite-7b-Q4_K_M.gguf' with -1 gpu-layers and 10 max context size.
INFO 2024-03-13 15:20:14,840 server.py:89 Starting server process, press CTRL+C to shutdown server...
INFO 2024-03-13 15:20:14,840 server.py:90 After application startup complete see http://localhost:8000/docs for API.
Exception: Requested tokens (59) exceed context window of 10
Traceback (most recent call last):
  File "/Users/mhickey/.pyenv/versions/3.11.7/envs/test-skill/lib/python3.11/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
[...]

It works as expected with default context/big enough context.

russellb · 2024-03-13T16:26:15Z

@hickeyma That traceback is still expected on the lab serve side. The error addressed in this PR is that lab chat would exit with an error when this occurred. After these changes lab chat handles it gracefully. More work will be needed to clean up the output on the serve side.

We maintain chat history and pass it as context to the API. When that context grew too long, chat would fail out saying an unknown error had occurred and that it got a BadRequest exception. The output from `lab serve` shows that it got a request with context that was too long. To resolve this, we just catch this specific error and trim the oldest message from our history and try again. If we have nothing left to trim and still get the error, we'll print an error to the console. Closes #528 Signed-off-by: Russell Bryant <[email protected]>

russellb · 2024-03-13T17:34:48Z

@hickeyma OK as discussed on slack, I've now addressed the behavior on the lab chat side when the single message you just put in is too big for the context size. It now prints out an error in red:

Message too large for context size.

hickeyma

LGTM, thanks @russellb

After #559, a new message was added in the chat when the context size is too small for both prompt and response. We now have a test that validates this behavior. Signed-off-by: Sébastien Han <[email protected]> Signed-off-by: Kai Xu <[email protected]> Co-authored-by: Kai Xu <[email protected]>

russellb requested review from Tomcli, afrittoli, hickeyma, markstur, mrutkows, soltysh, spzala and xukai92 as code owners March 12, 2024 18:49

russellb mentioned this pull request Mar 12, 2024

Executing chat failed with: API issue found while executing chat: Unknown error: <class 'openai.BadRequestError'> #528

Closed

leseb reviewed Mar 12, 2024

View reviewed changes

russellb force-pushed the issue-528 branch 2 times, most recently from a82202e to 178885f Compare March 13, 2024 15:18

hickeyma suggested changes Mar 13, 2024

View reviewed changes

russellb force-pushed the issue-528 branch from 178885f to 46dba9d Compare March 13, 2024 17:33

russellb force-pushed the issue-528 branch from 46dba9d to 7009fa9 Compare March 13, 2024 17:33

russellb requested a review from hickeyma March 13, 2024 17:34

hickeyma approved these changes Mar 13, 2024

View reviewed changes

hickeyma merged commit eb8386e into instructlab:main Mar 13, 2024

hickeyma deleted the issue-528 branch March 13, 2024 18:28

russellb mentioned this pull request Mar 13, 2024

lab serve max context functional test failure #600

Closed

leseb mentioned this pull request Mar 14, 2024

test: add functionnal test to catch context too large #613

Merged

This was referenced Mar 25, 2024

Adds more stop-tokens to chat formatter #699

Closed

endless model output bug #137

Closed

adding more stop tokens to lab serve #452

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat: Catch when context gets too long #559

chat: Catch when context gets too long #559

Uh oh!

russellb commented Mar 12, 2024

Uh oh!

leseb left a comment

Uh oh!

leseb Mar 12, 2024

Uh oh!

russellb Mar 13, 2024

Uh oh!

leseb Mar 13, 2024

Uh oh!

russellb Mar 13, 2024

Uh oh!

leseb Mar 13, 2024

Uh oh!

hickeyma left a comment

Uh oh!

russellb commented Mar 13, 2024

Uh oh!

russellb commented Mar 13, 2024

Uh oh!

hickeyma left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chat: Catch when context gets too long #559

chat: Catch when context gets too long #559

Uh oh!

Conversation

russellb commented Mar 12, 2024

Uh oh!

leseb left a comment

Choose a reason for hiding this comment

Uh oh!

leseb Mar 12, 2024

Choose a reason for hiding this comment

Uh oh!

russellb Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

leseb Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

russellb Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

leseb Mar 13, 2024

Choose a reason for hiding this comment

Uh oh!

hickeyma left a comment

Choose a reason for hiding this comment

Uh oh!

russellb commented Mar 13, 2024

Uh oh!

russellb commented Mar 13, 2024

Uh oh!

hickeyma left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants