Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@russellb
Copy link
Member

We maintain chat history and pass it as context to the API. When that
context grew too long, chat would fail out saying an unknown error had
occurred and that it got a BadRequest exception.

The output from lab serve shows that it got a request with
context that was too long.

To resolve this, we just catch this specific error and trim the oldest
message from our history and try again. If we have nothing left to
trim and still get the error, we'll let it pass through and fail as
before.

Closes #528

Signed-off-by: Russell Bryant [email protected]

Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this code change would benefit from #547. At least, it would allow us to write a functional test by tweaking max_ctx_size and verify this code works as intended :)

cli/chat/chat.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a debug log? So we know that something got trimmed at one point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I added it, though lab chat doesn't have an option to easily set the logger level. It's there at least! A debug option could be a later PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering it! Could we log how much we trimmed as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for now, it's the oldest message every time you see the log entry, ha. Need an option to turn on logging, though ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me!

@russellb russellb force-pushed the issue-528 branch 2 times, most recently from a82202e to 178885f Compare March 13, 2024 15:18
Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing PR @russellb. When testing get the following error:

$ lab chat 
╭──────────────────────────────────────────────── system ─────────────────────────────────────────────────╮
│ Welcome to Chat CLI w/ MERLINITE-7B-Q4_K_M (type /h for help)                                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> Tell me about spain in the hot summer days.                                                [S][default]
╭────────────────────────────────────────── merlinite-7b-Q4_K_M ──────────────────────────────────────────╮
│ <|system|><|pad|><|pad|><|pad|><|pad|><|pad|>, or                                                       │
╰───────────────────────────────────────────────────────────────────────────────── elapsed 0.159 seconds ─╯

Started server with small context window for checking:

$ lab serve --max-ctx-size 10

INFO 2024-03-13 15:20:14,618 lab.py:260 Using model 'models/merlinite-7b-Q4_K_M.gguf' with -1 gpu-layers and 10 max context size.
INFO 2024-03-13 15:20:14,840 server.py:89 Starting server process, press CTRL+C to shutdown server...
INFO 2024-03-13 15:20:14,840 server.py:90 After application startup complete see http://localhost:8000/docs for API.
Exception: Requested tokens (59) exceed context window of 10
Traceback (most recent call last):
  File "/Users/mhickey/.pyenv/versions/3.11.7/envs/test-skill/lib/python3.11/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
[...]

It works as expected with default context/big enough context.

@russellb
Copy link
Member Author

@hickeyma That traceback is still expected on the lab serve side. The error addressed in this PR is that lab chat would exit with an error when this occurred. After these changes lab chat handles it gracefully. More work will be needed to clean up the output on the serve side.

We maintain chat history and pass it as context to the API. When that
context grew too long, chat would fail out saying an unknown error had
occurred and that it got a BadRequest exception.

The output from `lab serve` shows that it got a request with
context that was too long.

To resolve this, we just catch this specific error and trim the oldest
message from our history and try again. If we have nothing left to
trim and still get the error, we'll print an error to the console.

Closes #528

Signed-off-by: Russell Bryant <[email protected]>
@russellb
Copy link
Member Author

@hickeyma OK as discussed on slack, I've now addressed the behavior on the lab chat side when the single message you just put in is too big for the context size. It now prints out an error in red:

Message too large for context size.

@russellb russellb requested a review from hickeyma March 13, 2024 17:34
Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @russellb

@hickeyma hickeyma merged commit eb8386e into instructlab:main Mar 13, 2024
@hickeyma hickeyma deleted the issue-528 branch March 13, 2024 18:28
xukai92 added a commit that referenced this pull request Mar 22, 2024
After #559, a new message was
added in the chat when the context size is too small for both prompt and
response.
We now have a test that validates this behavior.

Signed-off-by: Sébastien Han <[email protected]>
Signed-off-by: Kai Xu <[email protected]>
Co-authored-by: Kai Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Executing chat failed with: API issue found while executing chat: Unknown error: <class 'openai.BadRequestError'>

3 participants