Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Separate prompt caches for each chat + default cache (as it is now).
Motivation
When switching between chats in the web interface (or via api), the prompt is recalculated each time. Caching does not help in this case because the prompt in each chat is radically different. If when requesting endpoints, you specify a string identifier for which a separate cache would be allocated and used, there would be a good increase in performance, especially in cases where several chats are used and frequent switching between them occurs.
Possible Implementation
Add a string parameter cacheid
to the server endpoints (including openapi). If it is specified, a separate cache allocated under this identifier will be used. If the parameter is empty or is not added to the request, the default cache will be used as it works now.