Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

[Text Generation] Optimize the slow update method in the KVCacheDecoder#1190

Merged
dbogunowicz merged 5 commits into
mainfrom
feature/damian/optimize_decoder
Aug 24, 2023
Merged

[Text Generation] Optimize the slow update method in the KVCacheDecoder#1190
dbogunowicz merged 5 commits into
mainfrom
feature/damian/optimize_decoder

Conversation

@dbogunowicz

@dbogunowicz dbogunowicz commented Aug 17, 2023

Copy link
Copy Markdown
Contributor

As reported by @mgoin and investigated by myself, the update method in the KVCacheDecoder is very slow.
Profiling has shown that this is due to the repeated use of the numpy.delete function:

image

This discussion:
https://stackoverflow.com/questions/30399534/shift-elements-in-a-numpy-array
hints that the most elegant and quite efficient replacement for numpy.delete would be slicing the arrays. This is the change that this PR introduces.

Short benchmarking numbers:

image

@dbogunowicz dbogunowicz marked this pull request as ready for review August 18, 2023 12:52
Comment thread src/deepsparse/transformers/utils/decoder_kv_cache.py

@bfineran bfineran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending comment

@dbogunowicz dbogunowicz requested a review from bfineran August 22, 2023 10:30
Comment thread src/deepsparse/transformers/utils/decoder_kv_cache.py
Comment thread src/deepsparse/transformers/utils/decoder_kv_cache.py
Comment thread src/deepsparse/transformers/utils/decoder_kv_cache.py
@dbogunowicz dbogunowicz merged commit 1bd60d2 into main Aug 24, 2023
@dbogunowicz dbogunowicz deleted the feature/damian/optimize_decoder branch August 24, 2023 13:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants