Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

[Text Generation] Automatically benchmark in auto-regressive setting#1142

Merged
dbogunowicz merged 17 commits into
mainfrom
feature/damian/benchmark_llm
Aug 24, 2023
Merged

[Text Generation] Automatically benchmark in auto-regressive setting#1142
dbogunowicz merged 17 commits into
mainfrom
feature/damian/benchmark_llm

Conversation

@dbogunowicz

@dbogunowicz dbogunowicz commented Jul 24, 2023

Copy link
Copy Markdown
Contributor

When benchmarking an LLM, assert that input_ids length is one, so that benchmarks emulate the correct data.

Manual Testing

  1. Export the sample model
python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx
  1. Inject kv cache
python kv_cache_injector.py --input-file deployment/model.onnx --output-file deployment/model_kvcache.onnx
2023-07-24 12:50:46 sparseml.exporters.transforms.kv_cache.configs INFO     Loaded config file deployment/config.json for model: codegen
2023-07-24 12:50:46 sparseml.exporters.transforms.kv_cache.configs INFO     Properly configured arguments for KV Cache Transformation
2023-07-24 12:50:48 sparseml.exporters.transforms.onnx_transform INFO     [CacheKeysAndValues] Transformed 40 matches
2023-07-24 12:50:52 sparseml.exporters.transforms.onnx_transform INFO     [PositionsAdjustmentCodeGen] Transformed 7 matches
Modified model saved to: deployment/model_kvcache.onnx
  1. Benchmark
deepsparse.benchmark /home/ubuntu/damian/sparseml/deployment/model_kvcache.onnx --sequence_length 256

2023-08-01 10:09:08 deepsparse.benchmark.benchmark_model INFO     Thread pinning to cores enabled
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2023-08-01 10:09:11 deepsparse.transformers.utils.helpers INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/damian/sparseml/deployment/model.onnx
2023-08-01 10:09:16 deepsparse.benchmark.benchmark_model INFO     Found model that contains KV cache support. Benchmarking the autoregressive model with sequence length: 256.
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230727 COMMUNITY | (3cb4a3e5) (optimized) (system=avx2, binary=avx2)
2023-08-01 10:11:00 deepsparse.benchmark.benchmark_model INFO     deepsparse.engine.Engine:
        onnx_file_path: /home/ubuntu/damian/sparseml/deployment/model.onnx
        batch_size: 1
        num_cores: 23
        num_streams: 1
        scheduler: Scheduler.default
        fraction_of_supported_ops: 1.0
        cpu_avx_type: avx2
        cpu_vnni: False
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'input_ids', type = int64, shape = [1, 1]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'attention_mask', type = int64, shape = [1, 256]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.0.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.0.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.1.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.1.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.2.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.2.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.3.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.3.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.4.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.4.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.5.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.5.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.6.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.6.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.7.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.7.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.8.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.8.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.9.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.9.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.10.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.10.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.11.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.11.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.12.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.12.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.13.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.13.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.14.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.14.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.15.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.15.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.16.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.16.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.17.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.17.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.18.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.18.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.19.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.19.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.20.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.20.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.21.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.21.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.22.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.22.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.23.key', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'past_key_values.23.value', type = float32, shape = [1, 16, 255, 64]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'positions', type = int64, shape = [1, 1]
2023-08-01 10:11:01 deepsparse.utils.onnx INFO     Generating input 'causal_mask', type = int64, shape = [1, 1, 1, 256]
2023-08-01 10:11:02 deepsparse.benchmark.benchmark_model INFO     Starting 'singlestream' performance measurements for 10 seconds
Original Model Path: /home/ubuntu/damian/sparseml/deployment/model.onnx
Batch Size: 1
Sequence Length: 256
Scenario: sync
Throughput (items/sec): 30.5009
Latency Mean (ms/batch): 32.6904
Latency Median (ms/batch): 32.6916
Latency Std (ms/batch): 1.4164
Iterations: 306

@dbogunowicz dbogunowicz marked this pull request as ready for review July 24, 2023 12:56
@dbogunowicz dbogunowicz requested review from bfineran, mgoin and natuan July 24, 2023 12:56
bfineran
bfineran previously approved these changes Jul 24, 2023

@bfineran bfineran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending more testing from @dbogunowicz

@dbogunowicz

Copy link
Copy Markdown
Contributor Author

Ran a quite exhaustive set of manual tests specified in #1083, all looking good.

bfineran
bfineran previously approved these changes Jul 26, 2023
Comment thread src/deepsparse/benchmark/benchmark_model.py Outdated
Comment thread src/deepsparse/transformers/utils/helpers.py Outdated
Comment thread src/deepsparse/utils/onnx.py Outdated
Comment thread src/deepsparse/benchmark/benchmark_model.py Outdated
bfineran
bfineran previously approved these changes Aug 1, 2023
Comment thread src/deepsparse/benchmark/benchmark_model.py Outdated
@ProExpertProg

Copy link
Copy Markdown
Contributor

Just as a heads up, I'm making my benchmarking script depend on this PR so please let me know when you merge/if anything changes. Thanks for this utility @dbogunowicz it couldn't have come at a better time

Comment thread src/deepsparse/utils/onnx.py Outdated
Comment thread src/deepsparse/transformers/utils/helpers.py Outdated
Comment thread src/deepsparse/utils/onnx.py Outdated
Comment thread src/deepsparse/benchmark/benchmark_model.py Outdated
Comment thread src/deepsparse/transformers/utils/helpers.py Outdated
Comment thread src/deepsparse/benchmark/benchmark_model.py Outdated
@ProExpertProg ProExpertProg force-pushed the feature/damian/benchmark_llm branch from e2d19aa to 709853d Compare August 8, 2023 18:25
bfineran
bfineran previously approved these changes Aug 23, 2023

@ProExpertProg ProExpertProg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more type annotation things but looking great!

Comment thread src/deepsparse/transformers/engines/nl_decoder_engine.py Outdated
Comment thread src/deepsparse/utils/onnx.py Outdated
Comment thread src/deepsparse/transformers/utils/helpers.py

@ProExpertProg ProExpertProg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Damian

@dbogunowicz dbogunowicz merged commit 703b47f into main Aug 24, 2023
@dbogunowicz dbogunowicz deleted the feature/damian/benchmark_llm branch August 24, 2023 14:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants