perf

Performance Benchmarking

This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:

Torch
Torch-TensorRT [Torchscript]
Torch-TensorRT [Dynamo]
Torch-TensorRT [torch_compile]
TensorRT

Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.

Prerequisite

Benchmark scripts depends on following Python packages in addition to requirements.txt packages

Torch-TensorRT
Torch
TensorRT

Structure

./
├── models
├── perf_run.py
├── hub.py
├── custom_models.py
├── requirements.txt
├── benchmark.sh
└── README.md

models - Model directory
perf_run.py - Performance benchmarking script which supports torch, ts_trt, torch_compile, dynamo, tensorrt backends
hub.py - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
custom_models.py - Script which includes custom models other than torchvision and timm (eg: HF BERT)
utils.py - utility functions script
benchmark.sh - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.

Usage

Here are the list of CompileSpec options that can be provided directly to compile the pytorch module

--backends : Comma separated string of backends. Eg: torch, torch_compile, dynamo, tensorrt
--model : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in .plan extension)). If the backend is dynamo or torch_compile, the input should be a Pytorch module (instead of a torchscript module).
--model_torch : Name of the PyTorch model file (optional, only necessary if dynamo or torch_compile is a chosen backend)
--inputs : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
--batch_size : Batch size
--precision : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
--device : Device ID
--truncate : Truncate long and double weights in the network in Torch-TensorRT
--is_trt_engine : Boolean flag to be enabled if the model file provided is a TensorRT engine.
--report : Path of the output file where performance summary is written.

Eg:

  python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
                     --model_torch ${MODELS_DIR}/vgg16_torch.pt \
                     --precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
                     --batch_size 1 \
                     --backends torch,ts_trt,dynamo,torch_compile,tensorrt \
                     --report "vgg_perf_bs1.txt"

Note:

Please note that measuring INT8 performance is only supported via a calibration cache file or QAT mode for torch_tensorrt backend.
TensorRT engine filename should end with .plan otherwise it will be treated as Torchscript module.

Example models

This tool benchmarks any pytorch model or torchscript module. As an example, we provide VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT models in hub.py that we internally test for performance. The torchscript modules for these models can be generated by running

python hub.py

You can refer to benchmark.sh on how we run/benchmark these models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Performance Benchmarking

Prerequisite

Structure

Usage

Example models

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
accumulate_results.py		accumulate_results.py
benchmark.sh		benchmark.sh
custom_models.py		custom_models.py
hub.py		hub.py
perf_run.py		perf_run.py
requirements.txt		requirements.txt
run_hf_model.sh		run_hf_model.sh
stage1.sh		stage1.sh
stage2.sh		stage2.sh
utils.py		utils.py

FilesExpand file tree

perf

Directory actions

More options

Directory actions

More options

Latest commit

History

perf

Folders and files

parent directory

README.md

Performance Benchmarking

Prerequisite

Structure

Usage

Example models