Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
 
 

README.md

Performance Benchmarking

This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:

  1. Torch
  2. Torch-TensorRT [Torchscript]
  3. Torch-TensorRT [Dynamo]
  4. Torch-TensorRT [torch_compile]
  5. TensorRT

Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.

Prerequisite

Benchmark scripts depends on following Python packages in addition to requirements.txt packages

  1. Torch-TensorRT
  2. Torch
  3. TensorRT

Structure

./
├── models
├── perf_run.py
├── hub.py
├── custom_models.py
├── requirements.txt
├── benchmark.sh
└── README.md
  • models - Model directory
  • perf_run.py - Performance benchmarking script which supports torch, ts_trt, torch_compile, dynamo, tensorrt backends
  • hub.py - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
  • custom_models.py - Script which includes custom models other than torchvision and timm (eg: HF BERT)
  • utils.py - utility functions script
  • benchmark.sh - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.

Usage

Here are the list of CompileSpec options that can be provided directly to compile the pytorch module

  • --backends : Comma separated string of backends. Eg: torch, torch_compile, dynamo, tensorrt
  • --model : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in .plan extension)). If the backend is dynamo or torch_compile, the input should be a Pytorch module (instead of a torchscript module).
  • --model_torch : Name of the PyTorch model file (optional, only necessary if dynamo or torch_compile is a chosen backend)
  • --inputs : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
  • --batch_size : Batch size
  • --precision : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
  • --device : Device ID
  • --truncate : Truncate long and double weights in the network in Torch-TensorRT
  • --is_trt_engine : Boolean flag to be enabled if the model file provided is a TensorRT engine.
  • --report : Path of the output file where performance summary is written.

Eg:

  python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
                     --model_torch ${MODELS_DIR}/vgg16_torch.pt \
                     --precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
                     --batch_size 1 \
                     --backends torch,ts_trt,dynamo,torch_compile,tensorrt \
                     --report "vgg_perf_bs1.txt"

Note:

  1. Please note that measuring INT8 performance is only supported via a calibration cache file or QAT mode for torch_tensorrt backend.
  2. TensorRT engine filename should end with .plan otherwise it will be treated as Torchscript module.

Example models

This tool benchmarks any pytorch model or torchscript module. As an example, we provide VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT models in hub.py that we internally test for performance. The torchscript modules for these models can be generated by running

python hub.py

You can refer to benchmark.sh on how we run/benchmark these models.