This sample demonstrates:
- Converting a pre-trained EfficientNet-B0 ONNX model to a
TensorRTengine - Performing inference with
TensorRTusing Python APIs - Comparing inference performance between
ONNX RuntimeandTensorRT - Proper memory management and resource cleanup in both Python implementations
TensorRT's ONNX parser + ONNX model ->TensorRTengine- Engine building and serialization
- Input/output tensor handling
- Performance profiling
- Editable timing cache for deterministic engine builds
- Memory pool optimization with workspace configuration
- Configures workspace memory pool for running under limited hardware
- Supports editable timing cache for deterministic builds
- Serialization and deserialization of TensorRT engines
- Efficient image preprocessing with
PILandNumPy - Supports batch inference
- Implements proper error handling and resource cleanup
- Provides performance comparison between
ONNX RuntimeandTensorRT - Performs inference on a real-world image
Users can run their onnx model and generate the engine with similar functionality using trtexec:
# Basic conversion with performance profiling
trtexec --onnx=efficientnet-b0.onnx \
--saveEngine=efficientnet-b0_trtexec.plan \
--dumpProfile \
--iterations=100 \
--avgRuns=100 \
--workspace=1024 \
--batch=1Key options explained:
--onnx: Input ONNX model--saveEngine: Output TensorRT engine--dumpProfile: Performance profiling--iterations: Number of inference iterations--avgRuns: Number of runs to average for timing--workspace: Workspace size in MB (1024MB = 1GB)--batch: Batch size for inference
August 2025 Removed support for Python versions < 3.10.