PyTorch-Quantization is a toolkit for training and evaluating PyTorch models with simulated quantization. Quantization can be added to the model automatically, or manually, allowing the model to be tuned for accuracy and performance. Quantization is compatible with NVIDIAs high performance integer kernels which leverage integer Tensor Cores. The quantized model can be exported to ONNX and imported to an upcoming version of TensorRT.
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.comgit clone https://github.com/NVIDIA/TensorRT.git
cd tools/pytorch-quantizationInstall prerequisites
pip install -r requirements.txt
pip install torchBuild and install pytorch-quantization
python setup.py installpytorch-quantization is preinstalled in NVIDIA NGC PyTorch container since 20.12, e.g. nvcr.io/nvidian/pytorch:20.12-py3
- Pytorch Quantization Toolkit userguide
- Quantization Basics whitepaper