Releases: quic/aimet
Releases · quic/aimet
Version 2.8.0
What's Changed
New Features
- ONNX
- Update aimet_onnx
QuantizationSimModel.__init__function signature (cbe67ae) - Defined new AdaRound API
aimet_onnx.apply_adaround(84edcf5) - Defined new sequential MSE API
aimet_onnx.apply_seq_mse(836ab1e) - Defined new per-layer sensitivity analysis API
aimet_onnx.analyze_per_layer_sensitivity(dc34fa4) - Allowed onnx
QuantizationSimModel.compute_encodingsto take iterables (2c8ae88)
- Update aimet_onnx
- PyTorch
- Added native support for huggingface Phi-3 (80cd141)
Bug Fixes and Improvements
- ONNX
- Made dynamic weights of Conv, ConvTranspose, Gemm, and MatMul follow the symmetry of static weights (ce68e75)
- aimet-onnx on PyPI is now compatible with onnxruntime-gpu (6d3aa97)
- Unpinned onnx version (abe8782)
- Changed default execution provider to CPUExecutionProvider (e7d10c7)
- Made QcQuantizeOp's data_type attribute always consistent without additional reconfiguration (8009871)
- Made delta/offset and min/max always consistent (88706ef)
- PyTorch
Version 2.7.0
What's Changed
New Features
- PyTorch
- OmniQuant (experimental) - implement OmniQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama and Qwen2 model families
Bug Fixes and Improvements
-
ONNX
- Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
- Support loading encodings for missing quantizers
- Set bitwidth of tensor quantizer while loading encodings
-
PyTorch
- Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
- Export encodings for data movement operations in ONNX QDQ export
- AdaScale (experimental) - support for updating Conv2D layers in blocks
- AdaScale (experimental) - update API to take num_iterations instead of num_epochs
Version 2.6.0
What's Changed
New Features
-
ONNX
- Support for passing onnxruntime EPs directly to
QuantizationSimModel.__init__
- Support for passing onnxruntime EPs directly to
-
PyTorch
- Support for simulating float8 quantization
- Experimental: Added
aimet_torch.onnx.exportAPI for exportingQuantizationSimModelto onnx QDQ graph
Bug Fixes and Improvements
-
ONNX
- Reduced CPU and GPU memory usage during sequential MSE
- Fixed AMP generating incompatible quantizer configurations
- Fixed AMP errors with dynamic Conv ops
- Aligned computation of symmetric encodings with
aimet_torch
-
PyTorch
- Fixed AttributeError when catching
torch.onnx.exportfailures during QuantSim export - Fixed errors being thrown when deepspeed import fails
- Aligned input and output encodings for Resize layers
- Added supergroup fusion handling for LeakyRelu layers
- Docs: Updated LoRA user guide
- Fixed AttributeError when catching
Deprecations
- ONNX
- Deprecated
use_cuda,device,rounding_mode, anduse_symmetric_encodingsargs toQuantizationSimModel.__init__
- Deprecated
Version 2.5.0
What's Changed
New Features
-
ONNX
- Added a new set_quantizers() API to QuantizationSimModel
-
PyTorch
- Added new api to fold param quantizers
- Experimental: AdaScale - a new post-training quantization technique
Bug Fixes
-
ONNX
- Cleaned up tempfiles generated by large model export
-
PyTorch
- Fixed nullptr error in FloatEncoding
- Checked wrong parameter access only upon AttributeError
- Changed to import spconv lazily
- Fixed type error in transformer utils
Version 2.4.0
New Features
- ONNX
- Introduced option to export only encodings'
- Common
- Added RMSNormalization in default AIMET config
Bug Fixes
- ONNX
- Removed cublas dependency from the libpymo executable
- Represent y_zero_point as int
- Represent per-block scale as int
- PyTorch
- SeqMSE optimizes nested modules once improving turn-around time
- CrossLayerEqualization does not replaces ReLU6 with ReLU automatically
- AMP creates distict quantizer groups for model inputs
Documentation
Release main page: https://github.com/quic/aimet/releases/tag/2.4.0
Documentation: https://quic.github.io/aimet-pages/releases/2.4.0/index.html
version 2.3.0
New Feature
- ONNX
- Upgraded CUDA to 12.1.0
- Upgraded ONNX-Runtime to 1.19.2
- Reduced
QuantizationSimModel.export()time
Bug Fixes
- ONNX
- Fixed bug in
QuantizationSimModel.export()to export ONNX models with external weights to one file
- Fixed bug in
Documentation
Release main page: https://github.com/quic/aimet/releases/tag/2.3.0
Documentation: https://quic.github.io/aimet-pages/releases/2.3.0/index.html
version 2.2.0
What's New
- New Features
- PyTorch and ONNX
- Added "min_max" (
QuantScheme.min_max) as a new name for "post_training_tf" quant scheme
- Added "min_max" (
- ONNX
- Introduced supergroup pattern-matching for complicated patterns such as LayerNormalization and RMSNorm
- PyTorch and ONNX
- Bug Fixes
- PyTorch
- Restored
aimet_torch.v1tf-enhanced behavior - Updated Sequential MSE candidate logic to compute encoding candidates. Vectorized blockwise sequential MSE loss calculation for
nn.Linear
- Restored
- ONNX
- Fixed bug in
QuantizationSimModel._tie_quantizers()which propagates encodings to first op of parent ops if parent op is not quantizable
- Fixed bug in
- PyTorch
Documentation
- Release main page: https://github.com/quic/aimet/releases/tag/2.2.0
- Documentation: https://quic.github.io/aimet-pages/releases/2.2.0/index.html
Packages
- aimet_torch-2.2.0+cu121-cp310-none-any.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
- aimet_torch-2.2.0+cpu-cp310-none-any.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_onnx-2.2.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
- aimet_onnx-2.2.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_tensorflow-2.2.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
- aimet_tensorflow-2.2.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA
version 2.1.0
What's New
-
New Features
- PyTorch and ONNX
- [BREAKING CHANGE]: AIMET QuantSim by default uses per-channel quantization for weights instead of per-tensor
- AIMET QuantSim exports encoding json schema version 1.0.0 by default
- PyTorch
- AIMET now quantizes scalar inputs of type
torch.nn.Parameter- these were not quantized in prior releases - Published recipe for performing LoRA QAT - using LoRA adapters to recover quantized accuracy of the base model. Includes recipes for weight-only (WQ) and weight-and-activation (QWA) QAT
- AIMET now quantizes scalar inputs of type
- PyTorch and ONNX
-
Bug Fixes
- PyTorch
- Fixed a bug that prevented Adaround from caching data samples with PyTorch versions 2.6 and later
- PyTorch
Documentation
- Release main page: https://github.com/quic/aimet/releases/tag/2.1.0
- Documentation: https://quic.github.io/aimet-pages/releases/2.1.0/index.html
Packages
- aimet_torch-2.1.0+cu121-cp310-none-any.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
- aimet_torch-2.1.0+cpu-cp310-none-any.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_onnx-2.1.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
- aimet_onnx-2.1.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_tensorflow-2.1.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
- aimet_tensorflow-2.1.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA
version 2.0.0
What's New
- New Features
- Common
- Reorganized the documentation to more clearly explain AIMET procedures
- Redesigned the documentation using the Furo theme
- Added post-AIMET procedures on how to take AIMET quantized model to Qualcomm® AI Engine Direct and Qualcomm® AI Hub
- PyTorch
- BREAKING CHANGE: aimet_torch.v2 has become the default API. All the legacy APIs are migrated to aimet_torch.v1 subpackage, for example from aimet_torch.qc_quantize_op to aimet_torch.v1.qc_quantize_op
- Added Manual Mixed Precision Configurator (Beta) to make it easy to configure a model in Mixed Precision.
- ONNX
- Optimized QuantizationSimModel.init() latency
- Align ConnectedGraph representation with onnx graph
- Common
- Bug Fixes
- ONNX
- Bug fixes for Adaround
- Bug fixes for BN fold
- ONNX
- Upgrading
- PyTorch
- aimet_torch 2 is fully backward compatible with all the public APIs of aimet_torch 1.x. If you are using low-level components of QuantizationSimModel, please see Migrate to aimet_torch 2.
- PyTorch
Documentation
- Release main page: https://github.com/quic/aimet/releases/tag/2.0.0
- Installation guide: https://quic.github.io/aimet-pages/releases/2.0.0/install/index.html
- User guide: https://quic.github.io/aimet-pages/releases/2.0.0/userguide/index.html
- API documentation: https://quic.github.io/aimet-pages/releases/2.0.0/apiref/index.html
Packages
- aimet_torch-2.0.0+cu121-cp310-none-any.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
- aimet_torch-2.0.0+cpu-cp310-none-any.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_onnx-2.0.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
- aimet_onnx-2.0.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_tensorflow-2.0.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
- aimet_tensorflow-2.0.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA
version 1.35.1
What's New
- PyTorch
- Fixed package versioning for compatibility with latest pip version
Documentation
- Release main page: https://github.com/quic/aimet/releases/tag/1.35.1
- Installation guide: https://quic.github.io/aimet-pages/releases/1.35.1/install/index.html
- User guide: https://quic.github.io/aimet-pages/releases/1.35.1/user_guide/index.html
- API documentation: https://quic.github.io/aimet-pages/releases/1.35.1/api_docs/index.html
Packages
- aimet_torch-1.35.1+cu121-cp310-cp310-manylinux_2_34_x86_64.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
- aimet_torch-1.35.1+cu117-cp310-cp310-manylinux_2_34_x86_64.whl
- PyTorch 1.13 GPU package with Python 3.10 and CUDA 11.x
- aimet_torch-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_onnx-1.35.1+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
- aimet_onnx-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
- aimet_tensorflow-1.35.1+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
- aimet_tensorflow-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA