Documentation

What's Changed

New Features

ONNX
- Update aimet_onnx QuantizationSimModel.__init__ function signature (cbe67ae)
- Defined new AdaRound API aimet_onnx.apply_adaround (84edcf5)
- Defined new sequential MSE API aimet_onnx.apply_seq_mse (836ab1e)
- Defined new per-layer sensitivity analysis API aimet_onnx.analyze_per_layer_sensitivity (dc34fa4)
- Allowed onnx QuantizationSimModel.compute_encodings to take iterables (2c8ae88)
PyTorch
- Added native support for huggingface Phi-3 (80cd141)

Bug Fixes and Improvements

ONNX
- Made dynamic weights of Conv, ConvTranspose, Gemm, and MatMul follow the symmetry of static weights (ce68e75)
- aimet-onnx on PyPI is now compatible with onnxruntime-gpu (6d3aa97)
- Unpinned onnx version (abe8782)
- Changed default execution provider to CPUExecutionProvider (e7d10c7)
- Made QcQuantizeOp's data_type attribute always consistent without additional reconfiguration (8009871)
- Made delta/offset and min/max always consistent (88706ef)
PyTorch
- Made input quantizers always get enabled whenever the input wasn't already quantized (a2adae2)
- Deprecated saving PyTorch model object during QuantizationsimModel.export (b5521f3)

What's Changed

New Features

PyTorch
- OmniQuant (experimental) - implement OmniQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama and Qwen2 model families

Bug Fixes and Improvements

ONNX
- Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
- Support loading encodings for missing quantizers
- Set bitwidth of tensor quantizer while loading encodings
PyTorch
- Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
- Export encodings for data movement operations in ONNX QDQ export
- AdaScale (experimental) - support for updating Conv2D layers in blocks
- AdaScale (experimental) - update API to take num_iterations instead of num_epochs

What's Changed

New Features

ONNX
- Support for passing onnxruntime EPs directly to QuantizationSimModel.__init__
PyTorch
- Support for simulating float8 quantization
- Experimental: Added aimet_torch.onnx.export API for exporting QuantizationSimModel to onnx QDQ graph

Bug Fixes and Improvements

ONNX
- Reduced CPU and GPU memory usage during sequential MSE
- Fixed AMP generating incompatible quantizer configurations
- Fixed AMP errors with dynamic Conv ops
- Aligned computation of symmetric encodings with aimet_torch
PyTorch
- Fixed AttributeError when catching torch.onnx.export failures during QuantSim export
- Fixed errors being thrown when deepspeed import fails
- Aligned input and output encodings for Resize layers
- Added supergroup fusion handling for LeakyRelu layers
- Docs: Updated LoRA user guide

Deprecations

ONNX
- Deprecated use_cuda, device, rounding_mode, and use_symmetric_encodings args to QuantizationSimModel.__init__

What's Changed

New Features

ONNX
- Added a new set_quantizers() API to QuantizationSimModel
PyTorch
- Added new api to fold param quantizers
- Experimental: AdaScale - a new post-training quantization technique

Bug Fixes

ONNX
- Cleaned up tempfiles generated by large model export
PyTorch
- Fixed nullptr error in FloatEncoding
- Checked wrong parameter access only upon AttributeError
- Changed to import spconv lazily
- Fixed type error in transformer utils

New Features

ONNX
- Introduced option to export only encodings'
Common
- Added RMSNormalization in default AIMET config

Bug Fixes

ONNX
- Removed cublas dependency from the libpymo executable
- Represent y_zero_point as int
- Represent per-block scale as int
PyTorch
- SeqMSE optimizes nested modules once improving turn-around time
- CrossLayerEqualization does not replaces ReLU6 with ReLU automatically
- AMP creates distict quantizer groups for model inputs

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.4.0
Documentation: https://quic.github.io/aimet-pages/releases/2.4.0/index.html

New Feature

ONNX
- Upgraded CUDA to 12.1.0
- Upgraded ONNX-Runtime to 1.19.2
- Reduced QuantizationSimModel.export() time

Bug Fixes

ONNX
- Fixed bug in QuantizationSimModel.export() to export ONNX models with external weights to one file

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.3.0
Documentation: https://quic.github.io/aimet-pages/releases/2.3.0/index.html

What's New

New Features
- PyTorch and ONNX
  - Added "min_max" (QuantScheme.min_max) as a new name for "post_training_tf" quant scheme
- ONNX
  - Introduced supergroup pattern-matching for complicated patterns such as LayerNormalization and RMSNorm
Bug Fixes
- PyTorch
  - Restored aimet_torch.v1 tf-enhanced behavior
  - Updated Sequential MSE candidate logic to compute encoding candidates. Vectorized blockwise sequential MSE loss calculation for nn.Linear
- ONNX
  - Fixed bug in QuantizationSimModel._tie_quantizers() which propagates encodings to first op of parent ops if parent op is not quantizable

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.2.0
Documentation: https://quic.github.io/aimet-pages/releases/2.2.0/index.html

Packages

aimet_torch-2.2.0+cu121-cp310-none-any.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
aimet_torch-2.2.0+cpu-cp310-none-any.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_onnx-2.2.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
aimet_onnx-2.2.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_tensorflow-2.2.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
aimet_tensorflow-2.2.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

What's New

New Features
- PyTorch and ONNX
  - [BREAKING CHANGE]: AIMET QuantSim by default uses per-channel quantization for weights instead of per-tensor
  - AIMET QuantSim exports encoding json schema version 1.0.0 by default
- PyTorch
  - AIMET now quantizes scalar inputs of type torch.nn.Parameter - these were not quantized in prior releases
  - Published recipe for performing LoRA QAT - using LoRA adapters to recover quantized accuracy of the base model. Includes recipes for weight-only (WQ) and weight-and-activation (QWA) QAT
Bug Fixes
- PyTorch
  - Fixed a bug that prevented Adaround from caching data samples with PyTorch versions 2.6 and later

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.1.0
Documentation: https://quic.github.io/aimet-pages/releases/2.1.0/index.html

Packages

aimet_torch-2.1.0+cu121-cp310-none-any.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
aimet_torch-2.1.0+cpu-cp310-none-any.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_onnx-2.1.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
aimet_onnx-2.1.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_tensorflow-2.1.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
aimet_tensorflow-2.1.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

What's New

New Features
- Common
  - Reorganized the documentation to more clearly explain AIMET procedures
  - Redesigned the documentation using the Furo theme
  - Added post-AIMET procedures on how to take AIMET quantized model to Qualcomm® AI Engine Direct and Qualcomm® AI Hub
- PyTorch
  - BREAKING CHANGE: aimet_torch.v2 has become the default API. All the legacy APIs are migrated to aimet_torch.v1 subpackage, for example from aimet_torch.qc_quantize_op to aimet_torch.v1.qc_quantize_op
  - Added Manual Mixed Precision Configurator (Beta) to make it easy to configure a model in Mixed Precision.
- ONNX
  - Optimized QuantizationSimModel.init() latency
  - Align ConnectedGraph representation with onnx graph
Bug Fixes
- ONNX
  - Bug fixes for Adaround
  - Bug fixes for BN fold
Upgrading
- PyTorch
  - aimet_torch 2 is fully backward compatible with all the public APIs of aimet_torch 1.x. If you are using low-level components of QuantizationSimModel, please see Migrate to aimet_torch 2.

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.0.0
Installation guide: https://quic.github.io/aimet-pages/releases/2.0.0/install/index.html
User guide: https://quic.github.io/aimet-pages/releases/2.0.0/userguide/index.html
API documentation: https://quic.github.io/aimet-pages/releases/2.0.0/apiref/index.html

Packages

aimet_torch-2.0.0+cu121-cp310-none-any.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
aimet_torch-2.0.0+cpu-cp310-none-any.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_onnx-2.0.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
aimet_onnx-2.0.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_tensorflow-2.0.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
aimet_tensorflow-2.0.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

What's New

PyTorch
- Fixed package versioning for compatibility with latest pip version

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/1.35.1
Installation guide: https://quic.github.io/aimet-pages/releases/1.35.1/install/index.html
User guide: https://quic.github.io/aimet-pages/releases/1.35.1/user_guide/index.html
API documentation: https://quic.github.io/aimet-pages/releases/1.35.1/api_docs/index.html

Packages

aimet_torch-1.35.1+cu121-cp310-cp310-manylinux_2_34_x86_64.whl
- PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
aimet_torch-1.35.1+cu117-cp310-cp310-manylinux_2_34_x86_64.whl
- PyTorch 1.13 GPU package with Python 3.10 and CUDA 11.x
aimet_torch-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_onnx-1.35.1+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
aimet_onnx-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
aimet_tensorflow-1.35.1+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
aimet_tensorflow-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
- TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

Releases: quic/aimet

Version 2.8.0

What's Changed

Uh oh!

Version 2.7.0

What's Changed

Uh oh!

Version 2.6.0

What's Changed

Uh oh!

Version 2.5.0

What's Changed

Uh oh!

Version 2.4.0

Documentation

Uh oh!

version 2.3.0

New Feature

Bug Fixes

Documentation

Uh oh!

version 2.2.0

What's New

Documentation

Packages

Uh oh!

version 2.1.0

What's New

Documentation

Packages

Uh oh!

version 2.0.0

What's New

Documentation

Packages

Uh oh!

version 1.35.1

What's New

Documentation

Packages

Uh oh!