Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: quic/aimet

Version 2.8.0

18 Jun 16:14

Choose a tag to compare

What's Changed

New Features

  • ONNX
    • Update aimet_onnx QuantizationSimModel.__init__ function signature (cbe67ae)
    • Defined new AdaRound API aimet_onnx.apply_adaround (84edcf5)
    • Defined new sequential MSE API aimet_onnx.apply_seq_mse (836ab1e)
    • Defined new per-layer sensitivity analysis API aimet_onnx.analyze_per_layer_sensitivity (dc34fa4)
    • Allowed onnx QuantizationSimModel.compute_encodings to take iterables (2c8ae88)
  • PyTorch
    • Added native support for huggingface Phi-3 (80cd141)

Bug Fixes and Improvements

  • ONNX
    • Made dynamic weights of Conv, ConvTranspose, Gemm, and MatMul follow the symmetry of static weights (ce68e75)
    • aimet-onnx on PyPI is now compatible with onnxruntime-gpu (6d3aa97)
    • Unpinned onnx version (abe8782)
    • Changed default execution provider to CPUExecutionProvider (e7d10c7)
    • Made QcQuantizeOp's data_type attribute always consistent without additional reconfiguration (8009871)
    • Made delta/offset and min/max always consistent (88706ef)
  • PyTorch
    • Made input quantizers always get enabled whenever the input wasn't already quantized (a2adae2)
    • Deprecated saving PyTorch model object during QuantizationsimModel.export (b5521f3)

Version 2.7.0

02 Jun 18:05

Choose a tag to compare

What's Changed

New Features

Bug Fixes and Improvements

  • ONNX

    • Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
    • Support loading encodings for missing quantizers
    • Set bitwidth of tensor quantizer while loading encodings
  • PyTorch

    • Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
    • Export encodings for data movement operations in ONNX QDQ export
    • AdaScale (experimental) - support for updating Conv2D layers in blocks
    • AdaScale (experimental) - update API to take num_iterations instead of num_epochs

Version 2.6.0

16 May 23:52

Choose a tag to compare

What's Changed

New Features

  • ONNX

    • Support for passing onnxruntime EPs directly to QuantizationSimModel.__init__
  • PyTorch

    • Support for simulating float8 quantization
    • Experimental: Added aimet_torch.onnx.export API for exporting QuantizationSimModel to onnx QDQ graph

Bug Fixes and Improvements

  • ONNX

    • Reduced CPU and GPU memory usage during sequential MSE
    • Fixed AMP generating incompatible quantizer configurations
    • Fixed AMP errors with dynamic Conv ops
    • Aligned computation of symmetric encodings with aimet_torch
  • PyTorch

    • Fixed AttributeError when catching torch.onnx.export failures during QuantSim export
    • Fixed errors being thrown when deepspeed import fails
    • Aligned input and output encodings for Resize layers
    • Added supergroup fusion handling for LeakyRelu layers
    • Docs: Updated LoRA user guide

Deprecations

  • ONNX
    • Deprecated use_cuda, device, rounding_mode, and use_symmetric_encodings args to QuantizationSimModel.__init__

Version 2.5.0

05 May 21:44

Choose a tag to compare

What's Changed

New Features

  • ONNX

    • Added a new set_quantizers() API to QuantizationSimModel
  • PyTorch

    • Added new api to fold param quantizers
    • Experimental: AdaScale - a new post-training quantization technique

Bug Fixes

  • ONNX

    • Cleaned up tempfiles generated by large model export
  • PyTorch

    • Fixed nullptr error in FloatEncoding
    • Checked wrong parameter access only upon AttributeError
    • Changed to import spconv lazily
    • Fixed type error in transformer utils

Version 2.4.0

23 Apr 17:30

Choose a tag to compare

New Features

  • ONNX
    • Introduced option to export only encodings'
  • Common
    • Added RMSNormalization in default AIMET config

Bug Fixes

  • ONNX
    • Removed cublas dependency from the libpymo executable
    • Represent y_zero_point as int
    • Represent per-block scale as int
  • PyTorch
    • SeqMSE optimizes nested modules once improving turn-around time
    • CrossLayerEqualization does not replaces ReLU6 with ReLU automatically
    • AMP creates distict quantizer groups for model inputs

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.4.0
Documentation: https://quic.github.io/aimet-pages/releases/2.4.0/index.html

version 2.3.0

08 Apr 16:46

Choose a tag to compare

New Feature

  • ONNX
    • Upgraded CUDA to 12.1.0
    • Upgraded ONNX-Runtime to 1.19.2
    • Reduced QuantizationSimModel.export() time

Bug Fixes

  • ONNX
    • Fixed bug in QuantizationSimModel.export() to export ONNX models with external weights to one file

Documentation

Release main page: https://github.com/quic/aimet/releases/tag/2.3.0
Documentation: https://quic.github.io/aimet-pages/releases/2.3.0/index.html

version 2.2.0

24 Mar 22:21

Choose a tag to compare

What's New

  • New Features
    • PyTorch and ONNX
      • Added "min_max" (QuantScheme.min_max) as a new name for "post_training_tf" quant scheme
    • ONNX
      • Introduced supergroup pattern-matching for complicated patterns such as LayerNormalization and RMSNorm
  • Bug Fixes
    • PyTorch
      • Restored aimet_torch.v1 tf-enhanced behavior
      • Updated Sequential MSE candidate logic to compute encoding candidates. Vectorized blockwise sequential MSE loss calculation for nn.Linear
    • ONNX
      • Fixed bug in QuantizationSimModel._tie_quantizers() which propagates encodings to first op of parent ops if parent op is not quantizable

Documentation

Packages

  • aimet_torch-2.2.0+cu121-cp310-none-any.whl
    • PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
  • aimet_torch-2.2.0+cpu-cp310-none-any.whl
    • PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_onnx-2.2.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
  • aimet_onnx-2.2.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_tensorflow-2.2.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
  • aimet_tensorflow-2.2.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

version 2.1.0

11 Mar 20:50

Choose a tag to compare

What's New

  • New Features

    • PyTorch and ONNX
      • [BREAKING CHANGE]: AIMET QuantSim by default uses per-channel quantization for weights instead of per-tensor
      • AIMET QuantSim exports encoding json schema version 1.0.0 by default
    • PyTorch
      • AIMET now quantizes scalar inputs of type torch.nn.Parameter - these were not quantized in prior releases
      • Published recipe for performing LoRA QAT - using LoRA adapters to recover quantized accuracy of the base model. Includes recipes for weight-only (WQ) and weight-and-activation (QWA) QAT
  • Bug Fixes

    • PyTorch
      • Fixed a bug that prevented Adaround from caching data samples with PyTorch versions 2.6 and later

Documentation

Packages

  • aimet_torch-2.1.0+cu121-cp310-none-any.whl
    • PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
  • aimet_torch-2.1.0+cpu-cp310-none-any.whl
    • PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_onnx-2.1.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
  • aimet_onnx-2.1.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_tensorflow-2.1.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
  • aimet_tensorflow-2.1.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

version 2.0.0

13 Jan 21:19

Choose a tag to compare

What's New

  • New Features
    • Common
      • Reorganized the documentation to more clearly explain AIMET procedures
      • Redesigned the documentation using the Furo theme
      • Added post-AIMET procedures on how to take AIMET quantized model to Qualcomm® AI Engine Direct and Qualcomm® AI Hub
    • PyTorch
      • BREAKING CHANGE: aimet_torch.v2 has become the default API. All the legacy APIs are migrated to aimet_torch.v1 subpackage, for example from aimet_torch.qc_quantize_op to aimet_torch.v1.qc_quantize_op
      • Added Manual Mixed Precision Configurator (Beta) to make it easy to configure a model in Mixed Precision.
    • ONNX
      • Optimized QuantizationSimModel.init() latency
      • Align ConnectedGraph representation with onnx graph
  • Bug Fixes
    • ONNX
      • Bug fixes for Adaround
      • Bug fixes for BN fold
  • Upgrading
    • PyTorch
      • aimet_torch 2 is fully backward compatible with all the public APIs of aimet_torch 1.x. If you are using low-level components of QuantizationSimModel, please see Migrate to aimet_torch 2.

Documentation

Packages

  • aimet_torch-2.0.0+cu121-cp310-none-any.whl
    • PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
  • aimet_torch-2.0.0+cpu-cp310-none-any.whl
    • PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_onnx-2.0.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
  • aimet_onnx-2.0.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_tensorflow-2.0.0+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
  • aimet_tensorflow-2.0.0+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA

version 1.35.1

21 Dec 09:54

Choose a tag to compare

What's New

  • PyTorch
    • Fixed package versioning for compatibility with latest pip version

Documentation

Packages

  • aimet_torch-1.35.1+cu121-cp310-cp310-manylinux_2_34_x86_64.whl
    • PyTorch 2.1 GPU package with Python 3.10 and CUDA 12.x
  • aimet_torch-1.35.1+cu117-cp310-cp310-manylinux_2_34_x86_64.whl
    • PyTorch 1.13 GPU package with Python 3.10 and CUDA 11.x
  • aimet_torch-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • PyTorch 2.1 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_onnx-1.35.1+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 GPU package with Python 3.10 - Recommended for use with ONNX models
  • aimet_onnx-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • ONNX 1.16 CPU package with Python 3.10 - If installing on a machine without CUDA
  • aimet_tensorflow-1.35.1+cu118-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 GPU package with Python 3.10 - Recommended for use with TensorFlow models
  • aimet_tensorflow-1.35.1+cpu-cp310-cp310-manylinux_2_34_x86_64.whl
    • TensorFlow 2.10 CPU package with Python 3.10 - If installing on a machine without CUDA