Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: quic/aimet

Version 2.17.0

20 Oct 21:38

Choose a tag to compare

  • Bug fixes and Improvements
    • ONNX

      • Optimize SeqMSE latency and CPU memory usage (434ac6b)
      • Support excluding nodes from SeqMSE optimization (6a37239)
      • Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
      • Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
      • Allow disabling MatMul-Add supergroup via config file (e49660c)
      • Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
    • Torch

      • Fix sim.export bug when using Python >= 3.12 (ee949a2)
      • Allow export for back-to-back quantizers which share the same encodings (28a7382)
      • Fix numerical issue in FPTQuant (f0bc6c9)
    • Common

      • Remove Conv-Relu supergroup from HTP < V73 config files (19e5a4e)
      • Fix LayerNorm and InstanceNorm weight symmetry in HTP < V73 config files (eb1ac5c, ce1ea63)

Version 2.16.0

07 Oct 04:36

Choose a tag to compare

New Features

  • ONNX
    • Experimental - Added Adascale, a post-training quantization technique (5e23ceb)

Bug fixes and Improvements

  • ONNX

    • Skip tying Concat input/output quantizers with conflicting encoding constraints (b924107)
    • Small updates to FPT Quant for improved accuracy (ba10947)
    • Implement partial encoding freezing mechanism in aimet-onnx (658ec3c)
    • Add Relu partial encoding constraints to HTP config files (dc8d978)
    • Clear encoding analyzer stats after computing param encodings (3d4725f)
    • Remove wasted computation/memory in FPTQuant local optimizer (59350af)
  • Torch

    • Allow boolean type casting of QuantizedTensors (7d63e66)
    • Implement partial encoding freezing mechanism in aimet-torch (1b99a39)
    • Improve scale post-processing to prevent scale freezing during QAT (6fe56b0)

Version 2.15.1

27 Sep 05:34

Choose a tag to compare

New Features

  • ONNX
    • Experimental - Added Adascale, a post-training quantization technique (5e23ceb)

Version 2.15.0

22 Sep 18:06

Choose a tag to compare

  • Bug fixes and Improvements

    • ONNX

      • Throws an error on bfloat16 models (5181860)
      • Added docs and examples for LiteMP (3d5e0dd)
      • Export to QDQ ONNX with pre-quantized constants (a97354f)
    • PyTorch

      • Fix multiple dispatch issue when torch function is called in nested context manager (6216ca0)
    • Keras

      • 2.14.0 is the last release of aimet-tf (087e9b1)
    • Common

Version 2.14.0

08 Sep 20:00

Choose a tag to compare

  • New Feature

    • ONNX
      • Add support for FP16 in QuantizationSimModel (2494d90)
  • Bug fixes and Improvements

    • ONNX

      • Add sequential MSE support for onnx >= 1.18.0. (754d030)
      • Improve histogram granularity during TFE calibration (91109af)
      • Improve runtime for QuantizationSimModel creation for large models like LLMs (f7e700f)
      • Improve runtime for setting quantizers in a QuantizationSimModel for use cases like tying KV Cache input and output quantizers. (c0bdb46)
      • Add a check for None values in the group attribute of Conv layers and fix improper handling of None group attribute in ConvTranspose within :func:fold_all_batch_norms_to_weight (374e8db)
    • PyTorch

      • Address QAT convergence issue: Add a fix for cases where quantizer.min becomes equal to quantizer.max during training, leading to NaN values (51f8990)
    • Keras

      • Fix accuracy drop issue for GPU wheel by excluding libpython*.so* from the aimet wheel packages (22cac5c)
    • Common

      • Remove Conv3d, Conv3dTranspose, and DepthwiseConv ops followed by activation from the supergroup until HTP support is available. (05f6810)
      • Fix color theme issue in documentation causing code snippets to render incorrectly (2c64eac)

Version 2.13.0

26 Aug 16:05

Choose a tag to compare

  • Bug fixes and Improvements

    • ONNX

      • Adjust weight scale for int32 bias overflow in W16A16 quantization (f39c0bf)
      • AutoQuant: Remove deprecated feature (414cdde)
      • Support exporting large models in aimet-onnx (0fe6701)
      • AdaRound: Delete deprecated top-level API. (bfba557)
      • AdaRound: Skip optimization if no input to layer (18dfedc)
    • PyTorch

      • Enable save_model_as_external_data for sim.onnx.export (107b339)
  • Known Issues

    • Keras
      • Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

Version 2.12.0

13 Aug 02:34

Choose a tag to compare

2.12.0

  • Bug fixes and Improvements

    • Common

      • Remove data movement ops from config (ae02aa8)
    • ONNX

      • Exclude bias from quantization when weights are not quantized (62f5879)
      • AdaRound: Fix prelu failing in CUDA model (b2350b2)
    • PyTorch

      • Wrap aimet_torch.onnx.export with torch.no_grad (b73bb71)
  • Known Issues

    • Keras
      • Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

Version 2.11.0

29 Jul 20:53

Choose a tag to compare

  • New Feature

    • PyTorch

    • ONNX

      • Enable llm_configurator for Llama (Experimental) (08c17b8)
  • Bug fixes and Improvements

    • Common

      • Represent LPBQ as DequantizeLinear in onnx QDQ (a967b8f)
      • Add additional sanity checks in LPBQ export logic (45c2a65)
      • Allow negative block axis in LPBQ QDQ export (6f670a4)
      • Add support for enabling param bw=2 in QuantSim (2d4e0eb)
      • Fix tanh output encoding range to [-1, 1] (3c92bb7)
    • ONNX

      • Apply matmul exception rule only for integer quantization (bb93c76)
      • Optimize blockwise min-max encoding analyzer (4febdd4)
      • Remove explicit FP32 model creation inside AdaRound and optimize building sessions during the optimization process (b1415bd_)
      • Make Concat output quantizer inherit fixed input range (50f35dd)
      • Enable output quantizers to inherit input encoding when tying encodings (3750526)
      • Fix bug in CLE with bn_conv groups (654f4b1)
    • PyTorch

      • Guarantee positive scale during aimet-torch QAT (2ed8305)
      • Add secondary progress bars to Adascale and Omniquant (6c92a97)
  • Documentation Updates

    • Update Quick Start example and PTQ section (6c9f584)
    • Add missing workflow images (f961ed4)
  • Known Issues

    • Keras
      • Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.
      • Skipping 2.11 aimet-keras release due to regression

Version 2.10.0

14 Jul 21:49

Choose a tag to compare

What's Changed

  • New Feature

    • Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
  • Bug fixes and Improvements

    • Common
      • Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
    • PyTorch
      • Implement scaling factor in aimet-torch float QDQ (9b8c655)
      • Fix CustomSiLU bug (499df9f)
      • Added extra logic to isolate model outputs from connectedgraph (4ad0703)
      • Always instantiate quantizers with requires_grad=True (5aac9c5)
      • Add logic to place adascale quantizers into correct dtype (5e1e6f2)
    • ONNX
      • Allow AdaRound and SeqMSE to take uncalibrated sims(31ca7fd)
      • Modify bias quantizer setting based on weight quantizer (b47a97e)
      • Fix cnt overflow issue (70029c5)
      • Make memory saving optimization default in build_session and _infer_activation_dtypes (4b94ca9)
      • Implement two-phase AMP API (1603c17)
      • Work-around onnx version converter issue for models with external weights (22f0f23)
  • Documentation

    • Update SeqMSE feature guide (fefd504)
    • Fix links in example notebooks (fe66376)
    • Modify docs for CLE (f9d0d6c)
    • Edit automatic mixed precision feature guide (22b5c94)
    • Polish BQ user guide (f547a49)
    • Polish QAT user guide (339a225)
    • Update Quick start example, PTQ section (example) and notebook (03fdd64)
    • Add missing workflow images (010898a)
    • Add reference to mixed precision page in docs (188d401)

Version 2.9.0

01 Jul 17:02

Choose a tag to compare

What's Changed

  • Bug Fixes and Improvements
    • ONNX
      • Rename QuantizeLinear outputs from <...>_int to <...>_q in onnx QDQ export (e78dbec)
      • Preserve I/O names in onnx QDQ export (35ad990)
      • Allow freezing loaded encodings in load_encodings_to_sim (911af75)
      • Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
      • Allow aimet-onnx to load partial encodings (6636515)
      • Fix onnx sim.export permanently removing quantizers (9a2a407)
      • Fix onnx QDQ export output name swapping bug (6d1664c)
      • Switch AdaRound API naming to num_iterations (fea395f)
    • PyTorch
      • Add native support for Mistral-0.3 (db99447)
      • AdaScale: Update the learning rates for AdaScale learnable parameters (7336ead)
    • common
      • Add docs to build aimet from source (ae981f7)