2.12.0

Bug fixes and Improvements
- ONNX
  - Optimize SeqMSE latency and CPU memory usage (434ac6b)
  - Support excluding nodes from SeqMSE optimization (6a37239)
  - Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
  - Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
  - Allow disabling MatMul-Add supergroup via config file (e49660c)
  - Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
- Torch
  - Fix sim.export bug when using Python >= 3.12 (ee949a2)
  - Allow export for back-to-back quantizers which share the same encodings (28a7382)
  - Fix numerical issue in FPTQuant (f0bc6c9)
- Common
  - Remove Conv-Relu supergroup from HTP < V73 config files (19e5a4e)
  - Fix LayerNorm and InstanceNorm weight symmetry in HTP < V73 config files (eb1ac5c, ce1ea63)

New Features

ONNX
- Experimental - Added Adascale, a post-training quantization technique (5e23ceb)

Bug fixes and Improvements

ONNX
- Skip tying Concat input/output quantizers with conflicting encoding constraints (b924107)
- Small updates to FPT Quant for improved accuracy (ba10947)
- Implement partial encoding freezing mechanism in aimet-onnx (658ec3c)
- Add Relu partial encoding constraints to HTP config files (dc8d978)
- Clear encoding analyzer stats after computing param encodings (3d4725f)
- Remove wasted computation/memory in FPTQuant local optimizer (59350af)
Torch
- Allow boolean type casting of QuantizedTensors (7d63e66)
- Implement partial encoding freezing mechanism in aimet-torch (1b99a39)
- Improve scale post-processing to prevent scale freezing during QAT (6fe56b0)

New Features

ONNX
- Experimental - Added Adascale, a post-training quantization technique (5e23ceb)

Bug fixes and Improvements
- ONNX
  - Throws an error on bfloat16 models (5181860)
  - Added docs and examples for LiteMP (3d5e0dd)
  - Export to QDQ ONNX with pre-quantized constants (a97354f)
- PyTorch
  - Fix multiple dispatch issue when torch function is called in nested context manager (6216ca0)
- Keras
  - 2.14.0 is the last release of aimet-tf (087e9b1)
- Common
  - Added PSNR metrics (14c8e81)

New Feature
- ONNX
  - Add support for FP16 in QuantizationSimModel (2494d90)
Bug fixes and Improvements
- ONNX
  - Add sequential MSE support for onnx >= 1.18.0. (754d030)
  - Improve histogram granularity during TFE calibration (91109af)
  - Improve runtime for QuantizationSimModel creation for large models like LLMs (f7e700f)
  - Improve runtime for setting quantizers in a QuantizationSimModel for use cases like tying KV Cache input and output quantizers. (c0bdb46)
  - Add a check for None values in the group attribute of Conv layers and fix improper handling of None group attribute in ConvTranspose within :func:fold_all_batch_norms_to_weight (374e8db)
- PyTorch
  - Address QAT convergence issue: Add a fix for cases where quantizer.min becomes equal to quantizer.max during training, leading to NaN values (51f8990)
- Keras
  - Fix accuracy drop issue for GPU wheel by excluding libpython*.so* from the aimet wheel packages (22cac5c)
- Common
  - Remove Conv3d, Conv3dTranspose, and DepthwiseConv ops followed by activation from the supergroup until HTP support is available. (05f6810)
  - Fix color theme issue in documentation causing code snippets to render incorrectly (2c64eac)

Bug fixes and Improvements
- ONNX
  - Adjust weight scale for int32 bias overflow in W16A16 quantization (f39c0bf)
  - AutoQuant: Remove deprecated feature (414cdde)
  - Support exporting large models in aimet-onnx (0fe6701)
  - AdaRound: Delete deprecated top-level API. (bfba557)
  - AdaRound: Skip optimization if no input to layer (18dfedc)
- PyTorch
  - Enable save_model_as_external_data for sim.onnx.export (107b339)
Known Issues
- Keras
  - Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

2.12.0

Bug fixes and Improvements
- Common
  - Remove data movement ops from config (ae02aa8)
- ONNX
  - Exclude bias from quantization when weights are not quantized (62f5879)
  - AdaRound: Fix prelu failing in CUDA model (b2350b2)
- PyTorch
  - Wrap aimet_torch.onnx.export with torch.no_grad (b73bb71)
Known Issues
- Keras
  - Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

New Feature
- PyTorch
  - SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
  - Enable Adascale and Omniquant for Mistral (d33e98c)
- ONNX
  - Enable llm_configurator for Llama (Experimental) (08c17b8)
Bug fixes and Improvements
- Common
  - Represent LPBQ as DequantizeLinear in onnx QDQ (a967b8f)
  - Add additional sanity checks in LPBQ export logic (45c2a65)
  - Allow negative block axis in LPBQ QDQ export (6f670a4)
  - Add support for enabling param bw=2 in QuantSim (2d4e0eb)
  - Fix tanh output encoding range to [-1, 1] (3c92bb7)
- ONNX
  - Apply matmul exception rule only for integer quantization (bb93c76)
  - Optimize blockwise min-max encoding analyzer (4febdd4)
  - Remove explicit FP32 model creation inside AdaRound and optimize building sessions during the optimization process (b1415bd_)
  - Make Concat output quantizer inherit fixed input range (50f35dd)
  - Enable output quantizers to inherit input encoding when tying encodings (3750526)
  - Fix bug in CLE with bn_conv groups (654f4b1)
- PyTorch
  - Guarantee positive scale during aimet-torch QAT (2ed8305)
  - Add secondary progress bars to Adascale and Omniquant (6c92a97)
Documentation Updates
- Update Quick Start example and PTQ section (6c9f584)
- Add missing workflow images (f961ed4)
Known Issues
- Keras
  - Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.
  - Skipping 2.11 aimet-keras release due to regression

What's Changed

New Feature
- Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
Bug fixes and Improvements
- Common
  - Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
- PyTorch
  - Implement scaling factor in aimet-torch float QDQ (9b8c655)
  - Fix CustomSiLU bug (499df9f)
  - Added extra logic to isolate model outputs from connectedgraph (4ad0703)
  - Always instantiate quantizers with requires_grad=True (5aac9c5)
  - Add logic to place adascale quantizers into correct dtype (5e1e6f2)
- ONNX
  - Allow AdaRound and SeqMSE to take uncalibrated sims(31ca7fd)
  - Modify bias quantizer setting based on weight quantizer (b47a97e)
  - Fix cnt overflow issue (70029c5)
  - Make memory saving optimization default in build_session and _infer_activation_dtypes (4b94ca9)
  - Implement two-phase AMP API (1603c17)
  - Work-around onnx version converter issue for models with external weights (22f0f23)
Documentation
- Update SeqMSE feature guide (fefd504)
- Fix links in example notebooks (fe66376)
- Modify docs for CLE (f9d0d6c)
- Edit automatic mixed precision feature guide (22b5c94)
- Polish BQ user guide (f547a49)
- Polish QAT user guide (339a225)
- Update Quick start example, PTQ section (example) and notebook (03fdd64)
- Add missing workflow images (010898a)
- Add reference to mixed precision page in docs (188d401)

What's Changed

Bug Fixes and Improvements
- ONNX
  - Rename QuantizeLinear outputs from <...>_int to <...>_q in onnx QDQ export (e78dbec)
  - Preserve I/O names in onnx QDQ export (35ad990)
  - Allow freezing loaded encodings in load_encodings_to_sim (911af75)
  - Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
  - Allow aimet-onnx to load partial encodings (6636515)
  - Fix onnx sim.export permanently removing quantizers (9a2a407)
  - Fix onnx QDQ export output name swapping bug (6d1664c)
  - Switch AdaRound API naming to num_iterations (fea395f)
- PyTorch
  - Add native support for Mistral-0.3 (db99447)
  - AdaScale: Update the learning rates for AdaScale learnable parameters (7336ead)
- common
  - Add docs to build aimet from source (ae981f7)

Releases: quic/aimet

Version 2.17.0

Uh oh!

Version 2.16.0

Uh oh!

Version 2.15.1

Uh oh!

Version 2.15.0

Uh oh!

Version 2.14.0

Uh oh!

Version 2.13.0

Uh oh!

Version 2.12.0

2.12.0

Uh oh!

Version 2.11.0

Uh oh!

Version 2.10.0

What's Changed

Uh oh!

Version 2.9.0

What's Changed

Uh oh!