Releases: quic/aimet
Releases · quic/aimet
Version 2.17.0
- Bug fixes and Improvements
-
ONNX
- Optimize SeqMSE latency and CPU memory usage (434ac6b)
- Support excluding nodes from SeqMSE optimization (6a37239)
- Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
- Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
- Allow disabling MatMul-Add supergroup via config file (e49660c)
- Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
-
Torch
-
Common
-
Version 2.16.0
New Features
- ONNX
- Experimental - Added Adascale, a post-training quantization technique (5e23ceb)
Bug fixes and Improvements
-
ONNX
- Skip tying Concat input/output quantizers with conflicting encoding constraints (b924107)
- Small updates to FPT Quant for improved accuracy (ba10947)
- Implement partial encoding freezing mechanism in aimet-onnx (658ec3c)
- Add Relu partial encoding constraints to HTP config files (dc8d978)
- Clear encoding analyzer stats after computing param encodings (3d4725f)
- Remove wasted computation/memory in FPTQuant local optimizer (59350af)
-
Torch
Version 2.15.1
New Features
- ONNX
- Experimental - Added Adascale, a post-training quantization technique (5e23ceb)
Version 2.15.0
Version 2.14.0
-
New Feature
- ONNX
- Add support for FP16 in
QuantizationSimModel(2494d90)
- Add support for FP16 in
- ONNX
-
Bug fixes and Improvements
-
ONNX
- Add sequential MSE support for
onnx >= 1.18.0. (754d030) - Improve histogram granularity during TFE calibration (91109af)
- Improve runtime for
QuantizationSimModelcreation for large models like LLMs (f7e700f) - Improve runtime for setting quantizers in a
QuantizationSimModelfor use cases like tying KV Cache input and output quantizers. (c0bdb46) - Add a check for None values in the
groupattribute ofConvlayers and fix improper handling of Nonegroupattribute inConvTransposewithin :func:fold_all_batch_norms_to_weight(374e8db)
- Add sequential MSE support for
-
PyTorch
- Address QAT convergence issue: Add a fix for cases where
quantizer.minbecomes equal toquantizer.maxduring training, leading to NaN values (51f8990)
- Address QAT convergence issue: Add a fix for cases where
-
Keras
- Fix accuracy drop issue for GPU wheel by excluding
libpython*.so*from the aimet wheel packages (22cac5c)
- Fix accuracy drop issue for GPU wheel by excluding
-
Common
-
Version 2.13.0
-
Bug fixes and Improvements
-
ONNX
-
PyTorch
- Enable save_model_as_external_data for sim.onnx.export (107b339)
-
-
Known Issues
- Keras
- Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.
- Keras
Version 2.12.0
2.12.0
-
Bug fixes and Improvements
-
Known Issues
- Keras
- Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.
- Keras
Version 2.11.0
-
New Feature
-
PyTorch
- SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
- Enable Adascale and Omniquant for Mistral (d33e98c)
-
ONNX
- Enable llm_configurator for Llama (Experimental) (08c17b8)
-
-
Bug fixes and Improvements
-
Common
-
ONNX
- Apply matmul exception rule only for integer quantization (bb93c76)
- Optimize blockwise min-max encoding analyzer (4febdd4)
- Remove explicit FP32 model creation inside AdaRound and optimize building sessions during the optimization process (
b1415bd_) - Make Concat output quantizer inherit fixed input range (50f35dd)
- Enable output quantizers to inherit input encoding when tying encodings (3750526)
- Fix bug in CLE with bn_conv groups (654f4b1)
-
PyTorch
-
-
Documentation Updates
-
Known Issues
- Keras
- Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.
- Skipping 2.11 aimet-keras release due to regression
- Keras
Version 2.10.0
What's Changed
-
New Feature
- Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
-
Bug fixes and Improvements
- Common
- Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
- PyTorch
- ONNX
- Allow AdaRound and SeqMSE to take uncalibrated sims(31ca7fd)
- Modify bias quantizer setting based on weight quantizer (b47a97e)
- Fix cnt overflow issue (70029c5)
- Make memory saving optimization default in build_session and _infer_activation_dtypes (4b94ca9)
- Implement two-phase AMP API (1603c17)
- Work-around onnx version converter issue for models with external weights (22f0f23)
- Common
-
Documentation
- Update SeqMSE feature guide (fefd504)
- Fix links in example notebooks (fe66376)
- Modify docs for CLE (f9d0d6c)
- Edit automatic mixed precision feature guide (22b5c94)
- Polish BQ user guide (f547a49)
- Polish QAT user guide (339a225)
- Update Quick start example, PTQ section (example) and notebook (03fdd64)
- Add missing workflow images (010898a)
- Add reference to mixed precision page in docs (188d401)
Version 2.9.0
What's Changed
- Bug Fixes and Improvements
- ONNX
- Rename QuantizeLinear outputs from <...>_int to <...>_q in onnx QDQ export (e78dbec)
- Preserve I/O names in onnx QDQ export (35ad990)
- Allow freezing loaded encodings in load_encodings_to_sim (911af75)
- Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
- Allow aimet-onnx to load partial encodings (6636515)
- Fix onnx sim.export permanently removing quantizers (9a2a407)
- Fix onnx QDQ export output name swapping bug (6d1664c)
- Switch AdaRound API naming to num_iterations (fea395f)
- PyTorch
- common
- Add docs to build aimet from source (ae981f7)
- ONNX