Releases: ARM-software/ComputeLibrary
Releases · ARM-software/ComputeLibrary
v52.0.0
v52.0.0 Public Major Release
Fix
- Make NEReorderLayer backwards compatible
- String conversion for Datatype::BFLOAT16
- Add missing header to winograd transforms for better leftover handling
- Update 3x3 winograd coefficients to increase numerical stability
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.0.0/index.xhtml
v25.04
v25.04 Public Major Release
Feat
- Add Neon(TM) and SVE hybrid FP16 matmul kernels using FP32 accumulation.
Fix
- Fix BF16 CpuGemmAssembly tests.
- SME softmax FP32 kernel failing given large inputs.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.04/index.xhtml
v25.03.1
v25.03.1 Public Major Release
Feat
- Add experimental QNX(R) support.
- Add matmul fp16->fp32 kernels to enable fp16 PyTorch attention through ACL.
Fix
- Replace .word with .inst when encoding instructions.
- Neon(TM) detection for Bare Metal.
Refactor
- Refactor reorder kernel and layer.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.03.1/index.xhtml
v25.03
v25.03 Public Major Release
Feat
- Notice: Migration to Semantic Versioning will take place by the end of April
- Modernize ACL CMake build
- Add a wrapper class for CpuPRelu operators
Fix
- Validation in Cpu Deconv for negative padded cases
- Reserved register list in [U]Int8 SME2 Softmax kernels
- Register allocation in [U]Int8 SME2 Softmax kernels
- C and C++ build flags assigned to proper SCons flags
- Don't pass filenames to the check-bad-style pre-commit hook
- Apply -fPIC flag both to C and C++ code
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.03/index.xhtml
v25.02.1
v25.02.1 Public Major Release
Feat
- Add stateless support for GEMM kernels that need working_space
- Add extra_cc_flags flag to SCons
Fix
- Enable wrapper tests
- Refactor format_code.py and pre-commit config
- Adjust tolerance in CPP/DFT/DFT1D/Complex test
Refactor
- Remove dynamic fusion and compute kernel writer files and mentions
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.02.1/index.xhtml
v25.02
v25.02 Public Major Release
Feat
- Detect number of CPU cores in OpenBSD
- Support tensors with dynamic shapes in NEGEMM
- Support FP16 dequantization in NEGEMMLowpMatrixMultiplyCore
- Add a public API for CpuMeanStdDevNormalization
- Enable BF16 inputs in CpuFullyConnected
Fix
- Linking errors in C++17 while compiling with clang
- False positive compiler warning stringop-overflow
- Redundant declaration warning of constexpr static data member (in C++17)
- Make GemmLowp return an error in validate when F16 is not supported
- Reorder interleave_by in CpuGemmAssemblyDispatch test code
- Gemm_hybrid_quantized.hpp was passing incorrect K size to the kernel
- Wrong kernel choice in CpuMul when build does not have SME2
- Incorrect scheduling hint heuristic for GEMMs
- Incorrect trademark usage in Readme for Arm(R)-Neoverse(TM) core
Refactor
- Use operator API inside NEMeanstdDevNormalizationLayer
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.02/index.xhtml
v25.01
v25.01 Public Major Release
Feat
- Add KleidiAI as third_party module
- Add NHWC FP16 kernels in CpuDirectConv
- Add support of all non-quantized data types for NEScatter
- Implement NEScatter for FP32 for all size configurations for Add/Sub/Min/Max/Update
- Add option to print time used by each iteration in the validation suite
- Support multi ISA build for macOS
Fix
- Performance regression in NEDeconvolutionLayer
- Performance regression in NEConvolutionLayer
- Usages of dynamic shapes in the library
- Use separate build flags for C and C++ for CMake
- Compiler error with gcc14 in 3rd party header stb_image
- Werror=noexcept compilation issue in NEScatter
- Unused tolerance_f16 in non-F16 builds
- SegFault in SME Softmax Int8 tests
- Disable pre-commit copyright validation for outside contributions
- SME2 interleaved s8 x s8 = f32 kernel mismatches
- Invalidate Bf16 Softmax when FEAT_SVE is not present and fix the tests
- Illegal instruction caused by SVE instruction outside streaming mode
- SME Winograd output transform 4x4_3x3 kernel
- Misspell in SConstruct:301: 'estate' to 'arch'
Refactor
- Removed deprecated NCHW kernels from CpuDirectConv2d
- Check pre-commit copyright, Android.bp and formatting separately
Perf
- Choose latest Gpu if Gpu name is not recognized and alter GEMM heuristics
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.01/index.xhtml
v24.12
v24.12 Public Major Release
Feat
- Add a build flag to make scheduler object thread_local and make it default in Bazel build
Fix
- CPU regression in Reshape from excess threads
- NEDeconvolutionLayer regression
- Ensure bias type is BF16 for BF16 indirect convolutions
Perf
- Disable mmul kernel selection for fp16 in GPU backend
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.12/index.xhtml
v24.11.1
v24.11.1 Public Minor Release
Feat
- Add stateless GEMM execution via ICPPKernel::run_op
- TensorShape class supports dynamic shapes
- Add skeletons for Dynamic GEMM operator
- Convert Double rounding to Single rounding quantization behaviour in both Cpu/Gpu backend
Fix
- Detect Advanced SIMD support on Windows®
Perf
- Implement activation heuristics for Neoverse™ V1
- Optimize PReLU on quantized datatypes
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.11.1/index.xhtml
v24.11
v24.11 Public Major Release
Feat
- Add SVE SoftmaxLayer kernel for BF16
- Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
- Extend static quantization interface for both matmul and convolution operations
Fix
- Clarify Third-Party IP licenses
- Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
- Add BF16 support for CpuGemmAssemblyDispatchWrapper
- Detect SVE support on Windows® to run the available kernels
- Fixed missing cstdint include which occurs with GCC 15
- Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
- Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
- Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
- Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
- Softmax SME2 kernel selection now correctly detects if SME2 is supported
- Requantization rounding issues in CPU/GPU Quantize
- Scale normalising coefficient in GPU LogSoftmax
- Apply consistent rounding policy in NEReduceMean
- Revert default memory manager for NEQLSTMLayer
- Create default memory manager when none is provided
Refactor
- Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
- Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs
Perf
- Use SME instead of SVE for subtractions in SoftmaxLayer for Q8 relating to LUT address calculation
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v24.11/index.xhtml