Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: ARM-software/ComputeLibrary

v52.0.0

01 May 15:32
Compare
Choose a tag to compare

v52.0.0 Public Major Release

Fix

v25.04

17 Apr 13:01
Compare
Choose a tag to compare

v25.04 Public Major Release

Feat

  • Add Neon(TM) and SVE hybrid FP16 matmul kernels using FP32 accumulation.

Fix

v25.03.1

04 Apr 14:05
Compare
Choose a tag to compare

v25.03.1 Public Major Release

Feat

  • Add experimental QNX(R) support.
  • Add matmul fp16->fp32 kernels to enable fp16 PyTorch attention through ACL.

Fix

  • Replace .word with .inst when encoding instructions.
  • Neon(TM) detection for Bare Metal.

Refactor

v25.03

21 Mar 11:00
Compare
Choose a tag to compare

v25.03 Public Major Release

Feat

  • Notice: Migration to Semantic Versioning will take place by the end of April
  • Modernize ACL CMake build
  • Add a wrapper class for CpuPRelu operators

Fix

  • Validation in Cpu Deconv for negative padded cases
  • Reserved register list in [U]Int8 SME2 Softmax kernels
  • Register allocation in [U]Int8 SME2 Softmax kernels
  • C and C++ build flags assigned to proper SCons flags
  • Don't pass filenames to the check-bad-style pre-commit hook
  • Apply -fPIC flag both to C and C++ code
    Documentation (API, build guide, contribution guide, errata, etc.) available here:
    https://artificial-intelligence.sites.arm.com/computelibrary/v25.03/index.xhtml

v25.02.1

07 Mar 10:02
Compare
Choose a tag to compare

v25.02.1 Public Major Release

Feat

  • Add stateless support for GEMM kernels that need working_space
  • Add extra_cc_flags flag to SCons

Fix

  • Enable wrapper tests
  • Refactor format_code.py and pre-commit config
  • Adjust tolerance in CPP/DFT/DFT1D/Complex test

Refactor

v25.02

17 Feb 16:40
Compare
Choose a tag to compare

v25.02 Public Major Release

Feat

  • Detect number of CPU cores in OpenBSD
  • Support tensors with dynamic shapes in NEGEMM
  • Support FP16 dequantization in NEGEMMLowpMatrixMultiplyCore
  • Add a public API for CpuMeanStdDevNormalization
  • Enable BF16 inputs in CpuFullyConnected

Fix

  • Linking errors in C++17 while compiling with clang
  • False positive compiler warning stringop-overflow
  • Redundant declaration warning of constexpr static data member (in C++17)
  • Make GemmLowp return an error in validate when F16 is not supported
  • Reorder interleave_by in CpuGemmAssemblyDispatch test code
  • Gemm_hybrid_quantized.hpp was passing incorrect K size to the kernel
  • Wrong kernel choice in CpuMul when build does not have SME2
  • Incorrect scheduling hint heuristic for GEMMs
  • Incorrect trademark usage in Readme for Arm(R)-Neoverse(TM) core

Refactor

v25.01

30 Jan 17:05
Compare
Choose a tag to compare

v25.01 Public Major Release

Feat

  • Add KleidiAI as third_party module
  • Add NHWC FP16 kernels in CpuDirectConv
  • Add support of all non-quantized data types for NEScatter
  • Implement NEScatter for FP32 for all size configurations for Add/Sub/Min/Max/Update
  • Add option to print time used by each iteration in the validation suite
  • Support multi ISA build for macOS

Fix

  • Performance regression in NEDeconvolutionLayer
  • Performance regression in NEConvolutionLayer
  • Usages of dynamic shapes in the library
  • Use separate build flags for C and C++ for CMake
  • Compiler error with gcc14 in 3rd party header stb_image
  • Werror=noexcept compilation issue in NEScatter
  • Unused tolerance_f16 in non-F16 builds
  • SegFault in SME Softmax Int8 tests
  • Disable pre-commit copyright validation for outside contributions
  • SME2 interleaved s8 x s8 = f32 kernel mismatches
  • Invalidate Bf16 Softmax when FEAT_SVE is not present and fix the tests
  • Illegal instruction caused by SVE instruction outside streaming mode
  • SME Winograd output transform 4x4_3x3 kernel
  • Misspell in SConstruct:301: 'estate' to 'arch'

Refactor

  • Removed deprecated NCHW kernels from CpuDirectConv2d
  • Check pre-commit copyright, Android.bp and formatting separately

Perf

v24.12

19 Dec 11:46
Compare
Choose a tag to compare

v24.12 Public Major Release

Feat

  • Add a build flag to make scheduler object thread_local and make it default in Bazel build

Fix

  • CPU regression in Reshape from excess threads
  • NEDeconvolutionLayer regression
  • Ensure bias type is BF16 for BF16 indirect convolutions

Perf

v24.11.1

02 Dec 17:46
Compare
Choose a tag to compare

v24.11.1 Public Minor Release

Feat

  • Add stateless GEMM execution via ICPPKernel::run_op
  • TensorShape class supports dynamic shapes
  • Add skeletons for Dynamic GEMM operator
  • Convert Double rounding to Single rounding quantization behaviour in both Cpu/Gpu backend

Fix

  • Detect Advanced SIMD support on Windows®

Perf

v24.11

18 Nov 11:53
Compare
Choose a tag to compare

v24.11 Public Major Release

Feat

  • Add SVE SoftmaxLayer kernel for BF16
  • Provide stateless API for CpuGemmLowpMatrixMultiplyCore, CpuQuantize, and DequantizationLayer
  • Extend static quantization interface for both matmul and convolution operations

Fix

  • Clarify Third-Party IP licenses
  • Check if CpuGemmAssemblyDispatch is configured in CpuMatMul before continue
  • Add BF16 support for CpuGemmAssemblyDispatchWrapper
  • Detect SVE support on Windows® to run the available kernels
  • Fixed missing cstdint include which occurs with GCC 15
  • Disable -O2 when building for Windows® as this crashes when certain compiler versions are used
  • Make cast on CPU truncate float to int instead of round to be consistent with other ML frameworks
  • Return error in validate() for CpuGemmLowpMatrixMultiplyCore if pretransposed A or B are true as this is not supported
  • Avoid implicit conversion from __fp16 to arm_compute::bfloat16 to avoid illegal instructions in hardware with FP16 but no BF16 support
  • Softmax SME2 kernel selection now correctly detects if SME2 is supported
  • Requantization rounding issues in CPU/GPU Quantize
  • Scale normalising coefficient in GPU LogSoftmax
  • Apply consistent rounding policy in NEReduceMean
  • Revert default memory manager for NEQLSTMLayer
  • Create default memory manager when none is provided

Refactor

  • Turn duplicated code in the elementwise_binary kernel into templates to reduce code size
  • Move CpuSoftmaxKernel LUT to LUTManager to consolidate location of all LUTs

Perf