Codestin Search App

Thanks to visit codestin.com
Credit goes to github.com

31 Oct 16:12

developer-compute

v52.6.0 Latest

Latest

v52.6.0 Public Minor Release

Feat

Enable F32 output in Quantized CpuGemmConv2d

Fix

Invalidate certain Cpu operations if tensor sizes are large
Missing output type validation in CpuGemmDirectConv2d
Handle padding updates after configure() in CpuActivation

Refactor

Flatten nested zip usage in validation/NEON.
Flatten nested combine and zip usage in validation/CL directory.
Flatten nested combine usage in validation/NEON directory.

Perf

Do only one iteration of refinement for FP16 inv

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.6.0/index.xhtml

Assets 10

13 Oct 21:10

developer-compute

v52.5.0

v52.5.0 Public Minor Release

Feat

Add profiling tracepoints to CPU and GPU platforms
Add Perfetto profiler as default backend
Further modernization in CMake build
Add CMakePresets.json

Fix

Handle padding updates after configure() in CpuActivation
Broken URLs in rendered non-released README.md
Linker errors on macOS when building with CMake

Perf

Add FP16 GEMM MMUL Reshaped Only Rhs Kernel

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.5.0/index.xhtml

Assets 10

27 Aug 07:44

developer-compute

v52.4.0

v52.4.0 Public Minor Release

Notice

The generation of pre-built binaries for macOS and Windows is currently under review and may be temporarily unavailable following this release.

Feat

Updates to operator CpuGEMMLowp for static quantization, and associated tests.

Fix

Potential null pointer access in CpuFullyConnected validate method.

Perf

Remove switch statements in activation kernels.

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.4.0/index.xhtml

Assets 10

04 Jul 14:02

developer-compute

v52.3.0

v52.3.0 Public Minor Release

Feat

Support QSYMM8_PER_CHANNEL in NEQuantizationLayer.
Add stateless wrapper for CpuFullyConnected.

Fix

Support mixed-type quantized matmul when updating quantization after configure.
Prevent overread when computing row sums in GEMM.
Resolve out-of-bounds access in Dimensions::collapse().

Perf

Remove switch in SVE activation.
Remove switch in SVE2 activation.

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.3.0/index.xhtml

Assets 10

13 Jun 09:01

developer-compute

v52.2.0

v52.2.0 Public Minor Release

Feat

Enable non-transposed BF16 reorders.

Fix

Reorder test failures on multi-isa builds.
Over-eager read ahead of operands in a64_hgemm_8x24.

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.2.0/index.xhtml

Assets 10

02 Jun 09:04

developer-compute

v52.1.0

v52.1.0 Public Minor Release

Feat

Restrict GEMM stateless execution to fixed-format kernels only
Add wrapper class to expose cpu::CpuPool2d functionality
Enable non-transposed F32 reorders

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.1.0/index.xhtml

Assets 10

15 May 09:10

developer-compute

v52.0.1

v52.0.1 Public Patch Release

Fix

Fill the padding area with zeros in CpuIm2ColKernel
Public header files pass -Wundef check
Limit thread split to the window size for run_parallel_pretranspose_B_array

Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.0.1/index.xhtml

Assets 10

01 May 15:32

developer-compute

v52.0.0

v52.0.0 Public Major Release

Fix

Make NEReorderLayer backwards compatible
String conversion for Datatype::BFLOAT16
Add missing header to winograd transforms for better leftover handling
Update 3x3 winograd coefficients to increase numerical stability
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v52.0.0/index.xhtml

Assets 10

17 Apr 13:01

developer-compute

v25.04

v25.04 Public Major Release

Feat

Add Neon(TM) and SVE hybrid FP16 matmul kernels using FP32 accumulation.

Fix

Fix BF16 CpuGemmAssembly tests.
SME softmax FP32 kernel failing given large inputs.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.04/index.xhtml

Assets 10

04 Apr 14:05

developer-compute

v25.03.1

v25.03.1 Public Major Release

Feat

Add experimental QNX(R) support.
Add matmul fp16->fp32 kernels to enable fp16 PyTorch attention through ACL.

Fix

Replace .word with .inst when encoding instructions.
Neon(TM) detection for Bare Metal.

Refactor

Refactor reorder kernel and layer.
Documentation (API, build guide, contribution guide, errata, etc.) available here:
https://artificial-intelligence.sites.arm.com/computelibrary/v25.03.1/index.xhtml

Assets 10