Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: ROCm/rocWMMA

rocWMMA 2.0.0 for ROCm 7.0.2

10 Oct 12:09
b5f06e6

Choose a tag to compare

rocWMMA code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.

rocWMMA 2.0.0 for ROCm 7.1.0

30 Oct 05:22
27a847f

Choose a tag to compare

rocWMMA code for ROCm 7.1.0 did not change. The library was rebuilt for the updated ROCm 7.1.0 stack.

rocWMMA 2.0.0 for ROCm 7.0.1

17 Sep 16:41
2445fb2

Choose a tag to compare

rocWMMA code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

rocWMMA 2.0.0 for ROCm 7.0.0

16 Sep 06:37
2445fb2

Choose a tag to compare

Added

  • Added internal register layout transforms to support interleaved MMA layouts
  • Added support for the gfx950 target
  • Added mixed input bf8 / fp8 types for MMA support
  • Added fragment scheduler API objects to embed thread block cooperation properties in fragments

Changed

  • Augmented load / store / MMA internals with static loop unrolling
  • rocWMMA mma_sync API now supports wave tile fragment sizes
  • rocWMMA cooperative fragments are now expressed with fragment scheduler template arguments
  • rocWMMA cooperative fragments now use the same base API as non-cooperative fragments
  • rocWMMA cooperative fragments register usage footprint has been reduced
  • rocWMMA fragments now support partial tile sizes with padding

Optimized

  • Added internal flow control barriers to improve assembly code generation and overall performance
  • Enabled interleaved layouts by default in MMA to improve overall performance

Removed

  • Removed support for the gfx940 and gfx941 targets
  • Removed the rocWMMA cooperative API
  • Removed wave count template parameters from transforms APIs

Resolved issues

  • Fixed a validation issue for small precision compute types < B32 on gfx9
  • Fixed CMake validation of compiler support for bf8 / fp8 types
  • Fixed linkage of rocwmma::synchronize_workgroup to inline

rocWMMA 1.7.0 for ROCm 6.4.4

24 Sep 14:02
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.

rocWMMA 1.7.0 for ROCm 6.4.3

07 Aug 14:20
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

rocWMMA 1.7.0 for ROCm 6.4.2

21 Jul 16:54
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.

rocWMMA 1.7.0 for ROCm 6.4.1

20 May 13:16
1a5b623

Choose a tag to compare

rocWMMA code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.

rocWMMA 1.7.0 for ROCm 6.4.0

11 Apr 13:35
1c029a0

Choose a tag to compare

Added

  • Added interleaved layouts that enhance the performance of GEMM operations
  • Added emulation test suites. These suites are lightweight and well-suited for execution on emulator platforms

Changed

  • Used GPU_TARGETS instead of AMDGPU_TARGETS in cmakelists.txt
  • Used --offload-compress flag for supported compilers

Resolved issues

  • For a CMake bug workaround, set CMAKE_NO_BUILTIN_CHRPATH when BUILD_OFFLOAD_COMPRESS is unset

rocWMMA 1.6.0 for ROCm 6.3.3

19 Feb 17:47
ba38cf3

Choose a tag to compare

rocWMMA code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.