Releases: ROCm/rocWMMA
rocWMMA 2.0.0 for ROCm 7.0.2
rocWMMA code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.
rocWMMA 2.0.0 for ROCm 7.1.0
rocWMMA code for ROCm 7.1.0 did not change. The library was rebuilt for the updated ROCm 7.1.0 stack.
rocWMMA 2.0.0 for ROCm 7.0.1
rocWMMA code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.
rocWMMA 2.0.0 for ROCm 7.0.0
Added
- Added internal register layout transforms to support interleaved MMA layouts
- Added support for the gfx950 target
- Added mixed input
bf8/fp8types for MMA support - Added fragment scheduler API objects to embed thread block cooperation properties in fragments
Changed
- Augmented load / store / MMA internals with static loop unrolling
- rocWMMA mma_sync API now supports
wave tilefragment sizes - rocWMMA cooperative fragments are now expressed with fragment scheduler template arguments
- rocWMMA cooperative fragments now use the same base API as non-cooperative fragments
- rocWMMA cooperative fragments register usage footprint has been reduced
- rocWMMA fragments now support partial tile sizes with padding
Optimized
- Added internal flow control barriers to improve assembly code generation and overall performance
- Enabled interleaved layouts by default in MMA to improve overall performance
Removed
- Removed support for the gfx940 and gfx941 targets
- Removed the rocWMMA cooperative API
- Removed wave count template parameters from transforms APIs
Resolved issues
- Fixed a validation issue for small precision compute types
< B32on gfx9 - Fixed CMake validation of compiler support for
bf8/fp8types - Fixed linkage of rocwmma::synchronize_workgroup to inline
rocWMMA 1.7.0 for ROCm 6.4.4
rocWMMA code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.
rocWMMA 1.7.0 for ROCm 6.4.3
rocWMMA code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.
rocWMMA 1.7.0 for ROCm 6.4.2
rocWMMA code for ROCm 6.4.2 did not change. The library was rebuilt for the updated ROCm 6.4.2 stack.
rocWMMA 1.7.0 for ROCm 6.4.1
rocWMMA code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocWMMA 1.7.0 for ROCm 6.4.0
Added
- Added interleaved layouts that enhance the performance of GEMM operations
- Added emulation test suites. These suites are lightweight and well-suited for execution on emulator platforms
Changed
- Used GPU_TARGETS instead of AMDGPU_TARGETS in
cmakelists.txt - Used
--offload-compressflag for supported compilers
Resolved issues
- For a CMake bug workaround, set
CMAKE_NO_BUILTIN_CHRPATHwhenBUILD_OFFLOAD_COMPRESSis unset
rocWMMA 1.6.0 for ROCm 6.3.3
rocWMMA code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.