Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Bug][MI450][Assembler]: AMDGPU assembler accepts invalid mx scale combinations on v_wmma_scale_f32_16x16x128_f8f6f4 #2634

@jaopaulolc

Description

@jaopaulolc
  • Target / CPU: amdgcn-amd-amdhsa / gfx1250
  • Toolchain:
    AMD clang version 23.0.0git (https://github.com/ROCm/llvm-project.git 43215c7+PATCHED:d17c5aa0e3ea29cde402f58f27e39b6034effa27)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /opt/rocm-versions/gfx1250-7.14.0a20260521/lib/llvm/bin

Issue

gfx1250-assembler-bug-mx-scales.zip

For v_wmma_scale_f32_16x16x128_f8f6f4, the ISA constrains the legal
(matrix_a_fmt, matrix_a_scale_fmt, matrix_b_fmt, matrix_b_scale_fmt)
tuples (see table-valid-combinations.txt): matrix-format classes FP8/BF8
and FP6/BF6 require scale E8; class FP4 allows E8 / E5M3 / E4M3;
when both sides are class FP4, the two scales must match.

The integrated assembler ignores these joint constraints on the
matrix_*_fmt / matrix_*_scale_fmt modifiers and accepts arbitrary
combinations, emitting encodings that are not legal per the ISA. Out of
the 225 (A fmt, A scale, B fmt, B scale) tuples, only 43 are valid;
the other 182 are all silently accepted today.

Steps to reproduce

The reproducer is a single self-contained script with no arguments
required. From the directory containing it:

python3 enumerate.py --clean

enumerate.py enumerates all 225 tuples, runs each through llvm-mc
(/opt/rocm/llvm/bin/llvm-mc by default; override with --llvm-mc PATH
or $LLVM_MC), splits the cases into four buckets under results/
(valid-accepted/, valid-rejected/, invalid-accepted/,
invalid-rejected/) as individual .s files, and prints a summary.

Reproduce any single failing case (each .s is self-contained and headed):

/opt/rocm/llvm/bin/llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx1250 \
    -filetype=obj -o /dev/null \
    results/invalid-accepted/A-FP8-E8__B-FP8-E5M3.s ; echo $?
# 0   (expected: non-zero with a diagnostic on matrix_b_scale_fmt)

CI gate that flips when any fix lands:

N=$(ls results/invalid-accepted/*.s | wc -l)
fail=0
for f in results/invalid-accepted/*.s; do
  /opt/rocm/llvm/bin/llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx1250 \
      -filetype=obj -o /dev/null "$f" 2>/dev/null || fail=$((fail + 1))
done
[ "$fail" -eq 0 ] \
  && echo "STILL BUGGY: $N/$N invalid combinations accepted" \
  || echo "PROGRESS: $fail/$N now rejected"

Results / summary

enumerate.py output on the toolchain above:

============================================================
Summary  (total tested: 225)
============================================================
  valid-accepted    :  43   (expected behavior)
  invalid-rejected  :   0   (expected behavior)
  invalid-accepted  : 182   (BUG: assembler should reject)
  valid-rejected    :   0   (BUG: assembler should accept)
------------------------------------------------------------
  correct           :  43 / 225
  incorrect (bugs)  : 182 / 225
  • 182 / 225 invalid (A fmt, A scale, B fmt, B scale) tuples are
    accepted; 0 / 225 are correctly rejected.
  • 43 / 225 valid tuples are accepted (no false negatives on the
    valid side).
  • All 182 bug repros are persisted as standalone .s files in
    results/invalid-accepted/; the CI gate above stays red until the
    assembler starts rejecting any of them.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions