Codestin Search App

jcaraban · 2026-06-02T19:28:47Z

Motivation

Follows #3186, now with mxfp4 Q·K

bench_sage --kernel=all  (b=1 hq=5 sq=75600 sk=75600 d=128 input=transformer):
kernel             time(ms)     TFLOPS          MAE         MaxE       Cosine
------------------------------------------------------------------------------
sage_fp8             9.7205    1505.20    5.493e-04    6.250e-02     0.998512
sage_mxfp4           7.9692    1835.98    2.522e-03    4.375e-01     0.968982
aiter_fp8            7.2131    2028.43    8.275e-04    2.305e-01     0.996718
aiter_i8fp8          7.8374    1866.87    7.299e-04    9.375e-02     0.997669
aiter_mxfp4          6.6262    2208.10    2.528e-03    4.141e-01     0.968871   <---
aiter_bf16          12.2339    1195.97    0.000e+00    0.000e+00     1.000000

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-06-02T19:29:42Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3495 --add-label <label>

mxfp4 ASM fmha and needed API and bench changes

f8b1677

jcaraban requested review from a team, JohnNikolay84 and minmengdie June 2, 2026 19:28

jcaraban changed the base branch from main to i8fp8_fmha_gfx950 June 2, 2026 19:29

jcaraban requested a review from valarLip June 2, 2026 19:29

jcaraban mentioned this pull request Jun 3, 2026

AITER Development Roadmap (2026 Q3) #3443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mxfp4fp8 fmha gfx950#3495

mxfp4fp8 fmha gfx950#3495
jcaraban wants to merge 1 commit into
i8fp8_fmha_gfx950from
mxfp4fp8_fmha_gfx950

jcaraban commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jcaraban commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Jun 2, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jcaraban commented Jun 2, 2026 •

edited

Loading