Codestin Search App

feifei14119 · 2026-06-02T23:50:18Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-06-02T23:51:20Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3500 --add-label <label>

Copilot

Pull request overview

Adds gfx1250 (mi400) MLA v3 decode support by wiring in a dedicated asm dispatch path, a mi400-focused sweep in the existing MLA test driver, and config entries for gfx1250 MLA asm kernels. Also introduces an env-based runtime switch to fully disable the optional FlyDSL backend.

Changes:

Extend mla_decode_stage1_asm_fwd dispatch to route gfx1250 to a mi400-specific kernarg pack + kernel selection from hsa/gfx1250/mla/mla_asm.csv (with optional debug dumping under ASM_DEBUG).
Add a --mi400 {auto,on,off} sweep mode to op_tests/test_mla.py that builds fp8/rope-split2 packed inputs and validates numerics against a reference.
Add ENABLE_FLYDSL runtime opt-out plumbing for FlyDSL availability checks.

Reviewed changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
op_tests/test_mla.py	Adds mi400 sweep mode, fp8 packing helpers, and a mi400-specific decode validation path.
hsa/gfx1250/mla/mla_asm.csv	Introduces gfx1250 MLA asm kernel registry entries used by heuristic dispatch.
csrc/py_itfs_cu/asm_mla.cu	Adds gfx1250 mi400 stage1 dispatch (kernargs ABI) and optional debug instrumentation/dumps.
aiter/ops/flydsl/utils.py	Adds `ENABLE_FLYDSL` env opt-out to disable FlyDSL backend at runtime.
aiter/mla.py	Adjusts decode buffer allocation/aliasing and kv_indptr handling for gfx1250 mi400 decode.
aiter/jit/core.py	Adds global `ENABLE_FLYDSL` flag mirroring `ENABLE_CK`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        _is_gfx1250 = get_gfx() == "gfx1250"
+        _can_alias_o_as_logits = (
+            num_kv_splits == 1
+            and (
+                q.dtype == dtypes.fp8 or (q.dtype == dtypes.bf16 and max_seqlen_q == 4)
+            )
+            and not _is_gfx1250
+        )


+def _cosine_diff(actual, expected):
+    actual = actual.detach().float().cpu()
+    expected = expected.detach().float().cpu()
+    assert torch.isfinite(actual).all()
+    assert torch.isfinite(expected).all()
+    numerator = 2 * (actual.double() * expected.double()).sum()
+    denominator = (
+        (actual.double().square() + expected.double().square()).sum().clamp_min(1e-12)
+    )
+    return (1 - (numerator / denominator)).item()


-        is_causal=True,
-        dtype=out_dtype,
-    )
+    # troch implementation. mi400 uses its own _ref_mla_mi400 golden (built on


feifei14119 force-pushed the feiw/pr/mla2 branch 2 times, most recently from 8ccaaa6 to 583232d Compare June 3, 2026 06:01

feifei14119 marked this pull request as ready for review June 3, 2026 06:04

feifei14119 requested review from a team and Copilot June 3, 2026 06:04

Copilot started reviewing on behalf of feifei14119 June 3, 2026 06:04 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

feifei14119 force-pushed the feiw/pr/mla2 branch from 583232d to 1d5a8cf Compare June 3, 2026 11:08

mla

fd6a141

feifei14119 force-pushed the feiw/pr/mla2 branch from 1d5a8cf to fd6a141 Compare June 4, 2026 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gfx12] mla v3#3500

[gfx12] mla v3#3500
feifei14119 wants to merge 1 commit into
ROCm:mainfrom
feifei14119:feiw/pr/mla2

feifei14119 commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

feifei14119 commented Jun 2, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Jun 2, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants