AITER Development Roadmap (2026 Q3)
Modeled on the SGLang AMD roadmap (sgl-project/sglang#23494). Last updated 2026-05-30.
Contributions and feedback are welcome.
Legend: ✓ done · ▶ in progress · ○ planned. Each item links a tracked PR/issue.
Focus
- Model enablement velocity: Day-N kernels for the frontier MoE models driving inference demand — MiniMax-M2.5, GLM-5.x, GPT-OSS, DeepSeek-V4.
- MXFP4 across the stack: MoE, GEMM, attention, and Sage attention on MXFP4 with assured accuracy.
- MLA completeness: Close the remaining MLA feature gaps (FP8 KV cache, small head counts, speculative decode).
- Build & architecture: CK-Free build, faster Python-binding compile, config loading without rebuild — reduce time-to-kernel for agentic workflows.
- Training kernels (new): First grouped-GEMM APIs for MoE training, beyond inference.
Feature and Performance Improvements (Q3 planned)
-
MLA capability completion
PoC: @larryli2-amd @minmengdie
Goal: Fill the high-priority MLA gaps for production MoE serving.
-
Frontier MoE model enablement
PoC: @peymanr @akii96 @andyluo7
Goal: Day-N kernels for the models with active customer demand.
-
MXFP4 full stack
PoC: @haoyangli0109 @aoli26 @ksikiric
Goal: MXFP4 MoE / GEMM / attention with minimal accuracy loss.
-
MoE kernel optimization
PoC: @valarLip @nsusanto @zx3xyy
Goal: Next-stage MoE perf (no-combine, preshuffle, EP reduce).
-
Training kernels (new direction)
PoC: @jayfurmanek
Goal: Enable MoE training, not just inference.
-
Build & architecture RFCs
PoC: @carlushuang @sunway513 @tjtanaa
Goal: Reduce time-to-kernel and build complexity.
-
gfx1250 / MI450 enablement (new)
PoC: @sunway513 @farlukas @aoli26
Goal: Bring AITER kernel coverage to gfx1250 across FlyDSL, Gluon, Triton, and CK paths; clear the known JIT/asm bring-up bugs.
-
Communication / distributed
PoC: @hubertlu-tw @xytpai
Goal: Native collective ops and EP scaling.
-
CI quality gates
PoC: @gyohuangxin
Goal: Catch regressions before release.
TODO: WIP.
AITER Development Roadmap (2026 Q3)
Modeled on the SGLang AMD roadmap (sgl-project/sglang#23494). Last updated 2026-05-30.
Contributions and feedback are welcome.
Legend: ✓ done · ▶ in progress · ○ planned. Each item links a tracked PR/issue.
Focus
Feature and Performance Improvements (Q3 planned)
MLA capability completion
PoC: @larryli2-amd @minmengdie
Goal: Fill the high-priority MLA gaps for production MoE serving.
Frontier MoE model enablement
PoC: @peymanr @akii96 @andyluo7
Goal: Day-N kernels for the models with active customer demand.
MXFP4 full stack
PoC: @haoyangli0109 @aoli26 @ksikiric
Goal: MXFP4 MoE / GEMM / attention with minimal accuracy loss.
MoE kernel optimization
PoC: @valarLip @nsusanto @zx3xyy
Goal: Next-stage MoE perf (no-combine, preshuffle, EP reduce).
Training kernels (new direction)
PoC: @jayfurmanek
Goal: Enable MoE training, not just inference.
Build & architecture RFCs
PoC: @carlushuang @sunway513 @tjtanaa
Goal: Reduce time-to-kernel and build complexity.
AITER_REBUILD=1with Prebuilt AITER #2797 (@tjtanaa)gfx1250 / MI450 enablement (new)
PoC: @sunway513 @farlukas @aoli26
Goal: Bring AITER kernel coverage to gfx1250 across FlyDSL, Gluon, Triton, and CK paths; clear the known JIT/asm bring-up bugs.
Communication / distributed
PoC: @hubertlu-tw @xytpai
Goal: Native collective ops and EP scaling.
CI quality gates
PoC: @gyohuangxin
Goal: Catch regressions before release.
TODO: WIP.