Codestin Search App

sunway513 · 2026-06-03T21:05:22Z

What

Add GLM-5.1-MXFP4 to the SGLang downstream accuracy coverage. GLM-5.1
is one of the three InferenceX headline frontier models (Kimi K2.6 /
DeepSeek v4 / GLM5). First of the planned buildout toward the InferenceX
MI355X top set on both vLLM and SGLang lanes.

How

Reuses SGLang's own registered regression test
nightly-amd-2-gpu-mi35x-glm51-mxfp4
(test/registered/amd/accuracy/mi35x/test_glm51_mxfp4_tp2_gsm8k_mi35x.py),
which exists specifically to catch AITER GLM-5.1-MXFP4 TP=2 accuracy
drops on gfx950 (a bad BF16 GEMM path). So this gate directly guards
AITER against GLM-5.1 regressions.

Model: amd/GLM-5.1-MXFP4 (InferenceX-aligned, deployable MXFP4 quant)
gsm8k 3-shot, threshold 0.92
run_on_pr + run_on_schedule

Runner choice — TP2 on the 2-GPU pool

The test is TP=2, so it runs on linux-aiter-do-mi350x-2, not the
8-GPU pool. Queue-time data collected this week shows 8-GPU downstream
jobs (do-mi350x-8 / aiter-8gpu) queue 15-55 min under nightly burst,
while the 2-/4-GPU pools stay under ~90s. A TP2 gate has no reason to
sit in the 8-GPU queue.

Follow-ups (tracked, not in this PR)

vLLM lane: MiniMax-M2.5, DeepSeek-V4-Pro
SGLang lane: DeepSeek-V4-Pro; re-enable Qwen3-235B-MXFP4 / DeepSeek-V3.2
/models cache patch for GLM-5.1 if HF download latency matters

GLM-5.1 is one of the three InferenceX headline frontier models (Kimi K2.6 / DeepSeek v4 / GLM5). Add it to the SGLang downstream coverage via SGLang's own registered regression test (nightly-amd-2-gpu-mi35x-glm51-mxfp4), which exists specifically to catch AITER GLM-5.1-MXFP4 TP=2 accuracy drops on gfx950 (bad BF16 GEMM path). - Model: amd/GLM-5.1-MXFP4 (InferenceX-aligned, deployable quant) - TP=2 -> linux-aiter-do-mi350x-2, deliberately on the 2-GPU pool: the 8-GPU downstream pool queues 15-55 min under nightly burst, while the 2-/4-GPU pools stay under ~90s. A TP2 gate has no reason to sit in the 8-GPU queue. - gsm8k accuracy threshold 0.92 (from the SGLang test). - run_on_pr + run_on_schedule. Pulls amd/GLM-5.1-MXFP4 from HF (the SGLang test hardcodes it and does not import os); a /models cache patch can follow if download latency matters.

github-actions · 2026-06-03T21:05:41Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3523 --add-label <label>

Copilot

Pull request overview

Adds a new SGLang downstream accuracy gate for GLM-5.1-MXFP4 (TP=2) to improve regression coverage on MI35x-class (gfx950) runners, aligning AITER CI with SGLang’s registered nightly regression test.

Changes:

Adds a new downstream test entry targeting the SGLang registered suite nightly-amd-2-gpu-mi35x-glm51-mxfp4.
Routes the job to the 2‑GPU MI350x runner pool to better match TP=2 and reduce queue time.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        "timeout_minutes": 110,
+        "extra_exec_args": "",
+        "test_command": "python3 run_suite.py --hw amd --suite nightly-amd-2-gpu-mi35x-glm51-mxfp4 --nightly --timeout-per-file 5400",


The first attempt referenced suite nightly-amd-2-gpu-mi35x-glm51-mxfp4, which only exists on sglang main. The downstream CI clones the amd/aiter-ci branch, where run_suite silently reports '0/0 passed' for an unknown suite (false green). Switch to the GLM-5.1 eval that actually exists on amd/aiter-ci: test_glm51_eval_mi35x.py -> suite nightly-amd-8-gpu-mi35x-glm51 (zai-org/GLM-5.1-FP8, DSA backend, TP=8, threshold 0.93). Runs on do-mi350x-8. Follow-up: when the MXFP4 TP=2 variant lands on amd/aiter-ci, move this gate to TP2 on do-mi350x-2 to get off the saturated 8-GPU pool.

sunway513 requested review from a team and Copilot June 3, 2026 21:05

sunway513 added the ci:sglang label Jun 3, 2026

Copilot started reviewing on behalf of sunway513 June 3, 2026 21:05 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/scripts/sglang_downstream.py Outdated

Comment on lines +127 to +129

"timeout_minutes": 110,

"extra_exec_args": "",

"test_command": "python3 run_suite.py --hw amd --suite nightly-amd-2-gpu-mi35x-glm51-mxfp4 --nightly --timeout-per-file 5400",

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(sglang-downstream): add GLM-5.1-MXFP4 accuracy gate (TP2)#3523

ci(sglang-downstream): add GLM-5.1-MXFP4 accuracy gate (TP2)#3523
sunway513 wants to merge 2 commits into
mainfrom
ci/sglang-glm51-downstream

sunway513 commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunway513 commented Jun 3, 2026

What

How

Runner choice — TP2 on the 2-GPU pool

Follow-ups (tracked, not in this PR)

Uh oh!

github-actions Bot commented Jun 3, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants