ci(sglang-downstream): add GLM-5.1-MXFP4 accuracy gate (TP2)#3523
Open
sunway513 wants to merge 2 commits into
Open
ci(sglang-downstream): add GLM-5.1-MXFP4 accuracy gate (TP2)#3523sunway513 wants to merge 2 commits into
sunway513 wants to merge 2 commits into
Conversation
GLM-5.1 is one of the three InferenceX headline frontier models (Kimi K2.6 / DeepSeek v4 / GLM5). Add it to the SGLang downstream coverage via SGLang's own registered regression test (nightly-amd-2-gpu-mi35x-glm51-mxfp4), which exists specifically to catch AITER GLM-5.1-MXFP4 TP=2 accuracy drops on gfx950 (bad BF16 GEMM path). - Model: amd/GLM-5.1-MXFP4 (InferenceX-aligned, deployable quant) - TP=2 -> linux-aiter-do-mi350x-2, deliberately on the 2-GPU pool: the 8-GPU downstream pool queues 15-55 min under nightly burst, while the 2-/4-GPU pools stay under ~90s. A TP2 gate has no reason to sit in the 8-GPU queue. - gsm8k accuracy threshold 0.92 (from the SGLang test). - run_on_pr + run_on_schedule. Pulls amd/GLM-5.1-MXFP4 from HF (the SGLang test hardcodes it and does not import os); a /models cache patch can follow if download latency matters.
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new SGLang downstream accuracy gate for GLM-5.1-MXFP4 (TP=2) to improve regression coverage on MI35x-class (gfx950) runners, aligning AITER CI with SGLang’s registered nightly regression test.
Changes:
- Adds a new downstream test entry targeting the SGLang registered suite
nightly-amd-2-gpu-mi35x-glm51-mxfp4. - Routes the job to the 2‑GPU MI350x runner pool to better match TP=2 and reduce queue time.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+127
to
+129
| "timeout_minutes": 110, | ||
| "extra_exec_args": "", | ||
| "test_command": "python3 run_suite.py --hw amd --suite nightly-amd-2-gpu-mi35x-glm51-mxfp4 --nightly --timeout-per-file 5400", |
The first attempt referenced suite nightly-amd-2-gpu-mi35x-glm51-mxfp4, which only exists on sglang main. The downstream CI clones the amd/aiter-ci branch, where run_suite silently reports '0/0 passed' for an unknown suite (false green). Switch to the GLM-5.1 eval that actually exists on amd/aiter-ci: test_glm51_eval_mi35x.py -> suite nightly-amd-8-gpu-mi35x-glm51 (zai-org/GLM-5.1-FP8, DSA backend, TP=8, threshold 0.93). Runs on do-mi350x-8. Follow-up: when the MXFP4 TP=2 variant lands on amd/aiter-ci, move this gate to TP2 on do-mi350x-2 to get off the saturated 8-GPU pool.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add GLM-5.1-MXFP4 to the SGLang downstream accuracy coverage. GLM-5.1
is one of the three InferenceX headline frontier models (Kimi K2.6 /
DeepSeek v4 / GLM5). First of the planned buildout toward the InferenceX
MI355X top set on both vLLM and SGLang lanes.
How
Reuses SGLang's own registered regression test
nightly-amd-2-gpu-mi35x-glm51-mxfp4(
test/registered/amd/accuracy/mi35x/test_glm51_mxfp4_tp2_gsm8k_mi35x.py),which exists specifically to catch AITER GLM-5.1-MXFP4 TP=2 accuracy
drops on gfx950 (a bad BF16 GEMM path). So this gate directly guards
AITER against GLM-5.1 regressions.
amd/GLM-5.1-MXFP4(InferenceX-aligned, deployable MXFP4 quant)run_on_pr+run_on_scheduleRunner choice — TP2 on the 2-GPU pool
The test is TP=2, so it runs on
linux-aiter-do-mi350x-2, not the8-GPU pool. Queue-time data collected this week shows 8-GPU downstream
jobs (do-mi350x-8 / aiter-8gpu) queue 15-55 min under nightly burst,
while the 2-/4-GPU pools stay under ~90s. A TP2 gate has no reason to
sit in the 8-GPU queue.
Follow-ups (tracked, not in this PR)