Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[tune] GLM-4.7-FP8 FMOE configs for EP=4 + fused shared expert (MI355x)#3529

Open
omirosh wants to merge 1 commit into
ROCm:mainfrom
omirosh:glm47/fp8-tuned-fmoe-ep4-fse
Open

[tune] GLM-4.7-FP8 FMOE configs for EP=4 + fused shared expert (MI355x)#3529
omirosh wants to merge 1 commit into
ROCm:mainfrom
omirosh:glm47/fp8-tuned-fmoe-ep4-fse

Conversation

@omirosh
Copy link
Copy Markdown
Contributor

@omirosh omirosh commented Jun 4, 2026

Summary

Adds FP8 fused-MoE tuning entries for GLM-4.7 running with Expert Parallel = 4
plus the fused shared-expert (FSE) path (introduced in vllm-project/vllm#44313 ).

Changes

  • aiter/configs/model_configs/glm47_fp8_untuned_fmoe.csv: append 20
    rows for the new
    expert=41, topk=10 block.

  • aiter/configs/model_configs/glm47_fp8_tuned_fmoe.csv: append the
    corresponding 20 tuned entries

Test plan

  • Boot vLLM with GLM-4.7 EP=4 + FSE, verify the new tuned rows are
    picked up via aiter.tuned_moe lookup and that the cached kernel
    configs are used (no fallback warnings).

Made with Cursor

Adds FP8 fused-MoE tuning entries for GLM-4.7 running with Expert
Parallel=4 plus the fused shared-expert (FSE) path introduced in
vllm-project/vllm#44313. Each EP=4 rank carries 160/4 routed experts
plus 1 fused shared expert (expert=41, topk=10) at cu_num=256,
model_dim=5120, inter_dim=1536.

- aiter/configs/model_configs/glm47_fp8_untuned_fmoe.csv: append 20
  token rows for the new expert=41, topk=10 block.
- aiter/configs/model_configs/glm47_fp8_tuned_fmoe.csv: append the
  corresponding 20 tuned entries.

Co-authored-by: Cursor <[email protected]>
@omirosh omirosh requested review from a team and Copilot June 4, 2026 05:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3529 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds additional FP8 fused-MoE tuning coverage for GLM-4.7 in the model-specific config set, targeting the EP=4 + fused shared-expert (FSE) path by extending the expert=41, topk=10 shape block and ensuring both untuned and tuned CSVs contain corresponding entries.

Changes:

  • Extended glm47_fp8_untuned_fmoe.csv with a new expert=41, topk=10 block covering token sizes from 1 through 32768 (20 rows).
  • Extended glm47_fp8_tuned_fmoe.csv with the corresponding 20 tuned entries (kernel selections + timings) for the same shapes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
aiter/configs/model_configs/glm47_fp8_untuned_fmoe.csv Adds new untuned shape rows for expert=41, topk=10 to drive tuning/lookup coverage.
aiter/configs/model_configs/glm47_fp8_tuned_fmoe.csv Adds the matching tuned kernel configs for expert=41, topk=10 shapes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants