-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Problem Description
Stress test failed on MI300X+CX7 due to hang issue.
For benchmark, we met this error
/home/ditian12/mori/src/ops/dispatch_combine/internode_v1.cpp:375: void mori::moe::v1::DispatchInterNodeRecv(EpDispatchCombineArgs<T> &) [T = hip_bfloat16]: Device-side assertion (lanePe < config.worldSize) && (lanePe >= 0)' failed.`
Operating System
Ubuntu
CPU
AMD
GPU
MI300X
ROCm Version
ROCm-7.0.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels