[Inductor] Dynamo hangs when processing an operator, seemingly depending on a logical argument value #151743
Labels
module: dynamo
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
π Describe the bug
Here is a reproducer:
The first iteration of the loop, when
use_fast_accum
argument of_scaled_grouped_mm
operator is set toFalse
, goes fine, but in the second iteration, when the argument set toTrue
, the compilation hangs. If a breakpoint set here, and then trying to step over and return from this function, it seems that the hang happens at this place.(Note: the
_scaled_grouped_mm
operator works on Hopper only.)Background: Initial support for auto-tuning of this operator is added through #150421, and I've encountered the issue while working on extending it through #150944. However, the problem is not related to auto-tuning, it could be reproduced with c3bc6b3, that was before #150421.
Error logs
Here is a backtrace from gdb, when reproducer stopped after being hang for some time. Apparently, it hangs in a
cudaStreamSynchronize()
.Gdb backtrace
Versions
The collect_env.py output
cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @amjames
The text was updated successfully, but these errors were encountered: