-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[lit] Fix lit hang on pool join when exception is thrown #131881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jh7370, wanted to get the ball rolling on the review for this fix. Figured you might be the right person to bug since I saw you review a couple of other lit related PRs. Thanks :) |
Sorry, I'm a bit snowed-under with reviews and my regular non-LLVM work at the moment, having just come back from a bit of time off. I'll add this to my queue, but it may be a while before I have a chance to look at this). |
Hey @jh7370, was wondering if you would have free cycles to take a look at this soon. If not do you know of anyone else that may be appropriate to review this PR? Thanks. |
Sorry, my workload is unlikely to lessen in the next few weeks. I've added a few possible reviewers (@DavidSpickett, @jroelofs, @MaskRay, @pcc) in the hope that someone else can take a look. |
I understand the problem as it existed previously but am confused what versions this applies to. The linked cypthon issue has a comment:
LLVM requires Python 3.8 (https://llvm.org/docs/GettingStarted.html#software). So is this in fact still an issue on 3.8+ and was not fixed there? Or are you referring to those issues to explain something else about how |
Yep, I was able to reproduce the hang on Python 3.12.3 |
Hey @DavidSpickett, I should clarify. I was able to reproduce the lit hang that I reported in #133914 (which this PR fixes) with python 3.12.3. As for the cpython issue, I added it to the description because it was cited as the likely root cause for the stack overflow thread I've linked. However, looking at it I think you are right that issue isn't reproducible in python versions that are used for LLVM, so i think its unlikely the root cause for this case. Going to remove it from the description. |
Cool, I will try to understand it as just how Python works then, rather than a workaround for a bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes llvm#133914 When using the internal shell with a timeout set lit will hang on the following call if an exception is thrown and not immediately caught https://github.com/llvm/llvm-project/blob/19970535f92c0f2dcda01b7fc60f95945166e424/llvm/utils/lit/lit/run.py#L93 This can occur when using the internal lit shell and trying to run a program that does not exist. In this case `_executeShCmd` will throw an internal shell error, which will not be caught by the function directly calling it, `executeShCmd`, rather it is caught one function higher in the call stack in `executeScriptInternal`. Because that exception is percolated up the call stack instead of being immediately caught lit will hang until the test timeout expires. This patch changes the location where we catch this exception to `executeShCmd` instead to avoid this. For more background on what causes this hang see: https://stackoverflow.com/questions/15314189/python-multiprocessing-pool-hangs-at-join
Fixes #133914
When using the internal shell with a timeout set lit will hang on the following call if an exception is thrown and not immediately caught
llvm-project/llvm/utils/lit/lit/run.py
Line 93 in 1997053
This can occur when using the internal lit shell and trying to run a program that does not exist. In this case
_executeShCmd
will throw an internal shell error, which will not be caught by the function directly calling it,executeShCmd
, rather it is caught one function higher in the call stack inexecuteScriptInternal
. Because that exception is percolated up the call stack instead of being immediately caught lit will hang until the test timeout expires. This patch changes the location where we catch this exception toexecuteShCmd
instead to avoid this.For more background on what causes this hang see:
https://stackoverflow.com/questions/15314189/python-multiprocessing-pool-hangs-at-join