Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[lit] Fix lit hang on pool join when exception is thrown #131881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 6, 2025

Conversation

ayylol
Copy link
Contributor

@ayylol ayylol commented Mar 18, 2025

Fixes #133914

When using the internal shell with a timeout set lit will hang on the following call if an exception is thrown and not immediately caught

This can occur when using the internal lit shell and trying to run a program that does not exist. In this case _executeShCmd will throw an internal shell error, which will not be caught by the function directly calling it, executeShCmd, rather it is caught one function higher in the call stack in executeScriptInternal. Because that exception is percolated up the call stack instead of being immediately caught lit will hang until the test timeout expires. This patch changes the location where we catch this exception to executeShCmd instead to avoid this.

For more background on what causes this hang see:
https://stackoverflow.com/questions/15314189/python-multiprocessing-pool-hangs-at-join

@ayylol
Copy link
Contributor Author

ayylol commented Apr 2, 2025

Hello @jh7370, wanted to get the ball rolling on the review for this fix. Figured you might be the right person to bug since I saw you review a couple of other lit related PRs. Thanks :)

@jh7370
Copy link
Collaborator

jh7370 commented Apr 2, 2025

Hello @jh7370, wanted to get the ball rolling on the review for this fix. Figured you might be the right person to bug since I saw you review a couple of other lit related PRs. Thanks :)

Sorry, I'm a bit snowed-under with reviews and my regular non-LLVM work at the moment, having just come back from a bit of time off. I'll add this to my queue, but it may be a while before I have a chance to look at this).

@ayylol
Copy link
Contributor Author

ayylol commented Apr 28, 2025

Hey @jh7370, was wondering if you would have free cycles to take a look at this soon. If not do you know of anyone else that may be appropriate to review this PR? Thanks.

@jh7370
Copy link
Collaborator

jh7370 commented Apr 29, 2025

Hey @jh7370, was wondering if you would have free cycles to take a look at this soon. If not do you know of anyone else that may be appropriate to review this PR? Thanks.

Sorry, my workload is unlikely to lessen in the next few weeks. I've added a few possible reviewers (@DavidSpickett, @jroelofs, @MaskRay, @pcc) in the hope that someone else can take a look.

@DavidSpickett
Copy link
Collaborator

I understand the problem as it existed previously but am confused what versions this applies to. The linked cypthon issue has a comment:

Also confirmed that this can not be reproduced on the 3.5 or 3.6 branches.
And the bug is marked as won't fix for 2.7.

LLVM requires Python 3.8 (https://llvm.org/docs/GettingStarted.html#software).

So is this in fact still an issue on 3.8+ and was not fixed there?

Or are you referring to those issues to explain something else about how join() should be used?

@ayylol
Copy link
Contributor Author

ayylol commented May 1, 2025

So is this in fact still an issue on 3.8+ and was not fixed there?

Yep, I was able to reproduce the hang on Python 3.12.3

@ayylol
Copy link
Contributor Author

ayylol commented May 2, 2025

Hey @DavidSpickett, I should clarify. I was able to reproduce the lit hang that I reported in #133914 (which this PR fixes) with python 3.12.3. As for the cpython issue, I added it to the description because it was cited as the likely root cause for the stack overflow thread I've linked. However, looking at it I think you are right that issue isn't reproducible in python versions that are used for LLVM, so i think its unlikely the root cause for this case.

Going to remove it from the description.

@DavidSpickett
Copy link
Collaborator

Cool, I will try to understand it as just how Python works then, rather than a workaround for a bug.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sarnex sarnex merged commit 009b9f4 into llvm:main May 6, 2025
11 checks passed
@ayylol ayylol deleted the llvmlit-hang branch May 6, 2025 17:05
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
Fixes llvm#133914

When using the internal shell with a timeout set lit will hang on the
following call if an exception is thrown and not immediately caught
https://github.com/llvm/llvm-project/blob/19970535f92c0f2dcda01b7fc60f95945166e424/llvm/utils/lit/lit/run.py#L93

This can occur when using the internal lit shell and trying to run a
program that does not exist. In this case `_executeShCmd` will throw an
internal shell error, which will not be caught by the function directly
calling it, `executeShCmd`, rather it is caught one function higher in
the call stack in `executeScriptInternal`. Because that exception is
percolated up the call stack instead of being immediately caught lit
will hang until the test timeout expires. This patch changes the
location where we catch this exception to `executeShCmd` instead to
avoid this.

For more background on what causes this hang see:

https://stackoverflow.com/questions/15314189/python-multiprocessing-pool-hangs-at-join
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[LIT] Tests take entire timeout duration after failing when exception is thrown in internal shell
5 participants