fix custom op fork test#14753
Conversation
Is the reason that the CUDA context Is not shared across different processes? |
anirudh2290
left a comment
There was a problem hiding this comment.
Yes cuda doesn't work well with fork. @arcadiaphy this PR looks good to me . Feel free to add another PR to improve docs for forking. Thank you !
|
@wkcn Naively sharing CUDA context across processes by forking will not work, I'm not sure if it's possible at all. |
|
I found an answer about |
|
@wkcn I've read this answer, it's possible to use separate device in forked process. But it still doesn't work for different device in mxnet if the main process has created the context, perhaps some cleanup needs to be done in pthread_at_fork. |
* fix custom op fork test * trigger CI
Description
The custom op fork test introduced in #14451 will cause error when running with gpu. The common situation is:
When CUDA context is created in main process, the forking process tries to access the same context, causing initialization error.
This PR adds checking on exitcode of forking process and removes this test in gpu tests.
BTW, right now the correct way to fork mxnet is to do it when the CUDA context is not created, otherwise CUDA error is very likely to happen, maybe we should add some warning in docs?
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments