-
Notifications
You must be signed in to change notification settings - Fork 24.1k
[device_mesh] improve device selection logic #150897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/wanchaol/370/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150897
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit 9d5b0ca with merge base 6f6fac6 ( NEW FAILURES - The following jobs have failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry not having enough context on DeviceMesh, so asking some questions before I can review. Meanwhile @fegin if he could unblock.
as titled, this PR improves the device selection logic when user did not set the device before calling the DeviceMesh constructor, as a device manager, DeviceMesh should try to set the device for users in a good way. The behavior of set_device before: If user call init_process_group to init a world process group, we assume user already called set_device and we don't set the device for the user If user does not init a world process group by themselves, we init a world process group for the user and follow a heuristic to set the device. This is ok but sometimes the set_device heuristic wouldn't work well (i.e. if user use TORCH_CUDA_VISBILE_DEVICES So this PR improves the device selection logic to: If the default cuda context is initialized by the time we init DeviceMesh, then we assume user must called some cuda operation before therefore must have selected the device by themselves If not the above, then we check if envvars have "LOCAL_RANK" and "WORLD_SIZE" from the launcher (i.e. torchrun), if so, we use "LOCAL_RANK" to set the device for the current process, which is a very standard practice. (This solves the TORCH_CUDA_VISBILE_DEVICES issue) If not above, then we fallback to the old heuristic. ghstack-source-id: a96dc0b Pull Request resolved: #150897
as titled, this PR improves the device selection logic when user did not set the device before calling the DeviceMesh constructor, as a device manager, DeviceMesh should try to set the device for users in a good way. The behavior of set_device before: If user call init_process_group to init a world process group, we assume user already called set_device and we don't set the device for the user If user does not init a world process group by themselves, we init a world process group for the user and follow a heuristic to set the device. This is ok but sometimes the set_device heuristic wouldn't work well (i.e. if user use TORCH_CUDA_VISBILE_DEVICES So this PR improves the device selection logic to: If the default cuda context is initialized by the time we init DeviceMesh, then we assume user must called some cuda operation before therefore must have selected the device by themselves If not the above, then we check if envvars have "LOCAL_RANK" and "WORLD_SIZE" from the launcher (i.e. torchrun), if so, we use "LOCAL_RANK" to set the device for the current process, which is a very standard practice. (This solves the TORCH_CUDA_VISBILE_DEVICES issue) If not above, then we fallback to the old heuristic. ghstack-source-id: a96dc0b Pull Request resolved: #150897
as titled, this PR improves the device selection logic when user did not set the device before calling the DeviceMesh constructor, as a device manager, DeviceMesh should try to set the device for users in a good way. The behavior of set_device before: If user call init_process_group to init a world process group, we assume user already called set_device and we don't set the device for the user If user does not init a world process group by themselves, we init a world process group for the user and follow a heuristic to set the device. This is ok but sometimes the set_device heuristic wouldn't work well (i.e. if user use TORCH_CUDA_VISBILE_DEVICES So this PR improves the device selection logic to: If the default cuda context is initialized by the time we init DeviceMesh, then we assume user must called some cuda operation before therefore must have selected the device by themselves If not the above, then we check if envvars have "LOCAL_RANK" and "WORLD_SIZE" from the launcher (i.e. torchrun), if so, we use "LOCAL_RANK" to set the device for the current process, which is a very standard practice. (This solves the TORCH_CUDA_VISBILE_DEVICES issue) If not above, then we fallback to the old heuristic. ghstack-source-id: 2baca3c Pull Request resolved: #150897
as titled, this PR improves the device selection logic when user did not set the device before calling the DeviceMesh constructor, as a device manager, DeviceMesh should try to set the device for users in a good way. The behavior of set_device before: If user call init_process_group to init a world process group, we assume user already called set_device and we don't set the device for the user If user does not init a world process group by themselves, we init a world process group for the user and follow a heuristic to set the device. This is ok but sometimes the set_device heuristic wouldn't work well (i.e. if user use TORCH_CUDA_VISBILE_DEVICES So this PR improves the device selection logic to: If the default cuda context is initialized by the time we init DeviceMesh, then we assume user must called some cuda operation before therefore must have selected the device by themselves If not the above, then we check if envvars have "LOCAL_RANK" and "WORLD_SIZE" from the launcher (i.e. torchrun), if so, we use "LOCAL_RANK" to set the device for the current process, which is a very standard practice. (This solves the TORCH_CUDA_VISBILE_DEVICES issue) If not above, then we fallback to the old heuristic. ghstack-source-id: 3f555ea Pull Request resolved: #150897
Stack from ghstack (oldest at bottom):
as titled, this PR improves the device selection logic when user did not
set the device before calling the DeviceMesh constructor, as a device
manager, DeviceMesh should try to set the device for users in a good
way.
The behavior of set_device before:
This is ok but sometimes the set_device heuristic wouldn't work well (i.e. if user use TORCH_CUDA_VISBILE_DEVICES
So this PR improves the device selection logic to: