You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TorchRun has an --nproc-per-node option to specify how many processes/gpus to use. But it has no option for specifying which gpus to use. So if you run torchrun multiple times the same gpus will be used. You can get around that as follows:
This works if you have a single-node setup (perhaps not if you have multiple nodes?), but is not intuitive and error prone because you are passing some configuration in an environment variable and some in options. I think it would better if torchrun had an option such as --bind-devices=2,4,7 for it, supplanting/replacing --nproc-per-node.
๐ The feature, motivation and pitch
TorchRun has an
--nproc-per-node
option to specify how many processes/gpus to use. But it has no option for specifying which gpus to use. So if you run torchrun multiple times the same gpus will be used. You can get around that as follows:This works if you have a single-node setup (perhaps not if you have multiple nodes?), but is not intuitive and error prone because you are passing some configuration in an environment variable and some in options. I think it would better if torchrun had an option such as
--bind-devices=2,4,7
for it, supplanting/replacing--nproc-per-node
.Alternatives
No response
Additional context
No response
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k
The text was updated successfully, but these errors were encountered: