Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add gpu guard for broadcast_coalesce#5655

Merged
soumith merged 1 commit into
pytorch:masterfrom
ailzhang:fix_broadcast_gpu_context
Mar 9, 2018
Merged

Add gpu guard for broadcast_coalesce#5655
soumith merged 1 commit into
pytorch:masterfrom
ailzhang:fix_broadcast_gpu_context

Conversation

@ailzhang
Copy link
Copy Markdown
Contributor

@ailzhang ailzhang commented Mar 9, 2018

This patch fixes a bug triggered by #5182 when we have multiple layers in the model, and the DDP is run on a single node, with a subset of GPUs each.
For example, as in the test we run 2 processes on a 8 GPU node, both processes are visible to all GPUs. We create the DDP model by nn.parallel.DistributedDataParallel(model_DDP, device_ids=gpu_subset) where gpu_subset is 0,1,2,3 for process 1, and 4,5,6,7 for process 2.
utils::flatten_dense_tensors(chunk.tensors) will actually create a new Tensor which a flatten version of layer weights. Without this patch, this tensor goes to default GPU 0 despite all layers weights for process 2 are on GPU4, this will further error out when broadcast requires the tensor to be on the GPU 4 for process 2.
The gpu guard inside the for loop has nothing to do with the current bug, I thought it's good to add it as a safety guard.
@apaszke

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants