Codestin Search App

ailzhang · 2018-03-09T01:50:05Z

This patch fixes a bug triggered by #5182 when we have multiple layers in the model, and the DDP is run on a single node, with a subset of GPUs each.
For example, as in the test we run 2 processes on a 8 GPU node, both processes are visible to all GPUs. We create the DDP model by nn.parallel.DistributedDataParallel(model_DDP, device_ids=gpu_subset) where gpu_subset is 0,1,2,3 for process 1, and 4,5,6,7 for process 2.
utils::flatten_dense_tensors(chunk.tensors) will actually create a new Tensor which a flatten version of layer weights. Without this patch, this tensor goes to default GPU 0 despite all layers weights for process 2 are on GPU4, this will further error out when broadcast requires the tensor to be on the GPU 4 for process 2.
The gpu guard inside the for loop has nothing to do with the current bug, I thought it's good to add it as a safety guard.
@apaszke

add gpu guard for broadcast_coalesce

35c23e1

onnxbot-worker-3 mentioned this pull request Mar 9, 2018

[auto] pytorch-pr-5655 onnxbot/onnx-fb-universe#1036

Closed

soumith merged commit a3f4635 into pytorch:master Mar 9, 2018

apaszke mentioned this pull request Mar 10, 2018

Minor improvement in AutoGPU usage in CUDA bindings #5689

Merged

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026

add gpu guard for broadcast_coalesce (pytorch#5655)

75ff672

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpu guard for broadcast_coalesce#5655

Add gpu guard for broadcast_coalesce#5655
soumith merged 1 commit into
pytorch:masterfrom
ailzhang:fix_broadcast_gpu_context

ailzhang commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ailzhang commented Mar 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants