Thanks to visit codestin.com
Credit goes to github.com

Skip to content

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is usedΒ #36035

@grtzsohalf

Description

@grtzsohalf

πŸ› Bug

When I use nn.DataParallel to wrap an nn.Module X, nn.Parameter in X is not copied to gpus in the forward pass. I think nn.Parameter can be considered as a part of module parameters, so it should be treated like other nn.Module parameters in X as well. Is it an intentional design?

To Reproduce

test.py:

import sys
import torch
import torch.nn as nn
import torch.nn.functional as F


gpus = list(map(int, sys.argv[1].split(',')))


class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.alpha = nn.ParameterList()

        for i in range(4):
            self.alpha.append(nn.Parameter(1e-3*torch.randn(i+2, 5)))

        self.cnn = nn.Conv2d(1, 1, 1, 1, 1)


    def forward(self, x):
        print(self.alpha)
        print(self.cnn)
        return x


if __name__ == '__main__':
    net = Net().cuda()
    if len(gpus) > 1:
        net = nn.DataParallel(net, device_ids=gpus)

    net(torch.rand(4, 5))

When I run python3 test.py 0 (which means device_id = [0]), the output is

ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size 2x5 (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size 3x5 (GPU 0)]
    (2): Parameter containing: [torch.cuda.FloatTensor of size 4x5 (GPU 0)]
    (3): Parameter containing: [torch.cuda.FloatTensor of size 5x5 (GPU 0)]
)
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))

However, when I run python3 test.py 0,1 (which means device_id = [0, 1]), the output is

ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))

Only nn.Module is copied to gpus in forward pass.
How can I use and train nn.Parameter just like nn.Module with nn.DataParallel?

Expected behavior

When the nn.Module X is wrapped with nn.DataParallel, both nn.Module and nn.Parameter in X should be copied to gpus.

Environment

PyTorch version: 1.6.0.dev20200401+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce GTX TITAN X
GPU 1: GeForce GTX 1060 6GB

Nvidia driver version: 440.64
cuDNN version: /usr/lib/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.6.0.dev20200401+cu101
[pip3] torchexp==0.1.0
[pip3] torchvision==0.6.0.dev20200401+cu101
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @albanD @mruberry

Metadata

Metadata

Assignees

Labels

high prioritymodule: data parallelmodule: nnRelated to torch.nntriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions