nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used

## 🐛 Bug

When I use nn.DataParallel to wrap an nn.Module X, nn.Parameter in X is not copied to gpus in the forward pass. I think nn.Parameter can be considered as a part of module parameters, so it should be treated like other nn.Module parameters in X as well. Is it an intentional design?

## To Reproduce

test.py:
```python
import sys
import torch
import torch.nn as nn
import torch.nn.functional as F


gpus = list(map(int, sys.argv[1].split(',')))


class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.alpha = nn.ParameterList()

        for i in range(4):
            self.alpha.append(nn.Parameter(1e-3*torch.randn(i+2, 5)))

        self.cnn = nn.Conv2d(1, 1, 1, 1, 1)


    def forward(self, x):
        print(self.alpha)
        print(self.cnn)
        return x


if __name__ == '__main__':
    net = Net().cuda()
    if len(gpus) > 1:
        net = nn.DataParallel(net, device_ids=gpus)

    net(torch.rand(4, 5))
```
When I run `python3 test.py 0` (which means device_id = [0]), the output is
```
ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size 2x5 (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size 3x5 (GPU 0)]
    (2): Parameter containing: [torch.cuda.FloatTensor of size 4x5 (GPU 0)]
    (3): Parameter containing: [torch.cuda.FloatTensor of size 5x5 (GPU 0)]
)
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
```
However, when I run `python3 test.py 0,1` (which means device_id = [0, 1]), the output is
```
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
```
Only nn.Module is copied to gpus in forward pass.
How can I use and train nn.Parameter just like nn.Module with nn.DataParallel?

## Expected behavior

When the nn.Module X is wrapped with nn.DataParallel, both nn.Module and nn.Parameter in X should be copied to gpus.

## Environment

PyTorch version: 1.6.0.dev20200401+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration: 
GPU 0: GeForce GTX TITAN X
GPU 1: GeForce GTX 1060 6GB

Nvidia driver version: 440.64
cuDNN version: /usr/lib/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.6.0.dev20200401+cu101
[pip3] torchexp==0.1.0
[pip3] torchvision==0.6.0.dev20200401+cu101
[conda] Could not collect



cc @ezyang @gchanan @zou3519 @albanD @mruberry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions