π Bug
When I use nn.DataParallel to wrap an nn.Module X, nn.Parameter in X is not copied to gpus in the forward pass. I think nn.Parameter can be considered as a part of module parameters, so it should be treated like other nn.Module parameters in X as well. Is it an intentional design?
To Reproduce
test.py:
import sys
import torch
import torch.nn as nn
import torch.nn.functional as F
gpus = list(map(int, sys.argv[1].split(',')))
class Net(nn.Module):
def __init__(self):
super().__init__()
self.alpha = nn.ParameterList()
for i in range(4):
self.alpha.append(nn.Parameter(1e-3*torch.randn(i+2, 5)))
self.cnn = nn.Conv2d(1, 1, 1, 1, 1)
def forward(self, x):
print(self.alpha)
print(self.cnn)
return x
if __name__ == '__main__':
net = Net().cuda()
if len(gpus) > 1:
net = nn.DataParallel(net, device_ids=gpus)
net(torch.rand(4, 5))
When I run python3 test.py 0 (which means device_id = [0]), the output is
ParameterList(
(0): Parameter containing: [torch.cuda.FloatTensor of size 2x5 (GPU 0)]
(1): Parameter containing: [torch.cuda.FloatTensor of size 3x5 (GPU 0)]
(2): Parameter containing: [torch.cuda.FloatTensor of size 4x5 (GPU 0)]
(3): Parameter containing: [torch.cuda.FloatTensor of size 5x5 (GPU 0)]
)
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
However, when I run python3 test.py 0,1 (which means device_id = [0, 1]), the output is
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
Only nn.Module is copied to gpus in forward pass.
How can I use and train nn.Parameter just like nn.Module with nn.DataParallel?
Expected behavior
When the nn.Module X is wrapped with nn.DataParallel, both nn.Module and nn.Parameter in X should be copied to gpus.
Environment
PyTorch version: 1.6.0.dev20200401+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0
Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce GTX TITAN X
GPU 1: GeForce GTX 1060 6GB
Nvidia driver version: 440.64
cuDNN version: /usr/lib/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.6.0.dev20200401+cu101
[pip3] torchexp==0.1.0
[pip3] torchvision==0.6.0.dev20200401+cu101
[conda] Could not collect
cc @ezyang @gchanan @zou3519 @albanD @mruberry
π Bug
When I use nn.DataParallel to wrap an nn.Module X, nn.Parameter in X is not copied to gpus in the forward pass. I think nn.Parameter can be considered as a part of module parameters, so it should be treated like other nn.Module parameters in X as well. Is it an intentional design?
To Reproduce
test.py:
When I run
python3 test.py 0(which means device_id = [0]), the output isHowever, when I run
python3 test.py 0,1(which means device_id = [0, 1]), the output isOnly nn.Module is copied to gpus in forward pass.
How can I use and train nn.Parameter just like nn.Module with nn.DataParallel?
Expected behavior
When the nn.Module X is wrapped with nn.DataParallel, both nn.Module and nn.Parameter in X should be copied to gpus.
Environment
PyTorch version: 1.6.0.dev20200401+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0
Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce GTX TITAN X
GPU 1: GeForce GTX 1060 6GB
Nvidia driver version: 440.64
cuDNN version: /usr/lib/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.6.0.dev20200401+cu101
[pip3] torchexp==0.1.0
[pip3] torchvision==0.6.0.dev20200401+cu101
[conda] Could not collect
cc @ezyang @gchanan @zou3519 @albanD @mruberry