distributed/rpc/tensorpipe/test_ddp_under_dist_autograd fails on release/1.6 branch

## 🐛 Bug

test_ddp_under_dist_autograd is failling on the release/1.6 branch with no additional changes.

## To Reproduce

Steps to reproduce the behavior:

Ran 
```
python test/distributed/rpc/tensorpipe/test_ddp_under_dist_autograd.py 
```

## Expected behavior

Passing test.

```
ERROR: test_ddp_dist_autograd_local_vs_remote_gpu (__main__.TestDdpComparisonTensorPipe)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper
    self._join_processes(fn)
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 306, in _join_processes
    self._check_return_codes(elapsed_time)
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 339, in _check_return_codes
    raise RuntimeError(error)
RuntimeError: Processes 4 5 exited with error code 10
```


## Environment

```
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A

OS: Amazon Linux 2
GCC version: (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
CMake version: version 3.13.3

Python version: 3.7
Is CUDA available: N/A
CUDA runtime version: 10.0.130
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 440.33.01
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.5
[conda] blas                      1.0                         mkl  
[conda] mkl                       2020.0                      166  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] numpy                     1.18.1           py37h4f9e942_0  
[conda] numpy-base                1.18.1           py37hde5b4d6_1  
[conda] numpydoc                  0.9.2                      py_0  
[conda] torch                     1.6.0a0+cefb9e0          pypi_0    pypi
```


cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski @jjlilley @lw @beauby

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

distributed/rpc/tensorpipe/test_ddp_under_dist_autograd fails on release/1.6 branch #41365

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

distributed/rpc/tensorpipe/test_ddp_under_dist_autograd fails on release/1.6 branch #41365

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions