Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 6b1d310

Browse files
committed
address comments
1 parent 7c0b10e commit 6b1d310

1 file changed

Lines changed: 16 additions & 8 deletions

File tree

intermediate_source/rpc_tutorial.rst

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,9 @@ paradigms:
2121
data between observers and the trainer
2222
2) Your model might be too large to fit in GPUs on a single machine, and hence
2323
would need a library to help split a model onto multiple machines. Or you
24-
might be implementing a parameter server training framework, where model
25-
parameters and trainers live on different machines.
24+
might be implementing a `parameter server <https://www.cs.cmu.edu/~muli/file/parameter_server_osdi14.pdf>`__
25+
training framework, where model parameters and trainers live on different
26+
machines.
2627

2728

2829
The `torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html>`__ package
@@ -360,21 +361,27 @@ borrowed from the word language model in PyTorch
360361
`example <https://github.com/pytorch/examples/tree/master/word_language_model>`__
361362
repository, which contains three main components, an embedding table, an
362363
``LSTM`` layer, and a decoder. The code below wraps the embedding table and the
363-
decode into sub-modules, so that their constructors can be passed to the RPC
364-
API.
364+
decoder into sub-modules, so that their constructors can be passed to the RPC
365+
API. In the `EmbeddingTable` sub-module, we intentionally put the `Embedding`
366+
layer on GPU to demonstrate the use case. In v1.4, RPC always creates CPU tensor
367+
arguments or return values on the destination server. If the function takes a
368+
GPU tensor, you need to move it to the proper device explicitly.
365369

366370

367371
.. code:: python
368372
369373
class EmbeddingTable(nn.Module):
374+
r"""
375+
Encoding layers of the RNNModel
376+
"""
370377
def __init__(self, ntoken, ninp, dropout):
371378
super(EmbeddingTable, self).__init__()
372379
self.drop = nn.Dropout(dropout)
373-
self.encoder = nn.Embedding(ntoken, ninp)
380+
self.encoder = nn.Embedding(ntoken, ninp).cuda()
374381
self.encoder.weight.data.uniform_(-0.1, 0.1)
375382
376383
def forward(self, input):
377-
return self.drop(self.encoder(input))
384+
return self.drop(self.encoder(input.cuda()).cpu()
378385
379386
380387
class Decoder(nn.Module):
@@ -470,8 +477,9 @@ Then, as the ``RNNModel`` contains three sub-modules, we need to call
470477
Now, we are ready to implement the training loop. After initializing the model
471478
arguments, we create the ``RNNModel`` and the ``DistributedOptimizer``. The
472479
distributed optimizer will take a list of parameter ``RRefs``, find all distinct
473-
owner workers, and create the given local optimizer (i.e., ``SGD`` in this case)
474-
on each of the owner worker using the given arguments (i.e., ``lr=0.05``).
480+
owner workers, and create the given local optimizer (i.e., ``SGD`` in this case,
481+
you can use other local optimizers as well) on each of the owner worker using
482+
the given arguments (i.e., ``lr=0.05``).
475483
476484
In the training loop, it first creates a distributed autograd context, which
477485
will help the distributed autograd engine to find gradients and involved RPC

0 commit comments

Comments
 (0)