You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: beginner_source/dist_overview.rst
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,7 @@ model replicas. Moreover, the model is broadcast at DDP construction time instea
113
113
of in every forward pass, which also helps to speed up training. DDP is shipped
114
114
with several performance optimization technologies. For a more in-depth
115
115
explanation, please refer to this
116
-
`DDP paper <https://arxiv.org/abs/2006.15704>`__ (VLDB'20).
116
+
`DDP paper <http://www.vldb.org/pvldb/vol13/p3005-li.pdf>`__ (VLDB'20).
117
117
118
118
119
119
DDP materials are listed below:
@@ -131,6 +131,10 @@ DDP materials are listed below:
131
131
tutorial.
132
132
3. The `Launching and configuring distributed data parallel applications <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__
133
133
document shows how to use the DDP launching script.
134
+
4. The `Shard Optimizer States With ZeroRedundancyOptimizer <https://pytorch.org/tutorials/recipes/zero_redundancy_optimizer.html>`__
135
+
recipe demonstrates how `ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__
136
+
helps to reduce optimizer memory footprint for distributed data-parallel
137
+
training.
134
138
135
139
TorchElastic
136
140
~~~~~~~~~~~~
@@ -194,12 +198,12 @@ RPC Tutorials are listed below:
194
198
decorator, which can help speed up inference and training. It uses similar
195
199
RL and PS examples employed in the above tutorials 1 and 2.
196
200
5. The `Combining Distributed DataParallel with Distributed RPC Framework <../advanced/rpc_ddp_tutorial.html>`__
197
-
tutorial demonstrates how to combine DDP with RPC to train a model using
201
+
tutorial demonstrates how to combine DDP with RPC to train a model using
198
202
distributed data parallelism combined with distributed model parallelism.
199
203
200
204
201
205
PyTorch Distributed Developers
202
206
------------------------------
203
207
204
-
If you'd like to contribute to PyTorch Distributed, please refer to our
208
+
If you'd like to contribute to PyTorch Distributed, please refer to our
0 commit comments