Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 78cf46b

Browse files
mrshenlibrianjo
andauthored
Add ZeroRedundancyOptimizer recipe to distributed overview page (#1393)
Co-authored-by: Brian Johnson <[email protected]>
1 parent db6d74d commit 78cf46b

1 file changed

Lines changed: 7 additions & 3 deletions

File tree

beginner_source/dist_overview.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ model replicas. Moreover, the model is broadcast at DDP construction time instea
113113
of in every forward pass, which also helps to speed up training. DDP is shipped
114114
with several performance optimization technologies. For a more in-depth
115115
explanation, please refer to this
116-
`DDP paper <https://arxiv.org/abs/2006.15704>`__ (VLDB'20).
116+
`DDP paper <http://www.vldb.org/pvldb/vol13/p3005-li.pdf>`__ (VLDB'20).
117117

118118

119119
DDP materials are listed below:
@@ -131,6 +131,10 @@ DDP materials are listed below:
131131
tutorial.
132132
3. The `Launching and configuring distributed data parallel applications <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__
133133
document shows how to use the DDP launching script.
134+
4. The `Shard Optimizer States With ZeroRedundancyOptimizer <https://pytorch.org/tutorials/recipes/zero_redundancy_optimizer.html>`__
135+
recipe demonstrates how `ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__
136+
helps to reduce optimizer memory footprint for distributed data-parallel
137+
training.
134138

135139
TorchElastic
136140
~~~~~~~~~~~~
@@ -194,12 +198,12 @@ RPC Tutorials are listed below:
194198
decorator, which can help speed up inference and training. It uses similar
195199
RL and PS examples employed in the above tutorials 1 and 2.
196200
5. The `Combining Distributed DataParallel with Distributed RPC Framework <../advanced/rpc_ddp_tutorial.html>`__
197-
tutorial demonstrates how to combine DDP with RPC to train a model using
201+
tutorial demonstrates how to combine DDP with RPC to train a model using
198202
distributed data parallelism combined with distributed model parallelism.
199203

200204

201205
PyTorch Distributed Developers
202206
------------------------------
203207

204-
If you'd like to contribute to PyTorch Distributed, please refer to our
208+
If you'd like to contribute to PyTorch Distributed, please refer to our
205209
`Developer Guide <https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md>`_.

0 commit comments

Comments
 (0)