Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit db6d74d

Browse files
authored
Update ZeRO recipe to match argument name change (#1392)
* Update ZeRO recipe to match argument name change * Add links to point to ZeroRedundancyOptimizer doc page
1 parent 4d8e788 commit db6d74d

1 file changed

Lines changed: 12 additions & 9 deletions

File tree

recipes_source/zero_redundancy_optimizer.rst

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,9 @@ Shard Optimizer States with ZeroRedundancyOptimizer
77
88
In this recipe, you will learn:
99

10-
- The high-level idea of ``ZeroRedundancyOptimizer``.
11-
- How to use ``ZeroRedundancyOptimizer`` in distributed training and its impact.
10+
- The high-level idea of `ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__.
11+
- How to use `ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__
12+
in distributed training and its impact.
1213

1314

1415
Requirements
@@ -21,8 +22,8 @@ Requirements
2122
What is ``ZeroRedundancyOptimizer``?
2223
------------------------------------
2324

24-
The idea of ``ZeroRedundancyOptimizer`` comes from
25-
`DeepSpeed/ZeRO project <https://github.com/microsoft/DeepSpeed>`_ and
25+
The idea of `ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__
26+
comes from `DeepSpeed/ZeRO project <https://github.com/microsoft/DeepSpeed>`_ and
2627
`Marian <https://github.com/marian-nmt/marian-dev>`_ that shard
2728
optimizer states across distributed data-parallel processes to
2829
reduce per-process memory footprint. In the
@@ -47,12 +48,14 @@ processes, so that all model replicas still land in the same state.
4748
How to use ``ZeroRedundancyOptimizer``?
4849
---------------------------------------
4950

50-
The code below demonstrates how to use ``ZeroRedundancyOptimizer``. The majority
51-
of the code is similar to the simple DDP example presented in
51+
The code below demonstrates how to use
52+
`ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__.
53+
The majority of the code is similar to the simple DDP example presented in
5254
`Distributed Data Parallel notes <https://pytorch.org/docs/stable/notes/ddp.html>`_.
5355
The main difference is the ``if-else`` clause in the ``example`` function which
54-
wraps optimizer constructions, toggling between ``ZeroRedundancyOptimizer`` and
55-
``Adam`` optimizer.
56+
wraps optimizer constructions, toggling between
57+
`ZeroRedundancyOptimizer <https://pytorch.org/docs/master/distributed.optim.html>`__
58+
and ``Adam`` optimizer.
5659

5760

5861
::
@@ -91,7 +94,7 @@ wraps optimizer constructions, toggling between ``ZeroRedundancyOptimizer`` and
9194
if use_zero:
9295
optimizer = ZeroRedundancyOptimizer(
9396
ddp_model.parameters(),
94-
optim=torch.optim.Adam,
97+
optimizer_class=torch.optim.Adam,
9598
lr=0.01
9699
)
97100
else:

0 commit comments

Comments
 (0)