Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 4606df9

Browse files
Rearranging documentation
1 parent 0c2a629 commit 4606df9

2 files changed

Lines changed: 37 additions & 35 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ users as quickly as possible.
1212

1313
## 1. Mixed Precision
1414

15-
### amp: Automatic Mixed Precision
15+
### Amp: Automatic Mixed Precision
1616

1717
`apex.amp` is a tool to enable mixed precision training by changing only 3 lines of your script.
1818
Users can easily experiment with different pure and mixed precision training modes by supplying
@@ -27,7 +27,7 @@ different flags to `amp.initialize`.
2727

2828
[DCGAN example coming soon...](https://github.com/NVIDIA/apex/tree/master/examples/dcgan)
2929

30-
[Moving to the new Amp API] (for users of the deprecated tools formerly called "Amp" and "FP16_Optimizer")
30+
[Moving to the new Amp API](https://nvidia.github.io/apex/amp.html#transition-guide-for-old-api-users) (for users of the deprecated tools formerly called "Amp" and "FP16_Optimizer")
3131

3232
## 2. Distributed Training
3333

docs/source/amp.rst

Lines changed: 35 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,22 @@ apex.amp
77
This page documents the updated API for Amp (Automatic Mixed Precision),
88
a tool to enable Tensor Core-accelerated training in only 3 lines of Python.
99

10-
Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
11-
or properties are chosen. Amp intends that users start with an existing default (FP32) script,
12-
add the three lines corresponding to the Amp API, and begin training with mixed precision.
13-
Amp can also be disabled, in which case the original script will behave exactly as it used to.
14-
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.
10+
A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
11+
on the Github page.
12+
13+
GANs are a tricky case that many people have requested. A `comprehensive DCGAN example`_
14+
is under construction.
15+
16+
``opt_level``\ s and Properties
17+
-------------------------------
18+
19+
Amp allows users to easily experiment with different pure and mixed precision modes.
20+
Commonly-used default modes are chosen by
21+
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
22+
properties that govern Amp's implementation of pure or mixed precision training.
23+
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
24+
particular properties directly to ``amp.initialize``. These manually specified values
25+
override the defaults established by the ``opt_level``.
1526

1627
Example::
1728

@@ -27,39 +38,33 @@ Example::
2738
scaled_loss.backward()
2839
...
2940

30-
A `runnable, comprehensive Imagenet example`_ demonstrating good practices can be found
31-
on the Github page.
41+
Users **should not** manually cast their model or data to ``.half()``, regardless of what ``opt_level``
42+
or properties are chosen. Amp intends that users start with an existing default (FP32) script,
43+
add the three lines corresponding to the Amp API, and begin training with mixed precision.
44+
Amp can also be disabled, in which case the original script will behave exactly as it used to.
45+
In this way, there's no risk adhering to the Amp API, and a lot of potential performance benefit.
3246

33-
GANs are a tricky case that many people have requested. A `comprehensive DCGAN example`_
34-
is under construction.
47+
.. note::
48+
Because it's never necessary to manually cast your model (aside from the call ``amp.initialize``)
49+
or input data, a script that adheres to the new API
50+
can switch between different ``opt-level``\ s without having to make any other changes.
3551

3652
.. _`runnable, comprehensive Imagenet example`:
3753
https://github.com/NVIDIA/apex/tree/master/examples/imagenet
3854

3955
.. _`comprehensive DCGAN example`:
4056
https://github.com/NVIDIA/apex/tree/master/examples/dcgan
4157

42-
``opt_level``\ s and Properties
43-
-------------------------------
44-
45-
Amp allows users to easily experiment with different pure and mixed precision modes, including
46-
pure FP16 training and pure FP32 training. Commonly-used default modes are chosen by
47-
selecting an "optimization level" or ``opt_level``; each ``opt_level`` establishes a set of
48-
properties that govern Amp's implementation of pure or mixed precision training.
49-
Finer-grained control of how a given ``opt_level`` behaves can be achieved by passing values for
50-
particular properties directly to ``amp.initialize``. These manually specified values will
51-
override the defaults established by the ``opt_level``.
52-
5358
Properties
5459
**********
5560

5661
Currently, the under-the-hood properties that govern pure or mixed precision training are the following:
5762

5863
- ``cast_model_type``: Casts your model's parameters and buffers to the desired type.
5964
- ``patch_torch_functions``: Patch all Torch functions and Tensor methods to perform Tensor Core-friendly ops like GEMMs and convolutions in FP16, and any ops that benefit from FP32 precision in FP32.
60-
- ``keep_batchnorm_fp32``: To enhance precision and enable cudnn batchnorm (which improves performance), it's often beneficial to keep batchnorms in particular in FP32 even if the rest of the model is FP16.
65+
- ``keep_batchnorm_fp32``: To enhance precision and enable cudnn batchnorm (which improves performance), it's often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16.
6166
- ``master_weights``: Maintain FP32 master weights to accompany any FP16 model weights. FP32 master weights are stepped by the optimizer to enhance precision and capture small gradients.
62-
- ``loss_scale``: If ``loss_scale`` is a float value, use this value as the static (fixed) loss scale. If ``loss_scale`` is the string ``"dynamic"``, adapatively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.
67+
- ``loss_scale``: If ``loss_scale`` is a float value, use this value as the static (fixed) loss scale. If ``loss_scale`` is the string ``"dynamic"``, adaptively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.
6368

6469
Again, you often don't need to specify these properties by hand. Instead, select an ``opt_level``,
6570
which will set them up for you. After selecting an ``opt_level``, you can optionally pass property
@@ -85,7 +90,7 @@ Your incoming model should be FP32 already, so this is likely a no-op.
8590
| Default properties set by ``O0``:
8691
| ``cast_model_type=torch.float32``
8792
| ``patch_torch_functions=False``
88-
| ``keep_batchnorm_fp32=None`` (effectively, "not applicable")
93+
| ``keep_batchnorm_fp32=None`` (effectively, "not applicable," everything is FP32)
8994
| ``master_weights=False``
9095
| ``loss_scale=1.0``
9196
|
@@ -116,21 +121,23 @@ what gives the best speedup and accuracy for your model.
116121

117122
Patch all Torch functions and Tensor methods to cast their inputs according to a whitelist-blacklist
118123
model. Whitelist ops (for example, Tensor Core-friendly ops like GEMMs and convolutions) are performed
119-
in FP16. Blacklist ops that benefit from FP32 precision (for example, batchnorm and softmax)
124+
in FP16. Blacklist ops that benefit from FP32 precision (for example, softmax)
120125
are performed in FP32. ``O1`` also uses dynamic loss scaling, unless overridden.
121126

122127
| Default properties set by ``O1``:
123128
| ``cast_model_type=None`` (not applicable)
124129
| ``patch_torch_functions=True``
125-
| ``keep_batchnorm_fp32=None`` (not necessary to specify True, batchnorm inputs are cast to FP32)
130+
| ``keep_batchnorm_fp32=None`` (again, "not applicable," all model weights remain FP32)
126131
| ``master_weights=None`` (not applicable, model weights remain FP32)
127132
| ``loss_scale="dynamic"``
128133
|
129134
|
130135
131136
``O2``: Fast Mixed Precision
132137
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
133-
``O2`` casts the model to FP16, keeps batchnorms in FP32, maintains master weights in FP32,
138+
``O2`` casts the model weights to FP16,
139+
patches the model's ``forward`` method to cast input
140+
data to FP16, keeps batchnorms in FP32, maintains FP32 master weights,
134141
and implements dynamic loss scaling (unless overridden).
135142
Unlike ``O1``, ``O2`` does not patch Torch functions or Tensor methods.
136143

@@ -221,14 +228,9 @@ One annoying aspect of FP16_Optimizer was that the user had to manually convert
221228
(either by calling ``.half()`` on it, or using a function or module wrapper from
222229
``apex.fp16_utils``), and also manually call ``.half()`` on input data. **Neither of these are
223230
necessary in the new API. No matter what --opt-level
224-
you choose, you can and should simply build your model in the default FP32 format.** The new Amp
225-
API will perform the right conversions during
231+
you choose, you can and should simply build your model and pass input data in the default FP32 format.**
232+
The new Amp API will perform the right conversions during
226233
``model, optimizer = amp.initialize(model, optimizer, opt_level=....)`` based on the ``--opt-level``
227234
and any overridden flags. Floating point input data may be FP32 or FP16, but you may as well just
228235
let it be FP16, because the ``model`` returned by ``amp.initialize`` will have its ``forward``
229236
method patched to cast the input data appropriately.
230-
231-
.. note::
232-
Aside from the call to ``amp.initialize`` itself, it's never necessary to manually cast
233-
your model or data with the new API. Therefore, a script that adheres to the new API
234-
can switch between different ``opt-level``\ s without having to make any other changes.

0 commit comments

Comments
 (0)