Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 26b30d1

Browse files
README touchup
1 parent 3c4b789 commit 26b30d1

1 file changed

Lines changed: 30 additions & 24 deletions

File tree

examples/imagenet/README.md

Lines changed: 30 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Notice that with the new Amp API **you never need to explicitly convert your mod
1616

1717
To train a model, create softlinks to the Imagenet dataset, then run `main.py` with the desired model architecture, as shown in `Example commands` below.
1818

19-
The default learning rate schedule is set for ResNet50. `main_amp.py` script rescales the learning rate according to the global batch size (number of distributed processes x per-process minibatch size).
19+
The default learning rate schedule is set for ResNet50. `main_amp.py` script rescales the learning rate according to the global batch size (number of distributed processes \* per-process minibatch size).
2020

2121
## Example commands
2222

@@ -26,59 +26,65 @@ The default learning rate schedule is set for ResNet50. `main_amp.py` script re
2626
CPU data loading bottlenecks.
2727

2828
**Note:** `--opt-level` `O1` and `O2` both use dynamic loss scaling by default unless manually overridden.
29-
`--opt-level` `O0` and `O3` (the "pure" training modes) do not use loss scaling by default, but they
30-
can also be told to use loss scaling via manual overrides. Using loss scaling with `O0`
31-
(pure FP32 training) does not really make sense, though, and will trigger a warning.
29+
`--opt-level` `O0` and `O3` (the "pure" training modes) do not use loss scaling by default.
30+
`O0` and `O3` can be told to use loss scaling via manual overrides, but using loss scaling with `O0`
31+
(pure FP32 training) does not really make sense, and will trigger a warning.
3232

33-
```bash
34-
### Softlink training dataset into current directory
33+
Softlink training and validation dataset into current directory
34+
```
3535
$ ln -sf /data/imagenet/train-jpeg/ train
36-
### Softlink validation dataset into current directory
3736
$ ln -sf /data/imagenet/val-jpeg/ val
3837
```
3938

40-
Single-process "pure fp32" training
39+
### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
40+
41+
"Pure FP32" training:
4142
```
4243
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
4344
```
44-
Single-process "pure fp16" training
45+
"Pure FP16" training:
4546
```
4647
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
4748
```
48-
Single-process fp16 training with fp32 batchnorm
49+
FP16 training with FP32 batchnorm:
4950
```
50-
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
51+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
5152
```
52-
Keeping the batchnorms in fp32 improves stability and allows Pytorch
53+
Keeping the batchnorms in FP32 improves stability and allows Pytorch
5354
to use cudnn batchnorms, which significantly increases speed in Resnet50.
5455

55-
The "O3" options might not converge, because they are not true mixed precision.
56+
The `O3` options might not converge, because they are not true mixed precision.
5657
However, they can be useful to establish "speed of light" performance for
57-
your model, which provides a baseline for comparison with opt-levels O1 and O2.
58-
For Resnet50 in particular, --opt-level O3 --keep-batchnorm-fp32 True establishes
59-
the "speed of light." (Without --keep-batchnorm-fp32, it's slower, because it does
58+
your model, which provides a baseline for comparison with `O1` and `O2`.
59+
For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establishes
60+
the "speed of light." (Without `--keep-batchnorm-FP32`, it's slower, because it does
6061
not use cudnn batchnorm.)
6162

62-
`--opt-level O1` ("conservative mixed precision") training patches Torch functions
63-
to cast inputs according to a whitelist-blacklist model. FP16-friendly (Tensor Core)
64-
ops like gemms and convolutions run in FP16, while ops that benefit from FP32,
65-
like batchnorm and softmax, run in FP32 (also, dynamic loss scaling is used by default):
63+
### `--opt-level O1` ("conservative mixed precision")
64+
65+
`O1` patches Torch functions to cast inputs according to a whitelist-blacklist model.
66+
FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
67+
that benefit from FP32, like batchnorm and softmax, run in FP32.
68+
Also, dynamic loss scaling is used by default.
6669
```
6770
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
6871
```
69-
"Conservative mixed precision" overridden to use static loss scaling:
72+
`O1` overridden to use static loss scaling:
7073
```
7174
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0
7275
```
73-
Distributed training with 2 processes (1 GPU per process)
76+
Distributed training with 2 processes (1 GPU per process, see **Distributed training** below
77+
for more detail)
7478
```
7579
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
7680
```
7781
For best performance, set `--nproc_per_node` equal to the total number of GPUs on the node
7882
to use all available resources.
7983

80-
`--opt-level O2` ("fast mixed precision") training casts the model to FP16,
81-
keeps batchnorms in FP32, maintains master weights in FP32, and implements
84+
### `--opt-level O2` ("fast mixed precision")
85+
86+
`O2` casts the model to FP16, keeps batchnorms in FP32,
87+
maintains master weights in FP32, and implements
8288
dynamic loss scaling by default. (Unlike --opt-level O1, --opt-level O2
8389
does not patch Torch functions.)
8490
```

0 commit comments

Comments
 (0)