Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a3dbea3

Browse files
Adding summary
1 parent 26b30d1 commit a3dbea3

1 file changed

Lines changed: 19 additions & 3 deletions

File tree

examples/imagenet/README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,23 @@ $ ln -sf /data/imagenet/train-jpeg/ train
3636
$ ln -sf /data/imagenet/val-jpeg/ val
3737
```
3838

39-
### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
39+
### Summary
40+
41+
Amp enables easy experimentation with various pure and mixed precision options.
42+
```
43+
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
44+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
45+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
46+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
47+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
48+
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
49+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
50+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 --loss-scale 128.0 ./
51+
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
52+
```
53+
Options are broken down in detail below.
54+
55+
#### `--opt-level O0` (FP32 training) and `O3` (FP16 training)
4056

4157
"Pure FP32" training:
4258
```
@@ -60,7 +76,7 @@ For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establis
6076
the "speed of light." (Without `--keep-batchnorm-FP32`, it's slower, because it does
6177
not use cudnn batchnorm.)
6278

63-
### `--opt-level O1` ("conservative mixed precision")
79+
#### `--opt-level O1` ("conservative mixed precision")
6480

6581
`O1` patches Torch functions to cast inputs according to a whitelist-blacklist model.
6682
FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
@@ -81,7 +97,7 @@ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50
8197
For best performance, set `--nproc_per_node` equal to the total number of GPUs on the node
8298
to use all available resources.
8399

84-
### `--opt-level O2` ("fast mixed precision")
100+
#### `--opt-level O2` ("fast mixed precision")
85101

86102
`O2` casts the model to FP16, keeps batchnorms in FP32,
87103
maintains master weights in FP32, and implements

0 commit comments

Comments
 (0)