@@ -36,7 +36,23 @@ $ ln -sf /data/imagenet/train-jpeg/ train
3636$ ln -sf /data/imagenet/val-jpeg/ val
3737```
3838
39- ### ` --opt-level O0 ` (FP32 training) and ` O3 ` (FP16 training)
39+ ### Summary
40+
41+ Amp enables easy experimentation with various pure and mixed precision options.
42+ ```
43+ $ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
44+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
45+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
46+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
47+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
48+ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
49+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
50+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 --loss-scale 128.0 ./
51+ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O2 ./
52+ ```
53+ Options are broken down in detail below.
54+
55+ #### ` --opt-level O0 ` (FP32 training) and ` O3 ` (FP16 training)
4056
4157"Pure FP32" training:
4258```
@@ -60,7 +76,7 @@ For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establis
6076the "speed of light." (Without ` --keep-batchnorm-FP32 ` , it's slower, because it does
6177not use cudnn batchnorm.)
6278
63- ### ` --opt-level O1 ` ("conservative mixed precision")
79+ #### ` --opt-level O1 ` ("conservative mixed precision")
6480
6581` O1 ` patches Torch functions to cast inputs according to a whitelist-blacklist model.
6682FP16-friendly (Tensor Core) ops like gemms and convolutions run in FP16, while ops
@@ -81,7 +97,7 @@ $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50
8197For best performance, set ` --nproc_per_node ` equal to the total number of GPUs on the node
8298to use all available resources.
8399
84- ### ` --opt-level O2 ` ("fast mixed precision")
100+ #### ` --opt-level O2 ` ("fast mixed precision")
85101
86102` O2 ` casts the model to FP16, keeps batchnorms in FP32,
87103maintains master weights in FP32, and implements
0 commit comments