@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
3030` O0 ` and ` O3 ` can be told to use loss scaling via manual overrides, but using loss scaling with ` O0 `
3131(pure FP32 training) does not really make sense, and will trigger a warning.
3232
33- Softlink training and validation dataset into current directory
33+ Softlink training and validation dataset into current directory:
3434```
3535$ ln -sf /data/imagenet/train-jpeg/ train
3636$ ln -sf /data/imagenet/val-jpeg/ val
@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
4242```
4343$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
4444$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
45- $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
45+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
4646$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
4747$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
4848$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
@@ -64,16 +64,16 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
6464```
6565FP16 training with FP32 batchnorm:
6666```
67- $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
67+ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
6868```
6969Keeping the batchnorms in FP32 improves stability and allows Pytorch
7070to use cudnn batchnorms, which significantly increases speed in Resnet50.
7171
7272The ` O3 ` options might not converge, because they are not true mixed precision.
7373However, they can be useful to establish "speed of light" performance for
7474your model, which provides a baseline for comparison with ` O1 ` and ` O2 ` .
75- For Resnet50 in particular, ` --opt-level O3 --keep-batchnorm-FP32 True ` establishes
76- the "speed of light." (Without ` --keep-batchnorm-FP32 ` , it's slower, because it does
75+ For Resnet50 in particular, ` --opt-level O3 --keep-batchnorm-fp32 True ` establishes
76+ the "speed of light." (Without ` --keep-batchnorm-fp32 ` , it's slower, because it does
7777not use cudnn batchnorm.)
7878
7979#### ` --opt-level O1 ` ("conservative mixed precision")
0 commit comments