Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b90b057

Browse files
Minor typos
1 parent a3dbea3 commit b90b057

2 files changed

Lines changed: 9 additions & 14 deletions

File tree

examples/imagenet/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
3030
`O0` and `O3` can be told to use loss scaling via manual overrides, but using loss scaling with `O0`
3131
(pure FP32 training) does not really make sense, and will trigger a warning.
3232

33-
Softlink training and validation dataset into current directory
33+
Softlink training and validation dataset into current directory:
3434
```
3535
$ ln -sf /data/imagenet/train-jpeg/ train
3636
$ ln -sf /data/imagenet/val-jpeg/ val
@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
4242
```
4343
$ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
4444
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
45-
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
45+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
4646
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
4747
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
4848
$ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
@@ -64,16 +64,16 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
6464
```
6565
FP16 training with FP32 batchnorm:
6666
```
67-
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
67+
$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
6868
```
6969
Keeping the batchnorms in FP32 improves stability and allows Pytorch
7070
to use cudnn batchnorms, which significantly increases speed in Resnet50.
7171

7272
The `O3` options might not converge, because they are not true mixed precision.
7373
However, they can be useful to establish "speed of light" performance for
7474
your model, which provides a baseline for comparison with `O1` and `O2`.
75-
For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establishes
76-
the "speed of light." (Without `--keep-batchnorm-FP32`, it's slower, because it does
75+
For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-fp32 True` establishes
76+
the "speed of light." (Without `--keep-batchnorm-fp32`, it's slower, because it does
7777
not use cudnn batchnorm.)
7878

7979
#### `--opt-level O1` ("conservative mixed precision")

examples/imagenet/main_amp.py

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -95,15 +95,10 @@ def fast_collate(batch):
9595
best_prec1 = 0
9696
args = parser.parse_args()
9797

98-
# Let multi_tensor_applier be the canary in the coalmine
99-
# that verifies if the backend is what we think it is
100-
assert multi_tensor_applier.available == args.has_ext
101-
10298
print("opt_level = {}".format(args.opt_level))
10399
print("keep_batchnorm_fp32 = {}".format(args.keep_batchnorm_fp32), type(args.keep_batchnorm_fp32))
104100
print("loss_scale = {}".format(args.loss_scale), type(args.loss_scale))
105101

106-
107102
print("\nCUDNN VERSION: {}\n".format(torch.backends.cudnn.version()))
108103

109104
if args.deterministic:
@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
342337
input, target = prefetcher.next()
343338

344339
if i%args.print_freq == 0:
345-
# Every print_freq iterations, let's check the accuracy and speed.
346-
# For best performance, it doesn't make sense to collect these metrics every
340+
# Every print_freq iterations, check the loss accuracy and speed.
341+
# For best performance, it doesn't make sense to print these metrics every
347342
# iteration, since they incur an allreduce and some host<->device syncs.
348343

349344
# Measure accuracy
@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
374369
'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
375370
'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
376371
epoch, i, len(train_loader),
377-
args.print_freq*args.world_size*args.batch_size/batch_time.val,
378-
args.print_freq*args.world_size*args.batch_size/batch_time.avg,
372+
args.world_size*args.batch_size/batch_time.val,
373+
args.world_size*args.batch_size/batch_time.avg,
379374
batch_time=batch_time,
380375
loss=losses, top1=top1, top5=top5))
381376

0 commit comments

Comments
 (0)