Minor typos

definitelynotmcarilli · definitelynotmcarilli · commit b90b05709260 · 2019-03-03T21:06:27.000-08:00
diff --git a/examples/imagenet/README.md b/examples/imagenet/README.md
@@ -30,7 +30,7 @@ CPU data loading bottlenecks.
 `O0` and `O3` can be told to use loss scaling via manual overrides, but using loss scaling with `O0`
 (pure FP32 training) does not really make sense, and will trigger a warning.
 
-Softlink training and validation dataset into current directory
+Softlink training and validation dataset into current directory:
 ```
 $ ln -sf /data/imagenet/train-jpeg/ train
 $ ln -sf /data/imagenet/val-jpeg/ val
@@ -42,7 +42,7 @@ Amp enables easy experimentation with various pure and mixed precision options.
 ```
 $ python main_amp.py -a resnet50 --b 128 --workers 4 --opt-level O0 ./
 $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
-$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
+$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
 $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
 $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 --loss-scale 128.0 ./
 $ python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O1 ./
@@ -64,16 +64,16 @@ $ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 ./
 ```
 FP16 training with FP32 batchnorm:
 ```
-$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-FP32 True ./
+$ python main_amp.py -a resnet50 --b 224 --workers 4 --opt-level O3 --keep-batchnorm-fp32 True ./
 ```
 Keeping the batchnorms in FP32 improves stability and allows Pytorch
 to use cudnn batchnorms, which significantly increases speed in Resnet50.
 
 The `O3` options might not converge, because they are not true mixed precision.
 However, they can be useful to establish "speed of light" performance for
 your model, which provides a baseline for comparison with `O1` and `O2`.
-For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-FP32 True` establishes
-the "speed of light."  (Without `--keep-batchnorm-FP32`, it's slower, because it does
+For Resnet50 in particular, `--opt-level O3 --keep-batchnorm-fp32 True` establishes
+the "speed of light."  (Without `--keep-batchnorm-fp32`, it's slower, because it does
 not use cudnn batchnorm.)
 
 #### `--opt-level O1` ("conservative mixed precision")
diff --git a/examples/imagenet/main_amp.py b/examples/imagenet/main_amp.py
@@ -95,15 +95,10 @@ def fast_collate(batch):
 best_prec1 = 0
 args = parser.parse_args()
 
-# Let multi_tensor_applier be the canary in the coalmine
-# that verifies if the backend is what we think it is
-assert multi_tensor_applier.available == args.has_ext 
-
 print("opt_level = {}".format(args.opt_level))
 print("keep_batchnorm_fp32 = {}".format(args.keep_batchnorm_fp32), type(args.keep_batchnorm_fp32))
 print("loss_scale = {}".format(args.loss_scale), type(args.loss_scale))
 
-
 print("\nCUDNN VERSION: {}\n".format(torch.backends.cudnn.version()))
 
 if args.deterministic:
@@ -342,8 +337,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
         input, target = prefetcher.next()
 
         if i%args.print_freq == 0:
-            # Every print_freq iterations, let's check the accuracy and speed.
-            # For best performance, it doesn't make sense to collect these metrics every
+            # Every print_freq iterations, check the loss accuracy and speed.
+            # For best performance, it doesn't make sense to print these metrics every
             # iteration, since they incur an allreduce and some host<->device syncs.
 
             # Measure accuracy
@@ -374,8 +369,8 @@ def train(train_loader, model, criterion, optimizer, epoch):
                       'Prec@1 {top1.val:.3f} ({top1.avg:.3f})\t'
                       'Prec@5 {top5.val:.3f} ({top5.avg:.3f})'.format(
                        epoch, i, len(train_loader),
-                       args.print_freq*args.world_size*args.batch_size/batch_time.val,
-                       args.print_freq*args.world_size*args.batch_size/batch_time.avg,
+                       args.world_size*args.batch_size/batch_time.val,
+                       args.world_size*args.batch_size/batch_time.avg,
                        batch_time=batch_time,
                        loss=losses, top1=top1, top5=top5))