Data fetching issue for ImageNet example

I tried to run Imagenet example with https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py

At the end of the epoch I got the following error message:
 
Traceback (most recent call last):
  File "main_amp.py", line 520, in <module>
    main()
  File "main_amp.py", line 239, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "main_amp.py", line 345, in train
    prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
  File "main_amp.py", line 504, in accuracy
    correct = pred.eq(target.view(1, -1).expand_as(pred))
RuntimeError: The expanded size of the tensor (128) must match the existing size (96) at non-singleton dimension 1.  Target sizes: [5, 128].  Tensor sizes: [1, 96]

The problem is the targets and and output have different size. Checking the code I see that new inputs and targets for the next batch are picked up at line 337
        input, target = prefetcher.next()

This target is then used to calculate accuracy at line 345
            prec1, prec5 = accuracy(output.data, target, topk=(1, 5)) 

The output.data in line 345 is not calculated on the input obtained in line 337. That causes a size mismatch. Also accuracy will not be calculated right.

I suggest moving line 337 after accuracy calculations and printing results to line 376 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data fetching issue for ImageNet example #188

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Data fetching issue for ImageNet example #188

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions