Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Data fetching issue for ImageNet example #188

@sozubek

Description

@sozubek

I tried to run Imagenet example with https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py

At the end of the epoch I got the following error message:

Traceback (most recent call last):
File "main_amp.py", line 520, in
main()
File "main_amp.py", line 239, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main_amp.py", line 345, in train
prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
File "main_amp.py", line 504, in accuracy
correct = pred.eq(target.view(1, -1).expand_as(pred))
RuntimeError: The expanded size of the tensor (128) must match the existing size (96) at non-singleton dimension 1. Target sizes: [5, 128]. Tensor sizes: [1, 96]

The problem is the targets and and output have different size. Checking the code I see that new inputs and targets for the next batch are picked up at line 337
input, target = prefetcher.next()

This target is then used to calculate accuracy at line 345
prec1, prec5 = accuracy(output.data, target, topk=(1, 5))

The output.data in line 345 is not calculated on the input obtained in line 337. That causes a size mismatch. Also accuracy will not be calculated right.

I suggest moving line 337 after accuracy calculations and printing results to line 376

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions