Thanks to visit codestin.com
Credit goes to github.com

Skip to content

inefficient data loader #12

@eminorhan

Description

@eminorhan

I just wanted to point out that the data loader in this implementation seems to be a lot less efficient than it could have been. Right now, the code writes each encoded image into a separate .npy file and during training loads each file in a batch separately. That's a lot of inefficient file I/O. You could have just saved all pre-extracted features in a single array/tensor and loaded a single file into RAM (or even into GPU RAM) once before starting training. The entire ImageNet takes up only 5 GB of memory if you store it in uint8 in this way, e.g.: https://huggingface.co/datasets/cloneofsimo/imagenet.int8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions