inefficient data loader

I just wanted to point out that the data loader in this implementation seems to be a lot less efficient than it could have been. Right now, the code writes each encoded image into a separate `.npy` file and during training loads each file in a batch separately. That's a lot of inefficient file I/O. You could have just saved all pre-extracted features in a single array/tensor and loaded a single file into RAM (or even into GPU RAM) once before starting training. The entire ImageNet takes up only 5 GB of memory if you store it in `uint8` in this way, e.g.: https://huggingface.co/datasets/cloneofsimo/imagenet.int8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

inefficient data loader #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

inefficient data loader #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions