torchrun --nnodes=1 --nproc_per_node=N extract_features.py --model DiT-XL/2 --data-path /path/to/imagenet/train --features-path /path/to/store/features --global-batch-size=256
This is feature extract code and I'm curious if I have to extract feature model by model before training