Official PyTorch implementation for
A Coefficient Makes SVRG Effective
Yida Yin, Zhiqiu Xu, Zhiyuan Li,Trevor Darrell, Zhuang Liu
UC Berkeley, University of Pennsylvania, Toyota Technological Institute at Chicago, and Meta AI Research
[Paper] [Video] [Project page]
We introduce
Train Loss
| CIFAR-100 | Pets | Flowers | STL-10 | Food-101 | DTD | SVHN | EuroSAT | |
|---|---|---|---|---|---|---|---|---|
| baseline | 2.66 | 2.20 | 2.40 | 1.64 | 2.45 | 1.98 | 1.59 | 1.25 |
| SVRG | 2.94 | 3.42 | 2.26 | 1.90 | 3.03 | 2.01 | 1.64 | 1.25 |
|
|
2.62 | 1.96 | 2.16 | 1.57 | 2.42 | 1.83 | 1.57 | 1.23 |
Validation Accuracy
| CIFAR-100 | Pets | Flowers | STL-10 | Food-101 | DTD | SVHN | EuroSAT | |
|---|---|---|---|---|---|---|---|---|
| baseline | 81.0 | 72.8 | 80.8 | 82.3 | 85.9 | 57.9 | 94.9 | 98.1 |
| SVRG | 78.2 | 17.6 | 82.6 | 65.1 | 79.6 | 57.8 | 95.7 | 97.9 |
|
|
81.4 | 77.8 | 83.3 | 84.0 | 85.9 | 61.8 | 95.8 | 98.2 |
Train Loss
| ConvNeXt-F | ViT-T | Swin-F | Mixer-S | ViT-B | ConvNeXt-B | |
|---|---|---|---|---|---|---|
| baseline | 3.487 | 3.443 | 3.427 | 3.635 | 2.817 | 2.644 |
| SVRG | 3.505 | 3.431 | 3.389 | 3.776 | 2.309 | 3.113 |
|
|
3.467 | 3.415 | 3.392 | 3.609 | 2.806 | 2.642 |
Validation Accuracy
| ConvNeXt-F | ViT-T | Swin-F | Mixer-S | ViT-B | ConvNeXt-B | |
|---|---|---|---|---|---|---|
| baseline | 76.0 | 73.9 | 74.3 | 71.0 | 81.6 | 83.7 |
| SVRG | 75.7 | 74.3 | 74.3 | 68.8 | 78.0 | 80.8 |
|
|
76.3 | 74.2 | 74.8 | 70.5 | 81.6 | 83.1 |
Please check INSTALL.md for installation instructions.
We list commands for convnext_femto and vit_base with coefficient 0.75.
- For training other models, change
--modelaccordingly, e.g., tovit_tiny,convnext_base,vit_base. - For using different coefficients, change
--coefficientaccordingly, e.g., to1,0.5. --use_cache_svrgcan be enabled on smaller models provided with sufficient memory and disabled on larger models.- Our results of smaller models on ImageNet-1K were produced with 4 nodes, each with 8 gpus. Our results of smaller models on ImageNet-1K were produced with 8 nodes, each with 8 gpus. Our results of ConvNeXt-Femto on small datasets were produced with 8 gpus.
Below we give example commands for both smaller models and larger models on ImageNet-1K and ConvNeXt-Femto on small datasets.
Smaller models
python run_with_submitit.py --nodes 4 --ngpus 8 \
--model convnext_femto --epochs 300 \
--batch_size 128 --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear --use_cache_svrg true \
--data_path /path/to/data/ --data_set IMNET \
--output_dir /path/to/results/
Larger models
python run_with_submitit.py --nodes 8 --ngpus 8 \
--model vit_base --epochs 300 \
--batch_size 64 --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear \
--data_path /path/to/data/ --data_set IMNET \
--output_dir /path/to/results/
ConvNeXt-Femto on small datasets
- Fill in
epochs,warmup_epochs, andbatch_sizebased ondata_set. - Note that
batch_sizeis the batch size for each gpu.
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --epochs $epochs --warmup_epochs $warmup_epochs \
--batch_size $batch_size --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear --use_cache_svrg true \
--data_path /path/to/data/ --data_set $data_set \
--output_dir /path/to/results/
single-GPU
python main.py --model convnext_femto --eval true \
--resume /path/to/model \
--data_path /path/to/data
multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --eval true \
--resume /path/to/model \
--data_path /path/to/data
This repository is built using the timm library and ConvNeXt codebase.
If you find this repository helpful, please consider citing:
@inproceedings{yin2023coefficient,
title={A Coefficient Makes SVRG Effective},
author={Yida Yin and Zhiqiu Xu and Zhiyuan Li and Trevor Darrell and Zhuang Liu},
year={2025},
booktitle={ICLR},
}