α-SVRG

Official PyTorch implementation for $\alpha$-SVRG

A Coefficient Makes SVRG Effective
Yida Yin, Zhiqiu Xu, Zhiyuan Li,Trevor Darrell, Zhuang Liu
UC Berkeley, University of Pennsylvania, Toyota Technological Institute at Chicago, and Meta AI Research
[Paper] [Video] [Project page]

We introduce $\alpha$-SVRG: applying a linearly decaying coefficient $\alpha$ to control the strength of the variance reduction term in SVRG.

Results

Smaller Datasets

Train Loss

	CIFAR-100	Pets	Flowers	STL-10	Food-101	DTD	SVHN	EuroSAT
baseline	2.66	2.20	2.40	1.64	2.45	1.98	1.59	1.25
SVRG	2.94	3.42	2.26	1.90	3.03	2.01	1.64	1.25
$\alpha$-SVRG	2.62	1.96	2.16	1.57	2.42	1.83	1.57	1.23

Validation Accuracy

	CIFAR-100	Pets	Flowers	STL-10	Food-101	DTD	SVHN	EuroSAT
baseline	81.0	72.8	80.8	82.3	85.9	57.9	94.9	98.1
SVRG	78.2	17.6	82.6	65.1	79.6	57.8	95.7	97.9
$\alpha$-SVRG	81.4	77.8	83.3	84.0	85.9	61.8	95.8	98.2

$\alpha$-SVRG improves both the train loss and the validation accuracy across all small datasets, but the standard SVRG mostly hurts the performance.

ImageNet-1K

Train Loss

	ConvNeXt-F	ViT-T	Swin-F	Mixer-S	ViT-B	ConvNeXt-B
baseline	3.487	3.443	3.427	3.635	2.817	2.644
SVRG	3.505	3.431	3.389	3.776	2.309	3.113
$\alpha$-SVRG	3.467	3.415	3.392	3.609	2.806	2.642

Validation Accuracy

	ConvNeXt-F	ViT-T	Swin-F	Mixer-S	ViT-B	ConvNeXt-B
baseline	76.0	73.9	74.3	71.0	81.6	83.7
SVRG	75.7	74.3	74.3	68.8	78.0	80.8
$\alpha$-SVRG	76.3	74.2	74.8	70.5	81.6	83.1

$\alpha$-SVRG consistently decreases the train loss, whereas standard SVRG increases it in most models. Note that a lower training loss in α-SVRG does not always lead to better generalization on the validation set. This is out of scope for $\alpha$-SVRG as an optimization method, but warrant future research on co-adapting optimization and regularization.

Installation

Please check INSTALL.md for installation instructions.

Training

We list commands for $\alpha$-SVRG on convnext_femto and vit_base with coefficient 0.75.

For training other models, change --model accordingly, e.g., to vit_tiny, convnext_base, vit_base.
For using different coefficients, change --coefficient accordingly, e.g., to 1, 0.5.
--use_cache_svrg can be enabled on smaller models provided with sufficient memory and disabled on larger models.
Our results of smaller models on ImageNet-1K were produced with 4 nodes, each with 8 gpus. Our results of smaller models on ImageNet-1K were produced with 8 nodes, each with 8 gpus. Our results of ConvNeXt-Femto on small datasets were produced with 8 gpus.

Below we give example commands for both smaller models and larger models on ImageNet-1K and ConvNeXt-Femto on small datasets.

Smaller models

python run_with_submitit.py --nodes 4 --ngpus 8 \
--model convnext_femto --epochs 300 \
--batch_size 128 --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear --use_cache_svrg true \
--data_path /path/to/data/ --data_set IMNET \
--output_dir /path/to/results/

Larger models

python run_with_submitit.py --nodes 8 --ngpus 8 \
--model vit_base --epochs 300 \
--batch_size 64 --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear \
--data_path /path/to/data/ --data_set IMNET \
--output_dir /path/to/results/

ConvNeXt-Femto on small datasets

Fill in epochs, warmup_epochs, and batch_size based on data_set.
Note that batch_size is the batch size for each gpu.

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --epochs $epochs --warmup_epochs $warmup_epochs \
--batch_size $batch_size --lr 4e-3 \
--use_svrg true --coefficient 0.75 --svrg_schedule linear --use_cache_svrg true \
--data_path /path/to/data/ --data_set $data_set \
--output_dir /path/to/results/

Evaluation

single-GPU

python main.py --model convnext_femto --eval true \
--resume /path/to/model \
--data_path /path/to/data

multi-GPU

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_femto --eval true \
--resume /path/to/model \
--data_path /path/to/data

Acknowledgement

This repository is built using the timm library and ConvNeXt codebase.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{yin2023coefficient,
      title={A Coefficient Makes SVRG Effective}, 
      author={Yida Yin and Zhiqiu Xu and Zhiyuan Li and Trevor Darrell and Zhuang Liu},
      year={2025},
      booktitle={ICLR},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
models		models
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
datasets.py		datasets.py
engine.py		engine.py
main.py		main.py
optim_factory.py		optim_factory.py
run_with_submitit.py		run_with_submitit.py
scheduler.py		scheduler.py
svrg.py		svrg.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

α-SVRG

Results

Smaller Datasets

ImageNet-1K

Installation

Training

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

davidyyd/alpha-SVRG

Folders and files

Latest commit

History

Repository files navigation

α-SVRG

Results

Smaller Datasets

ImageNet-1K

Installation

Training

Evaluation

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages