- Introduction
- How LSRS Works
- Performance
- Getting Started
- Training
- Inference
- Evaluation
- Model Zoo
- File Description
- Citation
We propose Latent Scale Rejection Sampling (LSRS) to improve Visual Autoregressive (VAR) image generation. LSRS uses a lightweight scoring model to reject suboptimal token maps during inference, effectively fixing structural errors.
π On VAR-d30, LSRS reduces the FID score from 1.95 β 1.78 with merely 1% extra inference time, and to 1.66 with a 15% increase!
| Model | FIDβ | ISβ | Preβ | Recβ | Param | Step | Time |
|---|---|---|---|---|---|---|---|
| VAR-d16 | 3.36 | 274.5 | 0.84 | 0.51 | 310M | 10 | 0.20 |
| + LSRS M=4 | 3.19 | 278.1 | 0.82 | 0.54 | 310M+4M | 10 | 0.21 |
| + LSRS M=128 | 2.97 | 276.4 | 0.81 | 0.55 | 310M+4M | 10 | 0.30 |
| VAR-d30 | 1.95 | 303.1 | 0.82 | 0.59 | 2.0B | 10 | 1.00 |
| + LSRS M=4 | 1.78 | 305.9 | 0.81 | 0.61 | 2.0B+4M | 10 | 1.01 |
| + LSRS M=128 | 1.66 | 298.9 | 0.80 | 0.63 | 2.0B+4M | 10 | 1.15 |
conda create -n lsrs python=3.8 -y
conda activate lsrs
pip install -r requirements.txtDownload pre-trained VAE and VAR weights from the official VAR repository.
var_ckpt_folder/
βββ var_d16.pth
βββ var_d20.pth
βββ var_d24.pth
βββ var_d30.pth
Modify configs.py as needed, especially:
vae_ckptvar_ckpt_folderdata_path
Convert ImageNet training set:
CUDA_VISIBLE_DEVICES=0 python data_imagenet.pySample data using VAR:
CUDA_VISIBLE_DEVICES=0 python data_var.py --batch_size 25 --model_depth 16CUDA_VISIBLE_DEVICES=0 python train_score.py --d 30Original VAR:
CUDA_VISIBLE_DEVICES=0 python sample_lsrs.py -b 25 -d 30With LSRS:
CUDA_VISIBLE_DEVICES=0 python sample_lsrs.py -b 25 -d 30 -u -s ./scoring_model.pth --st 1 --ed 2 --mk 32
β οΈ Note: Scale indices range from 0 to 9, corresponding to 1 to 10 in the paper.
The generated .npz files are saved in path_lsrs_sample_output. Use OpenAI's FID evaluation toolkit with the 256Γ256 reference to evaluate FID, IS, precision, and recall.
π Coming Soon! Pre-trained scoring models will be available for direct download.
| File | Description |
|---|---|
configs.py |
Configuration for paths and hyperparameters |
data_imagenet.py |
Convert ImageNet dataset to training format |
data_var.py |
Sample training data using VAR model |
train_score.py |
Training script with ranking loss |
score_net.py |
PyTorch implementation of scoring model |
load_score_models.py |
Load scoring models for inference |
sample_lsrs.py |
Sample 50,000 images for evaluation |
| File | Modification |
|---|---|
__init__.py |
Added build_vae function for VQVAE construction |
var.py |
Added autoregressive_infer_cfg_idx and autoregressive_infer_cfg_lsrs classes |
quant.py |
Added get_next_autoregressive_input_lsrs, _get_best_candidate, idxBl_to_fhat |
Thanks to VAR for their wonderful work and codebase!
If you find this work useful, please cite:
@article{zheng2025lsrs,
title={LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling},
author={Zheng, Hong-Kai and Li, Piji},
journal={arXiv preprint arXiv:2512.03796},
year={2025}
}