Du Chen1,2* · Liyi Chen1* · Zhengqiang Zhang1,2 · Lei Zhang1,2†
1The Hong Kong Polytechnic University 2OPPO Research Institute
*Equal contribution †Corresponding author
- 2025-06-25: GSASR is accecpted by ICCV 2025. Congratulations! We will modify our final version these days.
- 2025-06-05: The online demo with most powerful HATL-based GSASR is released, click to try it.
- 2025-05-30: The {EDSR, RDN, SWIN, HATL}-based GSASR models are available.
- 2025-01-16: GSASR paper and project papge are released.
This work presents GSASR. It achieve SoTA in arbitrary-scale super-resolution by representing given LR image as millions of continuous 2D Gaussians.
| Encoder Backbone | Methods | Version | Training Dataset | PSNR/SSIM/LPIPS/DISTS (x4 scaling factor) | ||
|---|---|---|---|---|---|---|
| DIV2K | LSDIR | Urban100 | ||||
| EDSR | LIIF | Paper | DIV2K | 30.43/0.8388/0.2662/0.1403 | 26.21/0.7614/0.2978/0.1678 | 26.14/0.7885/0.2271/0.1738 |
| GaussianSR | Paper | DIV2K | 30.46/0.8389/0.2684/0.1406 | 26.23/0.7615/0.3007/0.1679 | 26.19/0.7893/0.2283/0.1730 | |
| CiaoSR | Paper | DIV2K | 30.67/0.8431/0.2585/0.1370 | 26.42/0.7681/0.2865/0.1631 | 26.69/0.8091/0.2078/0.1659 | |
| GSASR | Paper Reported | DIV2K | 30.89/0.8486/0.2518/0.1301 | 26.65/0.7774/0.2777/0.1554 | 27.01/0.8142/0.1987/0.1552 | |
| GSASR | Enhanced | DIV2K | 31.01/0.8509/0.2508/0.1306 | 26.78/0.7813/0.2962/0.1543 | 27.34/0.8230/0.1920/0.1515 | |
| GSASR | Enhanced | DF2K | 31.04/0.8515/0.2512/0.1307 | 26.82/0.7827/0.2751/0.1540 | 27.45/0.8256/0.1902/0.1507 | |
| RDN | LIIF | Paper | DIV2K | 30.71/0.8449/0.2566/0.1354 | 26.48/0.7714/0.2838/0.1603 | 26.71/0.8055/0.2062/0.1562 |
| GaussianSR | Paper | DIV2K | 30.76/0.8457/0.2570/0.1347 | 26.53/0.7727/0.2837/0.1595 | 26.77/0.8064/0.2069/0.1610 | |
| CiaoSR | Paper | DIV2K | 30.91/0.8481/0.2525/0.1327 | 26.66/0.7770/0.2768/0.1563 | 27.10/0.8142/0.1966/0.1559 | |
| GSASR | Paper Reported | DIV2K | 30.96/0.8500/0.2505/0.1288 | 26.73/0.7801/0.2752/0.1533 | 27.15/0.8177/0.1953/0.1515 | |
| GSASR | Enhanced | DIV2K | 31.03/0.8513/0.2499/0.1306 | 26.79/0.7819/0.2740/0.1543 | 27.37/0.8238/0.1898/0.1511 | |
| GSASR | Enhanced | DF2K | 31.10/0.8525/0.2482/0.1296 | 26.88/0.7848/0.2709/0.1527 | 27.58/0.8289/0.1849/0.1500 | |
| SWIN | CiaoSR | Paper | DIV2K | 31.05/0.8511/0.2487/0.1316 | 26.80/0.7812/0.2724/0.1552 | 27.40/0.8231/0.1869/0.1535 |
| GSASR | Paper (not Reported) | DIV2K | 31.06/0.8521/0.2487/0.1270 | 26.84/0.7837/0.2719/0.1503 | 27.39/0.8247/0.1913/0.1466 | |
| GSASR | Enhanced | DIV2K | 31.10/0.8530/0.2463/0.1285 | 26.88/0.7849/0.2690/0.1517 | 27.55/0.8280/0.1850/0.1475 | |
| GSASR | Enhanced | DF2K | 31.17/0.8541/0.2456/0.1288 | 26.96/0.7876/0.2665/0.1513 | 27.81/0.8343/0.1781/0.1465 | |
| HATL | GSASR | Ultra Performance | SA1B | 31.31/0.8570/0.2381/0.1268 | 27.17/0.7948/0.2548/0.1470 | 28.44/0.8493/0.1580/0.1394 |
Comparisons with representative/SoTA ASR models (PSNR/SSIM are tested on Y channel of Ycbcr space).
We provide three versions of GSASR:
- Paper: the results we reported in our paper. (not reported) means results are not shown in our paper due to limited pages.
- Enhanced: we introduce Rotary Position Embedding (ROPE) with Flash Attention, and utilize Automatic Mixed Precision (AMP) strategy during training/inference to to reduce memory and time cost.
- Ultra Performance: based on
Enhancedsettings, we explore the performance upper bound of GSASR by introducing HAT-L encoder and SA1B dataset.
| Model Backbone | Training Dataset | Download | Version |
|---|---|---|---|
| EDSR | DIV2K | Google Drive, Hugging Face | Enhanced |
| EDSR | DF2K | Google Drive, Hugging Face | Enhanced |
| RDN | DIV2K | Google Drive, Hugging Face | Enhanced |
| RDN | DF2K | Google Drive, Hugging Face | Enhanced |
| SWIN | DIV2K | Google Drive, Hugging Face | Enhanced |
| SWIN | DF2K | Google Drive, Hugging Face | Enhanced |
| HATL | SA1B | Google Drive, Hugging Face | Ultra Performance |
Toward the results of our paper, we do not use these tricks (AMP+ROPE+Flash Attention/extra training datasets) for fair comparison.
As for the pretrained models reported in our paper, please refer to Pre-trained Models (Paper Version).
- Pytorch == 2.0 (PyTorch Version must >= 2.0)
- Anaconda
- CUDA Toolkit (necessary)
Firstly, please make sure you have installed CUDA Toolkit! Since we have hand-crafted CUDA operators, you need to compile them when you run GSASR.
git clone https://github.com/ChrisDud0257/GSASR
cd GSASR
conda create --name gsasr python=3.10
conda activate gsasr
export CUDA_HOME=${path_to_CUDA} ### specify the path to cuda-11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
python setup_gscuda.py install # gscuda
cd TrainTestGSASR
pip install -r requirements.txt
BASICSR_EXT=True python setup_basicsr.py develop # basicsrWe have tested that the versions of CUDA from 11.0 to 12.4 are all OK.
You need to properly authenticate with Hugging Face to download our model weights. Once set up, our code will handle it automatically at your first run. You can authenticate by running
# This will prompt you to enter your Hugging Face credentials.
huggingface-cli loginYou can try GSASR easily by lanching gradio demo or runing in command.
python demo_gr.pypython inference_enhenced.py \
--input_img_path <path_to_img> \
--save_sr_path <path_to_saved_folder> \
--model <{EDSR_DIV2K, EDSR_DF2K, RDN_DIV2K, RDN_DF2K, SWIN_DIV2K,SWIN_DF2K, HATL_SA1B}> \
--scale <scale> [--tile_process] [--AMP_test]If it fails to access Huggingface, try to manually download pretrained models and specify local path with --model_path <path_to_model_weight>.
Using --tile_process and --AMP_test if memory is limited.
Please note that, in our paper, we only train GSASR on DIV2K dataset without AMP+ROPE+Flash Attention tricks for fair comparison. Due to the limited pages in our paper, we don't report the results of Swin-based model. Here, besides {EDSR, RDN}-based GSASR present in paper, we further provide the Swin-based GSASR model. The {EDSR, RDN}-based GSASR models provided bellow should exactly generate the same results as that reported in our paper (Table.1 in the main paper and Table.1 ~ Table.7 in the supplementary).
Download models from the following link.
| Encoder Backbone | Training Dataset | Download | Version |
|---|---|---|---|
| EDSR | DIV2K | Google Drive, Hugging Face | Paper Reported |
| RDN | DIV2K | Google Drive, Hugging Face | Paper Reported |
| SWIN | DIV2K | Google Drive, Hugging Face | Paper (not Reported) |
if you have logined in the huggingface, directly execute the inference_paper.py as follows.
python inference_paper.py \
--input_img_path <path_to_img> \
--save_sr_path <path_to_saved_folder> \
--model <{EDSR, RDN, SWIN}> \
--scale <scale> [--tile_process]To get the numerical performance in Tab.2 of the main paper. Please download cropped 720*720 size of GT images, and the corresponding LR images of DIV2K testing parts, which are utilized in our paper.
The {EDSR, RDN}-based GSASR models provided bellow should exactly generate the same PSNR/SSIM/LPIPS/DISTS results as that reported in our paper (Table.2 in the main paper and Table.8 the supplementary).
| Dataset | Download |
|---|---|
| DIV2K_GT720 | Google Drive |
If you want to crop images all by yourself, please follow this instruction to prepare the data which could be utilized to test the computational cost.
After you download them, please test by the following command.
python inference_paper_benchmark.py \
--input_img_path <path_to_LRx4_folder> \
--save_sr_path <path_to_saved_folder> \
--model <{EDSR, RDN, SWIN}> \
--scale 4 [--tile_process]Please indicate the "input_img_path" to your downloaded DIV2K testing parts (which is provided by us).
If you want to test GSASR on standard benchmarks with full size, please use the same commands as above. We also provide the widely-used testing benchmarks, including Set5, Set14, DIV2K-val 100, LSDIR-val 250, Urban100, Manga109, BSDS100, General100, and each GT image's corresponding LR counterparts with different scaling factors which is obtained by bicubic operation.
| Dataset | Link |
|---|---|
| Testing Benchmarks | Google Drive |
In inference_paper_benchmarks.py, we integrate the statistics code of test time (ms) and GPU memory (MB). In our paper, we calculate the computational cost on a single NVIDIA A100 GPU, and we input the full size image into the model, we don't use tile_process. The inference time omit the pre-processing and post-processing and record the full pipeline cost inlcuding encoder, decoder and rendering.
After inference, execute the code to estimate PSNR/SSIM/LPIPS/DISTS.
cd TrainTestGSASR/scripts/metrics/
python calculate_psnr_ssim.py --test_y_channel --gt <path_to_GT_folder> --restored <path_to_SR_folder> --scale <scale> [--suffix <suffix_of_images>]
python calculate_lpips.py --gt <path_to_GT_folder> --restored <path_to_SR_folder> --scale <scale> [--suffix <suffix_of_images>]
python calculate_dists.py --gt <path_to_GT_folder> --restored <path_to_SR_folder> --scale <scale> [--suffix <suffix_of_images>]Please note that we test them on Y channel of Ycbcr space with --test_y_channel when calculating PSNR/SSIM. When calculating PSNR/SSIM/LPIPS/DISTS, we set crop_border=${scale} if the scaling factor is not larger than 8, otherwise crop_border=8.
Please follow this instruction to prepare the training and testing datasets.
Please follow this instruction to train GSASR.
Please follow this instruction to test GSASR if you further want to do it .
This project is built mainly based on the excellent BasicSR, HAT and ROPE-ViT codeframe. We appreciate it a lot for their developers.
We sincerely thank Mr.Zhengqiang Zhang for his support in the CUDA operator of rasterization.
If you find this research helpful for you, please cite our paper.
@article{chen2025generalized,
title={Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution},
author={Chen, Du and Chen, Liyi and Zhang, Zhengqiang and Zhang, Lei},
journal={arXiv preprint arXiv:2501.06838},
year={2025}
}If you have any questions or suggestions about this project, please contact me at [email protected] .