Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen, Hsien-Kai Kuo, Chun-Yi Lee
Abstract: Image restoration is a key task in low-level computer vision that aims to reconstruct high-quality images from degraded inputs. The emergence of Vision Mamba, which draws inspiration from the advanced state space model Mamba, marks a significant advancement in this field. Vision Mamba demonstrates excellence in modeling long-range dependencies with linear complexity, a crucial advantage for image restoration tasks. Despite its strengths, Vision Mamba encounters challenges in low-level vision tasks, including computational complexity that scales with the number of scanning sequences and local pixel forgetting. To address these limitations, this study introduces Efficient All-Around Mamba (EAMamba), an enhanced framework that incorporates a Multi-Head Selective Scan Module (MHSSM) with an all-around scanning mechanism. MHSSM efficiently aggregates multiple scanning sequences, which avoids increases in computational complexity and parameter count. The all-around scanning strategy implements multiple patterns to capture holistic information and resolves the local pixel forgetting issue. Our experimental evaluations validate these innovations across several restoration tasks, including super resolution, denoising, deblurring, and dehazing. The results validate that EAMamba achieves a significant 31-89% reduction in FLOPs while maintaining favorable performance compared to existing low-level Vision Mamba methods. The source codes are accessible at the following repository:
| |
Overall Framework of EAMamba |
| |
Multi-Head Selective Scan (MHSS) |
We use Python=3.9.12, and PyTorch>=1.13.1 with CUDA=11.7.
Installation of mamba-ssm (v2.2.2):
git clone https://github.com/state-spaces/mamba.git
cd mamba
git checkout 8ffd905
pip install .Installation of required packages:
pip install -r requirements.txtpython3 train.py --config [MODEL_CONFIG] --name [OUTPUT_FOLDER_NAME]| Parameter | Description |
|---|---|
MODEL_CONFIG |
Path to the model configuration file. |
OUTPUT_FOLDER_NAME |
Name of the output folder. |
accelerate launch --config_file accelerate_[2]gpus.yaml --num_cpu_threads_per_process 32 train.py --config [MODEL_CONFIG] --name [OUTPUT_FOLDER_NAME]| Parameter | Description |
|---|---|
accelerate_[2]gpus.yaml |
Configuration file for the number of GPUs. |
MODEL_CONFIG |
Path to the model configuration file. |
OUTPUT_FOLDER_NAME |
Name of the output folder. |
Currently we have the configuration file for 2, 4, and 8 GPUs, but you can create your own configuration file for different number of GPUs.
python3 test.py --model [MODEL_FILE] --dataset [DATASET] (--ensemble) (--save)| Parameter | Description |
|---|---|
MODEL_FILE |
Path to the model file. |
DATASET |
Name of the dataset. |
--ensemble |
Use ensemble mode. (Optional) |
--save |
Save the output images. (Optional) |
The ensemble mode is used to average the output of multiple rotated images to improve the performance, we did not use it in our experiments.
python3 profiling/erf_viz.py --model [MODEL_FILE] (--noise)| Parameter | Description |
|---|---|
MODEL_FILE |
Path to the model file. |
--noise |
Add noise to the input image. (Optional) |
Note that you need to manually change some of the variables in the erf_viz.py for different datasets, or noise levels.
python3 profiling/flops.py --config [CONFIG_FILE] (--var) (--simple)| Parameter | Description |
|---|---|
CONFIG_FILE |
Path to the configuration file. |
--var |
Calculate FLOPs and parameters for increasing input sizes until OOM. (Optional) |
--simple |
Do not output the full model summary. (Optional) |
Run the following command:
python3 task_evaluation/gen_sidd_mat.py --model [MODEL_FILE]| Parameter | Description |
|---|---|
MODEL_FILE |
Path to the model file. |
Run the following command:
python3 task_evaluation/gen_deblur_png.py --model [MODEL_FILE] --dataset [DATASET] --input_dir [INPUT_DIR] --result_dir [RESULT_DIR]| Parameter | Description |
|---|---|
MODEL_FILE |
Path to the model file. |
DATASET |
Path to the GoPro dataset. |
INPUT_DIR |
Path to the input images. |
RESULT_DIR |
Path to save the results. |
The gray scale row shows the difference between the ground truth and the generated results, with enhanced contrast for clearer visualization. Click on the dropdown to see the results for each dataset.
Here are the quantitative results for each dataset. Click on the dropdown to see the results for each task.
π Real-world Super Resolution
| Dataset | PSNR | SSIM |
|---|---|---|
| RealSR x2 | 34.18 | 0.927 |
| RealSR x3 | 31.11 | 0.872 |
| RealSR x4 | 29.60 | 0.835 |
π Synthetic Gaussian Color Denoising
| Dataset | Sigma 15 (PSNR) | Sigma 25 (PSNR) | Sigma 50 (PSNR) |
|---|---|---|---|
| Urban100 | 35.10 | 32.93 | 30.01 |
| CBSD68 | 34.43 | 31.81 | 28.62 |
| Kodak24 | 35.36 | 32.95 | 29.91 |
| McMaster | 35.59 | 33.34 | 30.31 |
π Real-world Denoising
| Dataset | PSNR | SSIM |
|---|---|---|
| SIDD | 39.87 | 0.960 |
π Motion Deblurring
| Dataset | PSNR | SSIM |
|---|---|---|
| GoPro | 33.58 | 0.966 |
| HIDE | 31.42 | 0.944 |
π Synthetic Dehazing
| Dataset | PSNR | SSIM |
|---|---|---|
| SOTS-indoor | 43.19 | 0.995 |
| SOTS-outdoor | 36.34 | 0.988 |
π Single-image Defocus Deblurring
| Dataset | PSNR | SSIM |
|---|---|---|
| Indoor | 28.90 | 0.887 |
| Outdoor | 23.23 | 0.785 |
| Combined | 25.99 | 0.821 |