Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[ICCV 2025 Highlight] 🌟🌟🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

License

Notifications You must be signed in to change notification settings

cocowy1/SMoE-Stereo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

121 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ SMoE-Stereo (ICCV 2025 Highlight) πŸš€

[ICCV 2025] 🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts arxiv

🌼 Abstract

Our SMoE-Stereo framework fuses Vision Foundation Models (VFMs) with a Selective-MoE design to unlock robust stereo matching at minimal computational cost. Its standout features are πŸ˜„ :

  • Our SMoE dynamically selects the most suitable experts for each input and thereby adapts to varying input characteristics, allowing it to adapt seamlessly to diverse β€œin-the-wild” scenes and domain shifts.

  • Unlike existing stereo matching methods that rely on rigid, sequential processing pipelines for all inputs, SMoE-Stereo intelligently prioritizes computational resources by selectively engaging only the most relevant MoEs for simpler scenes. This adaptive architecture optimally balances accuracy and processing speed according to available resources.

  • Remarkably, despite being trained exclusively on standard datasets (KITTI 2012/2015, Middlebury, and ETH3D training splits) without any additional data, SMoE-Stereo has achieved top ranking on the Robust Vision Challenge (RVC) leaderboards.

πŸ“ Zero-shot performance on Standard Stereo Benchmarks

teaser

πŸ‘€ Zero-shot on Adverse Weather Conditions and Enjoyable Inference Efficiency

weather efficiency

πŸ˜‡ Robust Vision Challenge (RVC) Benchmark

RVC

πŸŽ‡ Parameter-efficient Finetuning Methods (PEFT) & VFM backbones

Exciting Update! Our framework now comprehensively supports mainstream PEFT strategies for stereo matching, including:

Additionally, the framework is compatible with multiple leading vision foundation models (VFMs):

All these models can now leverage our PEFT implementation for enhanced performance and flexibility. Please choose the model variants you want !!!

Below are Examples:

parser.add_argument('--peft_type', default='smoe', choices=["lora", "smoe", "adapter", "tuning", "vpt", "ff"], type=str)
parser.add_argument('--vfm_type', default='damv2', choices=["sam", "dam", "damv2", "dinov2"], type=str)
parser.add_argument('--vfm_size', default='vitl', choices=["vitb", "vits", "vitl"], type=str)

βœ… TODO List

  • Upload the ViT-small weights of SMoE-Stereo.
  • add SMoE-IGEV-backbone.
  • add the KITTI demo.mp4.

😎 Our Framework

We use RAFT-Stereo as our backbone and replace its feature extractor with VFMs, while the remaining structures are unchanged. framework

πŸ’ͺ Flexible Selective Property

Our MoE modules and the experts within each MoE layer can be selectively activated to adapt to different input characteristics, facilitate scene-specific adaptation, enabling robust stereo matching across diverse real-world scenarios. framework

βš™οΈ Installation

  • NVIDIA RTX a6000
  • Python 3.8.13

⏳ Create a virtual environment and activate it.

conda create -n smoestereo python=3.8
conda activate smoestereo

🎬 Dependencies

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm
pip install scipy
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install timm==0.5.4
pip install thop
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install accelerate==1.0.1
pip install gradio_imageslider
pip install gradio==4.29.0

✏️ Required Data

✈️ Model weights

Model Link
sceneflow Google Driver
RVC (mix of all training datasets) Google Driver

The mix_all model is trained on all the datasets mentioned above, which has the best performance on zero-shot generalization. The model weights can be placed in the ckpt folders.

✈️ Evaluation

To evaluate the zero-shot performance of SMoE-Stereo on Scene Flow, KITTI, ETH3D, vkitti, DrivingStereo, or Middlebury, run

python evaluate_stereo.py --resume ./pretrained/damv2_sceneflow.pth --eval_dataset *(select one of ["eth3d", "kitti", "middlebury", "robust_weather",  "robust"])

or use the model trained on all datasets, which is better for zero-shot generalization.

python evaluate_stereo.py --resume ./pretrained/SMoEStereo_RVC.pth --eval_dataset *(select one of ["eth3d", "kitti", "sceneflow", "vkitti", "driving"])

Citation

If you find our work useful in your research, please consider citing our paper:

@article{wang2025moe,
  title={learning robust stereo matching in the wild with selective mixture-of-experts},
  author={Yun Wang, Longguang Wang, Chenghao Zhang, Yongjian Zhang, Zhanjie Zhang, Ao Ma, Chenyou Fan, Tin Lun Lam, Junjie Hu},
  journal={arXiv preprint arXiv:2507.04631},
  year={2025}
}

Acknowledgements

This project is based on RAFT-Stereo and GMStereo. We thank the original authors for their excellent works.

About

[ICCV 2025 Highlight] 🌟🌟🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages