🚀 SMoE-Stereo (ICCV 2025 Highlight) 🚀

[ICCV 2025] 🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

🌼 Abstract

Our SMoE-Stereo framework fuses Vision Foundation Models (VFMs) with a Selective-MoE design to unlock robust stereo matching at minimal computational cost. Its standout features are 😄 :

Our SMoE dynamically selects the most suitable experts for each input and thereby adapts to varying input characteristics, allowing it to adapt seamlessly to diverse “in-the-wild” scenes and domain shifts.
Unlike existing stereo matching methods that rely on rigid, sequential processing pipelines for all inputs, SMoE-Stereo intelligently prioritizes computational resources by selectively engaging only the most relevant MoEs for simpler scenes. This adaptive architecture optimally balances accuracy and processing speed according to available resources.
Remarkably, despite being trained exclusively on standard datasets (KITTI 2012/2015, Middlebury, and ETH3D training splits) without any additional data, SMoE-Stereo has achieved top ranking on the Robust Vision Challenge (RVC) leaderboards.

📝 Zero-shot performance on Standard Stereo Benchmarks

👀 Zero-shot on Adverse Weather Conditions and Enjoyable Inference Efficiency

😇 Robust Vision Challenge (RVC) Benchmark

🎇 Parameter-efficient Finetuning Methods (PEFT) & VFM backbones

Exciting Update! Our framework now comprehensively supports mainstream PEFT strategies for stereo matching, including:

Visual Prompt Tuning (ECCV 2022)
LoRA (ICLR 2022)
AdapterFormer (NeuralPS 2022)
Adapter Tuning (ECCV 2024)
LoRA MoE, Adapter MoE
Our SMoE strategy

Additionally, the framework is compatible with multiple leading vision foundation models (VFMs):

DepthAnything (DAM)
DepthAnythingV2 (DAMV2)
SegmentAnything (SAM)
DINOV2(DINOV2)

All these models can now leverage our PEFT implementation for enhanced performance and flexibility. Please choose the model variants you want !!!

Below are Examples:

parser.add_argument('--peft_type', default='smoe', choices=["lora", "smoe", "adapter", "tuning", "vpt", "ff"], type=str)
parser.add_argument('--vfm_type', default='damv2', choices=["sam", "dam", "damv2", "dinov2"], type=str)
parser.add_argument('--vfm_size', default='vitl', choices=["vitb", "vits", "vitl"], type=str)

✅ TODO List

Upload the ViT-small weights of SMoE-Stereo.
add SMoE-IGEV-backbone.
add the KITTI demo.mp4.

😎 Our Framework

We use RAFT-Stereo as our backbone and replace its feature extractor with VFMs, while the remaining structures are unchanged.

💪 Flexible Selective Property

Our MoE modules and the experts within each MoE layer can be selectively activated to adapt to different input characteristics, facilitate scene-specific adaptation, enabling robust stereo matching across diverse real-world scenarios.

⚙️ Installation

NVIDIA RTX a6000
Python 3.8.13

⏳ Create a virtual environment and activate it.

conda create -n smoestereo python=3.8
conda activate smoestereo

🎬 Dependencies

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install tqdm
pip install scipy
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install timm==0.5.4
pip install thop
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install accelerate==1.0.1
pip install gradio_imageslider
pip install gradio==4.29.0

✏️ Required Data

✈️ Model weights

Model	Link
sceneflow	Google Driver
RVC (mix of all training datasets)	Google Driver

The mix_all model is trained on all the datasets mentioned above, which has the best performance on zero-shot generalization. The model weights can be placed in the ckpt folders.

✈️ Evaluation

To evaluate the zero-shot performance of SMoE-Stereo on Scene Flow, KITTI, ETH3D, vkitti, DrivingStereo, or Middlebury, run

python evaluate_stereo.py --resume ./pretrained/damv2_sceneflow.pth --eval_dataset *(select one of ["eth3d", "kitti", "middlebury", "robust_weather",  "robust"])

or use the model trained on all datasets, which is better for zero-shot generalization.

python evaluate_stereo.py --resume ./pretrained/SMoEStereo_RVC.pth --eval_dataset *(select one of ["eth3d", "kitti", "sceneflow", "vkitti", "driving"])

Citation

If you find our work useful in your research, please consider citing our paper:

@article{wang2025moe,
  title={learning robust stereo matching in the wild with selective mixture-of-experts},
  author={Yun Wang, Longguang Wang, Chenghao Zhang, Yongjian Zhang, Zhanjie Zhang, Ao Ma, Chenyou Fan, Tin Lun Lam, Junjie Hu},
  journal={arXiv preprint arXiv:2507.04631},
  year={2025}
}

Acknowledgements

This project is based on RAFT-Stereo and GMStereo. We thank the original authors for their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
__pycache__		__pycache__
core		core
dataloader		dataloader
loss		loss
media		media
sampler		sampler
LICENSE		LICENSE
README.md		README.md
count_ops.py		count_ops.py
environment.yml		environment.yml
evaluate_stereo.py		evaluate_stereo.py
train_stereo_robust.py		train_stereo_robust.py
train_stereo_sceneflow.py		train_stereo_sceneflow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 SMoE-Stereo (ICCV 2025 Highlight) 🚀

[ICCV 2025] 🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

🌼 Abstract

📝 Zero-shot performance on Standard Stereo Benchmarks

👀 Zero-shot on Adverse Weather Conditions and Enjoyable Inference Efficiency

😇 Robust Vision Challenge (RVC) Benchmark

🎇 Parameter-efficient Finetuning Methods (PEFT) & VFM backbones

✅ TODO List

😎 Our Framework

💪 Flexible Selective Property

⚙️ Installation

⏳ Create a virtual environment and activate it.

🎬 Dependencies

✏️ Required Data

✈️ Model weights

✈️ Evaluation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

cocowy1/SMoE-Stereo

Folders and files

Latest commit

History

Repository files navigation

🚀 SMoE-Stereo (ICCV 2025 Highlight) 🚀

[ICCV 2025] 🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

🌼 Abstract

📝 Zero-shot performance on Standard Stereo Benchmarks

👀 Zero-shot on Adverse Weather Conditions and Enjoyable Inference Efficiency

😇 Robust Vision Challenge (RVC) Benchmark

🎇 Parameter-efficient Finetuning Methods (PEFT) & VFM backbones

✅ TODO List

😎 Our Framework

💪 Flexible Selective Property

⚙️ Installation

⏳ Create a virtual environment and activate it.

🎬 Dependencies

✏️ Required Data

✈️ Model weights

✈️ Evaluation

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages