Ranran Huang · Krystian Mikolajczyk
SPFSplat simultaneous predicts 3D Gaussians and camera poses in a canonical space
from unposed sparse images, requiring no ground-truth poses during training or inference.
Make sure to also check our other work: SPFSplatV2, which enhances SPFSplat with improved performance and training efficiency, and further extends it for VGGT!
Table of Contents
- Clone SPFSplat.
git clone https://github.com/ranrhuang/SPFSplat
cd SPFSplat- Create the environment, here we show an example using conda.
conda create -n spfsplat python=3.11
conda activate spfpslat
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt- Optional, compile the cuda kernels for RoPE (as in CroCo v2).
cd src/model/encoder/backbone/croco/curope/
python setup.py build_ext --inplace
cd ../../../../../..Our models are hosted on Hugging Face 🤗
| Model name | Training resolutions | Training data | Training settings |
|---|---|---|---|
| re10k.ckpt | 256x256 | re10k | RE10K, 2 views |
| acid.ckpt | 256x256 | acid | ACID, 2 views |
| re10k_dl3dv.ckpt | 256x256 | re10k, dl3dv | RE10K + DL3DV, 2 views |
| re10k_10view.ckpt | 256x256 | re10k | RE10K, 10 views |
| re10k_nointrin.ckpt | 256x256 | re10k | RE10K, w/o intrin embed., 2 views |
We assume the downloaded weights are located in the pretrained_weights directory.
Please refer to DATASETS.md for dataset preparation.
-
Download the MASt3R pretrained model and put it in the
./pretrained_weightsdirectory. -
Train with:
# 2 view
python -m src.main +experiment=spfsplat/re10k wandb.mode=online wandb.name=re10k
# For multi-view training, we suggest fine-tuning from the released model. Here we use 3 view as an example. Remember to adjust the batch size according to your available GPU memory.
python -m src.main +experiment=spfsplat/re10k_3view wandb.mode=online wandb.name=re10k_3view checkpointing.load=./pretrained_weights/re10k.ckpt checkpointing.resume=false
# To inference without known intrinsics, training models with model.encoder.backbone.intrinsics_embed_loc='none'
python -m src.main +experiment=spfsplat/re10k wandb.mode=online wandb.name=re10k_nointrin model.encoder.backbone.intrinsics_embed_loc='none'
# RealEstate10K (enable test.align_pose=true if using evaluation-time pose alignment)
python -m src.main +experiment=spfsplat/re10k mode=test wandb.name=re10k \
dataset/[email protected]_sampler=evaluation \
dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
checkpointing.load=./pretrained_weights/re10k.ckpt \
test.save_image=true test.align_pose=false
# ACID (enable test.align_pose=true if using evaluation-time pose alignment)
python -m src.main +experiment=spfsplat/acid mode=test wandb.name=acid \
dataset/[email protected]_sampler=evaluation \
dataset.re10k.view_sampler.index_path=assets/evaluation_index_acid.json \
checkpointing.load=./pretrained_weights/acid.ckpt \
test.save_image=false test.align_pose=false
# Multiple view evaluation on RealEstate10K
python -m src.main +experiment=spfsplat/re10k mode=test wandb.name=re10k_10view \
dataset/[email protected]_sampler=evaluation \
dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
dataset.re10k.view_sampler.num_context_views=10 \
checkpointing.load=./pretrained_weights/re10k_10view.ckpt
test.save_image=false test.align_pose=false
# RealEstate10K, evaluate on images without known intrinsics
python -m src.main +experiment=spfsplat/re10k mode=test wandb.name=re10k \
dataset/[email protected]_sampler=evaluation \
dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json \
checkpointing.load=./pretrained_weights/re10k_nointrin.ckpt \
model.encoder.backbone.intrinsics_embed_loc='none' \
model.encoder.estimating_focal=true \
test.save_image=true test.align_pose=false
# Evaluate on in-the-wild images, export .ply files, and render videos.
# If camera intrinsics are available, please provide them in the code and use other checkpoints
python -m src.paper.validate_in_the_wild +experiment=spfsplat/re10k wandb.name=re10k_iphone \
model.encoder.backbone.intrinsics_embed_loc='none' \
model.encoder.estimating_focal=true \
mode="test" \
checkpointing.load=models/re10k_nointrin.ckpt
To evaluate the pose estimation performance, you can run the following command:
# RealEstate10K
python -m src.eval_pose +experiment=spfsplat/re10k +evaluation=eval_pose mode=test wandb.name=re10k \
checkpointing.load=./pretrained_weights/re10k.ckpt \
dataset/[email protected]_sampler=evaluation \
dataset.re10k.view_sampler.index_path=assets/evaluation_index_re10k.json
# ACID
python -m src.eval_pose +experiment=spfsplat/acid +evaluation=eval_pose mode=test wandb.name=acid \
checkpointing.load=./pretrained_weights/re10k.ckpt \
dataset/[email protected]_sampler=evaluation \
dataset.re10k.view_sampler.index_path=assets/evaluation_index_acid.json
Note that here we show the evaluation using the model trained on RealEstate10K. You can replace the checkpoint path with other trained models.
We follow the pixelSplat camera system. The camera intrinsic matrices are normalized (the first row is divided by image width, and the second row is divided by image height). The camera extrinsic matrices are OpenCV-style camera-to-world matrices ( +X right, +Y down, +Z camera looks into the screen).
This project is built upon these excellent repositories: NoPoSplat, pixelSplat, DUSt3R, and CroCo. We thank the original authors for their excellent work.
@article{huang2025spfsplat,
title={No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views},
author={Huang, Ranran and Mikolajczyk, Krystian},
journal={arXiv preprint arXiv: 2508.01171},
year={2025}
}