Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ POMATO Public

[ICCV 2025] This is the official implementation of POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

License

Notifications You must be signed in to change notification settings

wyddmw/POMATO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

ArXiv

๐Ÿค— HuggingFace models

Songyan Zhang1*, Yongtao Ge2,3*, Jinyuan Tian2*, Hao Chen2โ€ , Chen Lv1, Chunhua Shen2

1Nanyang Technological University, Singapore; 2Zhejiang University, China; 3The University of Adelaide, Australia

*Equal Contributions, โ€ Corresponding Author

We present POMATO , a model that enables 3D reconstruction from an arbitrary dynamic video. Without relying on external modules, POMATO can directly perform 3D reconstruction along with temporal 3D point tracking and dynamic mask estimation.

๐Ÿ“ฐNews

  • [Apr 2025] Released paper and init the github repo.
  • [June 2025] POMATO was accepted to ICCV 2025๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰!
  • [July 2025] Released pre-trained models (pairwise and temporal versions) on huggingface and related inference code.

๐Ÿ”จ TODO LIST

  • Release the inference code and huggingface model.
  • Release the pose evaluation code.
  • Release the visualization and evaluation of 3D tracking.
  • Release the video depth evaluation.
  • Release the training code.

๐Ÿš€ Getting Started

Installation

  1. Clone the repository and set up the environment:
git clone https://github.com/wyddmw/POMA_eval.git
cd POMATO
  1. Install dependencies: Follow MonST3R to build the conda environment.
conda create -n pomato python=3.11 cmake=3.14.0
conda activate pomato 
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
# Optional: you can also install additional packages to:
# - training
# - evaluation on camera pose
# - dataset preparation
pip install -r requirements_optional.txt
  1. Optional, install 4d visualization tool, viser.
pip install -e viser
  1. Optional, compile the cuda kernels for RoPE (as in CroCo v2).
# DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Download Model Weights

Download the pre-trained POMATO model weights and place them under pretrained_models/.

  • POMATO Pairwise Model & POMATO Temporal Model: Available Here.

Quick Demo

Play with the demo data for 3D reconstruction.

bash demo.sh

The estimated depth and dynamic masks are saved in ./recon_results. Check the visualization at http://0.0.0.0:8080.

Fast 3D Reconstruction Inference

Use our pre-trained temporal models for temporal enhanced 3D reconstruction in a feed-forward manner. If the input video sequence is less than target 12 frames, the last frame will be repeated for padding.

python inference_scripts/fast_recon_temp.py --model pretrained_models/SPECIFIC_PRETRAINED_MODEL --image_folder YOUR/IMAGE/FOLDER
# an inference example:
# python inference_scripts/fast_recon_temp.py --model pretrained_models/POMATO_temp_6frames.pth --image_folder asssets/davis/

Pose Evaluation

Following download instruction to prepare the datasets first. Then construct the sampled data on Bonn and TUM datasets.

python datasets_preprocess/prepare_bonn.py  # sample with interval 5.

Run the evaluation script:

bash eval_scrips/eval_pose.sh

Tracking

Prepare the tracking validation data.

For adt and pstudio, due to the restriction of the original data lisence, we can't provided the processed data. However, we provide the script to process the original data.

First, follow the guidance of TAPVid-3D. Download the minival dataset to data/

For example, after creating correct environment, run

python3 -m tapnet.tapvid3d.annotation_generation.generate_adt \
--output_dir [path to tapvid datasets] \
--split=minival \
python3 -m tapnet.tapvid3d.annotation_generation.generate_pstudio \
--output_dir [path to tapvid datasets] \
--split=minival \

Then prepare the validation data:

python datasets_preprocess/prepare_tracking_valid.py \
--input_path data/tapvid_datasets \
--output_path data/tracking_eval_data \
--config_path datasets_preprocess/configs/tracking_valid.json \

For PointOdyssey, we provide the link to download the processed data: POMATO_Tracking.

huggingface-cli download xiaochui/POMATO_Tracking po_seq.zip  --local-dir ./data/tracking_eval_data

After downloading the data, unzip it to data/tracking_eval_data

cd data/tracking_eval_data 
unzip po_seq.zip

Modify the input path, output path and weight path before running the script if needed. The outputs will be saved as xxx.npz files.(For more details, please refer to inference_scripts/tracking.sh)

bash inference_scripts/tracking.sh

Please note that during the data generation process for ADT and PStudio, the sampled data and their start time configurations are consistent with those in the original paper. However, since the query points are sampled randomly, the results may differ slightly from those reported in the paper.

We provide a random result to compare it with the result in paper.

Result adt-12 adt-24 pstudio-12 pstudio-24 po-12 po-24
In Paper 31.57 28.22 24.59 19.79 33.20 33.58
Random Result 31.35 28.11 24.56 19.62 33.20 33.58

๐Ÿ“Œ Citation

If you find our POMATO is useful in your research or applications, please consider giving a star โญ and citing using the following BibTeX:

@article{zhang2025pomato,
  title={POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction},
  author={Zhang, Songyan and Ge, Yongtao and Tian, Jinyuan and Xu, Guangkai and Chen, Hao and Lv, Chen and Shen, Chunhua},
  journal={arXiv preprint arXiv:2504.05692},
  year={2025}
}

๐Ÿ™ Acknowledgements

Our code is based on MonST3R, DUSt3R, and MASt3R. We appreciate the authors for their excellent works! We also thank the authors of GCD for their help on the ParallelDomain-4D dataset.

About

[ICCV 2025] This is the official implementation of POMATO: Marrying Pointmap Matching with Temporal Motions for Dynamic 3D Reconstruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •