GPS-Gaussian+: Generalizable Pixel-Wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views
Boyao Zhou1*, Shunyuan Zheng2*†, Hanzhang Tu1, Ruizhi Shao1, Boning Liu1, Shengping Zhang2✉, Liqiang Nie2, Yebin Liu1
1Tsinghua Univserity 2Harbin Institute of Technology
*Equal contribution †Work done during an internship at Tsinghua University ✉Corresponding author
Projectpage · Paper · Dataset
We present GPS-Gaussian+, a generalizable 3D Gaussian Splatting, for human-centered scene rendering from sparse views in a feed-forward manner.
basketball.mp4
To deploy and run GPS-Gaussian+, run the following scripts:
conda env create --file environment.yml
conda activate gps_plus
Then, compile the diff-gaussian-rasterization in 3DGS repository:
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting/
pip install -e submodules/diff-gaussian-rasterization
cd ..
(Optional) For training with geometry regulatization, install pytorch3d for chamfer_distance. Otherwise, set if_chamfer = False in train.py.
-
You can download our captured THumanMV dataset from OneDrive. We provide 15 sequences of human performance captured in 10-camera setting. In our experiments, we split 10 cameras into 3 work sets: (1,2,3,4) (4,5,6,7) (7,8,9,10).
-
We provide step_0rect.py for source view rectification and step_1.py for novel view processing. To prepare data, you set the correct path for
data_root(raw data) andprocessed_data_root(processed data) in step_0rect.py and step_1.py. Then you can run for example:
cd data_process
python step_0rect.py -i s1a1 -t train
python step_1.py -i s1a1 -t train
python step_0rect.py -i s3a5 -t val
python step_1.py -i s3a5 -t val
python step_0rect.py -i s1a6 -t test
python step_1.py -i s1a6 -t test
cd ..
The processed dataset should be organized as follows:
processed_data_root
├── train/
│ ├── img/
│ │ ├── s1a1_s1_0000/
│ │ │ ├── 0.jpg
│ │ │ ├── 1.jpg
│ │ │ ├── 2.jpg
│ │ │ ├── 3.jpg
│ │ │ ├── 4.jpg
│ │ | └── 5.jpg
│ | └── ...
│ ├── mask/
│ │ ├── s1a1_s1_0000/
│ │ │ ├── 0.jpg
│ │ │ ├── 1.jpg
│ | └── ...
│ ├── parameter/
│ │ ├── s1a1_s1_0000/
│ │ │ ├── 0_1.json
│ │ │ ├── 2_extrinsic.npy
│ │ │ ├── 2_intrinsic.npy
│ | | └── ...
│ | └── ...
└──val
│ ├── img/
│ ├── mask/
│ ├── parameter/
└──test
│ ├── s1a6_process/
│ | ├── img/
│ | ├── mask/
│ | ├── parameter/
Note that 0-1.jpg are rectified input images and 2-5.jpg are images for supervision or evaluation. In particular, 4-5.jpg are original images of 0-1 views.
We provide the pretrained checkpoint in OneDrive and 60-frame processed data in OneDrive. You can directly put the downloaded data into /PATH/TO/processed_data_root/test/. You furthermore modify local_data_root=/PATH/TO/processed_data_root/ in stage.yaml
- For novel-view synthesis, you can set the checkpoint path in test.py and pick a target view in 2-3.
python test.py -i example_data -v 2
- For freeview rendering, you can set the checkpoint path and
LOOP_NUMin run_interpolation.py for frames per work set.
python run_interpolation.py -i example_data
You can check results in experiments\gps_plus.
Once you prepare all training data of 9 sequences and at least one sequence as validation data. You can modify train_data_root and val_data_root in stage.yaml.
python train.py
If you would like to train our network with your own data, you can organize the dataset as above and set inverse_depth_init in stage.yaml. We use inverse_depth_init = 0.3 in our experiments for the largest depth of the scene is around 3.33 meters.
- We assume that you have already calibrated the cameras using the images of the first frame with COLMAP and obtained the "sparse" folder. We provide an example custom_data.zip in this repo.
- Organize your sequential custom data into the following structure:
raw_custom_data_root
├── sparse/
│ ├── 0/
│ │ ├── cameras.bin
│ │ └── images.bin
├── frame0_cam0.jpg (e.g. 0000_00.jpg)
├── frame0_cam1.jpg
├── ...
├── frame0_camN.jpg
├── ......
├── ......
└── frameT_camN.jpg
- You should determine the size (n) of the work set of cameras, which means how many cameras are there between the leftmost and the rightmost. This number can not be too large. Otherwise, the rectification could collapse. Once you set
/PATH/TO/custom_data, you can run followings
cd data_process
python step_0rect_custom.py -t train -n 4
python step_1_custom.py -t train -n 4
python step_0rect_custom.py -t val -n 4
python step_1_custom.py -t val -n 4
- You can probably estimate the
inverse_depth_initin stage.yaml by reading the output of the distance of the left-most and the right-most cameras. See here. - Please verify that the correponding pixels of 0.jpg and 1.jpg in the processed data are aligned in the same horizontal line. If it is not the case, the work set size (n) might be too large or the calibration in sparse folder is not good.
- You can modify
train_data_rootandval_data_rootin stage.yaml with your/PATH/TO/processed_custom_dataand train our network from scratch.
python train.py
If the results are not good, you could modify inverse_depth_init in stage.yaml or crop the processed images along with modifying the intrinsic parameters as in here.
If you find the code or the data is useful for your research, please consider citing:
@article{zhou2024gps,
title={GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views},
author={Zhou, Boyao and Zheng, Shunyuan and Tu, Hanzhang and Shao, Ruizhi and Liu, Boning and Zhang, Shengping and Nie, Liqiang and Liu, Yebin},
journal={arXiv preprint arXiv:2411.11363},
year={2024}
}