Wanhua Li*, Renping Zhou*, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang†, Hanspeter Pfister†
(* indicates equal contribution, † means Co-corresponding author)
| Project page | Full Paper | Video |
| Datasets Annotations | Google Drive | BaiduWangpan
| Pretrained Model | Google Drive | BaiduWangpan
| Pregenerated Point Clouds by COLMAP | Google Drive | BaiduWangpan
This repository contains the official implementation of the paper "4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models" (CVPR 2025).
@inproceedings{li20254d,
title={4d langsplat: 4d language gaussian splatting via multimodal large language models},
author={Li, Wanhua and Zhou, Renping and Zhou, Jiawei and Song, Yingwei and Herter, Johannes and Qin, Minghan and Huang, Gao and Pfister, Hanspeter},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={22001--22011},
year={2025}
}🎉 Our work is based on LangSplat, and we thank them for their contributions! This work ground CLIP features into a set of 3D language Gaussians, which attains precise 3D language fields while being 199 × faster than LERF. [CVPR 2024] LangSplat
@inproceedings{qin2024langsplat,
title={Langsplat: 3d language gaussian splatting},
author={Qin, Minghan and Li, Wanhua and Zhou, Jiawei and Wang, Haoqian and Pfister, Hanspeter},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={20051--20060},
year={2024}
}🎉 We have released LangSplat V2! The new version significantly improves performance, achieving over 450+ FPS in rendering. [NeurIPS 2025] LangSplat V2
@article{li2025langsplatv2,
title={LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS},
author={Li, Wanhua and Zhao, Yujie and Qin, Minghan and Liu, Yang and Cai, Yuanhao and Gan, Chuang and Pfister, Hanspeter},
journal={arXiv preprint arXiv:2507.07136},
year={2025}
}@inproceedings{li20254dlangsplat4dlanguage,
title={4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models},
author={Wanhua Li and Renping Zhou and Jiawei Zhou and Yingwei Song and Johannes Herter and Minghan Qin and Gao Huang and Hanspeter Pfister},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
The repository contains submodules, thus please check it out with
git clone [email protected]:zrporz/4DLangSplat.git --recursive4D LangSplat uses the following software versions:
- Python 3.10
- CUDA 12.4
- GCC 10.2.0
On default, run the following commands to install the relative packages
conda create -n 4DLangSplat python=3.10
conda activate 4DLangSplat
pip install -r requirements.txt
### submodules for gaussian rasterization ###
pip install -e submodules/simple-knn
pip install -e submodules/4d-langsplat-rasterization
### submodules for generate segmentation map ###
pip install -e submodules/4d-langsplat-tracking-anything-with-deva
pip install git+https://github.com/facebookresearch/segment-anything.gitOur models are trained and evaluated on HyperNeRF and Neu3D datasets. Please follow their instructions to prepare your dataset, or run the following commands:
bash scripts/download_hypernerf.sh data/hypernerf
bash scripts/download_neu3d.sh data/neu3dTo evaluate the rendering results, we use RoboFlow to annotate the datasets. The annotations can be accessed through this link: Download the Annotations.
Follow 4DGaussians, we use COLMAP to generate the point clouds. Please follow their pipeline, or use ours: Download the Point Clouds
Then put them under data/<hypernerf or neu3d>/<dataset name>. You need to ensure that the data folder is organized as follows:
|——data
| | hypernerf
| | americano
| |——annotations
| |——train
| |——README
| |——video_annotations.json
| |——camera
| |——rgb
| |——1x
| |——000001.png
| ...
| |——2x
| ...
| |——dataset.json
| |——metadata.json
| |——points.npy
| |——scene.json
| |——points3D_downsample2.ply
| |——chickchicken
| ...
| | neu3d
| | coffee_martini
| |——annotations
| |——train
| |——README
| |——cam00
| |——images
| |——0000.png
| ...
| |——cam01
| ...
| |——cam00.mp4
| |——cam01.mp4
| ...
| |——poses_bounds.npy
| |——points3D_downsample2.ply
| |——cur_roasted_beef
| ...
We provide the pretrained checkpoints of gaussian model and autoencoder: Download Pretrained Checkpoint.
For HyperNeRF dataset, take americano as an example. Put checkpoint folder upder the output/hypernerf/americano and run the following commands for rendering and evaluation
bash scripts/render-hypernerf.sh
bash scripts/eval-hypernerf.shFor Neu3D dataset, take coffee_martini as an example. Put checkpoint folder under the output/neu3d/coffee_martini and run the following commands for rendering and evaluation
bash scripts/render-neu3d.sh
bash scripts/eval-neu3d.shThe evaluation results will be saved under eval/eval_results.
First execute the demo script to generate segmentation maps:
cd submodules/4d-langsplat-tracking-anything-with-deva
bash scripts/download_models.sh # Download the model parameters if you are a first time user
bash scripts/demo-chickchicken.shThe output segmentation maps will be saved in submodules/4d-langsplat-tracking-anything-with-deva/output
Extract CLIP features:
bash scripts/extract_clip_features.shGenerate video features:
bash scripts/generate-video-feature.shThese commands will create two feature directories under your dataset path:
clip_features: Extracted by CLIP modelvideo_features: Extracted by E5 model
Run the training and evaluation script:
bash scripts/train_eval.shThis will train the 4D LangSplat field and perform evaluation.
- release the code of the 4d-langsplat-rasterization
- release the code of the 4d-langsplat-tracking-anything-with-deva
- release the code of the evaluation
- release the code of the autoencoder
- release the code of preprocessing
- release the code of training
- release the the pretrained model
- release the preprocessed dataset
- update the arxiv link