Robotic systems demand accurate and comprehensive 3D environment perception, requiring simultaneous capture of a comprehensive representation of photo-realistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS–Voxel hybrid representation that combines fine-grained modeling with structural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applications including multi-domain scene Q&A, interactive edition, perception-guided manipulation, and map-assisted navigation.
git clone https://github.com/omnimap123/anonymous_code.git
cd anonymous_codeUse conda to install the required environment. To avoid problems, it is recommended to follow the instructions below to set up the environment.
conda env create -f environment.yml
conda activate omnimapFollow the instructions to install the YOLO-World model and download the pretrained weights YOLO-Worldv2-L (CLIP-Large).
Follow the instructions to install the TAP model and download the pretrained weights here.
pip install -U sentence-transformersDownload pretrained weights
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2Change the address of the above model in the configuration file in omnimap/config/.
OmniMap has completed validation on Replica (as same with vMap) and Scannet. Please download the following datasets.
- Replica Demo - Replica Room 0 only for faster experimentation.
- Replica - All Pre-generated Replica sequences.
- ScanNet - Official ScanNet sequences.
Run the following command to start the formal execution of the incremental mapping.
# for replica
python demo.py --dataset replica --scene {scene} --vis_gui
# for scannet
python main.py --dataset scannet --scene {scene} --vis_guiYou can use --start {start_id} and --length {length} to specify the starting frame ID and the mapping duration, respectively. The --vis_gui flag controls online visualization; disabling it may improve processing speed.
After building the map, the results will be saved in folder outputs/{scene}, which contains the rendered outputs and evaluation metrics.
We use the rendered depth and color images to generate the color mesh. You can run the following code to perform this operation.
# for replica
python tsdf_integrate.py --dataset replica --scene {scene}
# for scannet
python tsdf_integrate.py --dataset scannet --scene {scene}If you find our work helpful, please cite:
@article{omnimap,
title={OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics},
author={Deng, Yinan and Yue, Yufeng and Dou, Jianyu and Zhao, Jingyu and Wang, Jiahui and Tang, Yujie and Yang, Yi and Fu, Mengyin},
journal={IEEE Transactions on Robotics},
year={2025}
}We would like to express our gratitude to the open-source projects and their contributors HI-SLAM2. Their valuable work has greatly contributed to the development of our codebase.