DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction [
arxiv]
Zhen Yang, Heng Wang, Yanpeng Dong
Beijing Mechanical Equipment Institute, Beijing, China
This is the official implementation of DAOcc. DAOcc is a novel multi-modal occupancy prediction framework that leverages 3D object detection to assist in achieving superior performance while using a deployment-friendly image encoder and practical input image resolution.
- 2025-09-09: DAOcc is accepted to TCSVT — cue the confetti! 🎉
- 2025-07-20: We have open-sourced the TensorRT inference code for DAOcc, achieving 54.25 mIoU at 104.9 FPS. Check it out here.
- 2025-07-11: DAOcc achieved 54.33 mIoU on Occ3D-nuScenes without EMA.
- 2025-04-24: Following SparseBEV, we optimized the 2D-to-3D image feature transformation process, achieving substantial reductions in GPU memory consumption while slightly reducing training time. Check the config file.
- 2025-01-31: Release the model weights and the first version of the code.
- 2024-10-01: Our preprint is available on arXiv.
3D Semantic Occupancy Prediction on Occ3D-nuScenes
| Method | Camera Mask |
Image Backbone |
Image Resolution |
mIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| DAOcc | √ | R50 | 256×704 | 54.33 | config | model | log |
| Method | Camera Mask |
Image Backbone |
Image Resolution |
RayIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| DAOcc | × | R50 | 256×704 | 48.4 | config | model | log |
Deprecated results (archived)
| Method |
Camera Mask |
Image Backbone |
Image Resolution |
mIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| DAOcc | √ | R50 | 256×704 | 53.82 | config | model | log |
| DAOcc* | √ | R50 | 256×704 | 54.19 | - | model | - |
| Method | Camera Mask |
Image Backbone |
Image Resolution |
RayIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| DAOcc | × | R50 | 256×704 | 48.2 | config | model | log |
3D Semantic Occupancy Prediction on SurroundOcc
| Method | Image Backbone |
Image Resolution |
IoU | mIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| DAOcc | R50 | 256×704 | 45.0 | 30.5 | config | model | log |
3D Semantic Occupancy Prediction on OpenOccupancy
| Method | Image Backbone |
Image Resolution |
IoU | mIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|
| DAOcc | R18 | 256×704 | 32.2 | 24.1 | config | model | log |
3D Semantic Occupancy Prediction on Occ3D-Waymo
| Method | Camera Mask |
Infov Mask |
Image Backbone |
Image Resolution |
mIoU | Config | Model | Log |
|---|---|---|---|---|---|---|---|---|
| DAOcc | √ | √ | R50 | 256×704 | 44.69 | config | - | log |
| DAOcc* | √ | √ | R50 | 256×704 | 45.13 | - | - | - |
- The
*means using exponential moving average (EMA) hook. - For Occ3D-Waymo, we use only 20% of the training data.
@article{yang2025daocc,
title={Daocc: 3d object detection assisted multi-sensor fusion for 3d occupancy prediction},
author={Yang, Zhen and Dong, Yanpeng and Wang, Jiayu and Wang, Heng and Ma, Lichao and Cui, Zijian and Liu, Qi and Pei, Haoran and Zhang, Kexin and Zhang, Chao},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2025},
publisher={IEEE}
}Many thanks to these excellent open-source projects: