NeurIPS 2024 paper VQ-Map official implement.
- September 2024: VQ-Map accepted at NeurIPS 2024 π
- November 2024: Surround-View BEV layout estimation code is released.
Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car to make the results coherent and realistic. Due to the challenges posed by occlusion, unfavourable imaging conditions and low resolution, generating the BEV semantic maps corresponding to corrupted or invalid areas in the perspective view (PV) is appealing very recently. The question is how to align the PV features with the generative models to facilitate the map estimation. In this paper, we propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens from the discrete representation learning based on a specialized token decoder module, and finally generate high-quality BEV maps with the BEV codebook embedding serving as a bridge between PV and BEV. We evaluate the BEV map layout estimation performance of our model, termed VQ-Map, on both the nuScenes and Argoverse benchmarks, achieving 62.2/47.6 mean IoU for surround-view/monocular evaluation on nuScenes, as well as 73.4 IoU for monocular evaluation on Argoverse, which all set a new record for this map layout estimation task.
| Model | Dri. | Ped. Cross. | Walkway | Stop Line | Carpark | Divider | Mean |
|---|---|---|---|---|---|---|---|
| VQ-Map | 83.8 | 60.9 | 64.2 | 57.7 | 55.7 | 50.8 | 62.2 |
| Model | Drivable | Crossing | Walkway | Mean |
|---|---|---|---|---|
| VQ-Map | 70.0 | 43.9 | 32.7 | 47.6 |
| Model | Drivable |
|---|---|
| VQ-Map | 73.4 |
cd vq_map_sur
conda create -n vq-map-sur python==3.8
conda activate vq-map-surPlease following bevfusion to prepare environment.
Then, compile deformable attention cuda operator:
cd other/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.pymkdir data
cd dataPlease refer to mmdet3d for preparing the data.
After downloading the dataset:
python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenesThe folder stucture after processing should be as below:
βββ mmdet3d
βββ tools
βββ configs
βββ data
β βββ nuscenes
β β βββ maps
β β βββ samples
β β βββ sweeps
| | βββ v1.0-trainval
β β βββ nuscenes_infos_train.pkl
β β βββ nuscenes_infos_val.pkltorchpack dist-run -np 2 python tools/train.py configs/nuscenes/vqvae/codebook128x256.yaml -p --run-dir workdir/{YOUR_PRETRAIN_EXP_NAME}First, set pretrain_model.checkpoint in the configuration file (e.g., configs/nuscenes/seg/vq-map-512d8l.yaml) to the checkpoint from VQ-VAE training.
torchpack dist-run -np 4 python tools/train.py configs/nuscenes/seg/vq-map-512d8l.yaml --run-dir workdir/{YOUR_EXP_NAME}python tools/test.py configs/nuscenes/seg/ workdir/{YOUR_EXP_NAME}/{YOUR_SAVED_CHECKPOINT} --eval mapcd vq_map_mono
conda create -n vq-map-mono python==3.8
conda activate vq-map-mono
pip install -r requirements.txtThen, compile deformable attention cuda operator:
cd src/model/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.pyPlease refer to PON to preparing the data (into data/nuscenes_ground_truth).
βββ configs
βββ src
βββ data
β βββ nuscenes
β β βββ maps
β β βββ samples
β β βββ sweeps
| | βββ v1.0-trainval
β βββ nuscenes_ground_truth
β β βββ ....png
β β βββ ....png
β βββ nuscenes_splits
β βββ nuscenes_infos..accelerate launch --main_process_port 20000 src/train_cnn_vq.py --num-epochs 50 --optimizer adamw -n {YOUR_EXP_NAME} --data-cfg config/data/nuscenes_pretrain.yaml --model-cfg config/model/codebook.yaml -bs 32accelerate launch --main_process_port 20000 src/train_vq_map.py -bs 16 --model-cfg config/model/vq_map.yaml -n {YOUR_EXP_NAME} -nw 16 --lr 5e-4 --lr-backbone 5e-4 --data-cfg config/data/nuscenes.yaml --num-epochs 40The surround-view version of VQ-Map is based on BEVFusion. It is also greatly inspired by the following outstanding contributions to the open-source community: Deformable-DETR, BEVFormer, BEiTv2, VQGAN, PON
If VQ-Map supports or enhances your research, please acknowledge our work by citing our paper. Thank you!
@misc{zhang2024vqmapbirdseyeviewmaplayout,
title={VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization},
author={Yiwei Zhang and Jin Gao and Fudong Ge and Guan Luo and Bing Li and Zhaoxiang Zhang and Haibin Ling and Weiming Hu},
year={2024},
eprint={2411.01618},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.01618},
}