Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Z1zyw/VQ-Map

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VQ-Map [NeurIPS 2024]

NeurIPS 2024 paper VQ-Map official implement.

VQ-Map-Pipeline

News

  • September 2024: VQ-Map accepted at NeurIPS 2024 πŸŽ‰
  • November 2024: Surround-View BEV layout estimation code is released.

Abstract

Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car to make the results coherent and realistic. Due to the challenges posed by occlusion, unfavourable imaging conditions and low resolution, generating the BEV semantic maps corresponding to corrupted or invalid areas in the perspective view (PV) is appealing very recently. The question is how to align the PV features with the generative models to facilitate the map estimation. In this paper, we propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens from the discrete representation learning based on a specialized token decoder module, and finally generate high-quality BEV maps with the BEV codebook embedding serving as a bridge between PV and BEV. We evaluate the BEV map layout estimation performance of our model, termed VQ-Map, on both the nuScenes and Argoverse benchmarks, achieving 62.2/47.6 mean IoU for surround-view/monocular evaluation on nuScenes, as well as 73.4 IoU for monocular evaluation on Argoverse, which all set a new record for this map layout estimation task.

Results

Surround-View BEV Layout Estimation (IoU threshold = 0.5)

Model Dri. Ped. Cross. Walkway Stop Line Carpark Divider Mean
VQ-Map 83.8 60.9 64.2 57.7 55.7 50.8 62.2

Monocular BEV Layout Estimation (IoU threshold = 0.5)

nuScenes

Model Drivable Crossing Walkway Mean
VQ-Map 70.0 43.9 32.7 47.6

Argoverse

Model Drivable
VQ-Map 73.4

Usage

Surround-View

Prerequisites

cd vq_map_sur
conda create -n vq-map-sur python==3.8
conda activate vq-map-sur

Please following bevfusion to prepare environment.

Then, compile deformable attention cuda operator:

cd other/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Data Preparation for Surround-View Task

mkdir data
cd data

Please refer to mmdet3d for preparing the data.

After downloading the dataset:

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes

The folder stucture after processing should be as below:

β”œβ”€β”€ mmdet3d
β”œβ”€β”€ tools
β”œβ”€β”€ configs
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ nuscenes
β”‚   β”‚   β”œβ”€β”€ maps
β”‚   β”‚   β”œβ”€β”€ samples
β”‚   β”‚   β”œβ”€β”€ sweeps
|   |   β”œβ”€β”€ v1.0-trainval
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_train.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_val.pkl

Training

For VQ-VAE training
torchpack dist-run -np 2 python tools/train.py configs/nuscenes/vqvae/codebook128x256.yaml  -p --run-dir workdir/{YOUR_PRETRAIN_EXP_NAME}
For PV-BEV Alignment training

First, set pretrain_model.checkpoint in the configuration file (e.g., configs/nuscenes/seg/vq-map-512d8l.yaml) to the checkpoint from VQ-VAE training.

torchpack dist-run -np 4 python tools/train.py configs/nuscenes/seg/vq-map-512d8l.yaml --run-dir workdir/{YOUR_EXP_NAME}

Evaluation

python tools/test.py configs/nuscenes/seg/ workdir/{YOUR_EXP_NAME}/{YOUR_SAVED_CHECKPOINT} --eval map

Monocular

Prerequisites

cd vq_map_mono
conda create -n vq-map-mono python==3.8
conda activate vq-map-mono

pip install -r requirements.txt

Then, compile deformable attention cuda operator:

cd src/model/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Data Preparation for Momocular Task

Please refer to PON to preparing the data (into data/nuscenes_ground_truth).

β”œβ”€β”€ configs
β”œβ”€β”€ src
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ nuscenes
β”‚   β”‚   β”œβ”€β”€ maps
β”‚   β”‚   β”œβ”€β”€ samples
β”‚   β”‚   β”œβ”€β”€ sweeps
|   |   β”œβ”€β”€ v1.0-trainval
β”‚   β”œβ”€β”€ nuscenes_ground_truth
β”‚   β”‚   β”œβ”€β”€ ....png
β”‚   β”‚   β”œβ”€β”€ ....png
β”‚   β”œβ”€β”€ nuscenes_splits
β”‚   β”œβ”€β”€ nuscenes_infos..

Training

For VQ-VAE training
accelerate launch --main_process_port 20000 src/train_cnn_vq.py --num-epochs 50 --optimizer adamw -n {YOUR_EXP_NAME} --data-cfg config/data/nuscenes_pretrain.yaml --model-cfg config/model/codebook.yaml -bs 32
For PV-BEV Alignment training
accelerate launch --main_process_port 20000 src/train_vq_map.py -bs 16 --model-cfg config/model/vq_map.yaml -n {YOUR_EXP_NAME} -nw 16 --lr 5e-4 --lr-backbone 5e-4 --data-cfg config/data/nuscenes.yaml --num-epochs 40

Acknowledgements

The surround-view version of VQ-Map is based on BEVFusion. It is also greatly inspired by the following outstanding contributions to the open-source community: Deformable-DETR, BEVFormer, BEiTv2, VQGAN, PON

Citation

If VQ-Map supports or enhances your research, please acknowledge our work by citing our paper. Thank you!

@misc{zhang2024vqmapbirdseyeviewmaplayout,
      title={VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization}, 
      author={Yiwei Zhang and Jin Gao and Fudong Ge and Guan Luo and Bing Li and Zhaoxiang Zhang and Haibin Ling and Weiming Hu},
      year={2024},
      eprint={2411.01618},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.01618}, 
}

About

VQ-Map[NeurIPS 2024]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published