This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i.e., using satellite and street-view image pairs. The main challenge lies in overcoming the significant perspective differences between street views and satellite views. In this work, we introduce SG-BEV, a novel approach for satellite-guided BEV fusion for cross-view semantic segmentation. To overcome the limitations of existing cross-view projection methods in capturing the complete building facade features, we innovatively incorporate Bird's Eye View (BEV) method to establish a spatially explicit mapping of street-view features. Moreover, we fully leverage the advantages of multiple perspectives by introducing a novel satellite-guided reprojection module, optimizing the uneven feature distribution issues associated with traditional BEV methods. Our method demonstrates significant improvements on four cross-view datasets collected from multiple cities, including New York, San Francisco, and Boston. On average across these datasets, our method achieves an increase in mIOU by 10.13% and 5.21% compared with the state-of-the-art satellite-based and cross-view methods.
This project was developed and tested in CUDA 12.1
#### To create conda env:
cd /path/to/SG_BEV
conda env create -f environment.yml
conda activate SG_BEV
pip uninstall mmcv
pip install openmim
mim install mmcv==2.1.0The OmniCity dataset can be downloaded from https://opendatalab.com/OmniCity.
The Brooklyn dataset can be downloaded from "https://opendatalab.com/CVeRS/Cross-view"
The dataset should be organized as follows:
Dataset_root/
│
├── train/
│ ├── gt/
│ └── images/
│ ├── sate
│ └── svi
│
└── val/
├── gt/
└── images/
├── sate
└── svi
This project adopts SegNeXt with the MSCAN-B2 variant as the feature extractor for both street-view and satellite imagery, using non-shared weights pre-trained on the Cityscapes dataset. Pretrained weights can be downloaded from TsingHua Cloud.
Pre-trained weights should be placed in /SG_BEV/checkpoints
This project provides a training script train.sh with support for multi-GPU distributed training.
bash scripts/train.sh # Specify the desired configuration file inside train.sh