Official codebase for DragOSM paper.
[Paper] | [Project HomePage]
DragOSM: Interactive Label Correction for Off-Nadir Aerial Imagery
- About DragOSM
- Key Contributions
- Installation
- Dataset: ReBO
- Dataset Annotator
- Quick Start
- License
- Citation
- Contact
As an emerging research direction, I will introduce more potential research directions here after acceptance.
DragOSM is a new framework to accurately align and correct building roof and footprint labels in off-nadir (oblique) aerial images, especially where historical labels are spatially misaligned. Our method formulates label correction as an interactive "dragging" process, introducing the novel concept of the Alignment Token, and adopts denoising training to robustly learn spatial offsets.
For more details, see our paper:
DragOSM: Extract Building Roofs and Footprints from Aerial Images by Aligning Historical Labels
-
Problem Transformation: We reformulate polygonal building extraction as a task of aligning historical labels with the remote sensing imagery, thereby enabling a unified method that performs effectively on both near-nadir and off-nadir imagery.
-
Alignment Token & Interactive Label Correction:
We introduce the alignment token concept and formulate historical label correction as a two-step interactive dragging process for off-nadir images, solving the mismatch between outdated labels and current imagery. -
Denoising Training & Inference Scheme:
DragOSM uses dynamic Gaussian noise to simulate label displacements during training and learns to iteratively denoise and align annotations. DragOSM interprets the positional perturbation of OSM labels as a Gaussian process centered at the ground truth, rather than starting from structureless noise as in diffusion-based approaches. During multi-step inference, the correction process is modeled as the cumulative effect of Gaussian process differences. -
ReBO: A New Benchmark Dataset:
We curated the Repairing Buildings in OSM (ReBO) dataset with over 179,000 buildings and detailed instance-level polygon corrections (roof, footprint, OSM) across 41 cities. -
Strong Empirical Performance:
DragOSM achieves state-of-the-art results on label alignment tasks, outperforming both extraction-based and prompt-based baselines in both accuracy and robustness.
DragOSM is built upon MMDetection and reuses parts of BONAI.
Please follow the official installation guides for these dependencies:
Make sure you are using a compatible environment (e.g., PyTorch 1.7+, CUDA 11.1+, Python 3.8+) and properly install all dependencies.
# Example (do not run as-is, refer to official docs for details)
conda create -n dragosm python=3.8
conda activate dragosm
pip install torch torchvision
# Install MMDetection and BONAI following their instructionsThe dataset and statistics are avaliable with Onedrive and BaiduDisk.
We provide the annotator which can revise OSM data and review the predictions of DragOSM. You may find it here.
The descriptions for training and test are included in the default configs.
# To Train
bash tools/dist_train.sh configs/DragOSM/dragosm_vit_b_osmrand_intensity_snd_512_omni_ISRA_200.py
# TO Test
python test_offset.py@misc{li2025dragosm,
title={DragOSM: Extract Building Roofs and Footprints from Aerial Images by Aligning Historical Labels},
author={Kai Li and Xingxing Weng and Yupeng Deng and Yu Meng and Chao Pang and Gui-Song Xia and Xiangyu Zhao},
year={2025},
eprint={2509.17951},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.17951},
}