REOBench: Benchmarking Robustness of Earth Observation Foundation Models
Xiang Li, Yong Tao, Siyuan Zhang, Siwei Liu, Zhitong Xiong, Chunbo Luo, Lu Liu, Mykola Pechenizkiy, Xiao Xiang Zhu, Tianjin Huang
We introduce REOBench, a comprehensive Benchmark designed to evaluate the Robustness of Earth Observation Foundation Models. Our benchmark systematically evaluates the robustness of extensive prevalent foundation models, covering state-of-the-art models based on masked image modeling, contrastive learning, and large language models. REOBench focuses on high-resolution optical remote sensing images, which are widely used in real-world applications such as urban planning and disaster response. We conducted experiments on six widely studied remote sensing image understanding tasks, covering both vision-centric and vision-language tasks, under twelve types of perturbations. These include both appearance-based corruptions (e.g., noise, blur, haze) and geometric distortions (e.g., rotation, scale, translation), applied at varying severity levels to simulate realistic environmental and sensor-induced challenges.
- [2025.05.15] We release the REOBench, a Benchmark for Evaluating the Robustness of Earth Observation Foundation Models.
The dataset can be downloaded from link and used via the Hugging Face datasets library. To load the dataset, you can use the following code snippet:
from datasets import load_dataset
fw = load_dataset("xiang709/REOBench", streaming=True)We use mmsegmentation for semantic segmeantation experiments. Please check Segmenation folder for details.
We use mmrorate for object detection experiments. Please check Detection folder for details.
Please check Classification folder for details.
We provide evaluation code for evaluating vision-langauge models. Check VRSBench folder for details. Codes are adapted from VRSBench.
The dataset is released under the CC-BY-4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
- VRSBench. A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding.
- CLAIR. Automatic GPT-based caption evaluation.
@misc{li2025reobenchbenchmarkingrobustnessearth,
title={REOBench: Benchmarking Robustness of Earth Observation Foundation Models},
author={Xiang Li and Yong Tao and Siyuan Zhang and Siwei Liu and Zhitong Xiong and Chunbo Luo and Lu Liu and Mykola Pechenizkiy and Xiao Xiang Zhu and Tianjin Huang},
year={2025},
eprint={2505.16793},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2505.16793},
}Our REOBench dataset is built based on AID, DIOR, and VRSBench datasets.
We use mmdetection and mmsegmentation for in our experiments.