RSCC

RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events

Zhenyuan Chen, Chenxi Wang, Ningyu Zhang, Feng Zhang

Zhejiang University

Accepted by NeurIPS 2025 Datasets and Benchmarks Track

Overview

We introduce the Remote Sensing Change Caption (RSCC) dataset, a new benchmark designed to advance the development of large vision-language models for remote sensing. Existing image-text datasets typically rely on single-snapshot imagery and lack the temporal detail crucial for Earth observation tasks. By providing 62,351 pairs of pre-event and post-event images accompanied by detailed change captions, RSCC bridges this gap and enables robust disaster-awareness bi-temporal understanding. We demonstrate its utility through comprehensive experiments using interleaved multimodal large language models. Our results highlight RSCC’s ability to facilitate detailed disaster-related analysis, paving the way for more accurate, interpretable, and scalable vision-language applications in remote sensing.

📢News

[NEWS] 🎉 2025/09/19: Our paper "RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events" has been accepted by NeurIPS 2025 Datasets and Benchmarks Track!

[COMPLETED] Release RSCC dataset

2025/05/01 All pre-event & post-event images of RSCC (total: 62,351 pairs) are released.
2025/05/01 The change captions of RSCC-Subset (988 pairs) are released, including 10 baseline model results and QvQ-Max results (ground truth).
2025/05/01 The change captions based on Qwen2.5-VL-72B-Instruct of RSCC (total: 62,351 pairs) are released.
2025/09/09 Release RSCC change captions based on strong models (e.g., QvQ-Max, o3).

[COMPLETED] Release code for inference

2025/05/01 Naive inference with baseline models.
2025/05/15 Training-free method augmentation (e.g., VCD, DoLa, DeCo).

[COMPLETED] Release RSCCM training scripts

[COMPLETED] Release code for evaluation

2025/05/01 Metrics for N-Gram (e.g. BLEU, METEOR, ROUGE).
2025/05/01 Metrics for contextual similarity (e.g. Sentence-T5 Similarity, BERTScore).
2025/05/01 Auto comparison of change captions using QvQ-Max (visual reasoning VLM) as a judge.

Dataset

The dataset can be downloaded from Huggingface.

Benchmark Results

Model	N-Gram	N-Gram	Contextual Similarity	Contextual Similarity	Avg_L
(#Activate Params)	ROUGE(%)↑	METEOR(%)↑	BERT(%)↑	ST5-SCS(%)↑	(#Words)
BLIP-3 (3B)	4.53	10.85	98.83	44.05	*456
+ Textual Prompt	10.07 (+5.54↑)	20.69 (+9.84↑)	98.95 (+0.12↑)	63.67 (+19.62↑)	*302
+ Visual Prompt	8.45 (-1.62↓)	19.18 (-1.51↓)	99.01 (+0.06↑)	68.34 (+4.67↑)	*354
Kimi-VL (3B)	12.47	16.95	98.83	51.35	87
+ Textual Prompt	16.83 (+4.36↑)	25.47 (+8.52↑)	99.22 (+0.39↑)	70.75 (+19.40↑)	108
+ Visual Prompt	16.83 (+0.00)	25.39 (-0.08↓)	99.30 (+0.08↑)	69.97 (-0.78↓)	109
Phi-4-Multimodal (4B)	4.09	1.45	98.60	34.55	7
+ Textual Prompt	17.08 (+13.00↑)	19.70 (+18.25↑)	98.93 (+0.33↑)	67.62 (+33.07↑)	75
+ Visual Prompt	17.05 (-0.03↓)	19.09 (-0.61↓)	98.90 (-0.03↓)	66.69 (-0.93↓)	70
Qwen2-VL (7B)	11.02	9.95	99.11	45.55	42
+ Textual Prompt	19.04 (+8.02↑)	25.20 (+15.25↑)	99.01 (-0.10↓)	72.65 (+27.10↑)	84
+ Visual Prompt	18.43 (-0.61↓)	25.03 (-0.17↓)	99.03 (+0.02↑)	72.89 (+0.24↑)	88
LLaVA-NeXT-Interleave (8B)	12.51	13.29	99.11	46.99	57
+ Textual Prompt	16.09 (+3.58↑)	20.73 (+7.44↑)	99.22 (+0.11↑)	62.60 (+15.61↑)	75
+ Visual Prompt	15.76 (-0.33↓)	21.17 (+0.44↑)	99.24 (+0.02↑)	65.75 (+3.15↑)	88
LLaVA-OneVision (8B)	8.40	10.97	98.64	46.15	*221
+ Textual Prompt	11.15 (+2.75↑)	19.09 (+8.12↑)	98.85 (+0.21↑)	70.08 (+23.93↑)	*285
+ Visual Prompt	10.68 (-0.47↓)	18.27 (-0.82↓)	98.79 (-0.06↓)	69.34 (-0.74↓)	*290
InternVL 3 (8B)	12.76	15.77	99.31	51.84	64
+ Textual Prompt	19.81 (+7.05↑)	28.51 (+12.74↑)	99.55 (+0.24↑)	78.57 (+26.73↑)	81
+ Visual Prompt	19.70 (-0.11↓)	28.46 (-0.05↓)	99.51 (-0.04↓)	79.18 (+0.61↑)	84
Pixtral (12B)	12.34	15.94	99.34	49.36	70
+ Textual Prompt	19.87 (+7.53↑)	29.01 (+13.07↑)	99.51 (+0.17↑)	79.07 (+29.71↑)	97
+ Visual Prompt	19.03 (-0.84↓)	28.44 (-0.57↓)	99.52 (+0.01↑)	78.71 (-0.36↓)	102
CCExpert (7B)	7.61	4.32	99.17	40.81	12
+ Textual Prompt	8.71 (+1.10↑)	5.35 (+1.03↑)	99.23 (+0.06↑)	47.13 (+6.32↑)	14
+ Visual Prompt	8.84 (+0.13↑)	5.41 (+0.06↑)	99.23 (+0.00)	46.58 (-0.55↓)	14
TEOChat (7B)	7.86	5.77	98.99	52.64	15
+ Textual Prompt	11.81 (+3.95↑)	10.24 (+4.47↑)	99.12 (+0.13↑)	61.73 (+9.09↑)	22
+ Visual Prompt	11.55 (-0.26↓)	10.04 (-0.20↓)	99.09 (-0.03↓)	62.53 (+0.80↑)	22

Inference

Environment Setup

cd RSCC # path of project root
conda env create -f environment.yaml # genai: env for most baseline models
conda env create -f environment_teochat.yaml # teohat: env for TEOChat
conda env create -f environment_ccexpert.yaml # CCExpert: env for CCExpert

Prepare Pre-trainined Models and Dataset

Note

As transformers.model_utils from_pretrained function would automatically download pre-trained models from huggingface.co, there is the case that you do not have internet connection and would like to use local pre-trained model folder.

We use the same style as huggingface.co as repo_id/model_id. The model folder should be structured as below:

Show Structure

/path/to/model/folder/
├── moonshotai/
│   └── Kimi-VL-A3B-Instruct/
├── Qwen/
│   └── Qwen2-VL-7B-Instruct/
├── Salesforce/
│   └── xgen-mm-phi3-mini-instruct-interleave-r-v1.5/
├── microsoft/
│   └── Phi-4-multimodal-instruct/
├── OpenGVLab/
│   └── InternVL3-8B/
├── llava-hf/
│   ├── llava-interleave-qwen-7b-hf/
│   └── llava-onevision-qwen2-7b-ov-hf/
├── mistralai/
│   └── Pixtral-12B-2409/
├── Meize0729/
│   └── CCExpert_7b/
└── jirvin16/
    └── TEOChat/

[!NOTE] When inferencing with BLIP-3 (xgen-mm-phi3-mini-instruct-interleave-r-v1.5) and CCExpert, you may need to pre-download google/siglip-so400m-patch14-384 under the model folder.

When inference with TEOChat, you may need to pre-download:

LanguageBind/LanguageBind_Image

(Optionally) LanguageBind/LanguageBind_Video_merge

Then set in TEOChat's configs.json:
{
  "mm_image_tower": "/path/to/model/folder/LanguageBind/LanguageBind_Image",
  "mm_video_tower": "/path/to/model/folder/LanguageBind/LanguageBind_Video_merge"
}

Download RSCC dataset and place them under your dataset folder:

/path/to/dataset/folder
├── EBD/
│   └── {events}/
├── xbd/
│   └── images-w512-h512/
│       └── {events}/
└── xbdsubset/
    └── {events}/

Set global variable for PATH_TO_MODEL_FOLDER and PATH_TO_DATASET_FOLDER.

# `RSCC/utils/constants.py`
PATH_TO_MODEL_FOLDER = /path/to/model/folder/ #  "/home/models"
PATH_TO_DATASET_FOLDER = /path/to/dataset/folder # "/home/datasets"

Inference

0. Inference with QvQ-Max

Set api configs under RSCC/.env.

# API key for DashScope (keep this secret!)
DASHSCOPE_API_KEY="sk-xxxxxxxxxx"

# Model ID should match the official code
QVQ_MODEL_NAME="qvq-max-2025-03-25"

# API base URL
API_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"

# Maximum concurrent workers
MAX_WORKERS=30

# Token threshold warning level
TOKEN_THRESHOLD=10000

Run the script.

conda activate genai
python ./inference/xbd_subset_qvq.py

1. Inference with baseline models

[!WARNING]
We support multi-GPUs inference while the Pixtral model and CCExpert model should only be runned on cuda:0.

# inference/xbd_subset_baseline.py
...existing codes...
INFERENCE_MODEL_LIST = [
"moonshotai/Kimi-VL-A3B-Instruct",
"Qwen/Qwen2-VL-7B-Instruct",
"Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5",
"microsoft/Phi-4-multimodal-instruct",
"OpenGVLab/InternVL3-8B",
"llava-hf/llava-interleave-qwen-7b-hf",
"llava-hf/llava-onevision-qwen2-7b-ov-hf",
"mistralai/Pixtral-12B-2409",
# "Meize0729/CCExpert_7b", # omit
# "jirvin16/TEOChat", # omit
]

conda activate genai
python ./inference/xbd_subset_baseline.py
# or you can speficy the output file path, log file path and device
python ./inference/xbd_subset_baseline.py --output_file "./output/xbd_subset_baseline.jsonl" --log_file "./logs/xbd_subset_baseline.log" --device "cuda:0"

2. Inference with TEOChat

[!NOTE]
The baseline models and specialized model (i.e. TEOChat, CCExpert) use different env. You should use the correspond env along with model_list

# inference/xbd_subset_baseline.py
...existing codes...
INFERENCE_MODEL_LIST = [ "jirvin16/TEOChat"]

conda activate teochat
python ./inference/xbd_subset_baseline.py
# or you can speficy the output file path, log file path and device

3. Inference with CCExpert

[!NOTE]
The baseline models and specialized model (i.e. TEOChat, CCExpert) use different env. You should use the correspond env along with model_list

# inference/xbd_subset_baseline.py
...existing codes...
INFERENCE_MODEL_LIST = [ "Meize0729/CCExpert_7b"]

conda activate CCExpert
python ./inference/xbd_subset_baseline.py

Inference with Correction Decoding

python  ./inference_with_cd/inference_baseline_cd.py

Evaluation

Prepare Pre-trained Models

/path/to/model/folder
├── sentence-transformers/ # used for STS-SCS metric
│   └── sentence-t5-xxl/ # or use `sentence-t5-base` for faster evaluation
└── FacebookAI/ # used for BERTSCORE metric
    └── roberta-large/ # or use `roberta-base` for faster evaluation

Run Metrics

We calcuate BLEU, ROUGE, METEOR, BERTSCORE and Sentence-T5 Embedding Similarity for change captions between ground truth and other generated by baseline models.

Note

As we are using huggingface/evaluate, you need have connection to huggingface.co to get scripts and related source of metrics (e.g. BLEU, ROUGE and METEOR).

conda activate genai
python ./evaluation/metrics.py \
--ground_truth_file ./output/xbd_subset_qvq.jsonl \
--predictions_file ./output/xbd_subset_baseline.jsonl > ./logs/eval.log

Fine-tuning RSCCM

cd RSCC
conda env create -f environment_qwenvl_ft.yaml
conda activate qwenvl_ft
bash train/qwen-vl-finetune/scripts/sft_for_rscc_model.sh

Auto Comparison with MLLMs (e.g. Qwen QvQ-Max)

We provide scripts that employ the latest visual reasoning proprietary model (QvQ-Max) to choose the best change caption from a series of candidates.

Show Steps

Set api configs under RSCC/.env.

# API key for DashScope (keep this secret!)
DASHSCOPE_API_KEY="sk-xxxxxxxxxx"

# Model ID should match the official code
QVQ_MODEL_NAME="qvq-max-2025-03-25"

# API base URL
API_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"

# Maximum concurrent workers
MAX_WORKERS=30

# Token threshold warning level
TOKEN_THRESHOLD=10000

Run the script.

conda activate genai
python ./evaluation/autoeval.py

The token usage is auto logged and you can also check RSCC/data/token_usage.json to keep update with remaining token number.

Licensing Information

The dataset is released under the CC-BY-4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

🙏 Acknowledgement

Our RSCC dataset is built based on xBD and EBD datasets.

We are thankful to Kimi-VL, BLIP-3, Phi-4-Multimodal, Qwen2-VL, Qwen2.5-VL, LLaVA-NeXT-Interleave,LLaVA-OneVision, InternVL 3, Pixtral, TEOChat and CCExpert for releasing their models and code as open-source contributions.

The metrics implements are derived from huggingface/evaluate.

The training implements are derived from QwenLM/Qwen2.5-VL.

📜 Citation

@misc{chen2025rscclargescaleremotesensing,
      title={RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events},
      author={Zhenyuan Chen and Chenxi Wang and Ningyu Zhang and Feng Zhang},
      year={2025},
      eprint={2509.01907},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.01907},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RSCC

Overview

📢News

Dataset

Benchmark Results

Inference

Environment Setup

Prepare Pre-trainined Models and Dataset

Inference

Inference with Correction Decoding

Evaluation

Prepare Pre-trained Models

Run Metrics

Fine-tuning RSCCM

Auto Comparison with MLLMs (e.g. Qwen QvQ-Max)

Licensing Information

🙏 Acknowledgement

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
evaluation		evaluation
inference		inference
inference_with_cd		inference_with_cd
libs		libs
static		static
train/qwen-vl-finetune		train/qwen-vl-finetune
utils		utils
.env		.env
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
environment.yaml		environment.yaml
environment_ccexpert.yaml		environment_ccexpert.yaml
environment_qwenvl_ft.yaml		environment_qwenvl_ft.yaml
environment_teochat.yaml		environment_teochat.yaml
index.html		index.html

Bili-Sakura/RSCC

Folders and files

Latest commit

History

Repository files navigation

RSCC

Overview

📢News

Dataset

Benchmark Results

Inference

Environment Setup

Prepare Pre-trainined Models and Dataset

Inference

Inference with Correction Decoding

Evaluation

Prepare Pre-trained Models

Run Metrics

Fine-tuning RSCCM

Auto Comparison with MLLMs (e.g. Qwen QvQ-Max)

Licensing Information

🙏 Acknowledgement

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages