Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images
Xiaofei Yu, Yitong Li, Jie Ma*, Chang Li*, Hanlin Wu [paper]
The proposed Diffusion-RSCC consists of:
- A forward diffusion process that adds noise to caption embeddings until they resemble Gaussian noise.
- A reverse denoising process using a specially designed Condition Denoiser:
- Feature Extractor: Pretrained ResNet101 to extract features from bi-temporal images.
- Cross-Mode Fusion (CMF): Integrates visual and textual modalities for precise alignment.
- Stacking Self-Attention (SSA): Refines cross-modal information for accurate conditional mean estimation.
- The denoised latent vectors are converted into natural language captions.
- A large-scale RSICC dataset with 10,077 bi-temporal image pairs and 50,385 captions.
- Covers multiple semantic change types: buildings, roads, vegetation, parking lots, water.
- Resized images: 256×256.
Download Source: -Thanks for the Dataset by Liu et. al:[GitHub]. Put the content of downloaded dataset under the folder 'data'
path to ./data:
├─LevirCCcaptions.json
├─images
├─train
│ ├─A
│ ├─B
├─val
│ ├─A
│ ├─B
├─test
│ ├─A
│ ├─B- Contains 500 urban area image pairs with 2500 annotations for changes in roads, buildings, lakes, etc.
- Resized into 256×256 in Diffusion-RSCC.
- Focuses on urbanization and land cover changes over 10 years.
git clone https://github.com/Fay-Y/Diffusion-RSCC
cd Diffusion-RSCC
conda create -n DiffusionRSCC_env python=3.8
conda activate DiffusionRSCC_env
pip install -r requirements.txtPreprocess the raw captions and image pairs:
python word_encode.py
python img_preprocess.pyTo train the proposed Diffusion-RSCC, run the following command:
sh demo.shTo test, evaluate and visualize on the test dataset, run the following command
sh testlm.shcd resultIn the paper, the predicted captions are saved in folder "result".
Prediction results in test set with 5 Ground Truth captions are partly shown below, proving the effectiveness of our model.
![]() |
![]() |
- Release training logs and checkpoints
- Support more RSICC datasets


