Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images

Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images
Xiaofei Yu, Yitong Li, Jie Ma*， Chang Li*, Hanlin Wu [paper]

Model Architecture

The proposed Diffusion-RSCC consists of:

A forward diffusion process that adds noise to caption embeddings until they resemble Gaussian noise.
A reverse denoising process using a specially designed Condition Denoiser:
- Feature Extractor: Pretrained ResNet101 to extract features from bi-temporal images.
- Cross-Mode Fusion (CMF): Integrates visual and textual modalities for precise alignment.
- Stacking Self-Attention (SSA): Refines cross-modal information for accurate conditional mean estimation.
The denoised latent vectors are converted into natural language captions.

Datasets

LEVIR-CC

A large-scale RSICC dataset with 10,077 bi-temporal image pairs and 50,385 captions.
Covers multiple semantic change types: buildings, roads, vegetation, parking lots, water.
Resized images: 256×256.

Download Source: -Thanks for the Dataset by Liu et. al:[GitHub]. Put the content of downloaded dataset under the folder 'data'

path to ./data:
                ├─LevirCCcaptions.json
                ├─images
                  ├─train
                  │  ├─A
                  │  ├─B
                  ├─val
                  │  ├─A
                  │  ├─B
                  ├─test
                  │  ├─A
                  │  ├─B

DUBAI-CC

Contains 500 urban area image pairs with 2500 annotations for changes in roads, buildings, lakes, etc.
Resized into 256×256 in Diffusion-RSCC.
Focuses on urbanization and land cover changes over 10 years.

Installation and Dependencies

git clone https://github.com/Fay-Y/Diffusion-RSCC
cd Diffusion-RSCC
conda create -n DiffusionRSCC_env python=3.8
conda activate DiffusionRSCC_env
pip install -r requirements.txt

Preparation

Preprocess the raw captions and image pairs:

python word_encode.py
python img_preprocess.py

Training

To train the proposed Diffusion-RSCC, run the following command:

sh demo.sh

Testing

To test, evaluate and visualize on the test dataset, run the following command

sh testlm.sh

Visualization

cd result

In the paper, the predicted captions are saved in folder "result".

Prediction samples

Prediction results in test set with 5 Ground Truth captions are partly shown below, proving the effectiveness of our model.

TODO

Release training logs and checkpoints
Support more RSICC datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images

Model Architecture

Datasets

LEVIR-CC

DUBAI-CC

Installation and Dependencies

Preparation

Training

Testing

Visualization

Prediction samples

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
data		data
datasets		datasets
generation_outputs		generation_outputs
scripts		scripts
README.md		README.md
demo.sh		demo.sh
img_preprocess.py		img_preprocess.py
requirements.txt		requirements.txt
testlm.sh		testlm.sh
train_run.py		train_run.py
word_encode.py		word_encode.py

Folders and files

Latest commit

History

Repository files navigation

Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images

Model Architecture

Datasets

LEVIR-CC

DUBAI-CC

Installation and Dependencies

Preparation

Training

Testing

Visualization

Prediction samples

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages