Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lavender105/RSGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RSGPT: A Remote Sensing Vision Language Model and Benchmark

Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li☨

☨corresponding author

This is an ongoing project. We are working on increasing the dataset size.

Related Projects

RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model

Congcong Wen*, Yiting Lin*, Xiaokang Qu, Nan Li, Yong Liao, Hui Lin, Xiang Li

FedRSCLIP: Federated learning for remote sensing scene classification using vision-language models

Hui Lin*, Chao Zhang*, Danfeng Hong, Kexin Dong, and Congcong Wen☨

RS-MoE: A Vision–Language Model With Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering

Hui Lin*, Danfeng Hong*, Shuhang Ge*, Chuyao Luo, Kai Jiang, Hao Jin, and Congcong Wen☨

VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding

Xiang Li, Jian Ding, Mohamed Elhoseiny

Vision-language models in remote sensing: Current progress and future trends

Xiang Li*☨, Congcong Wen*, Yuan Hu*, Zhenghang Yuan, Xiao Xiang Zhu

RS-CLIP: Zero Shot Remote Sensing Scene Classification via Contrastive Vision-Language Supervision

Xiang Li, Congcong Wen, Yuan Hu, Nan Zhou

🔥 Updates

  • [2025.05.08] We release the code for training and testing RSGPT.
  • [2024.12.18] We release the manual scoring results for RSIEval.
  • [2024.06.19] We release the VRSBench, A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding. VRSBench contains 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. check VRSBench Project Page.
  • [2024.05.23] We release the RSICap dataset. Please fill out this form to get both RSICap and RSIEval dataset.
  • [2023.11.10] Our survey about vision-language models in remote sensing. RSVLM.
  • [2023.10.22] The RSICap dataset and code will be released upon paper acceptance.
  • [2023.10.22] We release the evaluation dataset RSIEval. Please fill out this form to get both the RSIEval dataset.

Dataset

  • RSICap: 2,585 image-text pairs with high-quality human-annotated captions.
  • RSIEval: 100 high-quality human-annotated captions with 936 open-ended visual question-answer pairs.

Code

The idea of finetuning our vision-language model is borrowed from MiniGPT-4. Our model is based on finetuning InstructBLIP using our RSICap dataset.

🚀 Installation

Set up a conda environment using the provided environment.yml file:

Step 1: Create the environment

conda env create -f environment.yml

Step 2: Activate the environment

conda activate rsgpt

Training

torchrun --nproc_per_node=8 train.py --cfg-path train_configs/rsgpt_train.yaml

Testing

Test image captioning:

python test.py --cfg-path eval_configs/rsgpt_eval.yaml --gpu-id 0 --out-path rsgpt/output --task ic

Test visual question answering:

python test.py --cfg-path eval_configs/rsgpt_eval.yaml --gpu-id 0 --out-path rsgpt/output --task vqa

Acknowledgement

  • MiniGPT-4. A popular open-source vision-language model.
  • InstructBLIP. The model architecture of RSGPT follows InstructBLIP. Don't forget to check out this great open-source work if you don't know it before!
  • Lavis. This repository is built upon Lavis!
  • Vicuna. The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!

If you're using RSGPT in your research or applications, please cite using this BibTeX:

@article{hu2025rsgpt,
  title={Rsgpt: A remote sensing vision language model and benchmark},
  author={Hu, Yuan and Yuan, Jianlong and Wen, Congcong and Lu, Xiaonan and Liu, Yu and Li, Xiang},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing},
  volume={224},
  pages={272--286},
  year={2025},
  publisher={Elsevier}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages