📡 SAR-TEXT Dataset

This is the official repository of the paper:

SAR-TEXT: A Large-Scale SAR Image-Text Dataset Built with SAR-Narrator and A Progressive Learning Strategy for Downstream Tasks（ https://arxiv.org/pdf/2507.18743 ）

It includes:

📁 A large-scale SAR image–text paired dataset (SAR-TEXT)
🤖 Multiple vision-language foundation models (VLFMs) including:
- SAR-CLIP for retrieval
- SAR-CoCa for captioning
- SAR-GPT for generation
🧠 An automatic captioning pipeline based on our SAR-Narrator framework( coming soon )

The goal of this project is to bridge the gap between synthetic aperture radar (SAR) imagery and semantic understanding via vision-language modeling. Everything — code, models, and data — will be open-sourced to support the community.

🖼️ Project Overview

📂 Dataset Release

The complete image and caption data for the SAR-TEXT image-text matching dataset is available via Baidu Netdisk:

🖼️ SAR Image–Text Matching Dataset (SAR-TEXT)
SAR-TEXT-data.zip (shared via Baidu NetDisk)
🔗 Download Link
🔑 Extraction Code: fw5a

This the SAR image–text dialogue dataset introduced in our paper. This release includes:

🛰 Optical Remote Sensing (RS) Dialogue Dataset(RS-VQA)
RS-VQA_conv.json
Based on the RS-VQA dataset, providing multi-turn visual question answering (VQA) dialogue annotations for optical remote sensing images.
📡 SAR Image–Text Dialogue Dataset (SAR-VQA)
SAR-VQA_conv.json (shared via Baidu NetDisk)
🔗 Download Link
🔑 Extraction Code: 1qqj

🤖 Pretrained Models Release

🧠 SAR-RS-CLIP
SAR-RS-CLIP.pt (shared via Baidu NetDisk)
🔗 Download Link
🔑 Extraction Code: 1472
🧠 SAR-RS-CoCa
SAR-RS-CoCa.pt (shared via Baidu NetDisk)
🔗 Download Link
🔑 Extraction Code: g4x3
🧠 SAR-GPT
SAR-GPT.pth (shared via Baidu NetDisk)
🔗 Download Link
🔑 Extraction Code: aqjy

⚙️ Environment and Codebase Notes

This repository integrates multiple models from different codebases. Please make sure to follow the correct environment setup for each component:

✅ CLIP and CoCa models are implemented using the OpenCLIP framework. All related model loading, training, and inference scripts are based on OpenCLIP.
✅ SAR-GPT is based on the TinyGPT-V repository. Any generation tasks involving SAR-GPT should be executed in the TinyGPT-V environment.

Ensure dependencies are installed accordingly before running any module.

🔍 Image-Text Retrieval with SAR-CLIP

The script SAR-CLIP-retrieval.py evaluates image-text retrieval performance using SAR-CLIP, fine-tuned on SAR-Text.

📦 Dataset

CSV File: HRSID_test_caption.csv
Contains paths and corresponding captions.
Images:
📦 Download from Baidu NetDisk:
HRSID_JPG.rar
🔑 Extraction Code: i4xf

🚀 Example Command

python evaluate_retrieval.py \
  --model-name ViT-L-14 \
  --retrieval-csv-path ./HRSID_test_caption.csv \
  --sarclip-path ./checkpoints/sarclip_weights.pt \
  --batch-size 64 \
  --workers 8

📊 Output

The script will print standard retrieval metrics:

retrieval-image2text-R@1, @5, @10
retrieval-text2image-R@1, @5, @10
retrieval-mean-recall

📝 Image Captioning with SAR-CoCa

The script SAR-CoCa-generate-caption.py is used to generate captions for SAR images using the CoCa model.

⚠️ Please ensure that this script is run in the OpenCLIP environment.

🧭 How to Use

Set the folder_path variable in the script to point to the directory containing your SAR images.
Run the script. It will generate a CSV file named SAR-CoCa-caption.csv, containing:
- File path for each image
- Corresponding caption generated by the CoCa model

📄 Output Example

filepath,caption
./test_images/img001.jpg,A ship appears in open water.
./test_images/img002.jpg,A satellite view of a bridge across a river.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
HRSID_test_caption.csv		HRSID_test_caption.csv
LICENSE.txt		LICENSE.txt
README.md		README.md
RS-VQA_conv.json		RS-VQA_conv.json
SAR-CLIP-retrieval.py		SAR-CLIP-retrieval.py
SAR-CoCa-generate-caption.py		SAR-CoCa-generate-caption.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📡 SAR-TEXT Dataset

🖼️ Project Overview

📂 Dataset Release

🤖 Pretrained Models Release

⚙️ Environment and Codebase Notes

🔍 Image-Text Retrieval with SAR-CLIP

📦 Dataset

🚀 Example Command

📊 Output

📝 Image Captioning with SAR-CoCa

🧭 How to Use

📄 Output Example

📚 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

YiguoHe/SAR-TEXT

Folders and files

Latest commit

History

Repository files navigation

📡 SAR-TEXT Dataset

🖼️ Project Overview

📂 Dataset Release

🤖 Pretrained Models Release

⚙️ Environment and Codebase Notes

🔍 Image-Text Retrieval with SAR-CLIP

📦 Dataset

🚀 Example Command

📊 Output

📝 Image Captioning with SAR-CoCa

🧭 How to Use

📄 Output Example

📚 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages