This repository is the offical implementation for "Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation."[IEEE TGRS] [arXiv]
The code has been verified to work with PyTorch v1.12.1 and Python 3.7.
- Clone this repository.
- Change directory to root of this repository.
- Create a new Conda environment with Python 3.7 then activate it:
conda create -n FIANet python==3.7
conda activate FIANet- Install PyTorch v1.12.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch- Install the packages in
requirements.txtviapip:
pip install -r requirements.txt- Create the
./pretrained_weightsdirectory where we will be storing the weights.
mkdir ./pretrained_weights- Download pre-trained classification weights of
the Swin Transformer,
and put the
pthfile in./pretrained_weights. These weights are needed for training to initialize the visual encoder. - Download BERT weights from HuggingFace’s Transformer library, and put it in the root directory.
We perform the experiments on two dataset including RefSegRS and RRSIS-D.
We use one GPU to train our model. For training on RefSegRS dataset:
python train.py --dataset refsegrs --model_id FIANet --epochs 60 --lr 5e-5 --num_tmem 1 For training on RRSIS-D dataset:
python train.py --dataset rrsisd --model_id FIANet --epochs 40 --lr 3e-5 --num_tmem 3 The pretrained models can be downloaded from [BaiduNetDisk](extract code: 65g4).
For RefSegRS dataset:
python test.py --swin_type base --dataset refsegrs --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 1 For RRSIS-D dataset:
python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split test --window12 --img_size 480 --num_tmem 3If you find this code useful for your research, please cite our paper:
@ARTICLE{10816052,
author={Lei, Sen and Xiao, Xinyu and Zhang, Tianlin and Li, Heng-Chao and Shi, Zhenwei and Zhu, Qing},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation},
year={2025},
volume={63},
number={},
pages={1-11},
doi={10.1109/TGRS.2024.3522293}}
Code in this repository is built on RMSIN and LAVT. We'd like to thank the authors for open sourcing their project.