Codestin Search App

InstanceVG: Improving Generalized Visual Grounding with Instance-aware Joint Learning

Ming Dai¹, Wenxuan Cheng¹, Jiang-Jiang Liu², Lingfeng Yang³, Zhenhua Feng⁴, Wankou Yang^1*, Jingdong Wang²

¹Southeast University ²Baidu VIS ³Jiangnan University ⁴Nanjing University of Science and Technology

📢 News

[2025.10.11] Codes, pretrained models, and datasets are now released! 🎉 .

🧩 Abstract

Generalized visual grounding tasks, including Generalized Referring Expression Comprehension (GREC) and Segmentation (GRES), extend the classical paradigm by accommodating multi-target and non-target scenarios. While GREC focuses on coarse-level bounding box localization, GRES aims for fine-grained pixel-level segmentation.

Existing approaches typically treat these tasks independently, ignoring the potential benefits of joint learning and cross-granularity consistency. Moreover, most treat GRES as mere semantic segmentation, lacking instance-aware reasoning between boxes and masks.

We propose InstanceVG, a multi-task generalized visual grounding framework that unifies GREC and GRES via instance-aware joint learning. InstanceVG introduces instance queries with prior reference points to ensure consistent prediction of points, boxes, and masks across granularities.

To our knowledge, InstanceVG is the first framework to jointly tackle both GREC and GRES while integrating instance-aware consistency learning. Extensive experiments on 10 datasets across 4 tasks demonstrate that InstanceVG achieves state-of-the-art performance, substantially surpassing existing methods across various evaluation metrics.

🏗️ Framework Overview

⚙️ Installation

Environment requirements

CUDA == 11.8
torch == 2.0.0
torchvision == 0.15.1

1. Install dependencies

pip install -r requirements.txt

InstanceVG depends on components from detrex and detectron2.

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
git clone https://github.com/IDEA-Research/detrex.git
cd detrex
git submodule init && git submodule update
pip install -e .

Finally, install InstanceVG in editable mode:

pip install -e .

🧮 Data Preparation

Prepare the MS-COCO dataset and download the referring and foreground annotations from the HF-Data.

Expected directory structure:

data/
└── seqtr_type/
    ├── annotations/
    │   ├── mixed-seg/
    │   │   └── instances_nogoogle_withid.json
    │   ├── grefs/instance.json
    │   ├── ref-zom/instance.json
    │   └── rrefcoco/instance.json
    └── images/
        └── mscoco/
            └── train2014/

🧠 Pretrained Weights

InstanceVG uses BEiT-3 as both the backbone and multi-modal fusion module.

Download pretrained weights and tokenizer from BEiT-3’s official repository.

mkdir pretrain_weights

Place the following files:

pretrain_weights/
├── beit3_base_patch16_224.zip
├── beit3_large_patch16_224.zip
└── beit3.spm

🚀 Demo

Example 1 — GRES task

python tools/demo.py \
  --img "asserts/imgs/Figure_1.jpg" \
  --expression "three skateboard guys" \
  --config "configs/gres/InstanceVG-grefcoco.py" \
  --checkpoint /PATH/TO/InstanceVG-grefcoco.pth

Example 2 — RIS task

python tools/demo.py \
  --img "asserts/imgs/Figure_2.jpg" \
  --expression "full half fruit" \
  --config "configs/refcoco/InstanceVG-refcoco.py" \
  --checkpoint /PATH/TO/InstanceVG-refcoco.pth

For additional options (e.g., thresholds, alternate checkpoints), see tools/demo.py.

🧩 Training

To train InstanceVG from scratch:

bash tools/dist_train.sh [PATH_TO_CONFIG] [NUM_GPUS]

📊 Evaluation

To reproduce reported results:

bash tools/dist_test.sh [PATH_TO_CONFIG] [NUM_GPUS] \
  --load-from [PATH_TO_CHECKPOINT_FILE]

🏆 Model Zoo

All pretrained checkpoints are available on Model.

Task / Train Set	Config	Checkpoint
RefCOCO/+/g (Base)	`configs/refcoco/InstanceVG-B-refcoco.py`	`InstanceVG-B-refcoco.pth`
RefCOCO/+/g (Large)	`configs/refcoco/InstanceVG-L-refcoco.py`	`InstanceVG-L-refcoco.pth`
gRefCOCO	`configs/gres/InstanceVG-grefcoco.py`	`InstanceVG-grefcoco.pth`
Ref-ZOM	`configs/refzom/InstanceVG-refzom.py`	`InstanceVG-refzom.pth`
RRefCOCO	`configs/rrefcoco/InstanceVG-rrefcoco.py`	`InstanceVG-rrefcoco.pth`

Example reproduction:

bash tools/dist_test.sh configs/refcoco/InstanceVG-B-refcoco.py 1 \
  --load-from work_dir/refcoco/InstanceVG-B-refcoco.pth

📚 Citation

If you find our work useful, please cite:

@ARTICLE{instancevg,
  author={Dai, Ming and Cheng, Wenxuan and Liu, Jiang-Jiang and Yang, Lingfeng and Feng, Zhenhua and Yang, Wankou and Wang, Jingdong},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  title={Improving Generalized Visual Grounding with Instance-aware Joint Learning},
  year={2025},
  doi={10.1109/TPAMI.2025.3607387}
}

@article{dai2024simvg,
  title={SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-Modal Fusion},
  author={Dai, Ming and Yang, Lingfeng and Xu, Yihao and Feng, Zhenhua and Yang, Wankou},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={121670--121698},
  year={2024}
}

@inproceedings{dai2025multi,
  title={Multi-Task Visual Grounding with Coarse-to-Fine Consistency Constraints},
  author={Dai, Ming and Li, Jian and Zhuang, Jiedong and Zhang, Xian and Yang, Wankou},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={3},
  pages={2618--2626},
  year={2025}
}

⭐ Acknowledgements

Our implementation builds upon

We thank these excellent open-source projects for their contributions to the community.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
asserts		asserts
configs		configs
instancevg		instancevg
tools		tools
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
multi_task.sh		multi_task.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InstanceVG: Improving Generalized Visual Grounding with Instance-aware Joint Learning

📢 News

🧩 Abstract

🏗️ Framework Overview

⚙️ Installation

1. Install dependencies

🧮 Data Preparation

🧠 Pretrained Weights

🚀 Demo

🧩 Training

📊 Evaluation

🏆 Model Zoo

📚 Citation

⭐ Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Dmmm1997/InstanceVG

Folders and files

Latest commit

History

Repository files navigation

InstanceVG: Improving Generalized Visual Grounding with Instance-aware Joint Learning

📢 News

🧩 Abstract

🏗️ Framework Overview

⚙️ Installation

1. Install dependencies

🧮 Data Preparation

🧠 Pretrained Weights

🚀 Demo

🧩 Training

📊 Evaluation

🏆 Model Zoo

📚 Citation

⭐ Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages