uMedGround

This repository provides the code for our accepted TPAMI 2025 paper "Uncertainty-aware Medical Diagnostic Phrase Identifying and Grounding"
Current Pre-implementation of uMedGround arxiv / IEEE

🔥 NEWS 🔥

[2025/09/22] We have released most datasets in the link.
[2025/09/14] We have released all codes for uMedGround. Curated Datasets are coming soon.
[2025/08/13] We launched the Grounding-VLM research group 🌐. If you are interested in Grounding-VLM, please email me to join the Grounding-VLM research group.
[2025/08/06] We will release all codes and datasets as soon as possible.
[2025/08/06] Our camera-ready paper was released first on the arixv.
[2025/08/04] Our paper was accepted by IEEE TPAMI 2025, thanks all my collobrators.
[2025/06/26] Our related paer accepted by ICCV 2025: "GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis"[Paper][Link], Congrats to Bo ! !
[2024/12/02] Our paper submitted to IEEE TPAMI.

Introduction

Medical phrase grounding is crucial for identifying relevant regions in medical images based on phrase queries, facilitating accurate image analysis and diagnosis. However, current methods rely on manual extraction of key phrases from medical reports, reducing efficiency and increasing the workload for clinicians. Additionally, the lack of model confidence estimation limits clinical trust and usability. In this paper, we introduce a novel task—Medical Report Grounding (MRG)—which aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner. To address this challenge, we propose uMedGround, a reliable framework that leverages a multimodal large language model (LLM) to predict diagnostic phrases by embedding a unique token, BOX, into the vocabulary to enhance detection capabilities. The embedded token, together with the input medical image, is decoded by a vision encoder-decoder to generate the corresponding grounding box. Critically, uMedGround incorporates an uncertainty-aware prediction model, significantly improving the robustness and reliability of grounding predictions. Experimental results demonstrate that uMedGround outperforms state-of-the-art medical phrase grounding methods and fine-tuned large visual-language models, validating its effectiveness and reliability. This study represents a pioneering exploration of the MRG task, marking the first-ever endeavor in this domain. Additionally, we explore the potential of uMedGround in grounded medical visual question answering and class-based localization applications.

Highlights

uMedGround unlocks the medical report grounding, which aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner:

End-to-end medical report phrase grounding;
Multimodal framework with box token embedding;
Uncertainty-aware grounding prediction model;
Generalization to VQA and localization tasks.

Requirment

pip install requirements.txt

torch==1.13.1
torchvision==0.14.1
packaging
sentencepiece
peft==0.4.0
einops==0.4.1
...

Public Datasets

Curated Datasets

If the paper goes to substantive review, we promise to disclose all the datasets.

[MRG-MS-CXR dataset]
[MRG-ChestX-ray8 dataset]
[MRG-MIMIC-VQA]
[MRG-MIMIC-Class dataset]

Code Usage

1. Download the pretrained weights

See the script ./pretrained/readme.txt

LLaVA

To train LISA-7B or 13B, you need to follow the instruction to merge the LLaVA delta weights. Typically, we use the final weights LLaVA-Lightning-7B-v1-1 and LLaVA-13B-v1-1 merged from liuhaotian/LLaVA-Lightning-7B-delta-v1-1 and liuhaotian/LLaVA-13b-delta-v1-1, respectively. For Llama2, we can directly use the LLaVA full weights liuhaotian/llava-llama-2-13b-chat-lightning-preview.

SAM ViT-H weights

Download SAM ViT-H pre-trained weights from the link.

MedSAM_vit_b weights

Download SAM ViT-H pre-trained weights from the link.

2. Download the Curated Datasets from the above links, and organize them as follows.

See the script In_data/readme.txt

├── ./ln_data/reason_gro/MS_CXR
│   ├── files
│   │   ├── p10
│   │       └── xxx.png
│   │   ├── p11
│   │       └── xxx.png
│   │   ├── ...
│   ├── X-ray14
│   │   ├── resize_images
│   │       └── xxx.png
│   │       └── xxx.png
│   │       └── ....png

3. Train

Run the script python train.py

deepspeed --master_port=24999 train.py \
  --version="PATH_TO_LLaVA" \
  --dataset_dir='./data' \
  --vision_pretrained="PATH_TO_SAM" \
  --dataset="vqa||reason_gro" \
  --sample_rates="9,3,3,1" \
  --exp_name="uMedGround-7b"
   --mhp_box_head \# MedGround
   --Umhp_box_head \# uMedGround

When training is finished, to get the full model weight:

cd ./runs/uMedGround-7b/ckpt_model && python zero_to_fp32.py . ../pytorch_model.bin

Merge LoRA Weight

Merge the LoRA weights of pytorch_model.bin, save the resulting model into your desired path in the Hugging Face format:

CUDA_VISIBLE_DEVICES="" python merge_lora_weights_and_save_hf_model.py \
  --version="PATH_TO_LLaVA" \
  --weight="PATH_TO_pytorch_model.bin" \
  --save_path="PATH_TO_SAVED_MODEL"

4. Test

Run the script python test.py

Results

Visualizations

Applications

Citation

If you find uMedGround helps your research, please cite our paper:

@InProceedings{uMedGround_Zou_2024,
author="Zou, Ke
and Bai, Yang
and Bo, Liu
and Yidi, Chen,
and Chen, Zhihao
and Chen, Yidi
and Yuan, Xuedong
and Wang, Meng
and Shen, Xiaojing
and Xiaochun Cao
and Yih Chung Tham
and Fu, Huazhu",
title="MedRG: Medical Report Grounding with Multi-modal Large Language Model",
journal={arXiv preprint arXiv:2404.06798},
year={2024}
}

Acknowledgement

This work is built upon the LISA and MedRPG

Contact

If you have any problems about our work, please contact me
Project Link: UMedGround
Wechat Group:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

uMedGround

🔥 NEWS 🔥

Introduction

Highlights

Requirment

Public Datasets

Curated Datasets

Code Usage

1. Download the pretrained weights

LLaVA

SAM ViT-H weights

MedSAM_vit_b weights

2. Download the Curated Datasets from the above links, and organize them as follows.

3. Train

Merge LoRA Weight

4. Test

Results

Visualizations

Applications

Citation

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
data		data
datasets		datasets
image		image
ln_data		ln_data
models		models
pretrained		pretrained
runs		runs
utils		utils
README.md		README.md
data_loader.py		data_loader.py
engine.py		engine.py
merge_lora_weights_and_save_hf_model.py		merge_lora_weights_and_save_hf_model.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Cocofeat/uMedGround

Folders and files

Latest commit

History

Repository files navigation

uMedGround

🔥 NEWS 🔥

Introduction

Highlights

Requirment

Public Datasets

Curated Datasets

Code Usage

1. Download the pretrained weights

LLaVA

SAM ViT-H weights

MedSAM_vit_b weights

2. Download the Curated Datasets from the above links, and organize them as follows.

3. Train

Merge LoRA Weight

4. Test

Results

Visualizations

Applications

Citation

Acknowledgement

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages