HLIP

Official PyTorch implementation of the following paper:
Towards Scalable Language-Image Pre-training for 3D Medical Imaging
University of Michigan

Overview

Directly leveraging uncurated clinical studies enables scalable language-image pre-training in 3D medical imaging, as the scale is no longer constrained by the manual effort required from clinicians to select a single representative scan or slice from each study. This paradigm could be more effective when equipped with a hierarchical attention mechanism inspired by the natural structure of the data: slice, scan, and study. We name this framework Hierarchical attention for Language-Image Pre-training (HLIP). For real-world clinical use, HLIP can be applied to studies containing either a single scan (e.g., chest CT) or multiple scans (e.g., brain MRI).

Updates

(2025-06) Complete the initiation of HLIP repository.
(2025-05) Release HLIP models trained on chest CT and brain MRI, feel free to try our demos.

Getting Started

Install

open-clip

python3 -m venv env
source env/bin/activate
pip install -U pip
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
git clone [email protected]:mlfoundations/open_clip.git
cd open_clip
make install
make install-training

Model Card

Data	Objective	Patch Size	Attention	Model	Performance
CT-RATE (20K)	SigLIP	`8, 24, 24`	`slice` + `scan`	ViT Base	-/-
CT-RATE (20K)	CLIP	`8, 24, 24`	`slice` + `scan`	ViT Base	-/-
BrainMRI (220K)	CLIP	`16, 16, 16`	`scan` + `study`	ViT Base	-/-
BrainMRI (220K)	CLIP	`8, 16, 16`	`scan` + `study`	ViT Base	-/-
BrainMRI (220K)	CLIP	`8, 16, 16`	`slice` + `scan` + `study`	ViT Base	-/-
HeadCT (240K)	CLIP	`8, 16, 16`	`scan` + `study`	ViT Base	-/-

Demo

Chest CT: an example from the external Rad-ChestCT dataset.

python inference_rad_chestct.py \
  --model clip_vit_base_singlescan_h2_token1176 \
  --use-cxr-bert \
  --resume /path/to/chestct_clip_vit_base_singlescan_h2_token1176.pt \
  --data ../../docs/tst32751/tst32751.pt

Brain MRI: an example from the external BraTS23 dataset.

python inference_pub_brain_5.py \
  --model clip_vit_base_multiscan_h2_token1176 \
  --resume /path/to/brainmri_clip_vit_base_multiscan_h2_token1176.pt \
  --patch-size 8 16 16 \
  --num-slices 72 \
  --data ../../docs/BraTS-GLI-00459-000

Visualizing the activation with --interpret.

Evaluation

CT-RATE

python zeroshot_ct_rate.py \
  --model clip_vit_base_singlescan_h2_token2744 \
  --use-cxr-bert \
  --resume /path/to/chestct_clip_vit_base_singlescan_h2_token2744.pt \
  --data-root /data/ct_rate/ \
  --zeroshot-template volume

Rad-ChestCT

python zeroshot_rad_chestct.py \
  --model clip_vit_base_singlescan_h2_token2744 \
  --use-cxr-bert \
  --resume /path/to/chestct_clip_vit_base_singlescan_h2_token2744.pt \
  --data-root /data/rad_chestct/ \
  --zeroshot-template volume

Brain MRI

python pub_brain_5_embed.py \
  --model clip_vit_base_multiscan_h2_token1176 \
  --resume /path/to/brainmri_clip_vit_base_multiscan_h2_token1176.pt \
  --data-root /path/to/pub_brain_5
  --num-slices 144 \
  --embed-root /path/to/pub_brain_5_embed

python zeroshot_pub_brain_5.py \
  --model clip_vit_base_multiscan_h2_token1176 \
  --resume /path/to/brainmri_clip_vit_base_multiscan_h2_token1176.pt \
  --embed-root /path/to/pub_brain_5_embed \
  --num-slices 144 \
  --zeroshot_prompt prompt \
  --zeroshot_template template

As there are ~18K studies in the Pub-Brain-5 dataset, evaluation may take ~30 minutes. We first extract the embedding for each study, followed by zero-shot classification. This procedure supports researchers interested in prompt engineering.

--num-slices is set to 144 during evaluation, though we use a fixed input size of 48, 224, 224. We found that HLIP can directly transfer and benefit from higher-resolution inputs at test time.

Training

Our training implementation is closely aligned with open-clip, allowing us to leverage features such as patch dropout and siglip. Below, we provide a training code demo for chest CT. Training on CT-RATE for 20 epochs takes ~6 hours using a node with 4 A40 GPUs.

torchrun --rdzv_endpoint=localhost:29500 --nproc_per_node 4 main.py \
  --logs_dir /path/to/logs/ \
  --json-root ../../data/ct_rate/files/ --data-root /path/to/data/ct_rate/ \
  --train-data raw_annotation --input-info -1150 350 crop \
  --zeroshot-ct-rate ../../data/ct_rate/metafiles/valid_labels.csv --zeroshot-template volume \
  --zeroshot-frequency 1 \
  --save-frequency 1 \
  --report-to wandb \
  --wandb-project-name chest_ct \
  --warmup 377 \
  --batch-size 16 \
  --accum-batch 1 \
  --lr=1e-5 \
  --wd=0.2 \
  --epochs=20 \
  --precision amp \
  --workers 4 \
  --grad-checkpointing \
  --model clip_vit_base_singlescan_h2_token2744 \
  --use-cxr-bert \
  --lock-text

Use the following commands for patch dropout:

  --force-patch-dropout 0.5 \
  --beta2 0.95

Use the following commands for siglip:

  --model siglip_vit_base_singlescan_h2_token2744 \
  --beta2 0.95 \
  --siglip

Citation

If you find this repository helpful, please consider citing:

@article{zhao2025towards,
  title={Towards Scalable Language-Image Pre-training for 3D Medical Imaging},
  author={Zhao, Chenhui and Lyu, Yiwei and Chowdury, Asadur and Harake, Edward and Kondepudi, Akhil and Rao, Akshay and Hou, Xinhai and Lee, Honglak and Hollon, Todd},
  journal={arXiv preprint arXiv:2505.21862},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HLIP

Overview

Updates

Getting Started

Install

Model Card

Demo

Evaluation

Training

Citation

About

Uh oh!

Releases

Packages

Languages

License

Zch0414/hlip

Folders and files

Latest commit

History

Repository files navigation

HLIP

Overview

Updates

Getting Started

Install

Model Card

Demo

Evaluation

Training

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages