MedSigLIP is a large-scale medical vision-language model developed by Google Health. It is designed to encode medical images and associated text into a shared embedding space, enabling advanced applications in healthcare AI.
This repository provides a FiftyOne integration for Google's MedSigLIP embedding models, enabling powerful text-image similarity search capabilities in your FiftyOne datasets.
This is a gated model, so you will need to fill out the form on the model card: https://huggingface.co/google/medsiglip-448
Approval should be instantaneous.
You'll also have to set your Hugging Face in your enviornment:
export HF_TOKEN="your_token"Or sign-in to Hugging Face via the CLI:
huggingface-cli login- Architecture: Two-tower encoder, each with 400 million parameters: one for images (vision transformer) and one for text (text transformer).
- Input Support:
- Images: 448x448 resolution
- Text: Up to 64 tokens
- Training Data: Trained on a diverse mix of de-identified medical images and text pairs (e.g., chest X-rays, dermatology, ophthalmology, pathology, CT/MRI slices) plus natural image-text pairs.
- Primary Use Cases:
- Medical image interpretation
- Data-efficient and zero-shot classification
- Semantic image retrieval
- Performance: Demonstrates strong zero-shot and linear probe performance across multiple medical imaging domains, outperforming or matching specialized models on key benchmarks.
- Recommended For: Healthcare AI developers seeking robust, general-purpose medical image and text embeddings, especially for classification and retrieval tasks (not for text generation).
- Zero-shot classification of medical images
- Semantic search in medical image databases
- Embedding generation for downstream machine learning tasks
You can use the SLAKE dataset as a running example. This is how to download it from the Hugging Face hub:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub
dataset = load_from_hub(
"Voxel51/SLAKE",
name="SLAKE",
overwrite=True,
max_samples=10
)Next, you need to register and download the model:
import fiftyone.zoo as foz
# Register this custom model source
foz.register_zoo_model_source("https://github.com/harpreetsahota204/medsiglip")
# Download your preferred SigLIP2 variant
foz.download_zoo_model(
"https://github.com/harpreetsahota204/medsiglip",
model_name="google/medsiglip-448",
)import fiftyone.zoo as foz
model = foz.load_zoo_model(
"google/medsiglip-448"
)dataset.compute_embeddings(
model=model,
embeddings_field="medsiglip_embeddings",
)import fiftyone.brain as fob
results = fob.compute_visualization(
dataset,
embeddings="medsiglip_embeddings",
method="umap",
brain_key="medsiglip_viz",
num_dims=2,
)
# View in the App
session = fo.launch_app(dataset)import fiftyone.brain as fob
# Build a similarity index
text_img_index = fob.compute_similarity(
dataset,
model=model,
brain_key="medsiglip_similarity",
)
# Search by text query
similar_images = text_img_index.sort_by_similarity("a photo of a chest x-ray")
# View results
session = fo.launch_app(similar_images)This model is released with Health AI Developer Foundations Terms of Use. Refer to the official license for details.
@article{sellergren2025medgemma,
title={MedGemma Technical Report},
author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
journal={arXiv preprint arXiv:2507.05201},
year={2025}
}