This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
benchmark representation-learning image-retrieval embedding vlm multimodal rag video-retrieval contrastive-learning mmeb visual-document-retrieval
-
Updated
Dec 20, 2025 - Python