Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. πŸ‘¨πŸ»β€πŸ³

License

Notifications You must be signed in to change notification settings

tonywu71/colpali-cookbooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

57 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ColPali Cookbooks πŸ‘¨πŸ»β€πŸ³

arXiv Hugging Face X

[ColPali Engine] [ViDoRe Benchmark]

Introduction

ColPali is a model designed to retrieve documents by analyzing their visual features. Unlike traditional systems that rely heavily on text extraction and OCR, ColPali treats each page as an image. It uses Paligemma-3B to capture not only text, but also the layout, tables, charts, and other visual elements to create detailed multi-vector embeddings that can be used for retrieval by computing pairwise late interaction similarity scores. This offers a more comprehensive understanding of documents and enables more efficient and accurate retrieval.

This repository contains notebooks for learning about the ColVision family of models, fine-tuning them for your specific use case, creating similarity maps to interpret their predictions, and more! 😍

Table of Contents

You can find the cookbooks in the examples directory. In the table below, they are listed from most recent to oldest.

Task Notebook Description
Inference, interpretability Use the πŸ€— transformers-native ColQwen2 Use the πŸ€— transformers-native implementation of ColQwen2 for inference, scoring, and interpretability.
Inference, interpretability Use the πŸ€— transformers-native ColPali Use the πŸ€— transformers-native implementation of ColPali for inference, scoring, and interpretability.
RAG ColQwen2: One model for your whole RAG pipeline with adapter hot-swapping πŸ”₯ Save VRAM by using a unique VLM for your entire RAG pipeline. Works even on Colab's free T4 GPU!
Interpretability ColQwen2: Generate your own similarity maps πŸ‘€ Generate your own similarity maps to interpret ColQwen2's predictions.
Interpretability ColPali: Generate your own similarity maps πŸ‘€ Generate your own similarity maps to interpret ColPali's predictions.
Fine-tuning Fine-tune ColPali πŸ› οΈ Fine-tune ColPali using LoRA and optional 4bit/8bit quantization.

Instructions

Open with Colab

The easiest way to use the notebooks is to open them from the examples directory and click on the Colab button below:

Colab

This will open the notebook in Google Colab, where you can run the code and experiment with the models.

Run locally

If you prefer to run the notebooks locally, you can clone the repository and open the notebooks in Jupyter Notebook or in your IDE.

Citation

ColPali: Efficient Document Retrieval with Vision Language Models

Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, CΓ©line Hudelot, Pierre Colombo (* denotes equal contribution)

@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and CΓ©line Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449}, 
}

About

Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. πŸ‘¨πŸ»β€πŸ³

Topics

Resources

License

Stars

Watchers

Forks