diff --git a/docs/docs/examples/examples/image_search.md b/docs/docs/examples/examples/image_search.md index fa4fbe81..d86908a0 100644 --- a/docs/docs/examples/examples/image_search.md +++ b/docs/docs/examples/examples/image_search.md @@ -1,5 +1,5 @@ --- -title: Index Images with ColPali +title: Image Search App with ColPali and FastAPI description: Build image search index with ColPali and FastAPI sidebar_class_name: hidden slug: /examples/image_search @@ -10,46 +10,45 @@ sidebar_custom_props: tags: [vector-index, multi-modal] --- -import { GitHubButton, YouTubeButton } from '../../../src/components/GitHubButton'; +import { GitHubButton, YouTubeButton, DocumentationButton } from '../../../src/components/GitHubButton'; ## Overview +CocoIndex supports native integration with ColPali - with just a few lines of code, you embed and index images with ColPali’s late-interaction architecture. We also build a light weight image search application with FastAPI. -CocoIndex now supports native integration with ColPali — enabling multi-vector, patch-level image indexing using cutting-edge multimodal models. With just a few lines of code, you can now embed and index images with ColPali’s late-interaction architecture, fully integrated into CocoIndex’s composable flow system. - -## Why ColPali for Indexing? +## ColPali **ColPali (Contextual Late-interaction over Patches)** is a powerful model for multimodal retrieval. It fundamentally rethinks how documents—especially visually complex or image-rich ones—are represented and searched. Instead of reducing each image or page to a single dense vector (as in traditional bi-encoders), ColPali breaks an image into many smaller patches, preserving local spatial and semantic structure. Each patch receives its own embedding, which together form a multi-vector representation of the complete document. +![ColPali Architecture](/img/examples/image_search/multi_modal_architecture.png) -## Declare an Image Indexing Flow with CocoIndex - - -In this example, we will use CocoIndex to index images with ColPali, and Qdrant to store and retrieve the embeddings. - -This flow illustrates how we’ll process and index images using ColPali: +## Flow Overview +![Flow](/img/examples/image_search/flow.png) 1. Ingest image files from the local filesystem 2. Use **ColPali** to embed each image into patch-level multi-vectors 3. Optionally extract image captions using an LLM 4. Export the embeddings (and optional captions) to a Qdrant collection -Check out the full working code [here](https://github.com/cocoindex-io/cocoindex/blob/main/examples/image_search/colpali_main.py). +## Setup +- [Install Postgres](https://cocoindex.io/docs/getting_started/installation#-install-postgres) if you don't have one. -:star: Star [CocoIndex on GitHub](https://github.com/cocoindex-io/cocoindex) if you like it! +- Make sure Qdrant is running + ``` + docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant + ``` -### 1. Ingest the Images +## Add Source We start by defining a flow to read `.jpg`, `.jpeg`, and `.png` files from a local directory using `LocalFile`. ```python - @cocoindex.flow_def(name="ImageObjectEmbeddingColpali") def image_object_embedding_flow(flow_builder, data_scope): data_scope["images"] = flow_builder.add_source( @@ -60,41 +59,36 @@ def image_object_embedding_flow(flow_builder, data_scope): ), refresh_interval=datetime.timedelta(minutes=1), ) - ``` The `add_source` function sets up a table with fields like `filename` and `content`. Images are automatically re-scanned every minute. + -### 2. Process Each Image and Collect the Embedding -### 2.1 Embed the Image with ColPali +## Process Each Image and Collect the Embedding We use CocoIndex's built-in `ColPaliEmbedImage` function, which returns a **multi-vector representation** for each image. Each patch receives its own vector, preserving spatial and semantic information. + + ```python img_embeddings = data_scope.add_collector() with data_scope["images"].row() as img: img["embedding"] = img["content"].transform(cocoindex.functions.ColPaliEmbedImage(model="vidore/colpali-v1.2")) + collect_fields = { + "id": cocoindex.GeneratedField.UUID, + "filename": img["filename"], + "embedding": img["embedding"], + } + img_embeddings.collect(**collect_fields) ``` -This transformation turns the raw image bytes into a list of vectors — one per patch — that can later be used for **late interaction search**. - - -### 3. Collect and Export the Embeddings +This transformation turns the raw image bytes into a list of vectors — one per patch — that can later be used for **late interaction search**. And then we collect the embeddings. -Once we’ve processed each image, we collect its metadata and embedding and send it to Qdrant. - -```python -collect_fields = { - "id": cocoindex.GeneratedField.UUID, - "filename": img["filename"], - "embedding": img["embedding"], -} -img_embeddings.collect(**collect_fields) -``` +![ColPali Embedding](/img/examples/image_search/embedding.png) -Then we export to Qdrant using the `Qdrant` target: +## Export the Embeddings ```python img_embeddings.export( @@ -107,7 +101,7 @@ img_embeddings.export( This creates a vector collection in Qdrant that supports **multi-vector fields** — required for ColPali-style late interaction search. -### 4. Enable Real-Time Indexing +## Enable Real-Time Indexing To keep the image index up to date automatically, we wrap the flow in a `FlowLiveUpdater`: @@ -124,50 +118,43 @@ async def lifespan(app: FastAPI): This keeps your vector index fresh as new images arrive. +## Fast API Application -## What’s Actually Stored? - -Unlike typical image search pipelines that store one global vector per image, ColPali stores: +We build a simple FastAPI application to query the index. ```python -Vector[Vector[Float32, N]] +app = FastAPI(lifespan=lifespan) + +app.add_middleware( + CORSMiddleware, + allow_origins=["*"], + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"], +) +# Serve images from the 'img' directory at /img +app.mount("/img", StaticFiles(directory="img"), name="img") ``` -Where: - -- The outer dimension is the **number of patches** -- The inner dimension is the **model’s hidden size** +## Search API & Query the index -This makes the index **multi-vector ready**, and compatible with late-interaction query strategies — like MaxSim or learned fusion. +We use `ColPaliEmbedQuery` to embed the query text into a multi-vector format. - -## Real-Time Indexing with Live Updater - -You can also attach CocoIndex’s `FlowLiveUpdater` to your FastAPI or any Python app to keep your ColPali index synced in real time: + ```python -from fastapi import FastAPI -from contextlib import asynccontextmanager - -@asynccontextmanager -async def lifespan(app: FastAPI): - load_dotenv() - cocoindex.init() - image_object_embedding_flow.setup(report_to_stdout=True) - app.state.live_updater = cocoindex.FlowLiveUpdater(image_object_embedding_flow) - app.state.live_updater.start() - yield - +@cocoindex.transform_flow() +def text_to_colpali_embedding( + text: cocoindex.DataSlice[str], +) -> cocoindex.DataSlice[list[list[float]]]: + return text.transform( + cocoindex.functions.ColPaliEmbedQuery(model=COLPALI_MODEL_NAME) + ) ``` - -## Retrivel and application - -Refer to this example on Query and application building: -https://cocoindex.io/blogs/live-image-search#3-query-the-index - -Make sure we use ColPali to embed the query +Then we build a search API to query the index. ```python +# --- Search API --- @app.get("/search") def search( q: str = Query(..., description="Search query"), @@ -175,40 +162,107 @@ def search( ) -> Any: # Get the multi-vector embedding for the query query_embedding = text_to_colpali_embedding.eval(q) + print( + f"🔍 Query multi-vector shape: {len(query_embedding)} tokens x {len(query_embedding[0]) if query_embedding else 0} dims" + ) + # Search in Qdrant with multi-vector MaxSim scoring using query_points API + search_results = app.state.qdrant_client.query_points( + collection_name=QDRANT_COLLECTION, + query=query_embedding, # Multi-vector format: list[list[float]] + using="embedding", # Specify the vector field name + limit=limit, + with_payload=True, + ) + + print(f"📈 Found {len(search_results.points)} results with MaxSim scoring") + + return { + "results": [ + { + "filename": result.payload["filename"], + "score": result.score, + "caption": result.payload.get("caption"), + } + for result in search_results.points + ] + } ``` -Full working code is available [here](https://github.com/cocoindex-io/cocoindex/blob/main/examples/image_search/colpali_main.py). +## Run the application + +- Install dependencies: + ``` + pip install -e . + pip install 'cocoindex[colpali]' # Adds ColPali support + ``` -Check it out for yourself! It is fun :) In this image search example, the results look better compared to [using CLIP](http://localhost:3000/blogs/live-image-search) with a single dense vector (1D embedding). -ColPali produces richer and more fine-grained retrieval. +- Configure model (optional): + ```sh + # All ColVision models supported by colpali-engine are available + # See https://github.com/illuin-tech/colpali#list-of-colvision-models for the complete list + # ColPali models (colpali-*) - PaliGemma-based, best for general document retrieval + export COLPALI_MODEL="vidore/colpali-v1.2" # Default model + export COLPALI_MODEL="vidore/colpali-v1.3" # Latest version -## Built with Flexibility in Mind + # ColQwen2 models (colqwen-*) - Qwen2-VL-based, excellent for multilingual text (29+ languages) and general vision + export COLPALI_MODEL="vidore/colqwen2-v1.0" + export COLPALI_MODEL="vidore/colqwen2.5-v0.2" # Latest Qwen2.5 model -Whether you’re working on: + # ColSmol models (colsmol-*) - Lightweight, good for resource-constrained environments + export COLPALI_MODEL="vidore/colSmol-256M" -- Visual RAG -- Multimodal retrieval systems -- Fine-grained visual search tools -- Or want to bring image understanding to your AI agent workflows + # Any other ColVision models from https://github.com/illuin-tech/colpali are supported + ``` -[CocoIndex](https://github.com/cocoindex-io/cocoindex) + ColPali gives you a modular, modern foundation to build from. +- Run ColPali Backend: + ``` + uvicorn colpali_main:app --reload --host 0.0.0.0 --port 8000 + ``` + :::warning + Note that recent Nvidia GPUs (such as the RTX 5090) are not supported by the stable PyTorch version up to 2.7.1. + ::: -## Connect to Any Data Source — and Keep It in Sync + If you get this error: -One of CocoIndex’s core strengths is its ability to connect to your existing data sources and automatically keep your index fresh. -Beyond local files, CocoIndex natively supports source connectors including: + ``` + The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90 compute_37. + ``` + + You can install the nightly pytorch build here: https://pytorch.org/get-started/locally/ + + ```sh + pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129 + ``` +- Run Frontend: + ``` + cd frontend + npm install + npm run dev + ``` + + Go to `http://localhost:5173` to search. The frontend works with both backends identically. + + ![Result](/img/examples/image_search/result.png) + +## CLIP Model & Comparison with ColPali +We've also had a similar application built with CLIP model. + + + +In general, +- CLIP: Faster, good for general image-text matching +- ColPali: More accurate for document images and text-heavy content, supports multi-vector late interaction for better precision + +## Connect to Any Data Source + +One of CocoIndex’s core strengths is its ability to connect to your existing data sources and automatically keep your index fresh. Beyond local files, CocoIndex natively supports source connectors including: - Google Drive - Amazon S3 / SQS - Azure Blob Storage -See documentation [here](https://cocoindex.io/docs/ops/sources). + Once connected, CocoIndex continuously watches for changes — new uploads, updates, or deletions — and applies them to your index in real time. - -## Support us - -We’re constantly adding more examples and improving our runtime. -If you found this helpful, please ⭐ star [CocoIndex on GitHub](https://github.com/cocoindex-io/cocoindex) and share it with others. \ No newline at end of file diff --git a/docs/static/img/examples/image_search/cover.png b/docs/static/img/examples/image_search/cover.png index db360205..e75e4678 100644 Binary files a/docs/static/img/examples/image_search/cover.png and b/docs/static/img/examples/image_search/cover.png differ diff --git a/docs/static/img/examples/image_search/embedding.png b/docs/static/img/examples/image_search/embedding.png new file mode 100644 index 00000000..6b87d5d2 Binary files /dev/null and b/docs/static/img/examples/image_search/embedding.png differ diff --git a/docs/static/img/examples/image_search/flow.png b/docs/static/img/examples/image_search/flow.png new file mode 100644 index 00000000..06ea6aab Binary files /dev/null and b/docs/static/img/examples/image_search/flow.png differ diff --git a/docs/static/img/examples/image_search/multi_modal_architecture.png b/docs/static/img/examples/image_search/multi_modal_architecture.png new file mode 100644 index 00000000..2b317013 Binary files /dev/null and b/docs/static/img/examples/image_search/multi_modal_architecture.png differ diff --git a/docs/static/img/examples/image_search/result.png b/docs/static/img/examples/image_search/result.png new file mode 100644 index 00000000..c56da124 Binary files /dev/null and b/docs/static/img/examples/image_search/result.png differ