Snappy pairs a FastAPI backend, a ColPali embedding service, and a Next.js frontend to deliver vision-first retrieval over PDFs. Each page is rasterized, embedded as multivectors, and stored alongside images so you can search by how documents look rather than only extracted text.
TL;DR
- Vision-focused retrieval and chat with ColPali multivector embeddings, MinIO image storage, and Qdrant search.
- Streaming responses, live indexing progress, and a schema-driven configuration UI to keep changes safe.
- One Docker Compose stack or individual services for local development and production-style deployments.
Table of Contents
- Quick Start
- Highlights
- Use Cases
- Architecture
- Frontend Experience
- Demo
- Environment Variables
- API Overview
- Troubleshooting
- Developer Notes
- Documentation
- Further Reading
- License
- Acknowledgements
Snappy.mp4
---
config:
theme: neutral
layout: elk
---
flowchart TB
subgraph Frontend["Next.js Frontend"]
UI["Pages (/upload, /search, /chat, /configuration, /maintenance)"]
CHAT["Chat API Route"]
end
subgraph Backend["FastAPI Backend"]
API["REST Routers"]
end
subgraph Services["Supporting Services"]
QDRANT["Qdrant"]
MINIO["MinIO"]
COLPALI["ColPali Embedding API"]
OPENAI["OpenAI Responses API"]
end
USER["Browser"] <--> UI
UI --> API
API --> QDRANT
API --> MINIO
API --> COLPALI
CHAT --> API
CHAT --> OPENAI
CHAT -- SSE --> USER
Head to backend/docs/architecture.md and backend/docs/analysis.md for a deeper walkthrough of the indexing and retrieval flows.
Using pre-built images? Skip to Option A for the fastest deployment using the pre-built containers from GitHub Container Registry.
cp .env.example .env
cp frontend/.env.example frontend/.env.localAdd your OpenAI API key to frontend/.env.local and review the backend defaults in .env.
From colpali/ pick one profile:
# GPU profile (CUDA + flash-attn tooling)
docker compose --profile gpu up -d --build
# CPU profile (no GPU dependencies)
docker compose --profile cpu up -d --buildOnly start one profile at a time to avoid port clashes. The first GPU build compiles flash-attn; subsequent builds reuse the cached wheel.
Use the pre-built images from GitHub Container Registry for instant deployment:
# Pull pre-built images
docker pull ghcr.io/athrael-soju/Snappy/backend:latest
docker pull ghcr.io/athrael-soju/Snappy/frontend:latest
docker pull ghcr.io/athrael-soju/Snappy/colpali-cpu:latest
# Create minimal docker-compose.yml (see docs/DOCKER_IMAGES.md)
# Then start services
docker compose up -dAvailable images:
backend:latest- FastAPI backend (amd64/arm64)frontend:latest- Next.js frontend (amd64/arm64)colpali-cpu:latest- CPU embedding service (amd64/arm64)colpali-gpu:latest- GPU embedding service (amd64 only)
Full guide: See docs/DOCKER_IMAGES.md for complete documentation on using pre-built images, version tags, configuration, and production deployment examples.
At the project root:
docker compose up -d --buildServices will come online at:
- Backend: http://localhost:8000
- Frontend: http://localhost:3000
- Qdrant: http://localhost:6333
- MinIO: http://localhost:9000 (console at :9001)
Update .env and frontend/.env.local if you need to expose different hostnames or ports.
-
In
backend/, install dependencies and launch FastAPI:python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1 pip install -U pip setuptools wheel pip install -r backend/requirements.txt uvicorn backend:app --host 0.0.0.0 --port 8000 --reload
-
Start Qdrant and MinIO (via Docker or your preferred deployment).
-
In
frontend/, install and run the Next.js app:yarn install --frozen-lockfile yarn dev
-
Keep the ColPali service from step 2 running (Docker or
uvicorn colpali/app.py).
- Page-level vision retrieval powered by ColPali multivector embeddings; no OCR pipeline to maintain.
- Streaming chat responses from the OpenAI Responses API with inline visual citations so you can see each supporting page.
- Pipelined indexing with live Server-Sent Events progress updates and optional MUVERA-assisted first-stage search.
- Runtime configuration UI backed by a typed schema, with reset and draft flows that make experimentation safe.
- Docker Compose profiles for ColPali (GPU or CPU) plus an all-in-one stack for local development.
Snappy excels at retrieval scenarios where visual layout, formatting, and appearance matter as much as textual content:
- Legal Document Analysis - Search case files, contracts, and legal briefs by visual layout, annotations, and document structure without relying on OCR accuracy.
- Medical Records Retrieval - Find patient charts, diagnostic reports, and medical forms by handwritten notes, stamps, diagrams, and visual markers that traditional text search misses.
- Financial Auditing and Compliance - Locate invoices, receipts, financial statements, and compliance documents by visual characteristics like logos, stamps, signatures, and table layouts.
- Academic Research and Papers - Search scientific papers, technical documents, and research archives by figures, tables, equations, charts, and visual presentation; ideal for literature reviews.
- Archive and Document Management - Retrieve historical documents, scanned archives, and legacy records by visual appearance, preserving context that text extraction destroys.
- Engineering and Technical Documentation - Find blueprints, schematics, technical drawings, and specification sheets by visual elements, diagrams, and layout patterns.
- Media and Publishing - Search newspaper archives, magazine layouts, and published materials by visual design, page composition, and formatting.
- Educational Content - Organize and retrieve textbooks, lecture notes, and educational materials by visual structure, highlighting, and annotations.
The Next.js 16 frontend with React 19.2 keeps things fast and friendly: real-time streaming, responsive layouts, and design tokens (text-body-*, size-icon-*) that make extending the UI consistent. Configuration and maintenance pages expose everything the backend can do, while upload/search/chat give you the workflows you need day to day.
COLPALI_URL,COLPALI_API_TIMEOUTQDRANT_EMBEDDED,QDRANT_URL,QDRANT_COLLECTION_NAME,QDRANT_PREFETCH_LIMIT,QDRANT_MEAN_POOLING_ENABLED, optional quantisation togglesMINIO_URL,MINIO_PUBLIC_URL, credentials, bucket naming,IMAGE_FORMAT,IMAGE_QUALITYMUVERA_ENABLEDand related settings (requiresfastembed[postprocess]in your environment)LOG_LEVEL,ALLOWED_ORIGINS,UVICORN_RELOAD
All schema-backed settings (and defaults) are documented in backend/docs/configuration.md. Runtime updates via /config/update are ephemeral; update .env for persistence.
NEXT_PUBLIC_API_BASE_URL(defaults tohttp://localhost:8000)OPENAI_API_KEY,OPENAI_MODEL, optionalOPENAI_TEMPERATURE,OPENAI_MAX_TOKENS
| Area | Endpoint(s) | Notes |
|---|---|---|
| Meta | GET /health |
Service and dependency status |
| Retrieval | GET /search?q=...&k=5 |
Page-level search (defaults to 10 when k omitted) |
| Indexing | POST /index |
Background indexing job (multipart PDF upload) |
GET /progress/stream/{job_id} |
Real-time progress (SSE) | |
POST /index/cancel/{job_id} |
Cancel an active job | |
| Maintenance | GET /status |
Collection/bucket statistics |
POST /initialize, DELETE /delete |
Provision or tear down collection + bucket | |
POST /clear/qdrant, /clear/minio, /clear/all |
Data reset helpers | |
| Configuration | GET /config/schema, /config/values |
Expose runtime schema and values |
POST /config/update, /config/reset |
Runtime configuration management |
Chat streaming lives in frontend/app/api/chat/route.ts. The route calls the backend search endpoint, invokes the OpenAI Responses API, and streams Server-Sent Events to the browser. The backend does not proxy OpenAI calls.
- ColPali timing out? Increase
COLPALI_API_TIMEOUTor run the GPU profile for heavy workloads. - Progress bar stuck? Ensure Poppler is installed and check backend logs for PDF conversion errors.
- Missing images? Verify MinIO credentials/URLs and confirm
next.config.tsallows the domains you expect. - CORS issues? Replace wildcard
ALLOWED_ORIGINSentries with explicit URLs before exposing the API publicly. - Config changes vanish?
/config/updatemodifies runtime state only-update.envfor anything you need to keep after a restart. - Upload rejected? The uploader currently accepts PDFs only. Adjust max size, chunk size, or file count limits in the "Uploads" section of the configuration UI.
backend/docs/configuration.md and backend/CONFIGURATION_GUIDE.md cover advanced troubleshooting and implementation details.
- Background indexing uses FastAPI
BackgroundTasks. For larger deployments consider a dedicated task queue. - MinIO worker pools auto-size based on hardware. Override only when you have specific throughput limits.
- TypeScript types and Zod schemas regenerate from the OpenAPI spec (
yarn gen:sdk,yarn gen:zod) to keep the frontend in sync. - Pre-commit hooks (autoflake, isort, black, pyright) keep the codebase tidy-run them before contributing.
- Version management: Uses Release Please + Conventional Commits for automated releases. See
VERSIONING.mdfor details.
backend/README.md- FastAPI backend guidefrontend/README.md- Next.js frontend guidecolpali/README.md- ColPali embedding service guidebackend/docs/configuration.md- Configuration referenceVERSIONING.md- Release and version workflow
backend/docs/analysis.md- vision vs. text RAG comparisonbackend/docs/architecture.md- collection, indexing, and search deep divecolpali/README.md- details on the standalone embedding service
MIT License - see LICENSE.
Snappy builds on the work of:
-
ColPali / ColModernVBert - multimodal models for visual retrieval
https://arxiv.org/abs/2407.01449 https://arxiv.org/abs/2510.01149 -
Qdrant - the vector database powering multivector search
https://qdrant.tech/blog/colpali-qdrant-optimization/
https://qdrant.tech/articles/binary-quantization/
https://qdrant.tech/articles/muvera-embeddings/ -
PyTorch - core deep learning framework
https://pytorch.org/