📑 What is PageIndex?
PageIndex is a vectorless, reasoning-based RAG (retrieval) framework that simulates how human experts navigate and extract knowledge from long, complex documents. Instead of relying on vector similarity search, it transforms documents into a tree-structured index and enables LLMs to perform agentic reasoning over that structure for context-aware retrieval. The retrieval process is traceable and interpretable, and requires no vector database and no chunking.
To learn more, please see a detailed introduction of the PageIndex framework . You can also check out our GitHub repo for open-source code, and the cookbooks, tutorials, and blog for additional usage guides and examples. The PageIndex service is available as a ChatGPT-style chat platform , or can be integrated via MCP or API.
PageIndex Workflow: tree index generation, and LLM reasoning over the index for context-aware retrieval
Services
- PageIndex Chat Platform : A chat platform that allows you to directly analyze multiple long documents with reasoning-based retrieval.
- PageIndex Chat API (beta): API service for PageIndex Chat.
- PageIndex MCP : Integrating PageIndex with your own LLM agents.
Enterprise options are available for private or on-prem deployment. Contact us or book a demo to learn more.
Tools
- PageIndex Tree Generation: Generates hierarchical tree indexes for documents.
- PageIndex OCR: An OCR model that preserves the global structure of the document; see this blog for details.
Cookbook
- Vectorless RAG with PageIndex: A quick, hands-on introduction to the PageIndex approach.
- Vision RAG with PageIndex: A simple example of building a vision-only RAG system using PageIndex.
- Agentic Retrieval with Chat API: How to create an agentic retriever by prompting the PageIndex Chat API.