Open-source search and retrieval database for AI applications.
-
Updated
Oct 22, 2025 - Rust
Open-source search and retrieval database for AI applications.
Parsing-free RAG supported by VLMs
Specialized Retrieval Augmented Generation pipeline designed for Corporate Documents such as CBA/CCNL, Company Regulations, and Business Policies
[KPMG x Columbia] Intelligent Document Analysis for Healthcare Programs Using LLMs and RAG | Fall 2025
A comprehensive multimodal OCR application that supports both image and video document processing using state-of-the-art vision-language models. This application provides an intuitive Gradio interface for extracting text, converting documents to markdown, and performing advanced document analysis.
An Information Retrieval engine for scientific papers – Lucene-powered with synonyms, wildcards, and smart query expansion.
Distributed vector search for AI-native applications
Agentic AI system that allows users to upload documents (PDFs, DOCX, etc.) and natural language questions. It uses LLM-based RAG to extract relevant information. The architecture includes multi-agent components such as document retrievers, summarizers, web searchers, and tool routers — enabling dynamic reasoning and accurate responses.
[ECML PKDD 2025] Official implementation of "PromptDSI: Prompt-based Rehearsal-free Continual Learning for Document Retrieval"
An intelligent Model Context Protocol (MCP) server for Azure AI Search integration with Claude Desktop - Transform enterprise document search into natural AI conversations using LangGraph workflows, Google Gemini, and advanced retrieval-augmented generation (RAG).
[VLSP 2025] ViDRILL is a Vietnamese document retrieval system for VLSP 2025. It combines dense and sparse retrieval, reranking, and optional LLM-based query rewriting and reasoning to support high-accuracy information retrieval and future LLM-enhanced pipelines.
AetherCare is an AI-powered healthcare platform that leverages Generative AI to assist users with medical inquiries, symptom-based disease prediction, hospital location services, and a knowledge repository for healthcare education. This project aims to enhance accessibility to healthcare information.
AI-powered RAG (Retrieval Augmented Generation) chatbot for the UCSB College of Engineering with semantic search, source verification & comprehensive academic information retrieval. Built with Streamlit, Python, Google Gemini, ChromaDB, LangChain & custom web scraping pipeline using Puppeteer (Node.js/JavaScript).
drex-062225-exp (document retrieval and extraction expert) model is a specialized fine-tuned version of docscopeocr-7b-050425-exp, optimized for document retrieval, content extraction, and analysis recognition. built on top of the qwen2.5-vl architecture.
A Streamlit-powered chatbot for querying PDF documents using RAG architecture with citation-based answers.
Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.
An experimental document-focused Vision-Language Model application that provides advanced document analysis, text extraction, and multimodal understanding capabilities. This application features a streamlined Gradio interface for processing both images and videos using state-of-the-art vision-language models specialized in document understanding.
AICUP 2023 fact verification system using PERT-large and RoBERTa for document retrieval, evidence extraction, and claim validation. Ranked 4th on the private leaderboard (0.71007); built with Huggingface Transformers and TensorFlow.
Exploring How Cognee's Knowledge Graphs Can Answer Questions About the Harry Potter Universe
Add a description, image, and links to the document-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the document-retrieval topic, visit your repo's landing page and select "manage topics."