Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Production-ready Enterprise RAG Platform. Features Agentic Intent Routing (LangGraph), Strict Citation Governance, and Asynchronous Vector Ingestion. Built on a Microservices Architecture (FastAPI + React + PostgreSQL) and fully containerized with Docker.

License

Notifications You must be signed in to change notification settings

Nibir1/documind-enterprise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DocuMind Enterprise

DocuMind Demo

πŸ“Ί Watch the full end-to-end demo featuring Asynchronous PDF ingestion, LangGraph agentic routing, and 100% test coverage validation.

DocuMind is a production-grade, containerized RAG (Retrieval-Augmented Generation) Knowledge Management System. It mimics a secure Azure Enterprise setup, featuring an agentic core that intelligently routes user queries between general conversation and strict document search.

Status Python FastAPI LangChain Tests

DocuMind-Enterprise is a production-grade Reference Architecture for building secure, compliant Retrieval-Augmented Generation (RAG) systems.

Why this exists

Most RAG demos fail in enterprise production because they lack governance and cost control. This project implements a strict "Citation-First" architecture designed for regulated industries (Legal, Finance, GDPR-compliant sectors). It enforces:

  1. Strict Source Attribution: No answer is generated without a verified PDF page reference (Zero Hallucination Policy).
  2. Agentic Routing: Uses LangGraph to intelligently distinguish between "general chitchat" and "database queries," significantly reducing token costs and latency.
  3. Asynchronous Ingestion: Non-blocking FastAPI pipelines for high-throughput document processing.

System Architecture

The application is built on a Microservices architecture using Docker Compose:

  1. Frontend (React + Vite): A modern "Glassmorphism" UI with streaming chat support and file ingestion status tracking.
  2. Backend (FastAPI): Asynchronous Python service handling file parsing, chunking, and AI orchestration.
  3. Database (PostgreSQL 16): Uses pgvector for high-performance vector similarity search (1536 dimensions) alongside relational metadata.
  4. AI Core (LangGraph): A state-machine agent that routes intent (Search vs.Β Chitchat) and enforces citation governance.

Key Features

  • Agentic Routing: Uses LangGraph to classify intent.
  • Strict Governance: No hallucinations; every answer includes Citations.
  • Enterprise Ingestion: Asynchronous pipeline for PDF/TXT files.
  • Modern UX: Responsive React interface with real-time feedback.

Tech Stack

  • Backend: Python 3.11, FastAPI 0.110, SQLAlchemy Async, Alembic
  • AI: LangChain, LangGraph, OpenAI, pgvector
  • Frontend: React 18, TypeScript, Tailwind
  • Infra: Docker Compose, Nginx

Getting Started

Prerequisites

  • Docker & Docker Compose
  • OpenAI API Key

Installation

git clone https://github.com/Nibir1/documind-enterprise.git
cd documind-enterprise
cp .env.example .env
# Add OPENAI_API_KEY
make build

Testing & Validation

This project includes a comprehensive integration test suite covering 100% of the critical path logic. The tests run inside the Docker container to ensure environment consistency and use AsyncMock to simulate OpenAI and PostgreSQL, ensuring zero-cost, fast execution.

To run the test suite:

make test

Access

How It Works

Ingestion

  • PDF uploaded β†’ Text extracted β†’ 1000-token chunks
  • Embedded via text-embedding-3-small β†’ Stored in Postgres

Retrieval (RAG)

  • Router Node β†’ Retriever Node β†’ Generator Node

Features

Governance

  • Mandatory citations (Filename + Page + Confidence)
  • Zero hallucination policy
  • Azure Monitor--ready audit logs

Cost Optimization

  • Router decides if query should use search or chitchat

Ingestion Pipeline

  • Recursive chunking & real-time status

Project Structure

documind-enterprise/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/v1/
β”‚   β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   └── services/
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ features/
β”‚   β”‚   └── components/
β”œβ”€β”€ docker-compose.yml
└── Makefile

Roadmap

  • Core RAG Architecture
  • Azure AD (Entra ID) SSO
  • Azure Container Apps deployment
  • RBAC for document sets

MIT License

Author: Nahasat Nibir -- Senior Backend Engineer & AI Systems Architect

About

Production-ready Enterprise RAG Platform. Features Agentic Intent Routing (LangGraph), Strict Citation Governance, and Asynchronous Vector Ingestion. Built on a Microservices Architecture (FastAPI + React + PostgreSQL) and fully containerized with Docker.

Topics

Resources

License

Stars

Watchers

Forks