Transform scanned and written documents into fully searchable, selectable PDFs using the power of Local LLM Vision.
PDF LLM OCR is a next-generation OCR tool that moves beyond traditional Tesseract-based scanning. By leveraging OCR Vision Language Models (VLMs) like olmOCR running locally on your machine, it "reads" documents with human-like understanding while keeping 100% of your data private.
- π§ AI-Powered Vision: Uses advanced VLMs to transcribe text with high accuracy, even on complex layouts or noisy scans.
- π€ Hybrid Alignment Strategy: Combines Surya OCR Detection for precise bounding boxes with Local LLM for perfect text content via position-based alignment.
- β‘ 10-21x Faster Detection: Uses detection-only mode (skips slow recognition) and batch processing for maximum speed.
- π 100% Local & Private: No cloud APIs, no subscription fees. Run it entirely offline using LM Studio.
- π Searchable Outputs: Embeds an invisible text layer directly into your PDF, making it compatible with valid PDF readers for searching (Ctrl+F) and selecting.
- π₯οΈ Dual Interfaces:
- Web UI: An interface with Drag & Drop, Dark Mode, and Real-time progress tracking.
- CLI: A robust command-line tool for power users and batch automation, featuring a "lively" terminal UI.
- β‘ Real-time Feedback: Watch your document process page-by-page with live web sockets or animated terminal bars.
graph TD
A[Input PDF] --> B[PDF to Image Conversion]
B --> C[Batch Processing]
subgraph "Phase 1: Layout Detection (Surya)"
C --> D[Surya DetectionPredictor]
D --> E[Bounding Boxes]
E --> F[Sorted by Reading Order]
end
subgraph "Phase 2: Text Extraction (Local LLM)"
C --> G[OlmOCR Vision Model]
G --> H[Pure Text Content]
end
F --> I[Position-Based Aligner]
H --> I
I -->|Distribute by Box Width| J[Aligned Text Blocks]
J --> K[Sandwich PDF Generator]
K --> L[Searchable PDF Output]
-
Batch Layout Detection: Surya's
DetectionPredictorprocesses all pages at once, extracting bounding boxes without slow text recognition (~1s total vs ~20s per page with recognition). -
LLM Text Extraction: A local vision model (OlmOCR) reads each page with human-like understanding, handling handwriting and complex layouts perfectly.
-
Position-Based Alignment: The aligner distributes LLM text across detected boxes proportionally by box width in reading orderβno fuzzy matching needed.
-
Sandwich PDF: The original page is rendered as an image with invisible, searchable text overlaid using PyMuPDF.
- Python 3.10+
- LM Studio: Download and install LM Studio.
- Load a Vision Model (highly recommended:
allenai/olmocr-2-7b). - Start the Local Server at default port
1234.
- Load a Vision Model (highly recommended:
Create a .env file in the root directory to configure your Local LLM:
LLM_API_BASE=http://localhost:1234/v1
LLM_MODEL=allenai/olmocr-2-7bThis project is managed with uv for lightning-fast dependency management.
-
Install
uv(if not installed):pip install uv
-
Clone the repository:
git clone https://github.com/ahnafnafee/pdf-ocr-llm.git cd pdf-ocr-llm -
Sync Dependencies:
uv sync
The easiest way to use the tool. Features a modern dashboard with Dark Mode and Text Preview.
- Start the Server:
uv run uvicorn server:app --reload --port 8000
- Open your browser to
http://localhost:8000. - Drag & Drop your PDF.
- Watch the magic happen! β¨
- Real-time Progress: Track per-page OCR status.
- Preview: Click "View Text" to inspect the raw AI extraction.
- Dark Mode: Toggle the moon icon for a sleek dark theme.
Perfect for developers or integrating into scripts.
Run the OCR tool on any PDF:
uv run main.py input.pdf output_ocr.pdfOptions:
| Option | Description |
|---|---|
input_pdf |
Path to input PDF (required) |
output_pdf |
Path to output PDF (optional, defaults to <input>_ocr.pdf) |
-v, --verbose |
Enable debug logging (alignment details, box counts) |
-q, --quiet |
Suppress all output except errors |
--dpi <int> |
DPI for image rendering (default: 200) |
--pages <range> |
Page range to process, e.g., 1-3,5 (default: all) |
--api-base <url> |
Override LLM API base URL |
--model <name> |
Override LLM model name |
Examples:
# Basic usage (auto-generates input_ocr.pdf)
uv run main.py scan.pdf
# Process specific pages with higher quality
uv run main.py document.pdf output.pdf --pages 1-5 --dpi 300
# Use a different model with verbose output
uv run main.py report.pdf --model "custom-model" --verboseYou'll see beautiful animated progress bars showing batch detection and per-page LLM processing.
local-llm-pdf-ocr/
βββ src/pdf_ocr/ # Core package
β βββ core/ # OCR processing modules
β β βββ aligner.py # Hybrid text alignment
β β βββ ocr.py # LLM OCR processor
β β βββ pdf.py # PDF handling utilities
β βββ utils/ # Utility modules
β βββ tqdm_patch.py # Progress bar silencer
βββ scripts/ # Debug and visualization tools
βββ static/ # Web UI assets
βββ examples/ # Sample PDFs
βββ main.py # CLI entry point
βββ server.py # Web server
- Backend: FastAPI (Async Web Framework)
- Frontend: Vanilla JS + CSS Variables
- PDF Processing: PyMuPDF (Fitz)
- Layout Detection: Surya OCR (Detection-only mode)
- AI Integration: OpenAI Client (compatible with Local LLM servers)
- CLI UI: Rich (Terminal formatting)
| Document Type | Detection Time | Speedup vs Recognition |
|---|---|---|
| Digital PDF | ~1s | 21x faster |
| Handwritten | ~1s | 10x faster |
| Hybrid Form | ~1s | 11x faster |
Detection uses batch processingβall pages in one call.
Contributions are welcome! Please feel free to submit a Pull Request.
License: MIT