📄 Local LLM PDF OCR

Transform scanned and written documents into fully searchable, selectable PDFs using the power of Local LLM Vision.

PDF LLM OCR is a next-generation OCR tool that moves beyond traditional Tesseract-based scanning. By leveraging OCR Vision Language Models (VLMs) like olmOCR running locally on your machine, it "reads" documents with human-like understanding while keeping 100% of your data private.

✨ Features

🧠 AI-Powered Vision: Uses advanced VLMs to transcribe text with high accuracy, even on complex layouts or noisy scans.
🤝 Hybrid Alignment Strategy: Combines Surya OCR Detection for precise bounding boxes with Local LLM for perfect text content via position-based alignment.
⚡ 10-21x Faster Detection: Uses detection-only mode (skips slow recognition) and batch processing for maximum speed.
🔒 100% Local & Private: No cloud APIs, no subscription fees. Run it entirely offline using LM Studio.
🔍 Searchable Outputs: Embeds an invisible text layer directly into your PDF, making it compatible with valid PDF readers for searching (Ctrl+F) and selecting.
🖥️ Dual Interfaces:
- Web UI: An interface with Drag & Drop, Dark Mode, and Real-time progress tracking.
- CLI: A robust command-line tool for power users and batch automation, featuring a "lively" terminal UI.
⚡ Real-time Feedback: Watch your document process page-by-page with live web sockets or animated terminal bars.

🏗️ Architecture

graph TD
    A[Input PDF] --> B[PDF to Image Conversion]
    B --> C[Batch Processing]

    subgraph "Phase 1: Layout Detection (Surya)"
        C --> D[Surya DetectionPredictor]
        D --> E[Bounding Boxes]
        E --> F[Sorted by Reading Order]
    end

    subgraph "Phase 2: Text Extraction (Local LLM)"
        C --> G[OlmOCR Vision Model]
        G --> H[Pure Text Content]
    end

    F --> I[Position-Based Aligner]
    H --> I

    I -->|Distribute by Box Width| J[Aligned Text Blocks]
    J --> K[Sandwich PDF Generator]
    K --> L[Searchable PDF Output]

How It Works

Batch Layout Detection: Surya's DetectionPredictor processes all pages at once, extracting bounding boxes without slow text recognition (~1s total vs ~20s per page with recognition).
LLM Text Extraction: A local vision model (OlmOCR) reads each page with human-like understanding, handling handwriting and complex layouts perfectly.
Position-Based Alignment: The aligner distributes LLM text across detected boxes proportionally by box width in reading order—no fuzzy matching needed.
Sandwich PDF: The original page is rendered as an image with invisible, searchable text overlaid using PyMuPDF.

🚀 Getting Started

Prerequisites

Python 3.10+
LM Studio: Download and install LM Studio.
- Load a Vision Model (highly recommended: allenai/olmocr-2-7b).
- Start the Local Server at default port 1234.

Configuration

Create a .env file in the root directory to configure your Local LLM:

LLM_API_BASE=http://localhost:1234/v1
LLM_MODEL=allenai/olmocr-2-7b

Installation

This project is managed with uv for lightning-fast dependency management.

Install uv (if not installed):
```
pip install uv
```

Clone the repository:

git clone https://github.com/ahnafnafee/pdf-ocr-llm.git
cd pdf-ocr-llm

Sync Dependencies:
```
uv sync
```

Usage

1. 🌐 Web Interface (Recommended)

The easiest way to use the tool. Features a modern dashboard with Dark Mode and Text Preview.

Start the Server:

uv run uvicorn server:app --reload --port 8000

Open your browser to http://localhost:8000.
Drag & Drop your PDF.
Watch the magic happen! ✨
- Real-time Progress: Track per-page OCR status.
- Preview: Click "View Text" to inspect the raw AI extraction.
- Dark Mode: Toggle the moon icon for a sleek dark theme.

2. 💻 Command Line Interface (CLI)

Perfect for developers or integrating into scripts.

Run the OCR tool on any PDF:

uv run main.py input.pdf output_ocr.pdf

Options:

Option	Description
`input_pdf`	Path to input PDF (required)
`output_pdf`	Path to output PDF (optional, defaults to `<input>_ocr.pdf`)
`-v`, `--verbose`	Enable debug logging (alignment details, box counts)
`-q`, `--quiet`	Suppress all output except errors
`--dpi <int>`	DPI for image rendering (default: 200)
`--pages <range>`	Page range to process, e.g., `1-3,5` (default: all)
`--api-base <url>`	Override LLM API base URL
`--model <name>`	Override LLM model name

Examples:

# Basic usage (auto-generates input_ocr.pdf)
uv run main.py scan.pdf

# Process specific pages with higher quality
uv run main.py document.pdf output.pdf --pages 1-5 --dpi 300

# Use a different model with verbose output
uv run main.py report.pdf --model "custom-model" --verbose

You'll see beautiful animated progress bars showing batch detection and per-page LLM processing.

📁 Project Structure

local-llm-pdf-ocr/
├── src/pdf_ocr/           # Core package
│   ├── core/              # OCR processing modules
│   │   ├── aligner.py     # Hybrid text alignment
│   │   ├── ocr.py         # LLM OCR processor
│   │   └── pdf.py         # PDF handling utilities
│   └── utils/             # Utility modules
│       └── tqdm_patch.py  # Progress bar silencer
├── scripts/               # Debug and visualization tools
├── static/                # Web UI assets
├── examples/              # Sample PDFs
├── main.py                # CLI entry point
└── server.py              # Web server

🛠️ Tech Stack

Backend: FastAPI (Async Web Framework)
Frontend: Vanilla JS + CSS Variables
PDF Processing: PyMuPDF (Fitz)
Layout Detection: Surya OCR (Detection-only mode)
AI Integration: OpenAI Client (compatible with Local LLM servers)
CLI UI: Rich (Terminal formatting)

⚡ Performance

Document Type	Detection Time	Speedup vs Recognition
Digital PDF	~1s	21x faster
Handwritten	~1s	10x faster
Hybrid Form	~1s	11x faster

Detection uses batch processing—all pages in one call.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
scripts		scripts
src/pdf_ocr		src/pdf_ocr
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
AGENT.md		AGENT.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 Local LLM PDF OCR

✨ Features

🏗️ Architecture

How It Works

🚀 Getting Started

Prerequisites

Configuration

Installation

Usage

1. 🌐 Web Interface (Recommended)

2. 💻 Command Line Interface (CLI)

📁 Project Structure

🛠️ Tech Stack

⚡ Performance

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

License

ahnafnafee/local-llm-pdf-ocr

Folders and files

Latest commit

History

Repository files navigation

📄 Local LLM PDF OCR

✨ Features

🏗️ Architecture

How It Works

🚀 Getting Started

Prerequisites

Configuration

Installation

Usage

1. 🌐 Web Interface (Recommended)

2. 💻 Command Line Interface (CLI)

📁 Project Structure

🛠️ Tech Stack

⚡ Performance

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages