Kreuzberg¶
Document intelligence with a Rust core and native bindings for 17 languages. Extract text, tables, and metadata from 90+ formats with optional OCR — usable as an SDK, CLI, REST API, MCP server, or Docker image.
Why Kreuzberg¶
- High Performance
Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.
- 90+ File Formats
PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.
- Multi-Engine OCR
Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.
- 16 Language Bindings
Native bindings for Python, TypeScript, Rust, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, R, Dart, Swift, Zig, C, and WebAssembly.
- Code Intelligence
Extract functions, classes, imports, symbols, and docstrings from 300+ programming languages. Results in the code_intelligence field with semantic chunking.
- Plugin System
Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.
- Flexible Deployment
Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.
Language Support¶
| Language | Package | Docs |
|---|---|---|
| Python | pip install kreuzberg |
API Reference |
| TypeScript (Native) | npm install @kreuzberg/node |
API Reference |
| TypeScript (WASM) | npm install @kreuzberg/wasm |
API Reference |
| Rust | cargo add kreuzberg |
API Reference |
| Go | go get github.com/kreuzberg-dev/kreuzberg/v5 |
API Reference |
| Java | Maven Central dev.kreuzberg:kreuzberg |
API Reference |
| Kotlin | Maven Central dev.kreuzberg:kreuzberg-kotlin |
API Reference |
| C# | dotnet add package Kreuzberg |
API Reference |
| Ruby | gem install kreuzberg |
API Reference |
| PHP | composer require kreuzberg/kreuzberg |
API Reference |
| Elixir | {:kreuzberg, "~> 5.0.0-rc.1"} |
API Reference |
| R | r-universe kreuzberg |
API Reference |
| Dart / Flutter | dart pub add kreuzberg |
API Reference |
| Swift | Swift Package Manager | API Reference |
| Zig | zig fetch --save from GitHub |
API Reference |
| C (FFI) | Shared library + header | API Reference |
| CLI | brew install kreuzberg-dev/tap/kreuzberg |
CLI Guide |
| Docker | ghcr.io/kreuzberg-dev/kreuzberg |
Docker Guide |
Choosing Between TypeScript Packages
@kreuzberg/node — Use for Node.js servers and CLI tools. Native performance (100% speed).
@kreuzberg/wasm — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).
Explore the Docs¶
- Getting Started
Install Kreuzberg and extract your first document in minutes.
- Guides
Configuration, OCR setup, Docker deployment, plugins, and more.
- Concepts
Architecture, extraction pipeline, MIME detection, and performance.
- API Reference
Complete API docs for every language binding, types, and errors.
- CLI & Servers
Command-line tool, REST API server, and MCP server for AI agents.
- Migration
Migrate from Unstructured or other document extraction libraries.
Getting Help¶
- Bugs & feature requests — Open an issue on GitHub
- Community chat — Join the Discord
- Contributing — Read the contributor guide