Thanks to visit codestin.com
Credit goes to docs.kreuzberg.dev

Skip to content

Kreuzberg

Document intelligence with a Rust core and native bindings for 17 languages. Extract text, tables, and metadata from 90+ formats with optional OCR — usable as an SDK, CLI, REST API, MCP server, or Docker image.


Why Kreuzberg

  • High Performance

Rust core with native PDFium, SIMD optimizations, and full parallelism. Process thousands of documents per minute without a GPU.

  • 90+ File Formats

PDF, DOCX, XLSX, PPTX, images, HTML, XML, emails, archives, academic formats — one API handles them all.

  • Multi-Engine OCR

Tesseract and PaddleOCR work across all language bindings. EasyOCR is available for Python only.

  • 16 Language Bindings

Native bindings for Python, TypeScript, Rust, Go, Java, Kotlin, C#, Ruby, PHP, Elixir, R, Dart, Swift, Zig, C, and WebAssembly.

  • Code Intelligence

Extract functions, classes, imports, symbols, and docstrings from 300+ programming languages. Results in the code_intelligence field with semantic chunking.

  • Plugin System

Register custom extractors, OCR backends, post-processors, and validators. Plugin authoring is primarily supported in Python; all bindings can consume registered plugins.

  • Flexible Deployment

Use as a library, CLI tool, REST API server, MCP server, or Docker container. Pick what fits your stack.

See all features


Language Support

Language Package Docs
Python pip install kreuzberg API Reference
TypeScript (Native) npm install @kreuzberg/node API Reference
TypeScript (WASM) npm install @kreuzberg/wasm API Reference
Rust cargo add kreuzberg API Reference
Go go get github.com/kreuzberg-dev/kreuzberg/v5 API Reference
Java Maven Central dev.kreuzberg:kreuzberg API Reference
Kotlin Maven Central dev.kreuzberg:kreuzberg-kotlin API Reference
C# dotnet add package Kreuzberg API Reference
Ruby gem install kreuzberg API Reference
PHP composer require kreuzberg/kreuzberg API Reference
Elixir {:kreuzberg, "~> 5.0.0-rc.1"} API Reference
R r-universe kreuzberg API Reference
Dart / Flutter dart pub add kreuzberg API Reference
Swift Swift Package Manager API Reference
Zig zig fetch --save from GitHub API Reference
C (FFI) Shared library + header API Reference
CLI brew install kreuzberg-dev/tap/kreuzberg CLI Guide
Docker ghcr.io/kreuzberg-dev/kreuzberg Docker Guide

Choosing Between TypeScript Packages

@kreuzberg/node — Use for Node.js servers and CLI tools. Native performance (100% speed).

@kreuzberg/wasm — Use for browsers, Cloudflare Workers, Deno, Bun, and serverless environments (60-80% speed, cross-platform).


Explore the Docs

  • Getting Started

Install Kreuzberg and extract your first document in minutes.

Quick Start

  • Guides

Configuration, OCR setup, Docker deployment, plugins, and more.

All Guides

  • Concepts

Architecture, extraction pipeline, MIME detection, and performance.

Architecture

  • API Reference

Complete API docs for every language binding, types, and errors.

References

  • CLI & Servers

Command-line tool, REST API server, and MCP server for AI agents.

CLI Usage

  • Migration

Migrate from Unstructured or other document extraction libraries.

Migration Guides


Getting Help

Edit this page on GitHub