Codestin Search App

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

Try it out! · Report Bug · Contact · Discord

(Super) Quick Start

Go to chunkr.ai
Make an account and copy your API key
Install our Python SDK:

pip install chunkr-ai

Use the SDK to process your documents:

from chunkr_ai import Chunkr

# Initialize with your API key from chunkr.ai
chunkr = Chunkr(api_key="your_api_key")

# Upload a document (URL or local file path)
url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
task = chunkr.upload(url)

# Export results in various formats
html = task.html(output_file="output.html")
markdown = task.markdown(output_file="output.md")
content = task.content(output_file="output.txt")
task.json(output_file="output.json")

# Clean up
chunkr.close()

Documentation

Visit our docs for more information and examples.

Self-Hosted Deployment Options

Quick Start with Docker Compose

Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support, optional)
Clone the repo:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr

Set up environment variables:

# Copy the example environment file
cp .env.example .env

# Configure your environment variables
# Required: LLM__KEY as your OpenAI API key

For more information on how to set up LLMs, see here.

Start the services:

With GPU:

docker compose up -d

Access the services:

Web UI: http://localhost:5173
API: http://localhost:8000

Important:

Requires an NVIDIA CUDA GPU

CPU-only deployment via compose-cpu.yaml is currently in development and not recommended for use

To use a CPU version run docker compose -f compose-cpu.yaml up -d

Stop the services when done:

docker compose down

Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

See our detailed guide at kube/README.md
Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

LLM Configuration

You can use any OpenAI API compatible endpoint by setting the following variables in your .env file:

LLM__KEY:
LLM__MODEL:
LLM__URL:

OpenAI Configuration

LLM__KEY=your_openai_api_key
LLM__MODEL=gpt-4o
LLM__URL=https://api.openai.com/v1/chat/completions

Google AI Studio Configuration

For getting a Google AI Studio API key, see here.

LLM__KEY=your_google_ai_studio_api_key
LLM__MODEL=gemini-2.0-flash-lite
LLM__URL=https://generativelanguage.googleapis.com/v1beta/openai/chat/completions

OpenRouter Configuration

Check here for available models.

LLM__KEY=your_open_router_api_key
LLM__MODEL=google/gemini-pro-1.5
LLM__URL=https://openrouter.ai/api/v1/chat/completions

Self-Hosted Configuration

You can use any OpenAI API compatible endpoint. To host your own LLM you can use VLLM or Ollama.

LLM__KEY=your_api_key
LLM__MODEL=model_name
LLM__URL=http://localhost:8000/v1

Licensing

The core of this project is dual-licensed:

GNU Affero General Public License v3.0 (AGPL-3.0)
Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

📧 Email: [email protected]
📅 Schedule a call: Book a 30-minute meeting
🌐 Visit our website: chunkr.ai

Name		Name	Last commit message	Last commit date
Latest commit History 5,178 Commits
.github		.github
.vscode		.vscode
apps/web		apps/web
clients		clients
core		core
docker		docker
images		images
kube		kube
packages		packages
services		services
.codespellrc		.codespellrc
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.release-please-config.json		.release-please-config.json
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
COMMERCIAL_LICENSE.md		COMMERCIAL_LICENSE.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-NOTICES.md		THIRD-PARTY-NOTICES.md
build_dockers.sh		build_dockers.sh
compose-cpu.yaml		compose-cpu.yaml
compose.yaml		compose.yaml
git.sh		git.sh
nginx-segmentation.conf		nginx-segmentation.conf
realm-export.json		realm-export.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chunkr | Open Source Document Intelligence API

Table of Contents

(Super) Quick Start

Documentation

Self-Hosted Deployment Options

Quick Start with Docker Compose

Deployment with Kubernetes

LLM Configuration

OpenAI Configuration

Google AI Studio Configuration

OpenRouter Configuration

Self-Hosted Configuration

Licensing

Connect With Us

About

Uh oh!

Releases

Packages

Languages

License

securiumsss/chunkr

Folders and files

Latest commit

History

Repository files navigation

Chunkr | Open Source Document Intelligence API

Table of Contents

(Super) Quick Start

Documentation

Self-Hosted Deployment Options

Quick Start with Docker Compose

Deployment with Kubernetes

LLM Configuration

OpenAI Configuration

Google AI Studio Configuration

OpenRouter Configuration

Self-Hosted Configuration

Licensing

Connect With Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages