1 unstable release
| 0.1.0 | Apr 1, 2026 |
|---|
#2645 in Text processing
105KB
2.5K
SLoC
Datalab CLI
Convert, extract, and process documents from the command line
Installation | Quick Start | Usage | Documentation
A powerful command-line interface for the Datalab document processing API. Built in Rust for speed and reliability.
Features
- 📄 Document Conversion — Convert PDFs, images, and documents to Markdown, HTML, JSON, or semantic chunks
- 🔍 Structured Extraction — Extract data using JSON schemas with confidence scores
- 📝 Form Filling — Fill PDF forms programmatically with smart field matching
- ⚡ Smart Caching — Local file-based caching reduces API costs on repeated requests
- 🤖 Agent-Friendly — JSON output to stdout, progress events to stderr, designed for piping
- 📊 Progress Streaming — Real-time JSON progress events for monitoring long operations
Installation
From crates.io
cargo install datalab-cli
From source
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo install --path .
Pre-built binaries
Download from GitHub Releases.
Quick Start
1. Get your API key from datalab.to/app/keys
2. Set the environment variable
export DATALAB_API_KEY="your-api-key"
3. Convert your first document
datalab convert document.pdf
That's it! The converted markdown is output as JSON to stdout.
Usage
Convert Documents
# Convert to markdown (default)
datalab convert document.pdf
# Convert to HTML
datalab convert document.pdf --output-format html
# High-quality mode for complex documents
datalab convert report.pdf --mode accurate
# Convert specific pages
datalab convert book.pdf --page-range "0-10"
# Save to file
datalab convert document.pdf --output result.json
Extract Structured Data
# Extract with inline schema
datalab extract invoice.pdf --schema '{
"fields": [
{"name": "total", "type": "number"},
{"name": "date", "type": "string"}
]
}'
# Extract with schema file
datalab extract invoice.pdf --schema schema.json
# Include confidence scores
datalab extract invoice.pdf --schema schema.json --include-scores
Fill Forms
# Fill a form
datalab fill application.pdf \
--fields '{"name": "John Doe", "email": "[email protected]"}' \
--output filled.pdf
File Management
# Upload a file
datalab files upload document.pdf
# List files
datalab files list
# Download a file
datalab files download file_abc123 --output downloaded.pdf
Cache Management
# View cache stats
datalab cache stats
# Clear old entries
datalab cache clear --older-than 7
Output Format
All commands output JSON to stdout for easy piping:
# Pipe to jq
datalab convert document.pdf | jq '.content'
# Save to file
datalab convert document.pdf > result.json
Progress events stream to stderr as JSON:
{"type":"start","operation":"convert","file":"document.pdf"}
{"type":"poll","status":"processing","elapsed_secs":1.2}
{"type":"complete","elapsed_secs":3.4}
Use --quiet to suppress progress, --verbose to force it.
Environment Variables
| Variable | Required | Description |
|---|---|---|
DATALAB_API_KEY |
Yes | Your API key |
DATALAB_BASE_URL |
No | Custom API endpoint (for on-prem) |
NO_COLOR |
No | Disable colored output |
Caching
Results are cached locally in ~/.cache/datalab/ to reduce API costs:
# First run: calls API
datalab convert document.pdf
# Second run: instant from cache
datalab convert document.pdf
# Bypass cache
datalab convert document.pdf --skip-cache
Documentation
Full documentation is available in the documentation directory. To view locally:
cd documentation
pip install -r requirements.txt
mkdocs serve
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/dipankar/datalab-cli
cd datalab-cli
cargo build
# Run tests
cargo test
# Run lints
cargo clippy
cargo fmt --check
License
MIT License - see LICENSE for details.
Built with Rust | Powered by Datalab
Dependencies
~10–29MB
~398K SLoC