1 unstable release
Uses new Rust 2024
| 0.1.0 | Feb 9, 2026 |
|---|
#612 in Text processing
355KB
7.5K
SLoC
PDF-CLI
A Rust library and CLI tool for reading, writing, and manipulating PDF files. Converts to/from Markdown. Implemented entirely in Rust without external PDF libraries.
Features
Library API
- In-memory PDF generation:
generate_pdf_bytes()— no filesystem needed - PDF validation:
validate_pdf()/validate_pdf_bytes()— structural integrity checks - Rich element model: 17
Elementvariants for document modeling - Accessibility:
StructureTypeenum (35 types),StructureElementtree,AccessibilityOptions
PDF Generation
- From scratch: Create PDFs with custom fonts and text content
- From Markdown: Rich formatting (headers, lists, task lists, blockquotes, tables, code blocks, definition lists, footnotes, images, links, page breaks)
- Text color:
Colorstruct (RGB), code blocks in gray, links in blue - Text alignment: H1 centered, configurable
TextAlignenum - Page orientation: Landscape/portrait with
--landscapeCLI flag - Page numbering: Automatic footer page numbers
- Watermarks: Diagonal text with configurable opacity/size
PDF Parsing
- Text extraction: Tj, TJ operators, font encodings (WinAnsi, MacRoman)
- Cross-reference streams: PDF 1.5+ xref stream parsing
- Object streams: Compressed object stream handling
- Validation: Header, xref, trailer, catalog, pages, object pairing checks
PDF Manipulation
- Merge: Combine multiple PDFs
- Split: Extract page ranges
- Rotate: 0/90/180/270°
- Reorder: Arbitrary page ordering
- Watermark: Diagonal text overlay
- Metadata: Title, author, subject, keywords
- Annotations: Text, link, and highlight annotations
- Images: JPEG embedding with aspect-ratio scaling
Installation
From Source
git clone https://github.com/yourusername/pdf-cli.git
cd pdf-cli
cargo build --release
The binary will be available at target/release/pdf-cli.
Usage
Basic Commands
Create a Simple PDF
pdf-cli create output.pdf "Hello, World!"
Create PDF with Custom Font and Size
pdf-cli create output.pdf "Hello, World!" --font "Times-Roman" --font-size 14
Convert Markdown to PDF
pdf-cli md-to-pdf input.md output.pdf
Convert Markdown to PDF with Custom Styling
pdf-cli md-to-pdf input.md output.pdf --font "Helvetica" --font-size 12
Extract Text from PDF
pdf-cli extract input.pdf
Convert PDF to Markdown
pdf-cli pdf-to-md input.pdf output.md
Add Image to PDF
pdf-cli add-image document.pdf image.jpg --x 100 --y 100 --width 200 --height 200
Landscape PDF
pdf-cli md-to-pdf input.md output.pdf --landscape
Merge PDFs
pdf-cli merge file1.pdf file2.pdf file3.pdf -o merged.pdf
Split PDF (extract pages 2-5)
pdf-cli split input.pdf -o pages2to5.pdf --start 2 --end 5
Rotate PDF
pdf-cli rotate input.pdf -o rotated.pdf --angle 90
Create PDF with Metadata
pdf-cli md-to-pdf-meta input.md output.pdf --title "My Document" --author "Author Name" --subject "Topic"
Supported Fonts
- Helvetica
- Times-Roman
- Courier
- And other standard PDF Type 1 fonts
Examples
Creating a Multi-page Document
pdf-cli create long-document.pdf "$(cat document.txt)" --font-size 10
Converting Complex Markdown
# Create a sample markdown file
cat > sample.md << EOF
# Sample Document
This is a **bold** text with *italic* formatting.
## Tables
| Name | Age | Country |
|------|-----|---------|
| John | 25 | USA |
| Jane | 30 | UK |
### Lists
1. First item
2. Second item
- Nested item
- Another nested item
### Code Examples
```rust
fn main() {
println!("Hello, PDF!");
}
EOF
Convert to PDF
pdf-cli md-to-pdf sample.md sample.pdf --font "Times-Roman" --font-size 12
## Library Usage
```rust
use pdfrs::{elements, pdf_generator, pdf};
// Parse markdown into elements
let elements = elements::parse_markdown("# Hello\n\nWorld");
// Generate PDF bytes in memory
let layout = pdf_generator::PageLayout::portrait();
let pdf_bytes = pdf_generator::generate_pdf_bytes(
&elements, "Helvetica", 12.0, layout
).unwrap();
// Validate the generated PDF
let validation = pdf::validate_pdf_bytes(&pdf_bytes);
assert!(validation.valid);
assert!(validation.page_count >= 1);
Architecture
This tool is built with a modular architecture:
- PDF Parser (
src/pdf.rs): PDF parsing, text extraction, validation, xref/object stream parsing - PDF Generator (
src/pdf_generator.rs): Creates PDFs with layout, color, alignment, accessibility - Elements (
src/elements.rs): 17 structured element types and markdown parser - Markdown (
src/markdown.rs): Markdown-to-PDF pipeline with rich formatting - PDF Operations (
src/pdf_ops.rs): Merge, split, rotate, reorder, watermark, metadata, annotations - Image Handler (
src/image.rs): JPEG/PNG/BMP embedding with dimension parsing - Compression (
src/compression.rs): PDF stream compression (deflate) - Security (
src/security.rs): Password protection, permissions
See ARCHITECTURE.md for detailed module documentation.
Testing
251 tests across 4 test suites:
- 115 lib tests: Unit tests for all modules
- 112 bin tests: CLI command tests
- 13 integration tests: End-to-end roundtrip, merge, split, rotate, watermark, reorder
- 11 bench tests: Property-based and benchmark tests
Round-trip validation tests verify that every element type survives: generate → validate → parse → verify.
cargo test
Limitations
- Text extraction works best with PDFs generated by this tool or simple Type 1 font PDFs
- Font support is limited to standard Type 1 fonts (Helvetica, Times-Roman, Courier)
- Image embedding is JPEG-focused (PNG/BMP dimension parsing available)
- Full tagged PDF output not yet implemented (structure types defined)
Contributing
Contributions are welcome! Please read our Contributing Guidelines for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built entirely in Rust without external PDF dependencies
- Implements core PDF specifications from scratch
- Inspired by the need for a lightweight PDF toolchain
Dependencies
~5.5–9.5MB
~157K SLoC