Thanks to visit codestin.com
Credit goes to lib.rs

#ai-agent #llm #document #document-parser #parser

litedoc-core

Deterministic document parser for machine-generated output with error recovery

1 unstable release

0.1.0 Feb 6, 2026

#1941 in Parser implementations


Used in litedoc-cli

Apache-2.0

82KB
2K SLoC

LiteDoc

CI crates.io docs.rs

Deterministic document format for AI agents and LLM output. Explicit block fencing, zero-copy parsing, and error recovery.

Why LiteDoc?

Markdown is ambiguous. Indentation rules vary, edge cases abound, and parsers disagree.[1] LiteDoc uses explicit ::block fencing for deterministic parsing that recovers gracefully from malformed input, which is ideal for machine-generated content in LLM pipelines where output parsing can fail when formatting is off.[2]

Status

v0.1.0 - Initial release with Rust parser and CLI. APIs are intended to be stable within v0.1 but may evolve.

Stability & Compatibility

LiteDoc follows semantic versioning. For the v0.1 line, we will not make breaking changes to the core format or public APIs without a version bump and a migration note in the changelog.

Document Description
LITEDOC_SPEC.md Language specification
LITEDOC_AST.md AST reference

Performance

Metric LiteDoc Markdown Improvement
Parse speed 5.451 µs 5.660 µs 3.7% faster
Inline parsing 490 ns 1.313 µs 63% faster
Error recovery 0.89 0.67 33% better

Install

cargo add litedoc-core            # Rust library
pip install litedoc-py                  # Python library
cargo install litedoc-cli          # CLI tool

Usage

Rust

use litedoc_core::{Block, Parser, Profile};

let mut parser = Parser::new(Profile::Litedoc);
let result = parser.parse_with_recovery(input);

for block in &result.document.blocks {
    match block {
        Block::List(list) => process_list(list),
        Block::Table(table) => process_table(table),
        _ => {}
    }
}

Python

import pyld

doc = pyld.parse("# Hello\n\nWorld")
for block in doc.blocks:
    match block:
        case pyld.Heading(level=level):
            print(f"H{level}")
        case pyld.Paragraph(content=content):
            print(content)

CLI

ldcli agent_output.ld            # Parse and display structure
ldcli -j agent_output.ld         # Output as JSON
ldcli validate agent_output.ld   # Check for errors
ldcli stats agent_output.ld      # Show statistics

Format

::list
- First item
- Second item
::

::quote
Quoted text
::

::table
| A | B |
|---|---|
| 1 | 2 |
::

Metadata:

--- meta ---
agent: summarizer-v2
task_id: abc123
timestamp: 1704067200
confidence: 0.92
tags: [summary, final]
---

Benchmarks

cargo bench -p litedoc-core
cargo test -p litedoc-core robustness_report -- --nocapture

CSV output:

ROBUSTNESS_CSV=1 cargo test -p litedoc-core robustness_report -- --nocapture
ROBUSTNESS_BENCH_CSV=1 cargo bench -p litedoc-core robustness_benchmark -- --nocapture

References

[1] CommonMark Spec, “Why is a spec needed?” (notes original Markdown syntax is not unambiguous and implementations diverged). https://spec.commonmark.org/0.31.2/

[2] LangChain docs: OUTPUT_PARSING_FAILURE (example of JSON-in-Markdown parsing failures). https://docs.langchain.com/oss/python/langchain/errors/OUTPUT_PARSING_FAILURE

License

Apache-2.0

Dependencies

~225KB