MinerU is a document parsing system that converts PDF files and images into machine-readable formats, primarily Markdown and JSON. The system supports three processing backends—pipeline, VLM (Vision-Language Model), and hybrid—each offering different trade-offs between accuracy, speed, and hardware requirements. MinerU provides multiple user interfaces (CLI, FastAPI, Gradio, OpenAI-compatible server) and generates normalized intermediate representations (middle.json) that can be transformed into various output formats.
Scope: This page describes the overall architecture, core processing components, backend selection logic, and key entry points. For installation details, see Getting Started. For backend-specific implementation details, see Pipeline Backend, VLM Backend, and Hybrid Backend. For output format specifications, see Data Transformation and Output Generation.
MinerU follows a layered architecture with clear separation between user interfaces, orchestration logic, backend processing, and model management.
Sources: mineru/cli/common.py1-550 mineru/cli/client.py1-224 mineru/cli/fast_api.py1-289 mineru/cli/gradio_app.py1-450 mineru/backend/vlm/vlm_analyze.py1-244 mineru/backend/hybrid/hybrid_analyze.py1-485
The central orchestration functions do_parse and aio_do_parse route requests to appropriate backends and manage the complete parsing lifecycle.
| Entry Point | Location | Purpose | Async Support |
|---|---|---|---|
do_parse | mineru/cli/common.py414-484 | Synchronous parsing orchestrator | No |
aio_do_parse | mineru/cli/common.py486-550 | Asynchronous parsing orchestrator | Yes |
read_fn | mineru/cli/common.py32-43 | File type detection and normalization | N/A |
_prepare_pdf_bytes | mineru/cli/common.py85-91 | Page range extraction via pypdfium2 | N/A |
_process_output | mineru/cli/common.py94-168 | Output file generation and writing | N/A |
Sources: mineru/cli/common.py414-550 mineru/utils/engine_utils.py
MinerU supports three distinct backends with different characteristics:
| Feature | pipeline | vlm-* | hybrid-* |
|---|---|---|---|
| Accuracy | 82+ (OmniDocBench v1.5) | 90+ | 90+ |
| CPU-Only Support | ✅ Yes | ❌ No | ❌ No |
| Min VRAM | 6GB | 8GB | 10GB |
| Language Support | 109 languages via OCR | Chinese, English only | 109 languages |
| Processing Approach | Sequential models | End-to-end VLM | VLM structure + traditional OCR |
| Hallucination Risk | None (rule-based) | Present (LLM-based) | Reduced (hybrid) |
| Implementation | pipeline_analyze.py | vlm_analyze.py193-243 | hybrid_analyze.py313-485 |
Sources: README.md120-192 mineru/cli/common.py439-483
MinerU provides four primary user interfaces, each implemented as a separate CLI entry point:
All CLI entry points are defined in pyproject.toml111-118:
Sources: pyproject.toml111-118 mineru/cli/client.py154-223 mineru/cli/fast_api.py46-289 mineru/cli/gradio_app.py199-450
MinerU processes documents through a multi-stage pipeline that converges on a normalized middle.json format before generating user-facing outputs.
| Function | Location | Purpose |
|---|---|---|
guess_suffix_by_bytes | mineru/utils/guess_suffix_or_lang.py | Detect file type (PDF/image) |
images_bytes_to_pdf_bytes | mineru/utils/pdf_image_tools.py | Convert images to PDF format |
load_images_from_pdf | mineru/utils/pdf_image_tools.py | Extract page images from PDF |
result_to_middle_json | Pipeline: backend/pipeline/model_json_to_middle_json.py VLM: backend/vlm/model_output_to_middle_json.py Hybrid: backend/hybrid/hybrid_model_output_to_middle_json.py | Convert backend output to middle.json |
union_make | Pipeline: backend/pipeline/pipeline_middle_json_mkcontent.py VLM: backend/vlm/vlm_middle_json_mkcontent.py | Generate final output formats |
Sources: mineru/cli/common.py32-168 mineru/utils/pdf_image_tools.py mineru/backend/pipeline/model_json_to_middle_json.py mineru/backend/vlm/model_output_to_middle_json.py mineru/backend/hybrid/hybrid_model_output_to_middle_json.py
MinerU uses singleton patterns to cache loaded models across multiple parsing requests, minimizing initialization overhead.
The ModelSingleton class in vlm_analyze.py22-190 implements the singleton pattern:
This ensures each unique combination of backend configuration is initialized only once across all parsing requests.
Sources: mineru/backend/vlm/vlm_analyze.py22-190 mineru/backend/pipeline/model_init.py mineru/utils/models_download_utils.py
MinerU supports multiple hardware accelerators through platform-specific configurations:
| Accelerator | Environment Variable | Backend Support | Documentation |
|---|---|---|---|
| NVIDIA GPU (CUDA) | CUDA_VISIBLE_DEVICES | All backends | Standard PyTorch |
| Huawei Ascend NPU | MINERU_LMDEPLOY_DEVICE=ascend | vllm, lmdeploy | Ascend.md |
| METAX GPU | MINERU_LMDEPLOY_DEVICE=maca | lmdeploy | METAX.md |
| T-Head PPU | Platform-specific | vllm | THead.md |
| Apple Silicon MPS | Automatic detection | transformers, mlx | MPS/MLX |
| AMD GPU (ROCm) | Custom kernels | vllm | AMD.md |
Sources: mineru/utils/config_reader.py mineru/cli/client.py163-178 docs/zh/usage/acceleration_cards/
MinerU generates multiple output files, each serving different downstream use cases:
| File | Format | Purpose | Generation |
|---|---|---|---|
<name>.md | Markdown | Human-readable document | union_make(MakeMode.MM_MD) |
middle.json | JSON | Normalized intermediate format | result_to_middle_json() |
content_list.json | JSON | Flat structured data (v1) | union_make(MakeMode.CONTENT_LIST) |
content_list_v2.json | JSON | Enhanced structured data (v2, VLM only) | union_make(MakeMode.CONTENT_LIST_V2) |
model.json | JSON | Raw model inference output | Direct backend output |
layout.pdf | Layout bounding box visualization | draw_layout_bbox() | |
span.pdf | Text span visualization | draw_span_bbox() |
Output generation is controlled by boolean flags in common.py414-434:
f_dump_md - Generate Markdown filesf_dump_middle_json - Generate middle.jsonf_dump_model_output - Generate model.jsonf_dump_content_list - Generate content_list.jsonf_draw_layout_bbox - Generate layout.pdff_draw_span_bbox - Generate span.pdff_make_md_mode - Markdown generation mode (MM_MD vs NLP_FRIENDLY)Sources: mineru/cli/common.py94-168 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py mineru/backend/vlm/vlm_middle_json_mkcontent.py mineru/utils/draw_bbox.py
MinerU provides multiple installation options via pyproject.toml46-103:
| Package | Dependencies | Use Case |
|---|---|---|
mineru | Core only | Minimal installation |
mineru[pipeline] | Pipeline backend models | CPU-compatible parsing |
mineru[vlm] | Transformers | GPU VLM parsing |
mineru[vllm] | vLLM | Linux GPU optimization |
mineru[lmdeploy] | LMDeploy | Multi-platform GPU |
mineru[mlx] | MLX | macOS Apple Silicon |
mineru[all] | All backends | Complete installation |
Key environment variables for configuration:
MINERU_MODEL_SOURCE - Model repository (huggingface/modelscope/local)MINERU_DEVICE_MODE - Hardware device overrideMINERU_VIRTUAL_VRAM_SIZE - GPU memory limitMINERU_LOG_LEVEL - Logging verbosityMINERU_VLM_FORMULA_ENABLE - VLM formula recognition toggleMINERU_VLM_TABLE_ENABLE - VLM table recognition toggleSources: pyproject.toml46-103 mineru/cli/client.py169-181 docs/zh/usage/model_source.md17-26
MinerU's architecture is built around:
do_parse/aio_do_parse in common.py414-550The system's modular design allows users to select appropriate backends based on their accuracy requirements, hardware constraints, and language support needs, while maintaining consistent interfaces and output formats across all processing paths.
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.