Overview of MinerU

Relevant source files

MinerU is a document parsing system that converts PDF files and images into machine-readable formats, primarily Markdown and JSON. The system supports three processing backends—pipeline, VLM (Vision-Language Model), and hybrid—each offering different trade-offs between accuracy, speed, and hardware requirements. MinerU provides multiple user interfaces (CLI, FastAPI, Gradio, OpenAI-compatible server) and generates normalized intermediate representations (middle.json) that can be transformed into various output formats.

Scope: This page describes the overall architecture, core processing components, backend selection logic, and key entry points. For installation details, see Getting Started. For backend-specific implementation details, see Pipeline Backend, VLM Backend, and Hybrid Backend. For output format specifications, see Data Transformation and Output Generation.

System Architecture

MinerU follows a layered architecture with clear separation between user interfaces, orchestration logic, backend processing, and model management.

Sources: mineru/cli/common.py1-550 mineru/cli/client.py1-224 mineru/cli/fast_api.py1-289 mineru/cli/gradio_app.py1-450 mineru/backend/vlm/vlm_analyze.py1-244 mineru/backend/hybrid/hybrid_analyze.py1-485

Core Processing Entry Points

The central orchestration functions do_parse and aio_do_parse route requests to appropriate backends and manage the complete parsing lifecycle.

Entry Point	Location	Purpose	Async Support
`do_parse`	mineru/cli/common.py414-484	Synchronous parsing orchestrator	No
`aio_do_parse`	mineru/cli/common.py486-550	Asynchronous parsing orchestrator	Yes
`read_fn`	mineru/cli/common.py32-43	File type detection and normalization	N/A
`_prepare_pdf_bytes`	mineru/cli/common.py85-91	Page range extraction via pypdfium2	N/A
`_process_output`	mineru/cli/common.py94-168	Output file generation and writing	N/A

Backend Routing Logic

Sources: mineru/cli/common.py414-550 mineru/utils/engine_utils.py

Backend Comparison

MinerU supports three distinct backends with different characteristics:

Feature	`pipeline`	`vlm-*`	`hybrid-*`
Accuracy	82+ (OmniDocBench v1.5)	90+	90+
CPU-Only Support	✅ Yes	❌ No	❌ No
Min VRAM	6GB	8GB	10GB
Language Support	109 languages via OCR	Chinese, English only	109 languages
Processing Approach	Sequential models	End-to-end VLM	VLM structure + traditional OCR
Hallucination Risk	None (rule-based)	Present (LLM-based)	Reduced (hybrid)
Implementation	pipeline_analyze.py	vlm_analyze.py193-243	hybrid_analyze.py313-485

Backend Selection Guidelines

Sources: README.md120-192 mineru/cli/common.py439-483

User Interface Components

MinerU provides four primary user interfaces, each implemented as a separate CLI entry point:

Entry Point Definitions

All CLI entry points are defined in pyproject.toml111-118:

Sources: pyproject.toml111-118 mineru/cli/client.py154-223 mineru/cli/fast_api.py46-289 mineru/cli/gradio_app.py199-450

Data Flow and Transformations

MinerU processes documents through a multi-stage pipeline that converges on a normalized middle.json format before generating user-facing outputs.

Key Transformation Functions

Function	Location	Purpose
`guess_suffix_by_bytes`	mineru/utils/guess_suffix_or_lang.py	Detect file type (PDF/image)
`images_bytes_to_pdf_bytes`	mineru/utils/pdf_image_tools.py	Convert images to PDF format
`load_images_from_pdf`	mineru/utils/pdf_image_tools.py	Extract page images from PDF
`result_to_middle_json`	Pipeline: backend/pipeline/model_json_to_middle_json.py VLM: backend/vlm/model_output_to_middle_json.py Hybrid: backend/hybrid/hybrid_model_output_to_middle_json.py	Convert backend output to middle.json
`union_make`	Pipeline: backend/pipeline/pipeline_middle_json_mkcontent.py VLM: backend/vlm/vlm_middle_json_mkcontent.py	Generate final output formats

Sources: mineru/cli/common.py32-168 mineru/utils/pdf_image_tools.py mineru/backend/pipeline/model_json_to_middle_json.py mineru/backend/vlm/model_output_to_middle_json.py mineru/backend/hybrid/hybrid_model_output_to_middle_json.py

Model Management Architecture

MinerU uses singleton patterns to cache loaded models across multiple parsing requests, minimizing initialization overhead.

Singleton Implementation Pattern

The ModelSingleton class in vlm_analyze.py22-190 implements the singleton pattern:

This ensures each unique combination of backend configuration is initialized only once across all parsing requests.

Sources: mineru/backend/vlm/vlm_analyze.py22-190 mineru/backend/pipeline/model_init.py mineru/utils/models_download_utils.py

Hardware Acceleration Support

MinerU supports multiple hardware accelerators through platform-specific configurations:

Accelerator	Environment Variable	Backend Support	Documentation
NVIDIA GPU (CUDA)	`CUDA_VISIBLE_DEVICES`	All backends	Standard PyTorch
Huawei Ascend NPU	`MINERU_LMDEPLOY_DEVICE=ascend`	vllm, lmdeploy	Ascend.md
METAX GPU	`MINERU_LMDEPLOY_DEVICE=maca`	lmdeploy	METAX.md
T-Head PPU	Platform-specific	vllm	THead.md
Apple Silicon MPS	Automatic detection	transformers, mlx	MPS/MLX
AMD GPU (ROCm)	Custom kernels	vllm	AMD.md

Device Detection Flow

Sources: mineru/utils/config_reader.py mineru/cli/client.py163-178 docs/zh/usage/acceleration_cards/

Output Formats and Structure

MinerU generates multiple output files, each serving different downstream use cases:

File	Format	Purpose	Generation
`<name>.md`	Markdown	Human-readable document	`union_make(MakeMode.MM_MD)`
`middle.json`	JSON	Normalized intermediate format	`result_to_middle_json()`
`content_list.json`	JSON	Flat structured data (v1)	`union_make(MakeMode.CONTENT_LIST)`
`content_list_v2.json`	JSON	Enhanced structured data (v2, VLM only)	`union_make(MakeMode.CONTENT_LIST_V2)`
`model.json`	JSON	Raw model inference output	Direct backend output
`layout.pdf`	PDF	Layout bounding box visualization	`draw_layout_bbox()`
`span.pdf`	PDF	Text span visualization	`draw_span_bbox()`

Output Control Flags

Output generation is controlled by boolean flags in common.py414-434:

f_dump_md - Generate Markdown files
f_dump_middle_json - Generate middle.json
f_dump_model_output - Generate model.json
f_dump_content_list - Generate content_list.json
f_draw_layout_bbox - Generate layout.pdf
f_draw_span_bbox - Generate span.pdf
f_make_md_mode - Markdown generation mode (MM_MD vs NLP_FRIENDLY)

Sources: mineru/cli/common.py94-168 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py mineru/backend/vlm/vlm_middle_json_mkcontent.py mineru/utils/draw_bbox.py

Configuration and Extension

Installation Variants

MinerU provides multiple installation options via pyproject.toml46-103:

Package	Dependencies	Use Case
`mineru`	Core only	Minimal installation
`mineru[pipeline]`	Pipeline backend models	CPU-compatible parsing
`mineru[vlm]`	Transformers	GPU VLM parsing
`mineru[vllm]`	vLLM	Linux GPU optimization
`mineru[lmdeploy]`	LMDeploy	Multi-platform GPU
`mineru[mlx]`	MLX	macOS Apple Silicon
`mineru[all]`	All backends	Complete installation

Environment Variables

Key environment variables for configuration:

MINERU_MODEL_SOURCE - Model repository (huggingface/modelscope/local)
MINERU_DEVICE_MODE - Hardware device override
MINERU_VIRTUAL_VRAM_SIZE - GPU memory limit
MINERU_LOG_LEVEL - Logging verbosity
MINERU_VLM_FORMULA_ENABLE - VLM formula recognition toggle
MINERU_VLM_TABLE_ENABLE - VLM table recognition toggle

Sources: pyproject.toml46-103 mineru/cli/client.py169-181 docs/zh/usage/model_source.md17-26

Summary

MinerU's architecture is built around:

Unified orchestration via do_parse/aio_do_parse in common.py414-550
Three processing backends with different accuracy/compatibility trade-offs
Singleton model management for efficient resource utilization
Normalized middle.json format as the central data representation
Multiple user interfaces (CLI, API, Gradio, OpenAI-compatible)
Flexible hardware support across CUDA, NPU, MPS, and CPU platforms
Extensible output generation for diverse downstream use cases

The system's modular design allows users to select appropriate backends based on their accuracy requirements, hardware constraints, and language support needs, while maintaining consistent interfaces and output formats across all processing paths.

Overview of MinerU

Relevant source files

System Architecture

MinerU follows a layered architecture with clear separation between user interfaces, orchestration logic, backend processing, and model management.

Core Processing Entry Points

The central orchestration functions do_parse and aio_do_parse route requests to appropriate backends and manage the complete parsing lifecycle.

Entry Point	Location	Purpose	Async Support
`do_parse`	mineru/cli/common.py414-484	Synchronous parsing orchestrator	No
`aio_do_parse`	mineru/cli/common.py486-550	Asynchronous parsing orchestrator	Yes
`read_fn`	mineru/cli/common.py32-43	File type detection and normalization	N/A
`_prepare_pdf_bytes`	mineru/cli/common.py85-91	Page range extraction via pypdfium2	N/A
`_process_output`	mineru/cli/common.py94-168	Output file generation and writing	N/A

Backend Routing Logic

Sources: mineru/cli/common.py414-550 mineru/utils/engine_utils.py

Backend Comparison

MinerU supports three distinct backends with different characteristics:

Feature	`pipeline`	`vlm-*`	`hybrid-*`
Accuracy	82+ (OmniDocBench v1.5)	90+	90+
CPU-Only Support	✅ Yes	❌ No	❌ No
Min VRAM	6GB	8GB	10GB
Language Support	109 languages via OCR	Chinese, English only	109 languages
Processing Approach	Sequential models	End-to-end VLM	VLM structure + traditional OCR
Hallucination Risk	None (rule-based)	Present (LLM-based)	Reduced (hybrid)
Implementation	pipeline_analyze.py	vlm_analyze.py193-243	hybrid_analyze.py313-485

Backend Selection Guidelines

Sources: README.md120-192 mineru/cli/common.py439-483

User Interface Components

MinerU provides four primary user interfaces, each implemented as a separate CLI entry point:

Entry Point Definitions

All CLI entry points are defined in pyproject.toml111-118:

Sources: pyproject.toml111-118 mineru/cli/client.py154-223 mineru/cli/fast_api.py46-289 mineru/cli/gradio_app.py199-450

Data Flow and Transformations

MinerU processes documents through a multi-stage pipeline that converges on a normalized middle.json format before generating user-facing outputs.

Key Transformation Functions

Function	Location	Purpose
`guess_suffix_by_bytes`	mineru/utils/guess_suffix_or_lang.py	Detect file type (PDF/image)
`images_bytes_to_pdf_bytes`	mineru/utils/pdf_image_tools.py	Convert images to PDF format
`load_images_from_pdf`	mineru/utils/pdf_image_tools.py	Extract page images from PDF
`result_to_middle_json`	Pipeline: backend/pipeline/model_json_to_middle_json.py VLM: backend/vlm/model_output_to_middle_json.py Hybrid: backend/hybrid/hybrid_model_output_to_middle_json.py	Convert backend output to middle.json
`union_make`	Pipeline: backend/pipeline/pipeline_middle_json_mkcontent.py VLM: backend/vlm/vlm_middle_json_mkcontent.py	Generate final output formats

Model Management Architecture

MinerU uses singleton patterns to cache loaded models across multiple parsing requests, minimizing initialization overhead.

Singleton Implementation Pattern

The ModelSingleton class in vlm_analyze.py22-190 implements the singleton pattern:

This ensures each unique combination of backend configuration is initialized only once across all parsing requests.

Sources: mineru/backend/vlm/vlm_analyze.py22-190 mineru/backend/pipeline/model_init.py mineru/utils/models_download_utils.py

Hardware Acceleration Support

MinerU supports multiple hardware accelerators through platform-specific configurations:

Accelerator	Environment Variable	Backend Support	Documentation
NVIDIA GPU (CUDA)	`CUDA_VISIBLE_DEVICES`	All backends	Standard PyTorch
Huawei Ascend NPU	`MINERU_LMDEPLOY_DEVICE=ascend`	vllm, lmdeploy	Ascend.md
METAX GPU	`MINERU_LMDEPLOY_DEVICE=maca`	lmdeploy	METAX.md
T-Head PPU	Platform-specific	vllm	THead.md
Apple Silicon MPS	Automatic detection	transformers, mlx	MPS/MLX
AMD GPU (ROCm)	Custom kernels	vllm	AMD.md

Device Detection Flow

Sources: mineru/utils/config_reader.py mineru/cli/client.py163-178 docs/zh/usage/acceleration_cards/

Output Formats and Structure

MinerU generates multiple output files, each serving different downstream use cases:

File	Format	Purpose	Generation
`<name>.md`	Markdown	Human-readable document	`union_make(MakeMode.MM_MD)`
`middle.json`	JSON	Normalized intermediate format	`result_to_middle_json()`
`content_list.json`	JSON	Flat structured data (v1)	`union_make(MakeMode.CONTENT_LIST)`
`content_list_v2.json`	JSON	Enhanced structured data (v2, VLM only)	`union_make(MakeMode.CONTENT_LIST_V2)`
`model.json`	JSON	Raw model inference output	Direct backend output
`layout.pdf`	PDF	Layout bounding box visualization	`draw_layout_bbox()`
`span.pdf`	PDF	Text span visualization	`draw_span_bbox()`

Output Control Flags

Output generation is controlled by boolean flags in common.py414-434:

f_dump_md - Generate Markdown files
f_dump_middle_json - Generate middle.json
f_dump_model_output - Generate model.json
f_dump_content_list - Generate content_list.json
f_draw_layout_bbox - Generate layout.pdf
f_draw_span_bbox - Generate span.pdf
f_make_md_mode - Markdown generation mode (MM_MD vs NLP_FRIENDLY)

Sources: mineru/cli/common.py94-168 mineru/backend/pipeline/pipeline_middle_json_mkcontent.py mineru/backend/vlm/vlm_middle_json_mkcontent.py mineru/utils/draw_bbox.py

Configuration and Extension

Installation Variants

MinerU provides multiple installation options via pyproject.toml46-103:

Package	Dependencies	Use Case
`mineru`	Core only	Minimal installation
`mineru[pipeline]`	Pipeline backend models	CPU-compatible parsing
`mineru[vlm]`	Transformers	GPU VLM parsing
`mineru[vllm]`	vLLM	Linux GPU optimization
`mineru[lmdeploy]`	LMDeploy	Multi-platform GPU
`mineru[mlx]`	MLX	macOS Apple Silicon
`mineru[all]`	All backends	Complete installation

Environment Variables

Key environment variables for configuration:

MINERU_MODEL_SOURCE - Model repository (huggingface/modelscope/local)
MINERU_DEVICE_MODE - Hardware device override
MINERU_VIRTUAL_VRAM_SIZE - GPU memory limit
MINERU_LOG_LEVEL - Logging verbosity
MINERU_VLM_FORMULA_ENABLE - VLM formula recognition toggle
MINERU_VLM_TABLE_ENABLE - VLM table recognition toggle

Sources: pyproject.toml46-103 mineru/cli/client.py169-181 docs/zh/usage/model_source.md17-26

Summary

MinerU's architecture is built around:

Unified orchestration via do_parse/aio_do_parse in common.py414-550
Three processing backends with different accuracy/compatibility trade-offs
Singleton model management for efficient resource utilization
Normalized middle.json format as the central data representation
Multiple user interfaces (CLI, API, Gradio, OpenAI-compatible)
Flexible hardware support across CUDA, NPU, MPS, and CPU platforms
Extensible output generation for diverse downstream use cases

Overview of MinerU

System Architecture

Core Processing Entry Points

Backend Routing Logic

Backend Comparison

Backend Selection Guidelines

User Interface Components

Entry Point Definitions

Data Flow and Transformations

Key Transformation Functions

Model Management Architecture

Singleton Implementation Pattern

Hardware Acceleration Support

Device Detection Flow

Output Formats and Structure

Output Control Flags

Configuration and Extension

Installation Variants

Environment Variables

Summary

On this page

Overview of MinerU

System Architecture

Core Processing Entry Points

Backend Routing Logic

Backend Comparison

Backend Selection Guidelines

User Interface Components

Entry Point Definitions

Data Flow and Transformations

Key Transformation Functions

Model Management Architecture

Singleton Implementation Pattern

Hardware Acceleration Support

Device Detection Flow

Output Formats and Structure

Output Control Flags

Configuration and Extension

Installation Variants

Environment Variables

Summary

On this page