Thanks to visit codestin.com
Credit goes to github.com

Skip to content

linxule/mineru-mcp

Repository files navigation

mineru-mcp

MCP server for MinerU document parsing API — extract text, tables, and formulas from PDFs, DOCs, and images.

Features

  • VLM model — 90%+ accuracy for complex documents
  • Pipeline model — Fast processing for simple documents
  • Local file upload — Upload files from disk for batch parsing
  • Batch processing — Parse up to 200 documents at once
  • Download & rename — Extract markdown with original filenames
  • Page ranges — Extract specific pages only
  • 109 language OCR support
  • Optimized for Claude Code — 73% token reduction vs alternatives

Tools

Tool Description
mineru_parse Parse a document URL
mineru_status Check task progress, get download URL
mineru_batch Parse multiple URLs (max 200)
mineru_batch_status Get batch results with pagination
mineru_upload_batch Upload local files for batch parsing
mineru_download_results Download results as named markdown files

Installation

Requires Node.js 18+ and a MinerU API key.

CLI Install (one-liner)

# Claude Code
claude mcp add mineru-mcp -e MINERU_API_KEY=your-api-key -- npx -y mineru-mcp

# Codex CLI (OpenAI)
codex mcp add mineru --env MINERU_API_KEY=your-api-key -- npx -y mineru-mcp

# Gemini CLI (Google)
gemini mcp add -e MINERU_API_KEY=your-api-key mineru npx -y mineru-mcp

Claude Desktop

Add to your claude_desktop_config.json:

OS Config path
macOS ~/Library/Application Support/Claude/claude_desktop_config.json
Windows %APPDATA%\Claude\claude_desktop_config.json
Linux ~/.config/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "mineru": {
      "command": "npx",
      "args": ["-y", "mineru-mcp"],
      "env": {
        "MINERU_API_KEY": "your-api-key"
      }
    }
  }
}

VS Code

Add to .vscode/mcp.json (workspace) or open Command Palette > MCP: Open User Configuration (global):

{
  "servers": {
    "mineru": {
      "command": "npx",
      "args": ["-y", "mineru-mcp"],
      "env": {
        "MINERU_API_KEY": "your-api-key"
      }
    }
  }
}

Note: VS Code uses "servers" as the top-level key, not "mcpServers". Other VS Code forks (Trae, Void, PearAI, etc.) typically use this same format.

Cursor

Add to ~/.cursor/mcp.json (global) or .cursor/mcp.json (project):

{
  "mcpServers": {
    "mineru": {
      "command": "npx",
      "args": ["-y", "mineru-mcp"],
      "env": {
        "MINERU_API_KEY": "your-api-key"
      }
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json (Windows: %USERPROFILE%\.codeium\windsurf\mcp_config.json):

{
  "mcpServers": {
    "mineru": {
      "command": "npx",
      "args": ["-y", "mineru-mcp"],
      "env": {
        "MINERU_API_KEY": "your-api-key"
      }
    }
  }
}

Cline

Open MCP Servers icon in Cline panel > Configure > Advanced MCP Settings, then add:

{
  "mcpServers": {
    "mineru": {
      "command": "npx",
      "args": ["-y", "mineru-mcp"],
      "env": {
        "MINERU_API_KEY": "your-api-key"
      }
    }
  }
}

Cherry Studio

In Settings > MCP Servers > Add Server, set Type to STDIO, Command to npx, Args to -y mineru-mcp, and add environment variable MINERU_API_KEY. Or paste in JSON/Code mode:

{
  "mineru": {
    "name": "MinerU",
    "command": "npx",
    "args": ["-y", "mineru-mcp"],
    "env": {
      "MINERU_API_KEY": "your-api-key"
    },
    "isActive": true
  }
}

Witsy

In Settings > MCP Servers, add a new server with Type: stdio, Command: npx, Args: -y mineru-mcp, and set environment variable MINERU_API_KEY to your API key.

Codex CLI (TOML config)

Alternatively, edit ~/.codex/config.toml directly:

[mcp_servers.mineru]
command = "npx"
args = ["-y", "mineru-mcp"]

[mcp_servers.mineru.env]
MINERU_API_KEY = "your-api-key"

Gemini CLI (JSON config)

Alternatively, edit ~/.gemini/settings.json directly:

{
  "mcpServers": {
    "mineru": {
      "command": "npx",
      "args": ["-y", "mineru-mcp"],
      "env": {
        "MINERU_API_KEY": "your-api-key"
      }
    }
  }
}

Windows

On Windows, npx requires a shell wrapper. Replace "command": "npx" with:

{
  "command": "cmd",
  "args": ["/c", "npx", "-y", "mineru-mcp"],
  "env": {
    "MINERU_API_KEY": "your-api-key"
  }
}

For CLI tools on Windows:

claude mcp add mineru-mcp -e MINERU_API_KEY=your-api-key -- cmd /c npx -y mineru-mcp
codex mcp add mineru --env MINERU_API_KEY=your-api-key -- cmd /c npx -y mineru-mcp

ChatGPT

ChatGPT only supports remote MCP servers over HTTPS — local stdio servers like this one are not directly supported. You would need to deploy behind a public URL with HTTP transport.

Configuration

Environment Variable Default Description
MINERU_API_KEY (required) Your MinerU API Bearer token
MINERU_BASE_URL https://mineru.net/api/v4 API base URL
MINERU_DEFAULT_MODEL pipeline Default model: pipeline or vlm

Get your API key at mineru.net

Usage

Parse a single URL

mineru_parse({
  url: "https://example.com/document.pdf",
  model: "vlm",        // optional: "pipeline" (default) or "vlm" (90% accuracy)
  pages: "1-10,15",    // optional: page ranges
  ocr: true,           // optional: enable OCR (pipeline only)
  formula: true,       // optional: formula recognition
  table: true,         // optional: table recognition
  language: "en",      // optional: language code
  formats: ["html"]    // optional: extra export formats
})

Check task progress

mineru_status({
  task_id: "abc-123",
  format: "concise"    // optional: "concise" (default) or "detailed"
})

Concise output: done | abc-123 | https://cdn-mineru.../result.zip

Batch parse URLs

mineru_batch({
  urls: ["https://example.com/doc1.pdf", "https://example.com/doc2.pdf"],
  model: "vlm"
})

Check batch progress

mineru_batch_status({
  batch_id: "batch-123",
  limit: 10,           // optional: max results (default: 10)
  offset: 0,           // optional: skip first N results
  format: "concise"    // optional: "concise" or "detailed"
})

Upload local files

mineru_upload_batch({
  directory: "/path/to/pdfs",  // scan directory for supported files
  // OR
  files: ["/path/to/doc1.pdf", "/path/to/doc2.pdf"],  // explicit file list
  model: "vlm",        // optional
  formula: true,       // optional
  table: true,         // optional
  language: "en",      // optional
  formats: ["html"]    // optional
})

Returns batch_id for tracking. Each file's original name is preserved via data_id (spaces become underscores).

Download results as markdown

mineru_download_results({
  batch_id: "batch-123",       // from mineru_upload_batch or mineru_batch
  output_dir: "/path/to/output",
  overwrite: false             // optional: overwrite existing files
})

Output filenames are derived from data_id (e.g., my_paper_title.md). Spaces in original filenames become underscores.

Typical local file workflow

mineru_upload_batch → mineru_batch_status (poll) → mineru_download_results

Supported Formats

  • PDF, DOC, DOCX, PPT, PPTX
  • PNG, JPG, JPEG

Limits

  • Single file: 200MB max, 600 pages max
  • Daily quota: 2000 pages at high priority
  • Batch: max 200 files per request

License

MIT

Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published