PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
-
Updated
Mar 2, 2026 - Python
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Open source Python library for converting PDF to DOCX.
(eBook,PDFs Translation) A multilingual eBook processing tool supporting all eBook formats. Features online and offline translation while preserving original layouts. Compatible with both scanned and digital PDFs. Elegant user interface. The world's highest-performing open-source layout-preserving eBook translator.
A CLI toolset to generate table of contents for PDF files automatically.
Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG
Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
Markdown to pdf renderer
Smart PDF to Markdown converter with intelligent heading detection, automatic header/footer removal, orphan fragment merging, and image export. Features a user-friendly GUI with preview mode, persistent settings, and per-page error recovery. Optimized for Obsidian and other Markdown-based note-taking workflows.
A Pure Python PDFViewer, which provides functionalities same as other famous PDFViewers.
In this code, a simple implementation of PDF to audio converter is shown
A powerful PDF processing engine that deconstructs documents into their core elements—text, images, and tables—and seamlessly reconstructs them into pristine, structured Markdown. Built with a React frontend and a robust Python (PyMuPDF) backend on Appwrite.
Multimodal RAG with PyMuPDF
pdfgui_tools is a user interface tool developed in Qt and Python that integrates with poppler-utils and PyPDF2 for PDF document management. It's a simple and user-friendly tool that includes various utilities.
A comprehensive web application that detects AI-generated content in PDF documents and transforms AI text into natural human-like writing. Built with Streamlit, spaCy, and Hugging Face transformers.
AI-Powered Thesis Review Tool
Advanced document analysis platform that extracts text from PDF, DOCX, and TXT files with AI-powered topic classification using Sentence Transformers. Features keyword matching, real-time analysis, interactive Streamlit web interface, and multi-topic support.
Add a description, image, and links to the pymupdf topic page so that developers can more easily learn about it.
To associate your repository with the pymupdf topic, visit your repo's landing page and select "manage topics."