A beautiful web application built with FastAPI and HTML that converts PDF files to Markdown format using various conversion libraries.
- 🚀 Multiple Conversion Libraries: Support for
pymupdf4llm,markitdown,marker, anddocling - 📱 Modern UI: Beautiful, responsive design with drag-and-drop file upload
- ⚡ Fast Processing: Efficient backend processing with real-time feedback
- 🔄 Library Status: Dynamic checking of available libraries
- 📄 Preview Results: View converted Markdown content directly in the browser
- pymupdf4llm (
>=0.0.26) - Fast PDF text extraction with PyMuPDF - markitdown (
>=0.1.2) - Microsoft's document conversion tool - marker (
>=1.8.1) - Advanced PDF to Markdown conversion with ML - docling (
>=2.41.0) - IBM's document intelligence platform
- Python 3.8 or higher
- UV (fast Python package installer)
-
Clone or download the project files
-
Install dependencies:
uv sync
Or if you don't have a virtual environment:
uv pip install -r requirements.txt
Note: Some libraries may require additional system dependencies:
- For marker: May require additional ML dependencies
- For docling: May require specific Python versions and dependencies
- For pymupdf4llm: Should work out of the box
- For markitdown: May require additional dependencies for certain file types
-
Verify installation (optional):
uv run python -c "import fastapi; print('FastAPI installed successfully')"
-
Start the server:
uv run python main.py
Or alternatively:
uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload
-
Open your browser and navigate to:
http://localhost:8000 -
Use the application:
- Select a conversion library from the dropdown
- Upload a PDF file (drag-and-drop or click to select)
- Click "Convert to Markdown"
- View the converted result
This project includes a Dockerfile and docker-compose.yml for easy containerization.
-
Build and run the container:
docker-compose up --build
-
Open your browser and navigate to:
http://localhost:8000
This will start the application inside a Docker container, accessible on port 8000.
GET /- Main web interfaceGET /health- Health check endpointGET /libraries- Get available conversion librariesPOST /convert- Convert PDF to Markdown
# Check available libraries
curl http://localhost:8000/libraries
# Convert a PDF file
curl -X POST \
-F "[email protected]" \
-F "library=pymupdf4llm" \
http://localhost:8000/convertIf you encounter issues with specific libraries:
-
pymupdf4llm: Usually installs without issues
uv add pymupdf4llm
-
markitdown: May need Microsoft Build Tools on Windows
uv add markitdown
-
marker: Requires additional ML dependencies
uv add marker-pdf
-
docling: May have specific version requirements
uv add docling
- "Library not available": The library failed to import. Check the installation.
- "Conversion failed": The selected library couldn't process your PDF. Try a different library.
- Large file timeout: Some libraries may take longer for large files.
- pymupdf4llm: Fastest for simple text extraction
- markitdown: Good balance of speed and quality
- marker: Best quality but slower, especially on first run
- docling: Advanced features but may be slower
pdftomd-ui/
├── main.py # FastAPI application
├── requirements.txt # Python dependencies
├── templates/
│ └── index.html # Web interface
├── uploads/ # Temporary upload directory (auto-created)
└── README.md # This file
To add support for additional conversion libraries:
- Add the library to
requirements.txt - Import it in
main.pywith try/except - Add it to the
get_available_libraries()function - Create a conversion function following the existing pattern
- Add it to the conversion logic in
/convertendpoint
This project is provided as-is for educational and development purposes.
Feel free to submit issues and enhancement requests!