A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
- PDF to Markdown conversion with formatting preservation using Docling
- Automatic image extraction with quality preservation using XRef Id
- Table detection using Microsoft's Table Transformer
- PDF URL support for core functionalities
- AI-powered image and table descriptions using multiple LLM providers
- Interactive HTML output with downloadable Excel tables
- Customizable image resolution and UI elements
- Comprehensive logging system
- Support for other files
- Streamlit/web interface
pip install markdrop  If you are using the CLI, you can install the package in editable mode:
python -m pip install -e .Python Package Index (PyPI) Page: https://pypi.org/project/markdrop
After installing the package, you can use the markdrop command-line interface.
1. Convert PDF to Markdown and HTML:
markdrop convert <input_path> --output_dir <output_directory> [--add_tables]- <input_path>: Path or URL to the input PDF file.
- <output_directory>: Directory to save output files (default:- output).
- --add_tables: (Optional) Add downloadable tables to the HTML output.
Example:
markdrop convert my_document.pdf --output_dir processed_docs --add_tables2. Generate Descriptions for Images and Tables in a Markdown File:
markdrop describe <input_path> --output_dir <output_directory> --ai_provider <provider> [--remove_images] [--remove_tables]- <input_path>: Path to the markdown file.
- <output_directory>: Directory to save the processed file (default:- output).
- <provider>: AI provider to use (- geminior- openai).
- --remove_images: (Optional) Remove images from the markdown file.
- --remove_tables: (Optional) Remove tables from the markdown file.
Example:
markdrop describe my_markdown.md --output_dir described_content --ai_provider gemini --remove_images3. Analyze Images in a PDF File:
markdrop analyze <input_path> --output_dir <output_directory> [--save_images]- <input_path>: Path or URL to the PDF file.
- <output_directory>: Directory to save analysis results (default:- output/analysis).
- --save_images: (Optional) Save extracted images.
Example:
markdrop analyze report.pdf --output_dir pdf_analysis --save_images4. Set Up API Keys for AI Providers:
markdrop setup <provider>- <provider>: The AI provider to set up (- geminior- openai).
Example:
markdrop setup gemini5. Generate Descriptions for Images (Standalone):
markdrop generate <input_path> --output_dir <output_directory> [--prompt <prompt_text>] [--llm_client <client1> <client2> ...]- <input_path>: Path to an image file or a directory of images.
- <output_directory>: Directory to save the descriptions CSV (default:- output/descriptions).
- --prompt: (Optional) Prompt for the AI model (default: "Describe the image in detail.").
- --llm_client: (Optional) List of LLM clients to use (default:- gemini). Available:- qwen,- gemini,- openai,- llama-vision,- molmo,- pixtral.
Example:
markdrop generate my_images/ --output_dir image_descriptions --prompt "What is in this picture?" --llm_client gemini openaifrom markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
# Configure processing options
config = MarkDropConfig(
    image_resolution_scale=2.0,        # Scale factor for image resolution
    download_button_color='#444444',   # Color for download buttons in HTML
    log_level=logging.INFO,           # Logging detail level
    log_dir='logs',                   # Directory for log files
    excel_dir='markdropped-excel-tables'  # Directory for Excel table exports
)
# Process PDF document
input_doc_path = "path/to/input.pdf"
output_dir = Path('output_directory')
# Convert PDF and generate HTML with images and tables
html_path = markdrop(input_doc_path, str(output_dir), config)
# Add interactive table download functionality
downloadable_html = add_downloadable_tables(html_path, config)from markdrop import setup_keys, process_markdown, ProcessorConfig, AIProvider, logger
from pathlib import Path
# Set up API keys for AI providers
setup_keys(key='gemini')  # or setup_keys(key='openai')
# Configure AI processing options
config = ProcessorConfig(
    input_path="path/to/markdown/file.md",    # Input markdown file path
    output_dir=Path("output_directory"),      # Output directory
    ai_provider=AIProvider.GEMINI,            # AI provider (GEMINI or OPENAI)
    remove_images=False,                      # Keep or remove original images
    remove_tables=False,                      # Keep or remove original tables
    table_descriptions=True,                  # Generate table descriptions
    image_descriptions=True,                  # Generate image descriptions
    max_retries=3,                           # Number of API call retries
    retry_delay=2,                           # Delay between retries in seconds
    gemini_model_name="gemini-2.5-flash",    # Gemini model for images
    gemini_text_model_name="gemini--2.5-flash",     # Gemini model for text
    image_prompt=DEFAULT_IMAGE_PROMPT,        # Custom prompt for image analysis
    table_prompt=DEFAULT_TABLE_PROMPT         # Custom prompt for table analysis
)
# Process markdown with AI descriptions
output_path = process_markdown(config)from markdrop import generate_descriptions
prompt = "Give textual highly detailed descriptions from this image ONLY, nothing else."
input_path = 'path/to/img_file/or/dir'
output_dir = 'data/output'
llm_clients = ['gemini', 'llama-vision']  # Available: ['qwen', 'gemini', 'openai', 'llama-vision', 'molmo', 'pixtral']
generate_descriptions(
    input_path=input_path,
    output_dir=output_dir,
    prompt=prompt,
    llm_client=llm_clients
)Converts PDF to markdown and HTML with enhanced features.
Parameters:
- input_doc_path(str): Path to input PDF file
- output_dir(str): Output directory path
- config(MarkDropConfig, optional): Configuration options for processing
Adds interactive table download functionality to HTML output.
Parameters:
- html_path(Path): Path to HTML file
- config(MarkDropConfig, optional): Configuration options
Configuration for PDF processing:
- image_resolution_scale(float): Scale factor for image resolution (default: 2.0)
- download_button_color(str): HTML color code for download buttons (default: '#444444')
- log_level(int): Logging level (default: logging.INFO)
- log_dir(str): Directory for log files (default: 'logs')
- excel_dir(str): Directory for Excel table exports (default: 'markdropped-excel-tables')
Configuration for AI processing:
- input_path(str): Path to markdown file
- output_dir(str): Output directory path
- ai_provider(AIProvider): AI provider selection (GEMINI or OPENAI)
- remove_images(bool): Whether to remove original images
- remove_tables(bool): Whether to remove original tables
- table_descriptions(bool): Generate table descriptions
- image_descriptions(bool): Generate image descriptions
- max_retries(int): Maximum API call retries
- retry_delay(int): Delay between retries in seconds
- gemini_model_name(str): Gemini model for image processing
- gemini_text_model_name(str): Gemini model for text processing
- image_prompt(str): Custom prompt for image analysis
- table_prompt(str): Custom prompt for table analysis
We welcome contributions! Please see our Contributing Guidelines for details.
- Clone the repository:
git clone https://github.com/shoryasethia/markdrop.git  
cd markdrop  - Create a virtual environment:
python -m venv venv  
source venv/bin/activate  # On Windows: venv\Scripts\activate  - Install development dependencies:
pip install -r requirements.txt  markdrop/  
├── LICENSE  
├── README.md  
├── CONTRIBUTING.md  
├── CHANGELOG.md  
├── requirements.txt  
├── setup.py  
└── markdrop/ 
    ├── __init__.py 
    ├── src
    |    └── markdrop-logo.png
    ├── main.py
    ├── process.py
    ├── api_setup.py
    ├── parse.py
    ├── utils.py  
    ├── helper.py
    ├── ignore_warnings.py
    ├── run.py
    └── models/
        ├── __init__.py
        ├── .env
        ├── img_descriptions.py
        ├── logger.py
        ├── model_loader.py
        ├── responder.py
        └── setup_keys.py  This project is licensed under the MIT License - see the LICENSE file for details.
See CHANGELOG.md for version history.
Please note that this project follows our Code of Conduct.