A production-ready web server application built using FastAPI that receives binary data from various document formats and converts them to Markdown using the MarkItDown library.
π‘ Quick Answer: Is there a concurrency limit?
By default, the server runs with 1 worker and no rate limiting. You can configure workers and rate limits using environment variables.
See docs/CONCURRENCY_SUMMARY.md for a quick guide or docs/CONCURRENCY.md for detailed information.
- Multiple Format Support: Convert DOC, DOCX, PPT, PPTX, PDF, XLS, XLSX, ODT, ODS, ODP, and TXT files to Markdown
- FastAPI Framework: Modern, fast, and well-documented REST API
- Health Checks: Built-in health monitoring endpoints
- Input Validation: Comprehensive file size, type, and content validation
- Error Handling: Robust error handling with detailed error messages
- CORS Support: Configurable CORS for web client integration
- Security Headers: Built-in security headers middleware
- Docker Support: Containerized deployment ready
- Azure Compatible: Ready for Azure Container Apps deployment
- AI Chat Web App Sample: Full-featured .NET Aspire application with document upload and RAG chat capabilities (see samples/AiChatWebApp)
# Build the Docker image
docker build -t markitdownserver .
# Run the container
docker run -d --name markitdownserver -p 8490:8490 markitdownserver
# Test the health endpoint
curl http://localhost:8490/health# Install dependencies
pip install -r requirements.txt
# Run the server
python app.pyThe server will be available at http://localhost:8490
- Python 3.12+ (for local development)
- Docker (for containerized deployment)
- .NET 9.0 SDK (for running C# client examples)
-
Clone the repository:
git clone https://github.com/elbruno/MarkItDownServer.git cd MarkItDownServer -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run the server:
python app.py
The server will start on
http://0.0.0.0:8490
-
Build the Docker image:
docker build -t markitdownserver:latest . -
Run the container:
docker run -d \ --name markitdownserver \ -p 8490:8490 \ markitdownserver:latest
-
Verify the container is running:
docker ps | grep markitdownserver -
View logs:
docker logs markitdownserver
-
Stop the container:
docker stop markitdownserver docker rm markitdownserver
GET /Returns service information and available endpoints.
Response:
{
"service": "MarkItDown Server",
"description": "API for converting documents to Markdown",
"version": "1.0.0",
"endpoints": {
"health": "/health",
"docs": "/docs",
"process": "/process_file"
}
}GET /healthReturns the health status of the service.
Response:
{
"status": "healthy",
"timestamp": "2025-01-07T12:00:00",
"service": "MarkItDown Server",
"version": "1.0.0"
}POST /process_fileUpload a document file and receive its content in Markdown format.
Parameters:
file: The document file to convert (multipart/form-data)
Supported File Types:
- Microsoft Office: DOC, DOCX, XLS, XLSX, PPT, PPTX
- PDF: PDF
- OpenDocument: ODT, ODS, ODP
- Text: TXT
File Size Limit: 50MB
Response:
{
"markdown": "# Document Title\n\nContent in markdown format..."
}Error Responses:
400 Bad Request: Invalid file type or empty file413 Payload Too Large: File exceeds 50MB500 Internal Server Error: Conversion error
Located in samples/SimpleConsole/, this is a basic example showing minimal code to use the API.
cd samples/SimpleConsole
dotnet runCode:
using System.Net.Http.Headers;
HttpClient client = new HttpClient();
string url = "http://127.0.0.1:8490/process_file";
string filePath = "Benefit_Options.pdf";
using (var content = new MultipartFormDataContent())
{
byte[] fileBytes = File.ReadAllBytes(filePath);
var fileContent = new ByteArrayContent(fileBytes);
fileContent.Headers.ContentType = MediaTypeHeaderValue.Parse("application/pdf");
content.Add(fileContent, "file", Path.GetFileName(filePath));
var response = await client.PostAsync(url, content);
if (response.IsSuccessStatusCode)
{
string responseBody = await response.Content.ReadAsStringAsync();
Console.WriteLine($"MarkDown for {filePath}\n\n{responseBody}");
}
}Located in samples/DetailedConsole/, this includes comprehensive error handling, configuration, and features.
cd samples/DetailedConsole
dotnet runFeatures:
- Configuration file support (
appsettings.json) - Comprehensive error handling
- Timeout configuration
- File validation
- Colored console output
- Automatic markdown file saving
- Content type detection
Configuration (appsettings.json):
{
"MarkItDownServer": {
"Url": "http://127.0.0.1:8490/process_file",
"FilePath": "Benefit_Options.pdf",
"TimeoutMinutes": "5"
}
}Located in samples/AiChatWebApp/, this is a complete .NET Aspire application with:
- Blazor Server UI with modern chat interface
- Document upload with drag-and-drop support
- Integration with GitHub Models for AI chat
- Retrieval-Augmented Generation (RAG) with vector search
- Real-time document processing and ingestion
cd samples/AiChatWebApp
# See QUICKSTART.md for detailed setup instructions
dotnet run --project AiChatWebApp.AppHostFeatures:
- Upload documents (PDF, Word, PowerPoint, Excel, Text) through the web UI
- Documents are automatically converted to Markdown via MarkItDown
- Chat with your documents using AI
- Semantic search with citations
- .NET Aspire orchestration with health monitoring
For complete documentation, see samples/AiChatWebApp/README.md or QUICKSTART.md.
curl -X POST "http://localhost:8490/process_file" \
-F "[email protected]" \
-H "Content-Type: multipart/form-data"import requests
url = "http://localhost:8490/process_file"
files = {"file": open("document.pdf", "rb")}
response = requests.post(url, files=files)
if response.status_code == 200:
markdown = response.json()["markdown"]
print(markdown)
else:
print(f"Error: {response.status_code}")
print(response.json())$url = "http://localhost:8490/process_file"
$filePath = "document.pdf"
$fileContent = [System.IO.File]::ReadAllBytes($filePath)
$boundary = [System.Guid]::NewGuid().ToString()
$LF = "`r`n"
$bodyLines = (
"--$boundary",
"Content-Disposition: form-data; name=`"file`"; filename=`"$(Split-Path $filePath -Leaf)`"",
"Content-Type: application/pdf$LF",
[System.Text.Encoding]::UTF8.GetString($fileContent),
"--$boundary--$LF"
) -join $LF
Invoke-RestMethod -Uri $url -Method Post -ContentType "multipart/form-data; boundary=$boundary" -Body $bodyLinesThe server can be configured using environment variables:
PORT: Server port (default: 8490)HOST: Server host (default: 0.0.0.0)MAX_FILE_SIZE: Maximum file size in bytes (default: 52428800 = 50MB)LOG_LEVEL: Logging level (default: INFO)WORKERS: Number of worker processes (default: 1)ENABLE_RATE_LIMIT: Enable rate limiting (default: false)RATE_LIMIT: Rate limit (default: 60/minute)
docker run -d \
--name markitdownserver \
-p 8490:8490 \
-e PORT=8490 \
-e MAX_FILE_SIZE=104857600 \
-e WORKERS=4 \
-e ENABLE_RATE_LIMIT=true \
-e RATE_LIMIT=100/minute \
markitdownserver:latestBy default, the server runs with:
- 1 worker process (single worker)
- Async request handling via FastAPI
- No rate limiting
Multi-worker setup for better performance:
# Run with 4 workers
docker run -d -p 8490:8490 -e WORKERS=4 markitdownserver:latestEnable rate limiting to prevent abuse:
# Limit to 100 requests per minute per IP
docker run -d -p 8490:8490 \
-e ENABLE_RATE_LIMIT=true \
-e RATE_LIMIT=100/minute \
markitdownserver:latestNote: Rate limiting requires slowapi package. Install with:
pip install slowapi- Small scale (< 100 req/min): 1-2 workers
- Medium scale (100-1000 req/min): 2-4 workers
- Large scale (> 1000 req/min): Use horizontal scaling with load balancer
π For detailed concurrency information, see docs/CONCURRENCY.md
-
Start the server:
python app.py
-
Run health check:
curl http://localhost:8490/health
-
Test file conversion:
curl -X POST "http://localhost:8490/process_file" \ -F "file=@samples/SimpleConsole/Benefit_Options.pdf"
Simple Console:
cd samples/SimpleConsole
dotnet runDetailed Console:
cd samples/DetailedConsole
dotnet runFor development and testing:
# Using Python
python app.py
# Using uvicorn directly
uvicorn app:app --host 0.0.0.0 --port 8490 --reloadFor production:
# Build
docker build -t markitdownserver:1.0.0 .
# Run
docker run -d \
--name markitdownserver \
-p 8490:8490 \
--restart unless-stopped \
markitdownserver:1.0.0
# View logs
docker logs -f markitdownserverSee the comprehensive docs/CODE_QUALITY_IMPROVEMENTS.md document for detailed Azure deployment instructions, including:
- Multi-stage Dockerfile optimization
- Azure CLI deployment scripts
- Bicep templates for Infrastructure as Code
- Environment configuration
- Security best practices
- Cost estimation and optimization
Quick Azure Deployment:
# Set variables
RESOURCE_GROUP="rg-markitdown"
LOCATION="eastus"
CONTAINER_APP_NAME="markitdown-server"
# Create resource group
az group create --name $RESOURCE_GROUP --location $LOCATION
# Deploy (see docs/CODE_QUALITY_IMPROVEMENTS.md for complete script)
az containerapp up \
--name $CONTAINER_APP_NAME \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--source .MarkItDownServer/
βββ app.py # Main FastAPI application
βββ requirements.txt # Python dependencies
βββ dockerfile # Docker configuration
βββ README.md # This file
βββ docs/ # Comprehensive documentation
βββ samples/
β βββ SimpleConsole/ # Basic C# client example
β β βββ Program.cs
β β βββ SimpleConsole.csproj
β β βββ Benefit_Options.pdf
β βββ DetailedConsole/ # Advanced C# client example
β βββ Program.cs
β βββ DetailedConsole.csproj
β βββ appsettings.json
β βββ Benefit_Options.pdf
βββ src/ # Legacy client (preserved)
β βββ ...
βββ utils/
βββ file_handler.py # Utility functions
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
The project follows Python best practices:
- Type hints for better code clarity
- Comprehensive error handling
- Input validation
- Security headers
- Structured logging
The server provides automatic interactive API documentation:
- Swagger UI: http://localhost:8490/docs
- ReDoc: http://localhost:8490/redoc
These interfaces allow you to:
- Explore all available endpoints
- Test API calls directly from the browser
- View request/response schemas
- See example requests and responses
- fastapi (0.115.5): Modern web framework for building APIs
- uvicorn (0.32.1): ASGI server for FastAPI
- python-multipart (0.0.20): Multipart form data support
- markitdown (0.0.1a2): Document to Markdown conversion
- pydantic (2.10.3): Data validation using Python type hints
- Python 3.12 or higher
- 512MB RAM minimum (1GB recommended)
- 100MB disk space
Issue: Port already in use
Error: [Errno 48] Address already in use
Solution: Change the port or stop the conflicting service
# Find process using port 8490
lsof -i :8490
# Kill the process
kill -9 <PID>
# Or use a different port
python app.py --port 8491Issue: "File type not allowed"
Solution: Ensure your file has a supported extension (doc, docx, pdf, etc.)
Issue: "File too large"
Solution: Files must be under 50MB. Compress or split large files.
Issue: Cannot connect to Docker daemon
Solution: Ensure Docker Desktop is running
docker ps # Test Docker connectionIssue: Container exits immediately
Solution: Check container logs
docker logs markitdownserverFor issues, questions, or contributions:
- GitHub Issues: Create an issue
- Documentation: See docs/
This project is licensed under the MIT License. See the LICENSE file for details.
- Built with FastAPI
- Powered by MarkItDown
- Developed by El Bruno
Comprehensive documentation is available in the docs directory:
- Quick Reference - Common commands and API usage
- Developer Manual - Integration guide for developers
- Concurrency Guide - Performance and scaling information
- Code Quality - Best practices and improvements
- Implementation Plans - Detailed feature implementation plans
- AI Chat Web App - Full .NET Aspire application with:
- Document upload and conversion
- RAG-based chat with semantic search
- Vector store integration
- Real-time markdown preview
- Quick Start Guide
- User Manual (coming soon)
- v1.0.0 (2025-01): Initial release with production-ready features
- Multi-format document conversion
- Comprehensive error handling
- Health check endpoints
- Docker support
- Azure deployment ready
- AI Chat Web App sample with .NET Aspire
Ready to convert your documents to Markdown? π
Start the server and visit http://localhost:8490/docs to explore the interactive API documentation!