Fast, efficient, and high-quality OCR powered by open visual language models.
Transform your PDF documents and images into structured Markdown text with advanced OCR capabilities. Built on state-of-the-art vision-language models, OCRFlux delivers superior text extraction with intelligent document structure recognition.
- π Single File Processing: Upload PDF or image files for immediate OCR processing with real-time results
- π Batch Processing: Process multiple files simultaneously with intelligent resource management
- β‘ Asynchronous Processing: Submit large files for background processing with comprehensive status tracking
- π Smart Retry Logic: Configurable retry mechanisms for robust processing of challenging documents
- PDF Documents: Multi-page PDF files with complex layouts, tables, and mixed content
- Image Files: PNG, JPG, JPEG images containing text, documents, receipts, and forms
- Quality Optimization: Automatic image preprocessing for optimal text recognition
- π Cross-page Merging: Intelligently merge text elements that span across page boundaries
- π Image Processing: Adjustable resolution and rotation for optimal OCR accuracy
- π― Quality Control: Fallback handling and quality assessment for each processed page
- βοΈ Configurable Parameters: Fine-tune processing behavior for different document types
- π Health Monitoring: Comprehensive system health checks and performance metrics
- π Task Management: Full lifecycle management for asynchronous processing tasks
- π Detailed Logging: Complete audit trail and debugging information
- π Performance Analytics: Processing statistics and optimization insights
| Format | Extensions | Max Size | Notes |
|---|---|---|---|
.pdf |
100MB | Multi-page documents, forms, reports | |
| Images | .png, .jpg, .jpeg |
100MB | Scanned documents, photos, screenshots |
The fastest way to get started is using Docker:
# Clone the repository
git clone https://github.com/your-org/ocrflux-api.git
cd ocrflux-api
# Copy environment configuration
cp .env.example .env
# Start the service
./scripts/deploy.sh up
# Check service status
./scripts/deploy.sh healthThe API will be available at http://localhost:8000
For local development and testing:
# Clone the repository
git clone https://github.com/your-org/ocrflux-api.git
cd ocrflux-api
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
# Set up environment
cp .env.example .env
# Edit .env file with your configuration
# Run configuration check
python scripts/check_config.py
# Start the development server
python run_server.py --mode devOnce the service is running, you can access:
- Interactive API Documentation: http://localhost:8000/docs
- Alternative Documentation: http://localhost:8000/redoc
- OpenAPI Schema: http://localhost:8000/openapi.json
- Service Information: http://localhost:8000/api-info
Key configuration options (see .env.example for complete list):
# Application
APP_NAME=OCRFlux API Service
DEBUG=false
LOG_LEVEL=INFO
# Server
HOST=0.0.0.0
PORT=8000
# Model
MODEL_PATH=/path/to/OCRFlux-3B
GPU_MEMORY_UTILIZATION=0.8
# Processing
MAX_FILE_SIZE=104857600 # 100MB
MAX_CONCURRENT_TASKS=4
# Security
CORS_ORIGINS=http://localhost:3000,https://yourdomain.com
ENABLE_RATE_LIMITING=true- Download OCRFlux Model: Download the OCRFlux-3B model from the official repository
- Set Model Path: Update
MODEL_PATHin your.envfile - GPU Configuration: Adjust
GPU_MEMORY_UTILIZATIONbased on your hardware
# Start basic service
./scripts/deploy.sh up
# Start with Redis caching
./scripts/deploy.sh -p with-redis up
# Start with Nginx reverse proxy
./scripts/deploy.sh -p with-nginx up
# Start with full monitoring stack
./scripts/deploy.sh -p with-monitoring up
# Start everything
./scripts/deploy.sh -p all up# Production deployment with build
./scripts/deploy.sh -p with-nginx --build up
# Check service health
./scripts/deploy.sh health
# View logs
./scripts/deploy.sh logs
# Stop services
./scripts/deploy.sh down- default: Basic OCRFlux API service
- with-redis: Adds Redis for result caching
- with-nginx: Adds Nginx reverse proxy with SSL support
- with-monitoring: Adds Prometheus and Grafana for monitoring
- all: Includes all optional services
# Process a PDF file
curl -X POST "http://localhost:8000/api/v1/parse" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]" \
-F "skip_cross_page_merge=false"
# Process an image
curl -X POST "http://localhost:8000/api/v1/parse" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]"# Process multiple files
curl -X POST "http://localhost:8000/api/v1/batch" \
-H "Content-Type: multipart/form-data" \
-F "[email protected]" \
-F "[email protected]" \
-F "[email protected]"# Submit async task
TASK_ID=$(curl -X POST "http://localhost:8000/api/v1/parse-async" \
-H "Content-Type: multipart/form-data" \
-F "file=@large_document.pdf" | jq -r '.task_id')
# Check task status
curl "http://localhost:8000/api/v1/tasks/$TASK_ID"
# Get results when complete
curl "http://localhost:8000/api/v1/tasks/$TASK_ID/result"# Basic health check
curl "http://localhost:8000/api/v1/health"
# Detailed health information
curl "http://localhost:8000/api/v1/health/detailed"The service provides multiple health check endpoints:
/api/v1/health- Basic health status/api/v1/health/detailed- Comprehensive system information/api/v1/health/model- Model-specific health status
Structured logging with configurable levels:
# Set log level
export LOG_LEVEL=DEBUG
# Log to file
export LOG_FILE=/app/logs/ocrflux.log
# View logs in Docker
docker logs ocrflux-api -fWhen using the monitoring profile:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
Minimum Requirements:
- CPU: 4 cores
- RAM: 8GB
- Storage: 20GB
- GPU: Optional (CUDA-compatible for better performance)
Recommended for Production:
- CPU: 8+ cores
- RAM: 16GB+
- Storage: 50GB+ SSD
- GPU: 8GB+ VRAM (RTX 3080 or better)
# Increase concurrent tasks for more CPU cores
MAX_CONCURRENT_TASKS=8
# Adjust GPU memory usage
GPU_MEMORY_UTILIZATION=0.9
# Increase file size limit for large documents
MAX_FILE_SIZE=209715200 # 200MB
# Optimize for your use case
TASK_TIMEOUT=600 # 10 minutes for very large files- Environment Variables: Never commit
.envfiles with secrets - CORS Configuration: Restrict origins to your domains only
- Rate Limiting: Enable rate limiting in production
- HTTPS: Use SSL/TLS certificates (configure in Nginx)
- File Validation: Ensure proper file type validation
- Resource Limits: Set appropriate Docker resource limits
- Network Security: Use Docker networks for service isolation
- Regular Updates: Keep dependencies and base images updated
# Restrict CORS origins
CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com
# Enable rate limiting
ENABLE_RATE_LIMITING=true
RATE_LIMIT_PER_MINUTE=60
# Set resource limits in docker-compose.yml
MEMORY_LIMIT=4G
CPU_LIMIT=2.0# Install test dependencies
pip install pytest pytest-asyncio
# Run unit tests
pytest tests/test_*.py -v
# Run integration tests
pytest tests/test_*_integration.py -v
# Run all tests with coverage
pytest --cov=api tests/ -v# Check configuration
python scripts/check_config.py
# Check specific components
python scripts/check_config.py --check deps
python scripts/check_config.py --check config
python scripts/check_config.py --check security# Clone and setup
git clone https://github.com/your-org/ocrflux-api.git
cd ocrflux-api
# Development environment
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
# Pre-commit hooks
pre-commit install
# Run development server
python run_server.py --mode dev --reloadocrflux-api/
βββ api/ # Main application code
β βββ core/ # Core components
β βββ models/ # Pydantic models
β βββ routes/ # API routes
β βββ middleware/ # Custom middleware
β βββ services/ # Business logic
βββ tests/ # Test suite
βββ scripts/ # Utility scripts
βββ nginx/ # Nginx configuration
βββ monitoring/ # Monitoring configs
βββ docs/ # Additional documentation
βββ Dockerfile # Docker configuration
βββ docker-compose.yml # Docker Compose setup
βββ pyproject.toml # Python project config
βββ README.md # This file
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run the test suite (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Check configuration
python scripts/check_config.py
# Check Docker status
docker ps
docker logs ocrflux-api
# Check resource usage
docker stats# Verify model path
ls -la $MODEL_PATH
# Check GPU availability
nvidia-smi # For NVIDIA GPUs
# Check memory usage
free -h# Monitor resource usage
htop
nvidia-smi -l 1
# Check service metrics
curl http://localhost:8000/api/v1/health/detailed
# Review logs for bottlenecks
docker logs ocrflux-api | grep -i "slow\|timeout\|error"# Check port availability
netstat -tlnp | grep 8000
# Test connectivity
curl -v http://localhost:8000/api/v1/health
# Check Docker networks
docker network ls
docker network inspect ocrflux-network- Documentation: Check
/docsendpoint for API documentation - Issues: Report bugs on GitHub Issues
- Discussions: Join community discussions
- Support: Contact [email protected]
This project is licensed under the MIT License - see the LICENSE file for details.
- OCRFlux - The underlying OCR engine
- FastAPI - The web framework
- vLLM - High-performance LLM inference
- Docker - Containerization platform
For support and questions:
- Email: [email protected]
- Documentation: http://localhost:8000/docs
- GitHub Issues: https://github.com/your-org/ocrflux-api/issues
- Community: https://github.com/your-org/ocrflux-api/discussions
Made with β€οΈ by the OCRFlux Team