A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images and PDF. Available both as a Python package and a Streamlit web application.
- 
Multiple Vision Models Support
- LLaVA: Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)
 - Llama 3.2 Vision: Advanced model with high accuracy for complex documents
 - Granite3.2-vision: A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
 - Moondream: Small vision language model designed to run efficiently on edge devices.
 - Minicpm-v: MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344).
 
 - 
Multiple Output Formats
- Markdown: Preserves text formatting with headers and lists
 - Plain Text: Clean, simple text extraction
 - JSON: Structured data format
 - Structured: Tables and organized data
 - Key-Value Pairs: Extracts labeled information
 - Table: Extract all tabular data.
 
 - 
Batch Processing
- Process multiple images in parallel
 - Progress tracking for each image
 - Image preprocessing (resize, normalize, etc.)
 
 - 
Custom Prompts
- Override default prompts with custom instructions for text extraction.
 
 
pip install ollama-ocr- Install Ollama
 - Pull the required model:
 
ollama pull llama3.2-vision:11b
ollama pull granite3.2-vision
ollama pull moondream
ollama pull minicpm-vfrom ollama_ocr import OCRProcessor
# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', base_url="http://host.docker.internal:11434/api/generate")  # You can use any vision model available on Ollama
# you can pass your custom ollama api
# Process an image
result = ocr.process_image(
    image_path="path/to/your/image.png", # path to your pdf files "path/to/your/file.pdf"
    format_type="markdown",  # Options: markdown, text, json, structured, key_value
    custom_prompt="Extract all text, focusing on dates and names.", # Optional custom prompt
    language="English" # Specify the language of the text (New! π)
)
print(result)from ollama_ocr import OCRProcessor
# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)  # max workers for parallel processing
# Process multiple images
# Process multiple images with progress tracking
batch_results = ocr.process_batch(
    input_path="path/to/images/folder",  # Directory or list of image paths
    format_type="markdown",
    recursive=True,  # Search subdirectories
    preprocess=True,  # Enable image preprocessing
    custom_prompt="Extract all text, focusing on dates and names.", # Optional custom prompt
    language="English" # Specify the language of the text (New! π)
)
# Access results
for file_path, text in batch_results['results'].items():
    print(f"\nFile: {file_path}")
    print(f"Extracted Text: {text}")
# View statistics
print("\nProcessing Statistics:")
print(f"Total images: {batch_results['statistics']['total']}")
print(f"Successfully processed: {batch_results['statistics']['successful']}")
print(f"Failed: {batch_results['statistics']['failed']}")- Markdown Format: The output is a markdown string containing the extracted text from the image.
 - Text Format: The output is a plain text string containing the extracted text from the image.
 - JSON Format: The output is a JSON object containing the extracted text from the image.
 - Structured Format: The output is a structured object containing the extracted text from the image.
 - Key-Value Format: The output is a dictionary containing the extracted text from the image.
 - Table Format: Extract all tabular data.
 
- User-Friendly Interface
- Drag-and-drop file upload
 - Real-time processing
 - Download extracted text
 - Image preview with details
 - Responsive design
 - Language Selection: Specify the language for better OCR accuracy. (New! π)
 
 
- Clone the repository:
 
git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR- Install dependencies:
 
pip install -r requirements.txt- Go to the directory where app.py is located:
 
cd src/ollama_ocr      - Run the Streamlit app:
 
streamlit run app.py- Ollama OCR on Colab: How to use Ollama-OCR on Google Colab.
 - Example Notebook: Example usage of Ollama OCR.
 - Ollama OCR with Autogen: Use Ollama-OCR with autogen.
 - Ollama OCR with LangGraph: Use Ollama-OCR with LangGraph.
 
This project is licensed under the MIT License - see the LICENSE file for details.
Built with Ollama Powered by Vision Models