Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: MIMICLab/DocsRay

v1.9.0

01 Aug 10:47

Choose a tag to compare

[1.9.0] - 2025-02-01

Added

  • LibreOffice Integration: Enhanced document conversion capabilities
    • Automatic detection and use of LibreOffice when available
    • Improved conversion quality for Office documents (DOCX, XLSX, PPTX)
    • Better support for OpenDocument formats (ODT, ODS, ODP)
    • Enhanced HWP/HWPX document handling with LibreOffice
    • Fallback mechanisms when LibreOffice is not available

Changed

  • File converter now prioritizes LibreOffice for office document conversions
  • Improved error messages and conversion feedback
  • Better handling of conversion failures with automatic fallback methods

v1.8.0

31 Jul 03:40

Choose a tag to compare

Changelog

All notable changes to this project will be documented in this file.

[1.8.0] - 2025-01-31

Added

  • Video Input Support: Process and extract information from video files
    • Automatic audio extraction from video formats
    • Frame extraction for visual content analysis
    • Support for common video formats (MP4, AVI, MOV, etc.)
  • Audio Input Support: Direct processing of audio files
    • Transcription using faster-whisper for speech-to-text
    • Support for various audio formats (MP3, WAV, M4A, etc.)
  • Multimedia Processing Pipeline: New multimedia_processor.py module
    • Unified interface for handling video and audio inputs
    • Automatic format detection and conversion
    • Integration with existing document processing pipeline

Changed

  • Enhanced file converter to support multimedia file types
  • Updated dependencies to include faster-whisper for audio transcription

[1.7.2] - 2025-01-26

Added

  • Configurable --timeout parameter for perf-test command
    • Allows custom request timeout in seconds
    • No timeout if parameter is not specified (replaces hardcoded 300 seconds)

Changed

  • Modified perf-test command to accept optional timeout parameter
  • Updated error messages to show actual timeout value instead of hardcoded 300 seconds

[1.7.1] - 2025-01-25

Added

  • Auto-restart functionality for Web, API, and MCP servers with --auto-restart flag
  • Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
  • Optional --max-retries parameter (unlimited retries if not specified)
  • Configurable --retry-delay parameter for restart attempts

Changed

  • --timeout parameter is now optional for both web and API (no timeout if not specified)
  • --pages parameter is now optional for web interface (process all pages if not specified)
  • Updated FastAPI from deprecated @app.on_event to modern lifespan context manager
  • API server now tracks request processing activity via /activity endpoint

Fixed

  • Fixed deprecation warning in FastAPI shutdown event handler
  • Improved process cleanup in auto-restart monitor
  • Better handling of zombie processes when restarting services

[1.7.0] - 2025-01-24

Changed

  • BREAKING CHANGE: Modified embedding synthesis method from element-wise addition to concatenation
    • This change doubles the embedding dimension by concatenating two model outputs instead of adding them
    • Results in better semantic representation but requires reindexing of existing documents

Technical Details

  • Changed np.add(emb_1, emb_2) to np.concatenate([emb_1, emb_2]) in get_embedding method
  • Updated batch processing in get_embeddings to use list comprehension with concatenation
  • Both embeddings are now L2-normalized after concatenation

Removed

  • Removed mteb_embedding.py test file

[1.6.2] - Previous Release

  • Previous release details...

v1.7.1

24 Jul 03:59

Choose a tag to compare

Changed

  • BREAKING CHANGE: Modified embedding synthesis method from element-wise addition to concatenation

    • This change doubles the embedding dimension by concatenating two model outputs instead of adding them
    • Results in better semantic representation but requires reindexing of existing documents
  • --timeout parameter is now optional for both web and API (no timeout if not specified)

  • --pages parameter is now optional for web interface (process all pages if not specified)

  • Updated FastAPI from deprecated @app.on_event to modern lifespan context manager

  • API server now tracks request processing activity via /activity endpoint

Added

  • Auto-restart functionality for Web, API, and MCP servers with --auto-restart flag
  • Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
  • Optional --max-retries parameter (unlimited retries if not specified)
  • Configurable --retry-delay parameter for restart attempts

Fixed

  • Fixed deprecation warning in FastAPI shutdown event handler
  • Improved process cleanup in auto-restart monitor
  • Better handling of zombie processes when restarting services

Technical Details

  • Changed np.add(emb_1, emb_2) to np.concatenate([emb_1, emb_2]) in get_embedding method
  • Updated batch processing in get_embeddings to use list comprehension with concatenation
  • Both embeddings are now L2-normalized after concatenation

Removed

  • Removed mteb_embedding.py test file

[1.6.2] - Previous Release

  • Previous release details...

v1.6.0

13 Jul 11:17

Choose a tag to compare

⏺ DocsRay v1.6.0 Release Notes

πŸš€ Major Features & Improvements

🎯 Enhanced Model Selection System

  • Flexible Model Types: Choose between lite (4b), base (12b), and pro
    (27b) models using the new --model-type option
  • Selective Downloads: Download only the model type you need,
    significantly reducing storage requirements
  • Runtime Model Selection: Models are selected based on environment
    variables, allowing dynamic switching

πŸ”„ Redesigned API Architecture

  • Per-Request Document Processing: API now accepts document paths with
    each request instead of pre-loading documents
  • Automatic Document Caching: Documents are automatically processed and
    cached on first access
  • New API Endpoints:
    • /cache/info - View cached document information
    • /cache/clear - Clear document cache
  • Enhanced Flexibility: No need to restart the API server when switching
    between documents

⚑ Performance Testing & Monitoring

  • New perf-test Command: Comprehensive API performance benchmarking tool
  • Detailed Metrics: Track response times, success rates, and cache
    performance
  • Iterative Testing: Run multiple test iterations with statistical
    analysis
  • Cache Performance Analysis: Separate timing for first-time vs cached
    requests

πŸ› οΈ Improved CLI Interface

  • Unified Argument Structure: Consistent file path arguments across all
    commands
  • Simplified Syntax:
    • Old: docsray ask "question" --doc file.pdf
    • New: docsray ask file.pdf "question"
  • Force Download Option: --force flag to re-download existing models

πŸ“‹ Usage Examples

Model Selection

Download lite models (4b, ~3GB)

docsray download-models --model-type lite

Use base models for web interface (12b, ~8GB)

docsray web --model-type base

Process documents with pro models (27b, ~16GB)

docsray process document.pdf --model-type pro

New API Usage

Start API server (no pre-loading required)

docsray api --port 8000

Ask questions about any document

curl -X POST http://localhost:8000/ask
-H "Content-Type: application/json"
-d '{
"document_path": "/path/to/document.pdf",
"question": "What is the main topic?",
"use_coarse_search": true
}'

Performance Testing

Basic performance test

docsray perf-test document.pdf "What is this about?"

Advanced testing with multiple iterations

docsray perf-test document.pdf "Analyze the key points"
--iterations 5 --port 8000 --host localhost

Force Model Re-download

Re-download all models for base type

docsray download-models --model-type base --force

πŸ”§ Technical Improvements

Resource Management

  • Embedding Models: Always downloaded regardless of model type
  • LLM Models: Downloaded selectively based on chosen model type
  • Memory Optimization: Better resource allocation based on system
    capabilities

Code Quality & Maintainability

  • Consistent Parameter Naming: All file path parameters unified as
    file_path
  • Environment Variable Integration: Seamless integration with
    DOCSRAY_MODEL_TYPE
  • Improved Error Handling: Better error messages and recovery mechanisms

πŸ“¦ Installation & Upgrade

New installation

pip install docsray==1.6.0

Upgrade from previous version

pip install --upgrade docsray

Download models for your preferred type

docsray download-models --model-type lite # or base/pro

πŸ”„ Migration Guide

For Existing Users

  1. CLI Commands: Update any scripts using the old --doc syntax
  2. API Integration: Update API calls to include document_path in request
    body
  3. Model Downloads: Re-run docsray download-models with your preferred
    --model-type

Breaking Changes

  • CLI Syntax: ask command now takes file path as positional argument
  • API Endpoints: /info endpoint moved to /cache/info
  • API Request Format: Document path now required in each request

πŸ› Bug Fixes

  • Fixed model type selection not being applied during runtime
  • Improved error handling for missing model files
  • Enhanced file path resolution across different operating systems
  • Better cleanup of temporary files during document processing

πŸ“ˆ Performance Improvements

  • Reduced memory usage during model loading
  • Faster document processing with improved caching
  • Optimized embedding model selection based on system resources
  • Enhanced response times for cached documents

Full Changelog:
https://github.com/your-repo/DocsRay/compare/v1.5.4...v1.6.0

PyPI Package: https://pypi.org/project/docsray/1.6.0/

For support and questions, please visit our
https://github.com/your-repo/DocsRay/issues.

v1.5.4

03 Jun 07:58

Choose a tag to compare

DocsRay v1.5.4 Release Notes

Release Date: June 2025
Version: 1.5.4
Package: docsray

🎯 What's New

Universal Document Support

DocsRay now supports 30+ file formats with automatic conversion to PDF for seamless processing:

πŸ“„ Newly Supported Formats

  • Office Documents: Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt)
  • OpenDocument: .odt, .ods, .odp
  • Text Formats: Plain text (.txt), Markdown (.md), Rich Text (.rtf), reStructuredText (.rst)
  • Web Formats: HTML (.html, .htm), XML (.xml)
  • Image Formats: JPEG, PNG, GIF, BMP, TIFF, WebP
  • E-book Formats: EPUB (.epub), MOBI (.mobi)

πŸ”„ Smart Auto-Conversion

  • Automatically detects file format and converts to PDF
  • Preserves original file metadata and structure
  • Supports both system tools (LibreOffice, Pandoc) and Python libraries
  • Graceful fallback options for maximum compatibility

Enhanced Hybrid OCR System

DocsRay now features a dual OCR approach for maximum text extraction accuracy:

πŸ€– AI-Powered OCR

  • Primary: Gemma-3-4B multimodal model for intelligent text recognition
  • Advanced understanding of document layout and structure
  • Better handling of complex formatting and multilingual content

⚑ Traditional OCR Fallback

  • Secondary: Pytesseract integration for speed and reliability
  • Automatic selection based on system configuration
  • Korean language support with tesseract-ocr-kor

Adaptive Performance Optimization

Intelligent system resource detection with automatic mode selection:

System Memory Processing Mode OCR Support Visual Analysis Max Tokens
CPU Only FAST (Q4) ❌ ❌ 8K
< 16GB FAST (Q4) βœ… βœ… 8K
16-24GB STANDARD (Q8) βœ… βœ… 16K

24GB | FULL_FEATURE (F16) | βœ… | βœ… | 32K

Enhanced MCP (Model Context Protocol) Integration

πŸ” Dual Search Capabilities

  1. File System Search (search_files)

    • Recursive directory scanning with intelligent filtering
    • Advanced file type, size, and date filters
    • Smart exclusion of system directories
    • Progress tracking and cancellation support
  2. Content-Based Search (search_by_content)

    • Semantic search using document summary embeddings
    • GPU-accelerated similarity computation
    • Query enhancement with LLM assistance
    • Multiple detail levels (brief/standard/detailed)

πŸ“Š Intelligent Batch Processing

  • Unified Summary Generation: Process entire directories with summary embeddings
  • Smart File Detection: Automatically skips sensitive files (passwords, keys, etc.)
  • Performance Scaling: Optimizes processing based on available system resources
  • Cache Management: Efficient storage and retrieval of processed documents

🎯 Enhanced Directory Management

  • Recommended Search Paths: OS-specific suggestions for common document locations
  • Path Analysis: Complexity estimation and search time prediction
  • Interactive Setup: First-run directory configuration with automatic detection

Visual Content Analysis Improvements

πŸ‘οΈ Advanced Visual Processing

  • Multi-Image Analysis: Process multiple images per page in reading order
  • Vector Graphics Support: Analysis of charts, diagrams, and complex layouts
  • Smart Image Merging: Combine multiple visual elements for comprehensive analysis
  • Configurable Analysis Intervals: Process visuals every N pages for performance optimization

πŸ”§ Global Visual Analysis Control

  • Environment Variable Support: DOCSRAY_DISABLE_VISUALS=1
  • Per-Document Settings: Override global settings for specific files
  • MCP Integration: Toggle visual analysis through Claude Desktop
  • Performance Awareness: Automatic disabling in resource-constrained environments

πŸ› οΈ Technical Improvements

File Conversion Architecture

from docsray.scripts.file_converter import FileConverter

converter = FileConverter()

Check format support

if converter.is_supported("document.docx"):
# Convert any supported format
success, pdf_path = converter.convert_to_pdf("document.docx")

Enhanced API Endpoints

  • Universal Document Loading: All formats supported in web UI and API
  • Conversion Status Tracking: Real-time progress for file conversion
  • Format Detection: Automatic identification of file types
  • Metadata Preservation: Original format information retained

Improved Caching System

  • Hierarchical Caching: Section-level and document-level cache management
  • Summary Embeddings: Persistent storage of semantic representations
  • Cache Analytics: Detailed statistics and cleanup tools
  • Cross-Session Persistence: Maintain processed documents between sessions

πŸ“ˆ Performance Enhancements

GPU Acceleration

  • Optimized Vector Search: Batch processing with PyTorch GPU acceleration
  • Smart Device Selection: Automatic CUDA/MPS/CPU detection
  • Memory Management: Dynamic allocation based on available resources
  • Fallback Mechanisms: Graceful degradation for resource constraints

Search Algorithm Improvements

  • Query Enhancement: LLM-powered query expansion for better results
  • Hybrid Scoring: Combination of title and content similarity
  • Partial Sorting: O(N log k) complexity for top-k retrieval
  • Vectorized Operations: Batch similarity computation

πŸ”§ Configuration & Deployment

Environment Variables

# Custom data directory
export DOCSRAY_HOME=/path/to/custom/directory

Force specific performance mode

export DOCSRAY_FAST_MODE=1

Disable visual analysis globally

export DOCSRAY_DISABLE_VISUALS=1

Enable debug mode

export DOCSRAY_DEBUG=1

New CLI Commands

# Process any supported document format
docsray process document.docx --analyze-visuals

Enhanced model download with verification

docsray download-models --check

Improved Claude Desktop configuration

docsray configure-claude

πŸ› Bug Fixes & Stability

Core Fixes

  • Memory Management: Improved handling of large documents and batch processing
  • Error Handling: More robust error recovery and user feedback
  • File System: Better handling of permissions and cross-platform paths
  • MCP Server: Enhanced stability and connection management

Compatibility Improvements

  • Python 3.8+ Support: Broader compatibility across Python versions
  • Cross-Platform: Enhanced Windows, macOS, and Linux support
  • Library Dependencies: More flexible dependency management
  • Model Loading: Improved error handling for missing or corrupted models

πŸ“š Documentation & Examples

New Usage Patterns

# Universal document processing
from docsray import PDFChatBot
from docsray.scripts import pdf_extractor, chunker, build_index, section_rep_builder

Process any document type - auto-conversion handled internally

extracted = pdf_extractor.extract_content(
"report.xlsx", # Excel spreadsheet
analyze_visuals=True,
visual_analysis_interval=1
)

Continue with normal workflow

chunks = chunker.process_extracted_file(extracted)
chunk_index = build_index.build_chunk_index(chunks)
sections = section_rep_builder.build_section_reps(extracted["sections"], chunk_index)

Ask questions about the spreadsheet content

chatbot = PDFChatBot(sections, chunk_index)
answer, references = chatbot.answer("What are the quarterly sales figures?")

MCP Workflow Examples

# Batch process directory with summaries
"Process all documents in /path/to/folder with brief summaries"

Content-based document discovery

"Search for documents about quarterly reports"
"Load the best matching document"

Visual content analysis

"What charts and graphs are in this presentation?"
"Describe the diagram on slide 5"

πŸ”„ Migration Guide

From v1.2.x to v1.3.0

New Dependencies (Optional)

# For optimal file conversion support
sudo apt-get install libreoffice pandoc wkhtm...
Read more

v1.2.1 Hotfix and Stable update

01 Jun 07:48

Choose a tag to compare

Hotfix applied for Gemma3 multimodal functionality.

  • document search for mcp server
  • fast mode test for visual analysis

πŸ“‹ Features

  • Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
  • Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
  • Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
  • Adaptive Performance: Automatically optimizes based on available system resources
  • Multi-Model Support: Uses BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B models
  • MCP Integration: Seamless integration with Claude Desktop
  • Multiple Interfaces: Web UI, API server, CLI, and MCP server
  • Directory Management: Advanced PDF directory handling and caching
  • Multi-Language: Supports multiple languages including Korean and English
  • Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
  • Universal Document Support: Automatically converts 30+ file formats to PDF for processing
  • Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more

🎯 What's New in v1.0.x

Universal Document Support

DocsRay now automatically converts various document formats to PDF for processing:

Supported File Formats

Office Documents

  • Microsoft Word (.docx, .doc)
  • Microsoft Excel (.xlsx, .xls)
  • Microsoft PowerPoint (.pptx, .ppt)
  • OpenDocument formats (.odt, .ods, .odp)

Text Formats

  • Plain Text (.txt)
  • Markdown (.md)
  • Rich Text Format (.rtf)
  • reStructuredText (.rst)

Web Formats

  • HTML (.html, .htm)
  • XML (.xml)

Image Formats

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • GIF (.gif)
  • BMP (.bmp)
  • TIFF (.tiff, .tif)
  • WebP (.webp)

E-book Formats

  • EPUB (.epub)
  • MOBI (.mobi)

Automatic Conversion

Simply load any supported file type, and DocsRay will:

  1. Automatically detect the file format
  2. Convert it to PDF in the background
  3. Process it with all the same features as native PDFs
  4. Clean up temporary files automatically
# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.png

Hybrid OCR System

DocsRay now features an intelligent hybrid OCR system that automatically selects the optimal OCR method based on your system resources:

  • FULL_FEATURE_MODE (RAM > 32GB): AI-powered OCR using Gemma-3-4B model

    • Accurately recognizes complex layouts and multilingual text
    • Understands context when extracting text from tables, charts, and diagrams
    • Significantly improves text quality from scanned PDFs
  • Standard Mode (RAM 8-32GB): Traditional Pytesseract-based OCR

    • Stable and fast text extraction
    • Multi-language support (including Korean)
  • FAST_MODE (RAM < 8GB): OCR disabled

    • Memory efficiency prioritized
    • Processes only PDFs with embedded text

Adaptive Performance Optimization

Automatically detects system resources and optimizes performance:

# Automatic resource detection and mode configuration
if available_ram > 32GB:
    FULL_FEATURE_MODE = True  # All features enabled
elif available_ram < 4GB:
    FAST_MODE = True  # Lightweight mode
else:
    # Standard mode (balanced performance)

Enhanced MCP Commands

  • Cache Management: clear_all_cache, get_cache_info
  • Improved Summarization: Batch processing with section-by-section caching
  • Detail Levels: Adjustable summary detail (brief/standard/detailed)

v1.0.0

01 Jun 04:16

Choose a tag to compare

πŸ“‹ Features

  • Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
  • Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
  • Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
  • Adaptive Performance: Automatically optimizes based on available system resources
  • Multi-Model Support: Uses BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B models
  • MCP Integration: Seamless integration with Claude Desktop
  • Multiple Interfaces: Web UI, API server, CLI, and MCP server
  • Directory Management: Advanced PDF directory handling and caching
  • Multi-Language: Supports multiple languages including Korean and English
  • Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
  • Universal Document Support: Automatically converts 30+ file formats to PDF for processing
  • Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more

🎯 What's New in v1.0.x

Universal Document Support

DocsRay now automatically converts various document formats to PDF for processing:

Supported File Formats

Office Documents

  • Microsoft Word (.docx, .doc)
  • Microsoft Excel (.xlsx, .xls)
  • Microsoft PowerPoint (.pptx, .ppt)
  • OpenDocument formats (.odt, .ods, .odp)

Text Formats

  • Plain Text (.txt)
  • Markdown (.md)
  • Rich Text Format (.rtf)
  • reStructuredText (.rst)

Web Formats

  • HTML (.html, .htm)
  • XML (.xml)

Image Formats

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • GIF (.gif)
  • BMP (.bmp)
  • TIFF (.tiff, .tif)
  • WebP (.webp)

E-book Formats

  • EPUB (.epub)
  • MOBI (.mobi)

Automatic Conversion

Simply load any supported file type, and DocsRay will:

  1. Automatically detect the file format
  2. Convert it to PDF in the background
  3. Process it with all the same features as native PDFs
  4. Clean up temporary files automatically
# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.png

Hybrid OCR System

DocsRay now features an intelligent hybrid OCR system that automatically selects the optimal OCR method based on your system resources:

  • FULL_FEATURE_MODE (RAM > 32GB): AI-powered OCR using Gemma-3-4B model

    • Accurately recognizes complex layouts and multilingual text
    • Understands context when extracting text from tables, charts, and diagrams
    • Significantly improves text quality from scanned PDFs
  • Standard Mode (RAM 8-32GB): Traditional Pytesseract-based OCR

    • Stable and fast text extraction
    • Multi-language support (including Korean)
  • FAST_MODE (RAM < 8GB): OCR disabled

    • Memory efficiency prioritized
    • Processes only PDFs with embedded text

Adaptive Performance Optimization

Automatically detects system resources and optimizes performance:

# Automatic resource detection and mode configuration
if available_ram > 32GB:
    FULL_FEATURE_MODE = True  # All features enabled
elif available_ram < 4GB:
    FAST_MODE = True  # Lightweight mode
else:
    # Standard mode (balanced performance)

Enhanced MCP Commands

  • Cache Management: clear_all_cache, get_cache_info
  • Improved Summarization: Batch processing with section-by-section caching
  • Detail Levels: Adjustable summary detail (brief/standard/detailed)

πŸ“Š Performance Optimization Guide

Recommended Settings by Memory

System Memory Mode OCR Visual Analysis Max Tokens
< 4GB FAST_MODE ❌ ❌ 4,096
4-8GB FAST_MODE βœ… (Pytesseract) Limited 8K
8-16GB Standard βœ… (Pytesseract) βœ… 16K
16-32GB Standard βœ… (Pytesseract) βœ… 32K
> 32GB FULL_FEATURE βœ… (AI OCR) βœ… 128K

v0.3.0

31 May 03:28

Choose a tag to compare

Full Changelog: v0.2.0...v0.3.0

  • New feature update: document summerization

v0.2.0

30 May 08:13
73eead3

Choose a tag to compare