Releases: MIMICLab/DocsRay
v1.9.0
[1.9.0] - 2025-02-01
Added
- LibreOffice Integration: Enhanced document conversion capabilities
- Automatic detection and use of LibreOffice when available
- Improved conversion quality for Office documents (DOCX, XLSX, PPTX)
- Better support for OpenDocument formats (ODT, ODS, ODP)
- Enhanced HWP/HWPX document handling with LibreOffice
- Fallback mechanisms when LibreOffice is not available
Changed
- File converter now prioritizes LibreOffice for office document conversions
- Improved error messages and conversion feedback
- Better handling of conversion failures with automatic fallback methods
v1.8.0
Changelog
All notable changes to this project will be documented in this file.
[1.8.0] - 2025-01-31
Added
- Video Input Support: Process and extract information from video files
- Automatic audio extraction from video formats
- Frame extraction for visual content analysis
- Support for common video formats (MP4, AVI, MOV, etc.)
- Audio Input Support: Direct processing of audio files
- Transcription using faster-whisper for speech-to-text
- Support for various audio formats (MP3, WAV, M4A, etc.)
- Multimedia Processing Pipeline: New
multimedia_processor.pymodule- Unified interface for handling video and audio inputs
- Automatic format detection and conversion
- Integration with existing document processing pipeline
Changed
- Enhanced file converter to support multimedia file types
- Updated dependencies to include faster-whisper for audio transcription
[1.7.2] - 2025-01-26
Added
- Configurable
--timeoutparameter forperf-testcommand- Allows custom request timeout in seconds
- No timeout if parameter is not specified (replaces hardcoded 300 seconds)
Changed
- Modified
perf-testcommand to accept optional timeout parameter - Updated error messages to show actual timeout value instead of hardcoded 300 seconds
[1.7.1] - 2025-01-25
Added
- Auto-restart functionality for Web, API, and MCP servers with
--auto-restartflag - Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
- Optional
--max-retriesparameter (unlimited retries if not specified) - Configurable
--retry-delayparameter for restart attempts
Changed
--timeoutparameter is now optional for both web and API (no timeout if not specified)--pagesparameter is now optional for web interface (process all pages if not specified)- Updated FastAPI from deprecated
@app.on_eventto modern lifespan context manager - API server now tracks request processing activity via
/activityendpoint
Fixed
- Fixed deprecation warning in FastAPI shutdown event handler
- Improved process cleanup in auto-restart monitor
- Better handling of zombie processes when restarting services
[1.7.0] - 2025-01-24
Changed
- BREAKING CHANGE: Modified embedding synthesis method from element-wise addition to concatenation
- This change doubles the embedding dimension by concatenating two model outputs instead of adding them
- Results in better semantic representation but requires reindexing of existing documents
Technical Details
- Changed
np.add(emb_1, emb_2)tonp.concatenate([emb_1, emb_2])inget_embeddingmethod - Updated batch processing in
get_embeddingsto use list comprehension with concatenation - Both embeddings are now L2-normalized after concatenation
Removed
- Removed
mteb_embedding.pytest file
[1.6.2] - Previous Release
- Previous release details...
v1.7.1
Changed
-
BREAKING CHANGE: Modified embedding synthesis method from element-wise addition to concatenation
- This change doubles the embedding dimension by concatenating two model outputs instead of adding them
- Results in better semantic representation but requires reindexing of existing documents
-
--timeoutparameter is now optional for both web and API (no timeout if not specified) -
--pagesparameter is now optional for web interface (process all pages if not specified) -
Updated FastAPI from deprecated
@app.on_eventto modern lifespan context manager -
API server now tracks request processing activity via
/activityendpoint
Added
- Auto-restart functionality for Web, API, and MCP servers with
--auto-restartflag - Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
- Optional
--max-retriesparameter (unlimited retries if not specified) - Configurable
--retry-delayparameter for restart attempts
Fixed
- Fixed deprecation warning in FastAPI shutdown event handler
- Improved process cleanup in auto-restart monitor
- Better handling of zombie processes when restarting services
Technical Details
- Changed
np.add(emb_1, emb_2)tonp.concatenate([emb_1, emb_2])inget_embeddingmethod - Updated batch processing in
get_embeddingsto use list comprehension with concatenation - Both embeddings are now L2-normalized after concatenation
Removed
- Removed
mteb_embedding.pytest file
[1.6.2] - Previous Release
- Previous release details...
v1.6.0
βΊ DocsRay v1.6.0 Release Notes
π Major Features & Improvements
π― Enhanced Model Selection System
- Flexible Model Types: Choose between lite (4b), base (12b), and pro
(27b) models using the new --model-type option - Selective Downloads: Download only the model type you need,
significantly reducing storage requirements - Runtime Model Selection: Models are selected based on environment
variables, allowing dynamic switching
π Redesigned API Architecture
- Per-Request Document Processing: API now accepts document paths with
each request instead of pre-loading documents - Automatic Document Caching: Documents are automatically processed and
cached on first access - New API Endpoints:
- /cache/info - View cached document information
- /cache/clear - Clear document cache
- Enhanced Flexibility: No need to restart the API server when switching
between documents
β‘ Performance Testing & Monitoring
- New perf-test Command: Comprehensive API performance benchmarking tool
- Detailed Metrics: Track response times, success rates, and cache
performance - Iterative Testing: Run multiple test iterations with statistical
analysis - Cache Performance Analysis: Separate timing for first-time vs cached
requests
π οΈ Improved CLI Interface
- Unified Argument Structure: Consistent file path arguments across all
commands - Simplified Syntax:
- Old: docsray ask "question" --doc file.pdf
- New: docsray ask file.pdf "question"
- Force Download Option: --force flag to re-download existing models
π Usage Examples
Model Selection
Download lite models (4b, ~3GB)
docsray download-models --model-type lite
Use base models for web interface (12b, ~8GB)
docsray web --model-type base
Process documents with pro models (27b, ~16GB)
docsray process document.pdf --model-type pro
New API Usage
Start API server (no pre-loading required)
docsray api --port 8000
Ask questions about any document
curl -X POST http://localhost:8000/ask
-H "Content-Type: application/json"
-d '{
"document_path": "/path/to/document.pdf",
"question": "What is the main topic?",
"use_coarse_search": true
}'
Performance Testing
Basic performance test
docsray perf-test document.pdf "What is this about?"
Advanced testing with multiple iterations
docsray perf-test document.pdf "Analyze the key points"
--iterations 5 --port 8000 --host localhost
Force Model Re-download
Re-download all models for base type
docsray download-models --model-type base --force
π§ Technical Improvements
Resource Management
- Embedding Models: Always downloaded regardless of model type
- LLM Models: Downloaded selectively based on chosen model type
- Memory Optimization: Better resource allocation based on system
capabilities
Code Quality & Maintainability
- Consistent Parameter Naming: All file path parameters unified as
file_path - Environment Variable Integration: Seamless integration with
DOCSRAY_MODEL_TYPE - Improved Error Handling: Better error messages and recovery mechanisms
π¦ Installation & Upgrade
New installation
pip install docsray==1.6.0
Upgrade from previous version
pip install --upgrade docsray
Download models for your preferred type
docsray download-models --model-type lite # or base/pro
π Migration Guide
For Existing Users
- CLI Commands: Update any scripts using the old --doc syntax
- API Integration: Update API calls to include document_path in request
body - Model Downloads: Re-run docsray download-models with your preferred
--model-type
Breaking Changes
- CLI Syntax: ask command now takes file path as positional argument
- API Endpoints: /info endpoint moved to /cache/info
- API Request Format: Document path now required in each request
π Bug Fixes
- Fixed model type selection not being applied during runtime
- Improved error handling for missing model files
- Enhanced file path resolution across different operating systems
- Better cleanup of temporary files during document processing
π Performance Improvements
- Reduced memory usage during model loading
- Faster document processing with improved caching
- Optimized embedding model selection based on system resources
- Enhanced response times for cached documents
Full Changelog:
https://github.com/your-repo/DocsRay/compare/v1.5.4...v1.6.0
PyPI Package: https://pypi.org/project/docsray/1.6.0/
For support and questions, please visit our
https://github.com/your-repo/DocsRay/issues.
v1.5.4
DocsRay v1.5.4 Release Notes
Release Date: June 2025
Version: 1.5.4
Package: docsray
π― What's New
Universal Document Support
DocsRay now supports 30+ file formats with automatic conversion to PDF for seamless processing:
π Newly Supported Formats
- Office Documents: Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt)
- OpenDocument: .odt, .ods, .odp
- Text Formats: Plain text (.txt), Markdown (.md), Rich Text (.rtf), reStructuredText (.rst)
- Web Formats: HTML (.html, .htm), XML (.xml)
- Image Formats: JPEG, PNG, GIF, BMP, TIFF, WebP
- E-book Formats: EPUB (.epub), MOBI (.mobi)
π Smart Auto-Conversion
- Automatically detects file format and converts to PDF
- Preserves original file metadata and structure
- Supports both system tools (LibreOffice, Pandoc) and Python libraries
- Graceful fallback options for maximum compatibility
Enhanced Hybrid OCR System
DocsRay now features a dual OCR approach for maximum text extraction accuracy:
π€ AI-Powered OCR
- Primary: Gemma-3-4B multimodal model for intelligent text recognition
- Advanced understanding of document layout and structure
- Better handling of complex formatting and multilingual content
β‘ Traditional OCR Fallback
- Secondary: Pytesseract integration for speed and reliability
- Automatic selection based on system configuration
- Korean language support with
tesseract-ocr-kor
Adaptive Performance Optimization
Intelligent system resource detection with automatic mode selection:
| System Memory | Processing Mode | OCR Support | Visual Analysis | Max Tokens |
|---|---|---|---|---|
| CPU Only | FAST (Q4) | β | β | 8K |
| < 16GB | FAST (Q4) | β | β | 8K |
| 16-24GB | STANDARD (Q8) | β | β | 16K |
24GB | FULL_FEATURE (F16) | β | β | 32K
Enhanced MCP (Model Context Protocol) Integration
π Dual Search Capabilities
-
File System Search (
search_files)- Recursive directory scanning with intelligent filtering
- Advanced file type, size, and date filters
- Smart exclusion of system directories
- Progress tracking and cancellation support
-
Content-Based Search (
search_by_content)- Semantic search using document summary embeddings
- GPU-accelerated similarity computation
- Query enhancement with LLM assistance
- Multiple detail levels (brief/standard/detailed)
π Intelligent Batch Processing
- Unified Summary Generation: Process entire directories with summary embeddings
- Smart File Detection: Automatically skips sensitive files (passwords, keys, etc.)
- Performance Scaling: Optimizes processing based on available system resources
- Cache Management: Efficient storage and retrieval of processed documents
π― Enhanced Directory Management
- Recommended Search Paths: OS-specific suggestions for common document locations
- Path Analysis: Complexity estimation and search time prediction
- Interactive Setup: First-run directory configuration with automatic detection
Visual Content Analysis Improvements
ποΈ Advanced Visual Processing
- Multi-Image Analysis: Process multiple images per page in reading order
- Vector Graphics Support: Analysis of charts, diagrams, and complex layouts
- Smart Image Merging: Combine multiple visual elements for comprehensive analysis
- Configurable Analysis Intervals: Process visuals every N pages for performance optimization
π§ Global Visual Analysis Control
- Environment Variable Support:
DOCSRAY_DISABLE_VISUALS=1 - Per-Document Settings: Override global settings for specific files
- MCP Integration: Toggle visual analysis through Claude Desktop
- Performance Awareness: Automatic disabling in resource-constrained environments
π οΈ Technical Improvements
File Conversion Architecture
from docsray.scripts.file_converter import FileConverterconverter = FileConverter()
Check format support
if converter.is_supported("document.docx"):
# Convert any supported format
success, pdf_path = converter.convert_to_pdf("document.docx")
Enhanced API Endpoints
- Universal Document Loading: All formats supported in web UI and API
- Conversion Status Tracking: Real-time progress for file conversion
- Format Detection: Automatic identification of file types
- Metadata Preservation: Original format information retained
Improved Caching System
- Hierarchical Caching: Section-level and document-level cache management
- Summary Embeddings: Persistent storage of semantic representations
- Cache Analytics: Detailed statistics and cleanup tools
- Cross-Session Persistence: Maintain processed documents between sessions
π Performance Enhancements
GPU Acceleration
- Optimized Vector Search: Batch processing with PyTorch GPU acceleration
- Smart Device Selection: Automatic CUDA/MPS/CPU detection
- Memory Management: Dynamic allocation based on available resources
- Fallback Mechanisms: Graceful degradation for resource constraints
Search Algorithm Improvements
- Query Enhancement: LLM-powered query expansion for better results
- Hybrid Scoring: Combination of title and content similarity
- Partial Sorting: O(N log k) complexity for top-k retrieval
- Vectorized Operations: Batch similarity computation
π§ Configuration & Deployment
Environment Variables
# Custom data directory export DOCSRAY_HOME=/path/to/custom/directoryForce specific performance mode
export DOCSRAY_FAST_MODE=1
Disable visual analysis globally
export DOCSRAY_DISABLE_VISUALS=1
Enable debug mode
export DOCSRAY_DEBUG=1
New CLI Commands
# Process any supported document format docsray process document.docx --analyze-visualsEnhanced model download with verification
docsray download-models --check
Improved Claude Desktop configuration
docsray configure-claude
π Bug Fixes & Stability
Core Fixes
- Memory Management: Improved handling of large documents and batch processing
- Error Handling: More robust error recovery and user feedback
- File System: Better handling of permissions and cross-platform paths
- MCP Server: Enhanced stability and connection management
Compatibility Improvements
- Python 3.8+ Support: Broader compatibility across Python versions
- Cross-Platform: Enhanced Windows, macOS, and Linux support
- Library Dependencies: More flexible dependency management
- Model Loading: Improved error handling for missing or corrupted models
π Documentation & Examples
New Usage Patterns
# Universal document processing from docsray import PDFChatBot from docsray.scripts import pdf_extractor, chunker, build_index, section_rep_builderProcess any document type - auto-conversion handled internally
extracted = pdf_extractor.extract_content(
"report.xlsx", # Excel spreadsheet
analyze_visuals=True,
visual_analysis_interval=1
)Continue with normal workflow
chunks = chunker.process_extracted_file(extracted)
chunk_index = build_index.build_chunk_index(chunks)
sections = section_rep_builder.build_section_reps(extracted["sections"], chunk_index)Ask questions about the spreadsheet content
chatbot = PDFChatBot(sections, chunk_index)
answer, references = chatbot.answer("What are the quarterly sales figures?")
MCP Workflow Examples
# Batch process directory with summaries "Process all documents in /path/to/folder with brief summaries"Content-based document discovery
"Search for documents about quarterly reports"
"Load the best matching document"Visual content analysis
"What charts and graphs are in this presentation?"
"Describe the diagram on slide 5"
π Migration Guide
From v1.2.x to v1.3.0
New Dependencies (Optional)
# For optimal file conversion support
sudo apt-get install libreoffice pandoc wkhtm...v1.2.1 Hotfix and Stable update
Hotfix applied for Gemma3 multimodal functionality.
- document search for mcp server
- fast mode test for visual analysis
π Features
- Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
- Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
- Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
- Adaptive Performance: Automatically optimizes based on available system resources
- Multi-Model Support: Uses BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B models
- MCP Integration: Seamless integration with Claude Desktop
- Multiple Interfaces: Web UI, API server, CLI, and MCP server
- Directory Management: Advanced PDF directory handling and caching
- Multi-Language: Supports multiple languages including Korean and English
- Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
- Universal Document Support: Automatically converts 30+ file formats to PDF for processing
- Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more
π― What's New in v1.0.x
Universal Document Support
DocsRay now automatically converts various document formats to PDF for processing:
Supported File Formats
Office Documents
- Microsoft Word (.docx, .doc)
- Microsoft Excel (.xlsx, .xls)
- Microsoft PowerPoint (.pptx, .ppt)
- OpenDocument formats (.odt, .ods, .odp)
Text Formats
- Plain Text (.txt)
- Markdown (.md)
- Rich Text Format (.rtf)
- reStructuredText (.rst)
Web Formats
- HTML (.html, .htm)
- XML (.xml)
Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- BMP (.bmp)
- TIFF (.tiff, .tif)
- WebP (.webp)
E-book Formats
- EPUB (.epub)
- MOBI (.mobi)
Automatic Conversion
Simply load any supported file type, and DocsRay will:
- Automatically detect the file format
- Convert it to PDF in the background
- Process it with all the same features as native PDFs
- Clean up temporary files automatically
# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.pngHybrid OCR System
DocsRay now features an intelligent hybrid OCR system that automatically selects the optimal OCR method based on your system resources:
-
FULL_FEATURE_MODE (RAM > 32GB): AI-powered OCR using Gemma-3-4B model
- Accurately recognizes complex layouts and multilingual text
- Understands context when extracting text from tables, charts, and diagrams
- Significantly improves text quality from scanned PDFs
-
Standard Mode (RAM 8-32GB): Traditional Pytesseract-based OCR
- Stable and fast text extraction
- Multi-language support (including Korean)
-
FAST_MODE (RAM < 8GB): OCR disabled
- Memory efficiency prioritized
- Processes only PDFs with embedded text
Adaptive Performance Optimization
Automatically detects system resources and optimizes performance:
# Automatic resource detection and mode configuration
if available_ram > 32GB:
FULL_FEATURE_MODE = True # All features enabled
elif available_ram < 4GB:
FAST_MODE = True # Lightweight mode
else:
# Standard mode (balanced performance)Enhanced MCP Commands
- Cache Management:
clear_all_cache,get_cache_info - Improved Summarization: Batch processing with section-by-section caching
- Detail Levels: Adjustable summary detail (brief/standard/detailed)
v1.0.0
π Features
- Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
- Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
- Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
- Adaptive Performance: Automatically optimizes based on available system resources
- Multi-Model Support: Uses BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B models
- MCP Integration: Seamless integration with Claude Desktop
- Multiple Interfaces: Web UI, API server, CLI, and MCP server
- Directory Management: Advanced PDF directory handling and caching
- Multi-Language: Supports multiple languages including Korean and English
- Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
- Universal Document Support: Automatically converts 30+ file formats to PDF for processing
- Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more
π― What's New in v1.0.x
Universal Document Support
DocsRay now automatically converts various document formats to PDF for processing:
Supported File Formats
Office Documents
- Microsoft Word (.docx, .doc)
- Microsoft Excel (.xlsx, .xls)
- Microsoft PowerPoint (.pptx, .ppt)
- OpenDocument formats (.odt, .ods, .odp)
Text Formats
- Plain Text (.txt)
- Markdown (.md)
- Rich Text Format (.rtf)
- reStructuredText (.rst)
Web Formats
- HTML (.html, .htm)
- XML (.xml)
Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- BMP (.bmp)
- TIFF (.tiff, .tif)
- WebP (.webp)
E-book Formats
- EPUB (.epub)
- MOBI (.mobi)
Automatic Conversion
Simply load any supported file type, and DocsRay will:
- Automatically detect the file format
- Convert it to PDF in the background
- Process it with all the same features as native PDFs
- Clean up temporary files automatically
# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.pngHybrid OCR System
DocsRay now features an intelligent hybrid OCR system that automatically selects the optimal OCR method based on your system resources:
-
FULL_FEATURE_MODE (RAM > 32GB): AI-powered OCR using Gemma-3-4B model
- Accurately recognizes complex layouts and multilingual text
- Understands context when extracting text from tables, charts, and diagrams
- Significantly improves text quality from scanned PDFs
-
Standard Mode (RAM 8-32GB): Traditional Pytesseract-based OCR
- Stable and fast text extraction
- Multi-language support (including Korean)
-
FAST_MODE (RAM < 8GB): OCR disabled
- Memory efficiency prioritized
- Processes only PDFs with embedded text
Adaptive Performance Optimization
Automatically detects system resources and optimizes performance:
# Automatic resource detection and mode configuration
if available_ram > 32GB:
FULL_FEATURE_MODE = True # All features enabled
elif available_ram < 4GB:
FAST_MODE = True # Lightweight mode
else:
# Standard mode (balanced performance)Enhanced MCP Commands
- Cache Management:
clear_all_cache,get_cache_info - Improved Summarization: Batch processing with section-by-section caching
- Detail Levels: Adjustable summary detail (brief/standard/detailed)
π Performance Optimization Guide
Recommended Settings by Memory
| System Memory | Mode | OCR | Visual Analysis | Max Tokens |
|---|---|---|---|---|
| < 4GB | FAST_MODE | β | β | 4,096 |
| 4-8GB | FAST_MODE | β (Pytesseract) | Limited | 8K |
| 8-16GB | Standard | β (Pytesseract) | β | 16K |
| 16-32GB | Standard | β (Pytesseract) | β | 32K |
| > 32GB | FULL_FEATURE | β (AI OCR) | β | 128K |
v0.3.0
Full Changelog: v0.2.0...v0.3.0
- New feature update: document summerization
v0.2.0
Full Changelog: https://github.com/MIMICLab/DocsRay/commits/v0.2.0