Changelog

[1.9.0] - 2025-02-01

Added

LibreOffice Integration: Enhanced document conversion capabilities
- Automatic detection and use of LibreOffice when available
- Improved conversion quality for Office documents (DOCX, XLSX, PPTX)
- Better support for OpenDocument formats (ODT, ODS, ODP)
- Enhanced HWP/HWPX document handling with LibreOffice
- Fallback mechanisms when LibreOffice is not available

Changed

File converter now prioritizes LibreOffice for office document conversions
Improved error messages and conversion feedback
Better handling of conversion failures with automatic fallback methods

Changelog

All notable changes to this project will be documented in this file.

[1.8.0] - 2025-01-31

Added

Video Input Support: Process and extract information from video files
- Automatic audio extraction from video formats
- Frame extraction for visual content analysis
- Support for common video formats (MP4, AVI, MOV, etc.)
Audio Input Support: Direct processing of audio files
- Transcription using faster-whisper for speech-to-text
- Support for various audio formats (MP3, WAV, M4A, etc.)
Multimedia Processing Pipeline: New multimedia_processor.py module
- Unified interface for handling video and audio inputs
- Automatic format detection and conversion
- Integration with existing document processing pipeline

Changed

Enhanced file converter to support multimedia file types
Updated dependencies to include faster-whisper for audio transcription

[1.7.2] - 2025-01-26

Added

Configurable --timeout parameter for perf-test command
- Allows custom request timeout in seconds
- No timeout if parameter is not specified (replaces hardcoded 300 seconds)

Changed

Modified perf-test command to accept optional timeout parameter
Updated error messages to show actual timeout value instead of hardcoded 300 seconds

[1.7.1] - 2025-01-25

Added

Auto-restart functionality for Web, API, and MCP servers with --auto-restart flag
Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
Optional --max-retries parameter (unlimited retries if not specified)
Configurable --retry-delay parameter for restart attempts

Changed

--timeout parameter is now optional for both web and API (no timeout if not specified)
--pages parameter is now optional for web interface (process all pages if not specified)
Updated FastAPI from deprecated @app.on_event to modern lifespan context manager
API server now tracks request processing activity via /activity endpoint

Fixed

Fixed deprecation warning in FastAPI shutdown event handler
Improved process cleanup in auto-restart monitor
Better handling of zombie processes when restarting services

[1.7.0] - 2025-01-24

Changed

BREAKING CHANGE: Modified embedding synthesis method from element-wise addition to concatenation
- This change doubles the embedding dimension by concatenating two model outputs instead of adding them
- Results in better semantic representation but requires reindexing of existing documents

Technical Details

Changed np.add(emb_1, emb_2) to np.concatenate([emb_1, emb_2]) in get_embedding method
Updated batch processing in get_embeddings to use list comprehension with concatenation
Both embeddings are now L2-normalized after concatenation

Removed

Removed mteb_embedding.py test file

[1.6.2] - Previous Release

Previous release details...

Changed

BREAKING CHANGE: Modified embedding synthesis method from element-wise addition to concatenation
- This change doubles the embedding dimension by concatenating two model outputs instead of adding them
- Results in better semantic representation but requires reindexing of existing documents
--timeout parameter is now optional for both web and API (no timeout if not specified)
--pages parameter is now optional for web interface (process all pages if not specified)
Updated FastAPI from deprecated @app.on_event to modern lifespan context manager
API server now tracks request processing activity via /activity endpoint

Added

Auto-restart functionality for Web, API, and MCP servers with --auto-restart flag
Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
Optional --max-retries parameter (unlimited retries if not specified)
Configurable --retry-delay parameter for restart attempts

Fixed

Fixed deprecation warning in FastAPI shutdown event handler
Improved process cleanup in auto-restart monitor
Better handling of zombie processes when restarting services

Technical Details

Changed np.add(emb_1, emb_2) to np.concatenate([emb_1, emb_2]) in get_embedding method
Updated batch processing in get_embeddings to use list comprehension with concatenation
Both embeddings are now L2-normalized after concatenation

Removed

Removed mteb_embedding.py test file

[1.6.2] - Previous Release

Previous release details...

⏺ DocsRay v1.6.0 Release Notes

🚀 Major Features & Improvements

🎯 Enhanced Model Selection System

Flexible Model Types: Choose between lite (4b), base (12b), and pro
(27b) models using the new --model-type option
Selective Downloads: Download only the model type you need,
significantly reducing storage requirements
Runtime Model Selection: Models are selected based on environment
variables, allowing dynamic switching

🔄 Redesigned API Architecture

Per-Request Document Processing: API now accepts document paths with
each request instead of pre-loading documents
Automatic Document Caching: Documents are automatically processed and
cached on first access
New API Endpoints:
- /cache/info - View cached document information
- /cache/clear - Clear document cache
Enhanced Flexibility: No need to restart the API server when switching
between documents

⚡ Performance Testing & Monitoring

New perf-test Command: Comprehensive API performance benchmarking tool
Detailed Metrics: Track response times, success rates, and cache
performance
Iterative Testing: Run multiple test iterations with statistical
analysis
Cache Performance Analysis: Separate timing for first-time vs cached
requests

🛠️ Improved CLI Interface

Unified Argument Structure: Consistent file path arguments across all
commands
Simplified Syntax:
- Old: docsray ask "question" --doc file.pdf
- New: docsray ask file.pdf "question"
Force Download Option: --force flag to re-download existing models

📋 Usage Examples

Model Selection

Download lite models (4b, ~3GB)

docsray download-models --model-type lite

Use base models for web interface (12b, ~8GB)

docsray web --model-type base

Process documents with pro models (27b, ~16GB)

docsray process document.pdf --model-type pro

New API Usage

Start API server (no pre-loading required)

docsray api --port 8000

Ask questions about any document

curl -X POST http://localhost:8000/ask
-H "Content-Type: application/json"
-d '{
"document_path": "/path/to/document.pdf",
"question": "What is the main topic?",
"use_coarse_search": true
}'

Performance Testing

Basic performance test

docsray perf-test document.pdf "What is this about?"

Advanced testing with multiple iterations

docsray perf-test document.pdf "Analyze the key points"
--iterations 5 --port 8000 --host localhost

Force Model Re-download

Re-download all models for base type

docsray download-models --model-type base --force

🔧 Technical Improvements

Resource Management

Embedding Models: Always downloaded regardless of model type
LLM Models: Downloaded selectively based on chosen model type
Memory Optimization: Better resource allocation based on system
capabilities

Code Quality & Maintainability

Consistent Parameter Naming: All file path parameters unified as
file_path
Environment Variable Integration: Seamless integration with
DOCSRAY_MODEL_TYPE
Improved Error Handling: Better error messages and recovery mechanisms

📦 Installation & Upgrade

New installation

pip install docsray==1.6.0

Upgrade from previous version

pip install --upgrade docsray

Download models for your preferred type

docsray download-models --model-type lite # or base/pro

🔄 Migration Guide

For Existing Users

CLI Commands: Update any scripts using the old --doc syntax
API Integration: Update API calls to include document_path in request
body
Model Downloads: Re-run docsray download-models with your preferred
--model-type

Breaking Changes

CLI Syntax: ask command now takes file path as positional argument
API Endpoints: /info endpoint moved to /cache/info
API Request Format: Document path now required in each request

🐛 Bug Fixes

Fixed model type selection not being applied during runtime
Improved error handling for missing model files
Enhanced file path resolution across different operating systems
Better cleanup of temporary files during document processing

📈 Performance Improvements

Reduced memory usage during model loading
Faster document processing with improved caching
Optimized embedding model selection based on system resources
Enhanced response times for cached documents

Full Changelog:
https://github.com/your-repo/DocsRay/compare/v1.5.4...v1.6.0

PyPI Package: https://pypi.org/project/docsray/1.6.0/

For support and questions, please visit our
https://github.com/your-repo/DocsRay/issues.

DocsRay v1.5.4 Release Notes

Release Date: June 2025
Version: 1.5.4
Package: docsray

🎯 What's New

Universal Document Support

DocsRay now supports 30+ file formats with automatic conversion to PDF for seamless processing:

📄 Newly Supported Formats

Office Documents: Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt)
OpenDocument: .odt, .ods, .odp
Text Formats: Plain text (.txt), Markdown (.md), Rich Text (.rtf), reStructuredText (.rst)
Web Formats: HTML (.html, .htm), XML (.xml)
Image Formats: JPEG, PNG, GIF, BMP, TIFF, WebP
E-book Formats: EPUB (.epub), MOBI (.mobi)

🔄 Smart Auto-Conversion

Automatically detects file format and converts to PDF
Preserves original file metadata and structure
Supports both system tools (LibreOffice, Pandoc) and Python libraries
Graceful fallback options for maximum compatibility

Enhanced Hybrid OCR System

DocsRay now features a dual OCR approach for maximum text extraction accuracy:

🤖 AI-Powered OCR

Primary: Gemma-3-4B multimodal model for intelligent text recognition
Advanced understanding of document layout and structure
Better handling of complex formatting and multilingual content

⚡ Traditional OCR Fallback

Secondary: Pytesseract integration for speed and reliability
Automatic selection based on system configuration
Korean language support with tesseract-ocr-kor

Adaptive Performance Optimization

Intelligent system resource detection with automatic mode selection:

System Memory	Processing Mode	OCR Support	Visual Analysis	Max Tokens
CPU Only	FAST (Q4)	❌	❌	8K
< 16GB	FAST (Q4)	✅	✅	8K
16-24GB	STANDARD (Q8)	✅	✅	16K

24GB | FULL_FEATURE (F16) | ✅ | ✅ | 32K

Enhanced MCP (Model Context Protocol) Integration

🔍 Dual Search Capabilities

File System Search (search_files)
- Recursive directory scanning with intelligent filtering
- Advanced file type, size, and date filters
- Smart exclusion of system directories
- Progress tracking and cancellation support
Content-Based Search (search_by_content)
- Semantic search using document summary embeddings
- GPU-accelerated similarity computation
- Query enhancement with LLM assistance
- Multiple detail levels (brief/standard/detailed)

📊 Intelligent Batch Processing

Unified Summary Generation: Process entire directories with summary embeddings
Smart File Detection: Automatically skips sensitive files (passwords, keys, etc.)
Performance Scaling: Optimizes processing based on available system resources
Cache Management: Efficient storage and retrieval of processed documents

🎯 Enhanced Directory Management

Recommended Search Paths: OS-specific suggestions for common document locations
Path Analysis: Complexity estimation and search time prediction
Interactive Setup: First-run directory configuration with automatic detection

Visual Content Analysis Improvements

👁️ Advanced Visual Processing

Multi-Image Analysis: Process multiple images per page in reading order
Vector Graphics Support: Analysis of charts, diagrams, and complex layouts
Smart Image Merging: Combine multiple visual elements for comprehensive analysis
Configurable Analysis Intervals: Process visuals every N pages for performance optimization

🔧 Global Visual Analysis Control

Environment Variable Support: DOCSRAY_DISABLE_VISUALS=1
Per-Document Settings: Override global settings for specific files
MCP Integration: Toggle visual analysis through Claude Desktop
Performance Awareness: Automatic disabling in resource-constrained environments

🛠️ Technical Improvements

File Conversion Architecture

from docsray.scripts.file_converter import FileConverter
converter = FileConverter()
Check format support
if converter.is_supported("document.docx"):

# Convert any supported format

success, pdf_path = converter.convert_to_pdf("document.docx")

Enhanced API Endpoints

Universal Document Loading: All formats supported in web UI and API
Conversion Status Tracking: Real-time progress for file conversion
Format Detection: Automatic identification of file types
Metadata Preservation: Original format information retained

Improved Caching System

Hierarchical Caching: Section-level and document-level cache management
Summary Embeddings: Persistent storage of semantic representations
Cache Analytics: Detailed statistics and cleanup tools
Cross-Session Persistence: Maintain processed documents between sessions

📈 Performance Enhancements

GPU Acceleration

Optimized Vector Search: Batch processing with PyTorch GPU acceleration
Smart Device Selection: Automatic CUDA/MPS/CPU detection
Memory Management: Dynamic allocation based on available resources
Fallback Mechanisms: Graceful degradation for resource constraints

Search Algorithm Improvements

Query Enhancement: LLM-powered query expansion for better results
Hybrid Scoring: Combination of title and content similarity
Partial Sorting: O(N log k) complexity for top-k retrieval
Vectorized Operations: Batch similarity computation

🔧 Configuration & Deployment

Environment Variables

# Custom data directory export DOCSRAY_HOME=/path/to/custom/directory Force specific performance mode export DOCSRAY_FAST_MODE=1 Disable visual analysis globally export DOCSRAY_DISABLE_VISUALS=1 Enable debug mode

export DOCSRAY_DEBUG=1

New CLI Commands

# Process any supported document format docsray process document.docx --analyze-visuals Enhanced model download with verification docsray download-models --check Improved Claude Desktop configuration

docsray configure-claude

🐛 Bug Fixes & Stability

Core Fixes

Memory Management: Improved handling of large documents and batch processing
Error Handling: More robust error recovery and user feedback
File System: Better handling of permissions and cross-platform paths
MCP Server: Enhanced stability and connection management

Compatibility Improvements

Python 3.8+ Support: Broader compatibility across Python versions
Cross-Platform: Enhanced Windows, macOS, and Linux support
Library Dependencies: More flexible dependency management
Model Loading: Improved error handling for missing or corrupted models

📚 Documentation & Examples

New Usage Patterns

# Universal document processing
from docsray import PDFChatBot
from docsray.scripts import pdf_extractor, chunker, build_index, section_rep_builder
Process any document type - auto-conversion handled internally
extracted = pdf_extractor.extract_content(

"report.xlsx",  # Excel spreadsheet

analyze_visuals=True,

visual_analysis_interval=1

)
Continue with normal workflow
chunks = chunker.process_extracted_file(extracted)

chunk_index = build_index.build_chunk_index(chunks)

sections = section_rep_builder.build_section_reps(extracted["sections"], chunk_index)
Ask questions about the spreadsheet content
chatbot = PDFChatBot(sections, chunk_index)

answer, references = chatbot.answer("What are the quarterly sales figures?")

MCP Workflow Examples

# Batch process directory with summaries "Process all documents in /path/to/folder with brief summaries" Content-based document discovery "Search for documents about quarterly reports" "Load the best matching document" Visual content analysis

"What charts and graphs are in this presentation?" "Describe the diagram on slide 5"

🔄 Migration Guide

From v1.2.x to v1.3.0

New Dependencies (Optional)

# For optimal file conversion support
sudo apt-get install libreoffice pandoc wkhtm...

Hotfix applied for Gemma3 multimodal functionality.

document search for mcp server
fast mode test for visual analysis

📋 Features

Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
Adaptive Performance: Automatically optimizes based on available system resources
Multi-Model Support: Uses BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B models
MCP Integration: Seamless integration with Claude Desktop
Multiple Interfaces: Web UI, API server, CLI, and MCP server
Directory Management: Advanced PDF directory handling and caching
Multi-Language: Supports multiple languages including Korean and English
Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
Universal Document Support: Automatically converts 30+ file formats to PDF for processing
Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more

🎯 What's New in v1.0.x

Universal Document Support

DocsRay now automatically converts various document formats to PDF for processing:

Supported File Formats

Office Documents

Microsoft Word (.docx, .doc)
Microsoft Excel (.xlsx, .xls)
Microsoft PowerPoint (.pptx, .ppt)
OpenDocument formats (.odt, .ods, .odp)

Text Formats

Plain Text (.txt)
Markdown (.md)
Rich Text Format (.rtf)
reStructuredText (.rst)

Web Formats

HTML (.html, .htm)
XML (.xml)

Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
BMP (.bmp)
TIFF (.tiff, .tif)
WebP (.webp)

E-book Formats

EPUB (.epub)
MOBI (.mobi)

Automatic Conversion

Simply load any supported file type, and DocsRay will:

Automatically detect the file format
Convert it to PDF in the background
Process it with all the same features as native PDFs
Clean up temporary files automatically

# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.png

Hybrid OCR System

DocsRay now features an intelligent hybrid OCR system that automatically selects the optimal OCR method based on your system resources:

FULL_FEATURE_MODE (RAM > 32GB): AI-powered OCR using Gemma-3-4B model
- Accurately recognizes complex layouts and multilingual text
- Understands context when extracting text from tables, charts, and diagrams
- Significantly improves text quality from scanned PDFs
Standard Mode (RAM 8-32GB): Traditional Pytesseract-based OCR
- Stable and fast text extraction
- Multi-language support (including Korean)
FAST_MODE (RAM < 8GB): OCR disabled
- Memory efficiency prioritized
- Processes only PDFs with embedded text

Adaptive Performance Optimization

Automatically detects system resources and optimizes performance:

# Automatic resource detection and mode configuration
if available_ram > 32GB:
    FULL_FEATURE_MODE = True  # All features enabled
elif available_ram < 4GB:
    FAST_MODE = True  # Lightweight mode
else:
    # Standard mode (balanced performance)

Enhanced MCP Commands

Cache Management: clear_all_cache, get_cache_info
Improved Summarization: Batch processing with section-by-section caching
Detail Levels: Adjustable summary detail (brief/standard/detailed)

📋 Features

Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
Multimodal AI: Visual content analysis using Gemma-3-4B's image recognition capabilities
Hybrid OCR System: Intelligent selection between AI-powered OCR and traditional Pytesseract
Adaptive Performance: Automatically optimizes based on available system resources
Multi-Model Support: Uses BGE-M3, E5-Large, Gemma-3-1B, and Gemma-3-4B models
MCP Integration: Seamless integration with Claude Desktop
Multiple Interfaces: Web UI, API server, CLI, and MCP server
Directory Management: Advanced PDF directory handling and caching
Multi-Language: Supports multiple languages including Korean and English
Smart Resource Management: FAST_MODE, Standard, and FULL_FEATURE_MODE based on system specs
Universal Document Support: Automatically converts 30+ file formats to PDF for processing
Smart File Conversion: Handles Office documents, images, HTML, Markdown, and more

🎯 What's New in v1.0.x

Universal Document Support

DocsRay now automatically converts various document formats to PDF for processing:

Supported File Formats

Office Documents

Microsoft Word (.docx, .doc)
Microsoft Excel (.xlsx, .xls)
Microsoft PowerPoint (.pptx, .ppt)
OpenDocument formats (.odt, .ods, .odp)

Text Formats

Plain Text (.txt)
Markdown (.md)
Rich Text Format (.rtf)
reStructuredText (.rst)

Web Formats

HTML (.html, .htm)
XML (.xml)

Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
BMP (.bmp)
TIFF (.tiff, .tif)
WebP (.webp)

E-book Formats

EPUB (.epub)
MOBI (.mobi)

Automatic Conversion

Simply load any supported file type, and DocsRay will:

Automatically detect the file format
Convert it to PDF in the background
Process it with all the same features as native PDFs
Clean up temporary files automatically

# Works with any supported format!
docsray process /path/to/document.docx
docsray process /path/to/spreadsheet.xlsx
docsray process /path/to/image.png

Hybrid OCR System

DocsRay now features an intelligent hybrid OCR system that automatically selects the optimal OCR method based on your system resources:

FULL_FEATURE_MODE (RAM > 32GB): AI-powered OCR using Gemma-3-4B model
- Accurately recognizes complex layouts and multilingual text
- Understands context when extracting text from tables, charts, and diagrams
- Significantly improves text quality from scanned PDFs
Standard Mode (RAM 8-32GB): Traditional Pytesseract-based OCR
- Stable and fast text extraction
- Multi-language support (including Korean)
FAST_MODE (RAM < 8GB): OCR disabled
- Memory efficiency prioritized
- Processes only PDFs with embedded text

Adaptive Performance Optimization

Automatically detects system resources and optimizes performance:

# Automatic resource detection and mode configuration
if available_ram > 32GB:
    FULL_FEATURE_MODE = True  # All features enabled
elif available_ram < 4GB:
    FAST_MODE = True  # Lightweight mode
else:
    # Standard mode (balanced performance)

Enhanced MCP Commands

Cache Management: clear_all_cache, get_cache_info
Improved Summarization: Batch processing with section-by-section caching
Detail Levels: Adjustable summary detail (brief/standard/detailed)

📊 Performance Optimization Guide

Recommended Settings by Memory

System Memory	Mode	OCR	Visual Analysis	Max Tokens
< 4GB	FAST_MODE	❌	❌	4,096
4-8GB	FAST_MODE	✅ (Pytesseract)	Limited	8K
8-16GB	Standard	✅ (Pytesseract)	✅	16K
16-32GB	Standard	✅ (Pytesseract)	✅	32K
> 32GB	FULL_FEATURE	✅ (AI OCR)	✅	128K

Full Changelog: v0.2.0...v0.3.0

New feature update: document summerization

Full Changelog: https://github.com/MIMICLab/DocsRay/commits/v0.2.0

Releases: MIMICLab/DocsRay

v1.9.0

[1.9.0] - 2025-02-01

Added

Changed

Uh oh!

v1.8.0

Changelog

[1.8.0] - 2025-01-31

Added

Changed

[1.7.2] - 2025-01-26

Added

Changed

[1.7.1] - 2025-01-25

Added

Changed

Fixed

[1.7.0] - 2025-01-24

Changed

Technical Details

Removed

[1.6.2] - Previous Release

Uh oh!

v1.7.1

Changed

Added

Fixed

Technical Details

Removed

[1.6.2] - Previous Release

Uh oh!

v1.6.0

Download lite models (4b, ~3GB)

Use base models for web interface (12b, ~8GB)

Process documents with pro models (27b, ~16GB)

Start API server (no pre-loading required)

Ask questions about any document

Basic performance test

Advanced testing with multiple iterations

Re-download all models for base type

New installation

Upgrade from previous version

Download models for your preferred type

Uh oh!

v1.5.4

DocsRay v1.5.4 Release Notes

🎯 What's New

Universal Document Support

📄 Newly Supported Formats

🔄 Smart Auto-Conversion

Enhanced Hybrid OCR System

🤖 AI-Powered OCR

⚡ Traditional OCR Fallback

Adaptive Performance Optimization

Enhanced MCP (Model Context Protocol) Integration

🔍 Dual Search Capabilities

📊 Intelligent Batch Processing

🎯 Enhanced Directory Management

Visual Content Analysis Improvements

👁️ Advanced Visual Processing

🔧 Global Visual Analysis Control

🛠️ Technical Improvements

File Conversion Architecture

Check format support

Enhanced API Endpoints

Improved Caching System

📈 Performance Enhancements

GPU Acceleration

Search Algorithm Improvements

🔧 Configuration & Deployment

Environment Variables

Force specific performance mode

Disable visual analysis globally

Enable debug mode

New CLI Commands

Enhanced model download with verification

Improved Claude Desktop configuration

🐛 Bug Fixes & Stability

Core Fixes