22[ ![ PyPI Status] ( https://badge.fury.io/py/docsray.svg )] ( https://badge.fury.io/py/docsray )
33[ ![ license] ( https://img.shields.io/badge/License-MIT-blue.svg )] ( https://github.com/MIMICLab/DocsRay/blob/main/LICENSE )
44[ ![ Downloads] ( https://pepy.tech/badge/docsray )] ( https://pepy.tech/project/docsray )
5+ [ ![ arXiv] ( https://img.shields.io/badge/arXiv-2507.23217-b31b1b.svg?style=flat )] ( http://arxiv.org/abs/2507.23217 )
56[ ![ Verified on MseeP] ( https://mseep.ai/badge.svg )] ( https://mseep.ai/app/f6dfcc65-8ee3-4ad1-9101-88b6dbdcf37b )
67
78** [ 🌐 Live Demo (Base Model)] ( https://docsray.com/ ) **
@@ -30,17 +31,17 @@ If the automatic setup doesn't work properly, you can run the setup manually:
3031# 1. Install DocsRay
3132pip install docsray
3233
33- # 2. Run manual setup
34+ # 2. Run setup (REQUIRED)
3435docsray setup
36+ # This will:
37+ # - Detect your GPU (NVIDIA CUDA, Apple Metal, or CPU)
38+ # - Install the optimized llama-cpp-python for your platform
39+ # - Install ffmpeg for audio/video processing
40+ # - Show additional recommendations for your OS
3541
36- # (If above doesn't work)
37- # 2-1. ffmpeg for Audio/Video processing (recommended)
38- # macOS: brew install ffmpeg
39- # Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
40- # Windows: Download from https://ffmpeg.org/download.html
41-
42- # 2-2. CUDA support for faster processing
43- # CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.3.9 --upgrade --force-reinstall --no-cache-dir
42+ # (If setup fails, manually install)
43+ # For CUDA support: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.3.9 --upgrade --force-reinstall --no-cache-dir
44+ # For ffmpeg: See "Audio/Video Processing" section below
4445
4546
4647# 3. Download models (default: lite)
@@ -53,10 +54,74 @@ docsray download-models --model-type lite # 4b model (~3GB)
5354
5455### Optional Components
5556
57+ #### ** LibreOffice for Better Office Document Support (Recommended)**
58+ ``` bash
59+ # Ubuntu/Debian
60+ sudo apt-get install libreoffice libreoffice-l10n-ko # l10n-ko for Korean support
61+
62+ # CentOS/RHEL/Fedora
63+ sudo yum install libreoffice
64+ # or
65+ sudo dnf install libreoffice
66+
67+ # macOS
68+ brew install libreoffice
69+ # For HWP support on macOS, additionally install h2orestart extension:
70+ # https://extensions.libreoffice.org/en/extensions/show/27504
71+
72+ # Windows
73+ # Download LibreOffice from: https://www.libreoffice.org/download/
74+ # For HWP support on Windows, additionally install h2orestart extension:
75+ # https://extensions.libreoffice.org/en/extensions/show/27504
76+
77+ # Arch Linux
78+ sudo pacman -S libreoffice-fresh
79+ ```
80+
81+ #### ** Audio/Video Processing (Optional)**
5682``` bash
57- # 1. Tesseract OCR (for enhanced OCR performance)
58- # Ubuntu/Debian: sudo apt-get install tesseract-ocr tesseract-ocr-kor
59- # macOS: brew install tesseract tesseract-lang
83+ # For audio transcription support
84+ pip install faster-whisper
85+
86+ # FFmpeg for video processing
87+ # Ubuntu/Debian
88+ sudo apt-get install ffmpeg
89+
90+ # macOS
91+ brew install ffmpeg
92+
93+ # CentOS/RHEL
94+ sudo yum install epel-release
95+ sudo yum install ffmpeg
96+
97+ # Windows (via Chocolatey)
98+ choco install ffmpeg
99+ ```
100+
101+ #### ** Additional Format Support**
102+ ``` bash
103+ # For pandoc-based conversions
104+ # Ubuntu/Debian
105+ sudo apt-get install pandoc
106+
107+ # macOS
108+ brew install pandoc
109+
110+ # For better HTML/Markdown processing
111+ pip install beautifulsoup4 markdown pdfkit
112+
113+ # For Korean fonts (better HWP rendering)
114+ # Ubuntu/Debian
115+ sudo apt-get install fonts-nanum fonts-nanum-coding fonts-nanum-extra
116+ ```
117+
118+ #### ** Tesseract OCR (for enhanced OCR performance)**
119+ ``` bash
120+ # Ubuntu/Debian
121+ sudo apt-get install tesseract-ocr tesseract-ocr-kor
122+
123+ # macOS
124+ brew install tesseract tesseract-lang
60125```
61126
62127### Start Using DocsRay
@@ -79,34 +144,21 @@ docsray configure-claude # MCP for Claude Desktop
79144- ** 📁 Universal Document Support** : 30+ file formats with automatic conversion
80145- ** 🌍 Multi-Language** : Korean, English, and other languages supported
81146
82- ## 🎯 What's New in v1.8.0
147+ ## 🎯 What's New
83148
84- ### Video and Audio Input Support
85- - ** Video Processing** : Extract and analyze content from video files
86- - Automatic audio extraction from video formats
87- - Frame extraction for visual content analysis
88- - Support for MP4, AVI, MOV, and other common formats
89- - ** Audio Processing** : Direct transcription and analysis of audio files
90- - Speech-to-text using faster-whisper
91- - Support for MP3, WAV, M4A, and other audio formats
92- - ** Multimedia Pipeline** : Unified processing for all media types
93- - ** Automatic Setup** : DocsRay now automatically installs dependencies and downloads models on first run
149+ ### v1.9.0: Enhanced Document Conversion
150+ - ** LibreOffice Integration** : Better quality conversions for Office documents when LibreOffice is installed
151+ - ** Improved Format Support** : Enhanced handling of DOCX, XLSX, PPTX, ODT, ODS, ODP, HWP/HWPX
94152
95- ## 📰 Recent Updates
153+ ### v1.8.0: Multimedia Support
154+ - ** Video/Audio Processing** : Extract and analyze content from video and audio files
155+ - ** Automatic Setup** : DocsRay now automatically installs dependencies and downloads models
96156
97- ### v1.7.1
157+ ### Recent Updates
158+ - Auto-restart capability for all servers
159+ - Enhanced embedding method (v1.7.0) - requires reindexing existing documents
98160
99- ### Auto-Restart and Timeout Features
100- - ** Auto-Restart Support** : Web, API, and MCP servers now support automatic restart on crashes
101- - ** Optional Timeout** : ` --timeout ` parameter only applies when explicitly specified
102- - ** Optional Page Limits** : ` --pages ` parameter only applies when explicitly specified
103- - ** Request Timeout for API** : API server can auto-restart if request processing exceeds timeout
104- - ** Unlimited Retries** : ` --max-retries ` is optional; if not specified, servers will retry indefinitely
105-
106- ### v1.7.0: Breaking Change - Enhanced Embedding Method
107- - ** Improved Embedding Synthesis** : Changed from element-wise addition to concatenation
108- - ** IMPORTANT** : This change requires reindexing of existing documents
109- - ** Better Accuracy** : Concatenation preserves more information from both embedding models
161+ For detailed changelog, see [ CHANGELOG.md] ( CHANGELOG.md )
110162
111163## 📖 Usage Guide
112164
@@ -179,6 +231,9 @@ docsray perf-test document.pdf "What is this about?"
179231# Advanced testing
180232docsray perf-test document.pdf " Analyze key points" \
181233 --iterations 5 --port 8000 --host localhost
234+
235+ # With custom timeout
236+ docsray perf-test document.pdf " What is this?" --timeout 600
182237```
183238
184239### MCP Integration (Claude Desktop)
@@ -284,6 +339,21 @@ We welcome contributions! Please check our [GitHub repository](https://github.co
284339
285340This project is licensed under the MIT License - see the [ LICENSE] ( LICENSE ) file for details.
286341
342+ ## 🙏 Open Source Dependencies
343+
344+ DocsRay is built on top of these excellent open source projects:
345+
346+ - ** llama.cpp** - GGML/GGUF model inference (MIT License)
347+ - ** PyMuPDF** - PDF processing (AGPL-3.0 License)
348+ - ** pdfplumber** - PDF text extraction (MIT License)
349+ - ** FastAPI** - Web framework (MIT License)
350+ - ** Gradio** - Web UI components (Apache-2.0 License)
351+ - ** OpenCV** - Image processing (Apache-2.0 License)
352+ - ** faster-whisper** - Audio transcription (MIT License)
353+ - ** Pandas** - Data manipulation (BSD-3-Clause License)
354+ - ** NumPy** - Numerical computing (BSD-3-Clause License)
355+ - ** scikit-learn** - Machine learning utilities (BSD-3-Clause License)
356+
287357## 🔗 Links
288358
289359- ** Live Demo (Base Model)** : https://docsray.com/
0 commit comments