Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 03eaedf

Browse files
committed
bump up to v1.9.0
1 parent ec73d66 commit 03eaedf

File tree

9 files changed

+652
-74
lines changed

9 files changed

+652
-74
lines changed

CHANGELOG.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
## [1.9.0] - 2025-02-01
6+
7+
### Added
8+
- **LibreOffice Integration**: Enhanced document conversion capabilities
9+
- Automatic detection and use of LibreOffice when available
10+
- Improved conversion quality for Office documents (DOCX, XLSX, PPTX)
11+
- Better support for OpenDocument formats (ODT, ODS, ODP)
12+
- Enhanced HWP/HWPX document handling with LibreOffice
13+
- Fallback mechanisms when LibreOffice is not available
14+
15+
### Changed
16+
- File converter now prioritizes LibreOffice for office document conversions
17+
- Improved error messages and conversion feedback
18+
- Better handling of conversion failures with automatic fallback methods
19+
20+
## [1.8.0] - 2025-01-31
21+
22+
### Added
23+
- **Video Input Support**: Process and extract information from video files
24+
- Automatic audio extraction from video formats
25+
- Frame extraction for visual content analysis
26+
- Support for common video formats (MP4, AVI, MOV, etc.)
27+
- **Audio Input Support**: Direct processing of audio files
28+
- Transcription using faster-whisper for speech-to-text
29+
- Support for various audio formats (MP3, WAV, M4A, etc.)
30+
- **Multimedia Processing Pipeline**: New `multimedia_processor.py` module
31+
- Unified interface for handling video and audio inputs
32+
- Automatic format detection and conversion
33+
- Integration with existing document processing pipeline
34+
35+
### Changed
36+
- Enhanced file converter to support multimedia file types
37+
- Updated dependencies to include faster-whisper for audio transcription
38+
39+
## [1.7.2] - 2025-01-26
40+
41+
### Added
42+
- Configurable `--timeout` parameter for `perf-test` command
43+
- Allows custom request timeout in seconds
44+
- No timeout if parameter is not specified (replaces hardcoded 300 seconds)
45+
46+
### Changed
47+
- Modified `perf-test` command to accept optional timeout parameter
48+
- Updated error messages to show actual timeout value instead of hardcoded 300 seconds
49+
50+
## [1.7.1] - 2025-01-25
51+
52+
### Added
53+
- Auto-restart functionality for Web, API, and MCP servers with `--auto-restart` flag
54+
- Request timeout monitoring for API server (triggers restart on timeout when auto-restart is enabled)
55+
- Optional `--max-retries` parameter (unlimited retries if not specified)
56+
- Configurable `--retry-delay` parameter for restart attempts
57+
58+
### Changed
59+
- `--timeout` parameter is now optional for both web and API (no timeout if not specified)
60+
- `--pages` parameter is now optional for web interface (process all pages if not specified)
61+
- Updated FastAPI from deprecated `@app.on_event` to modern lifespan context manager
62+
- API server now tracks request processing activity via `/activity` endpoint
63+
64+
### Fixed
65+
- Fixed deprecation warning in FastAPI shutdown event handler
66+
- Improved process cleanup in auto-restart monitor
67+
- Better handling of zombie processes when restarting services
68+
69+
## [1.7.0] - 2025-01-24
70+
71+
### Changed
72+
- **BREAKING CHANGE**: Modified embedding synthesis method from element-wise addition to concatenation
73+
- This change doubles the embedding dimension by concatenating two model outputs instead of adding them
74+
- Results in better semantic representation but requires reindexing of existing documents
75+
76+
### Technical Details
77+
- Changed `np.add(emb_1, emb_2)` to `np.concatenate([emb_1, emb_2])` in `get_embedding` method
78+
- Updated batch processing in `get_embeddings` to use list comprehension with concatenation
79+
- Both embeddings are now L2-normalized after concatenation
80+
81+
### Removed
82+
- Removed `mteb_embedding.py` test file
83+
84+
## [1.6.2] - Previous Release
85+
86+
- Previous release details...

README.md

Lines changed: 106 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
[![PyPI Status](https://badge.fury.io/py/docsray.svg)](https://badge.fury.io/py/docsray)
33
[![license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/MIMICLab/DocsRay/blob/main/LICENSE)
44
[![Downloads](https://pepy.tech/badge/docsray)](https://pepy.tech/project/docsray)
5+
[![arXiv](https://img.shields.io/badge/arXiv-2507.23217-b31b1b.svg?style=flat)](http://arxiv.org/abs/2507.23217)
56
[![Verified on MseeP](https://mseep.ai/badge.svg)](https://mseep.ai/app/f6dfcc65-8ee3-4ad1-9101-88b6dbdcf37b)
67

78
**[🌐 Live Demo (Base Model)](https://docsray.com/)**
@@ -30,17 +31,17 @@ If the automatic setup doesn't work properly, you can run the setup manually:
3031
# 1. Install DocsRay
3132
pip install docsray
3233

33-
# 2. Run manual setup
34+
# 2. Run setup (REQUIRED)
3435
docsray setup
36+
# This will:
37+
# - Detect your GPU (NVIDIA CUDA, Apple Metal, or CPU)
38+
# - Install the optimized llama-cpp-python for your platform
39+
# - Install ffmpeg for audio/video processing
40+
# - Show additional recommendations for your OS
3541

36-
#(If above doesn't work)
37-
# 2-1. ffmpeg for Audio/Video processing (recommended)
38-
# macOS: brew install ffmpeg
39-
# Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
40-
# Windows: Download from https://ffmpeg.org/download.html
41-
42-
# 2-2. CUDA support for faster processing
43-
# CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.3.9 --upgrade --force-reinstall --no-cache-dir
42+
# (If setup fails, manually install)
43+
# For CUDA support: CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.3.9 --upgrade --force-reinstall --no-cache-dir
44+
# For ffmpeg: See "Audio/Video Processing" section below
4445

4546

4647
# 3. Download models (default: lite)
@@ -53,10 +54,74 @@ docsray download-models --model-type lite # 4b model (~3GB)
5354

5455
### Optional Components
5556

57+
#### **LibreOffice for Better Office Document Support (Recommended)**
58+
```bash
59+
# Ubuntu/Debian
60+
sudo apt-get install libreoffice libreoffice-l10n-ko # l10n-ko for Korean support
61+
62+
# CentOS/RHEL/Fedora
63+
sudo yum install libreoffice
64+
# or
65+
sudo dnf install libreoffice
66+
67+
# macOS
68+
brew install libreoffice
69+
# For HWP support on macOS, additionally install h2orestart extension:
70+
# https://extensions.libreoffice.org/en/extensions/show/27504
71+
72+
# Windows
73+
# Download LibreOffice from: https://www.libreoffice.org/download/
74+
# For HWP support on Windows, additionally install h2orestart extension:
75+
# https://extensions.libreoffice.org/en/extensions/show/27504
76+
77+
# Arch Linux
78+
sudo pacman -S libreoffice-fresh
79+
```
80+
81+
#### **Audio/Video Processing (Optional)**
5682
```bash
57-
# 1. Tesseract OCR (for enhanced OCR performance)
58-
# Ubuntu/Debian: sudo apt-get install tesseract-ocr tesseract-ocr-kor
59-
# macOS: brew install tesseract tesseract-lang
83+
# For audio transcription support
84+
pip install faster-whisper
85+
86+
# FFmpeg for video processing
87+
# Ubuntu/Debian
88+
sudo apt-get install ffmpeg
89+
90+
# macOS
91+
brew install ffmpeg
92+
93+
# CentOS/RHEL
94+
sudo yum install epel-release
95+
sudo yum install ffmpeg
96+
97+
# Windows (via Chocolatey)
98+
choco install ffmpeg
99+
```
100+
101+
#### **Additional Format Support**
102+
```bash
103+
# For pandoc-based conversions
104+
# Ubuntu/Debian
105+
sudo apt-get install pandoc
106+
107+
# macOS
108+
brew install pandoc
109+
110+
# For better HTML/Markdown processing
111+
pip install beautifulsoup4 markdown pdfkit
112+
113+
# For Korean fonts (better HWP rendering)
114+
# Ubuntu/Debian
115+
sudo apt-get install fonts-nanum fonts-nanum-coding fonts-nanum-extra
116+
```
117+
118+
#### **Tesseract OCR (for enhanced OCR performance)**
119+
```bash
120+
# Ubuntu/Debian
121+
sudo apt-get install tesseract-ocr tesseract-ocr-kor
122+
123+
# macOS
124+
brew install tesseract tesseract-lang
60125
```
61126

62127
### Start Using DocsRay
@@ -79,34 +144,21 @@ docsray configure-claude # MCP for Claude Desktop
79144
- **📁 Universal Document Support**: 30+ file formats with automatic conversion
80145
- **🌍 Multi-Language**: Korean, English, and other languages supported
81146

82-
## 🎯 What's New in v1.8.0
147+
## 🎯 What's New
83148

84-
### Video and Audio Input Support
85-
- **Video Processing**: Extract and analyze content from video files
86-
- Automatic audio extraction from video formats
87-
- Frame extraction for visual content analysis
88-
- Support for MP4, AVI, MOV, and other common formats
89-
- **Audio Processing**: Direct transcription and analysis of audio files
90-
- Speech-to-text using faster-whisper
91-
- Support for MP3, WAV, M4A, and other audio formats
92-
- **Multimedia Pipeline**: Unified processing for all media types
93-
- **Automatic Setup**: DocsRay now automatically installs dependencies and downloads models on first run
149+
### v1.9.0: Enhanced Document Conversion
150+
- **LibreOffice Integration**: Better quality conversions for Office documents when LibreOffice is installed
151+
- **Improved Format Support**: Enhanced handling of DOCX, XLSX, PPTX, ODT, ODS, ODP, HWP/HWPX
94152

95-
## 📰 Recent Updates
153+
### v1.8.0: Multimedia Support
154+
- **Video/Audio Processing**: Extract and analyze content from video and audio files
155+
- **Automatic Setup**: DocsRay now automatically installs dependencies and downloads models
96156

97-
### v1.7.1
157+
### Recent Updates
158+
- Auto-restart capability for all servers
159+
- Enhanced embedding method (v1.7.0) - requires reindexing existing documents
98160

99-
### Auto-Restart and Timeout Features
100-
- **Auto-Restart Support**: Web, API, and MCP servers now support automatic restart on crashes
101-
- **Optional Timeout**: `--timeout` parameter only applies when explicitly specified
102-
- **Optional Page Limits**: `--pages` parameter only applies when explicitly specified
103-
- **Request Timeout for API**: API server can auto-restart if request processing exceeds timeout
104-
- **Unlimited Retries**: `--max-retries` is optional; if not specified, servers will retry indefinitely
105-
106-
### v1.7.0: Breaking Change - Enhanced Embedding Method
107-
- **Improved Embedding Synthesis**: Changed from element-wise addition to concatenation
108-
- **IMPORTANT**: This change requires reindexing of existing documents
109-
- **Better Accuracy**: Concatenation preserves more information from both embedding models
161+
For detailed changelog, see [CHANGELOG.md](CHANGELOG.md)
110162

111163
## 📖 Usage Guide
112164

@@ -179,6 +231,9 @@ docsray perf-test document.pdf "What is this about?"
179231
# Advanced testing
180232
docsray perf-test document.pdf "Analyze key points" \
181233
--iterations 5 --port 8000 --host localhost
234+
235+
# With custom timeout
236+
docsray perf-test document.pdf "What is this?" --timeout 600
182237
```
183238

184239
### MCP Integration (Claude Desktop)
@@ -284,6 +339,21 @@ We welcome contributions! Please check our [GitHub repository](https://github.co
284339

285340
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
286341

342+
## 🙏 Open Source Dependencies
343+
344+
DocsRay is built on top of these excellent open source projects:
345+
346+
- **llama.cpp** - GGML/GGUF model inference (MIT License)
347+
- **PyMuPDF** - PDF processing (AGPL-3.0 License)
348+
- **pdfplumber** - PDF text extraction (MIT License)
349+
- **FastAPI** - Web framework (MIT License)
350+
- **Gradio** - Web UI components (Apache-2.0 License)
351+
- **OpenCV** - Image processing (Apache-2.0 License)
352+
- **faster-whisper** - Audio transcription (MIT License)
353+
- **Pandas** - Data manipulation (BSD-3-Clause License)
354+
- **NumPy** - Numerical computing (BSD-3-Clause License)
355+
- **scikit-learn** - Machine learning utilities (BSD-3-Clause License)
356+
287357
## 🔗 Links
288358

289359
- **Live Demo (Base Model)**: https://docsray.com/

docsray/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ async def lifespan(app: FastAPI):
2626
app = FastAPI(
2727
title="DocsRay API",
2828
description="Universal Document Question-Answering System API",
29-
version="1.8.0",
29+
version="1.9.0",
3030
lifespan=lifespan
3131
)
3232

0 commit comments

Comments
 (0)