Tags: scottgal/mostlylucidweb
Tags
fix(docsummarizer): standardize DOCX processing with single-pass conv… …ersion - Removed split processing for DOCX files due to issues with chunked text extraction. - Updated conversion logic to enforce single-pass conversion for improved stability and accuracy.
fix(docsummarizer): standardize DOCX processing with single-pass conv… …ersion - Removed split processing for DOCX files due to issues with chunked text extraction. - Updated conversion logic to enforce single-pass conversion for improved stability and accuracy.
feat(docsummarizer): add OCR fallback for PDF backend with text sanit… …ization and retry support - Introduced a fallback OCR backend (`OcrPdfBackend`) for handling garbled text in PDFs, with configurable support for OCR-based extraction. - Enhanced text processing by sanitizing gibberish encodings in markdown across summarization workflows. - Updated conversion logic to allow backend retry mechanisms, ensuring robust handling of corrupt text layers.
feat(docsummarizer): simplify ONNX model download and optimize embedd… …ing budget - Removed interactive Spectre progress bar during ONNX model downloads to prevent concurrent display issues. - Adjusted embedding budget parameters for improved coverage and performance on small-chunk documents. - Updated verbose logging to better track embedding progress and status.
feat(docsummarizer): enhance mode reasoning and configuration insights - Refined summarization mode selection logic to improve user transparency while preserving auto-mode behavior. - Added detailed reasoning for backend configuration, including embedding, summarization, PDF processing, and vector storage. - Updated service detection output with actionable tips and clearer guidance for enabling additional features. - Improved verbose configuration display for better debugging and understanding of system decisions.
test(config): update OnnxConfig test to reflect model directory chang… …e to app base directory - Changed OnnxConfig test to assert the model directory is based in the application directory instead of the user profile for better portability. - Updated ChunkCacheService tests to clarify flaky CI behavior and recommend manual execution.
test(config): update OnnxConfig test to reflect model directory chang… …e to app base directory - Changed OnnxConfig test to assert the model directory is based in the application directory instead of the user profile for better portability. - Updated ChunkCacheService tests to clarify flaky CI behavior and recommend manual execution.
docs(datasummarizer): clarify trust model, heuristics, and safety gua… …rantees - Added "Who this is for" section with clear use cases and non-goals - Moved heuristics table earlier with threshold documentation and "flag, not prove" clarification - Documented SQL safety constraints (read-only, no COPY/ATTACH/INSTALL/EXPORT, 20-row limit) - Clarified Registry vs Store distinction with their respective purposes - Added advisory note that auto-generated constraints require review before CI enforcement - Highlighted .NET 10 foundation for high-performance local analytics
docs(datasummarizer): clarify trust model, heuristics, and safety gua… …rantees - Added "Who this is for" section with clear use cases and non-goals - Moved heuristics table earlier with threshold documentation and "flag, not prove" clarification - Documented SQL safety constraints (read-only, no COPY/ATTACH/INSTALL/EXPORT, 20-row limit) - Clarified Registry vs Store distinction with their respective purposes - Added advisory note that auto-generated constraints require review before CI enforcement - Highlighted .NET 10 foundation for high-performance local analytics
docs(datasummarizer): clarify trust model, heuristics, and safety gua… …rantees - Added "Who this is for" section with clear use cases and non-goals - Moved heuristics table earlier with threshold documentation and "flag, not prove" clarification - Documented SQL safety constraints (read-only, no COPY/ATTACH/INSTALL/EXPORT, 20-row limit) - Clarified Registry vs Store distinction with their respective purposes - Added advisory note that auto-generated constraints require review before CI enforcement - Highlighted .NET 10 foundation for high-performance local analytics
PreviousNext