Releases: scottgal/mostlylucidweb
DocSummarizer v3.2.0
DocSummarizer v3.2.0
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Full Documentation | Blog Post
What's New
Summarization Modes:
- ⚡ Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
- 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
- 🤖 Auto Mode - Smart mode selection based on document characteristics
Key Features:
- 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
- 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
- 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
- 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
- 🔍 Template Benchmarking - Compare multiple templates on the same document
- 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
- 🎨 Spectre.Console UI - Beautiful progress bars and tables
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
Quick Start
Fully Offline (Bert mode - no dependencies):
./docsummarizer -f document.md -m BertONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.
With LLM (better quality):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve
./docsummarizer -f document.pdfExample Commands
# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert
# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag
# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport
# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"
# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"
# JSON output for AI agents
./docsummarizer tool -f contract.pdf
# List available templates
./docsummarizer templatesPrerequisites
For Bert mode: None! Works completely offline after first model download.
For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve
For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve
Supported Formats
| Format | Requirements |
|---|---|
| Markdown (.md), Text (.txt) | None - works offline |
| ZIP archives (.zip) | None - extracts text/HTML automatically |
| PDF, DOCX, PPTX, XLSX | Docling service |
| Images (PNG, JPG, TIFF) | Docling service (OCR) |
| HTML, URLs | Built-in (optional Playwright for JS-rendered pages) |
Full Changelog: datasummarizer-v0.5.0...docsummarizer-v3.2.0
DataSummarizer v0.5.0
DataSummarizer v0.5.0
Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.
Full Documentation | Introduction Article
Key Features
- DuckDB-First - Stream data directly from files, handles datasets larger than RAM
- Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
- Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
- Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
- Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
- Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
- Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
- LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
- JSON Tool Mode - Structured output for MCP tools and AI agents
- Synthetic Data - Generate realistic test data from profiles
- Multi-Dataset Registry - Ingest and search across many datasets
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | datasummarizer-win-x64.zip |
| Windows | ARM64 | datasummarizer-win-arm64.zip |
| Linux | x64 | datasummarizer-linux-x64.tar.gz |
| Linux | ARM64 | datasummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | datasummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | datasummarizer-osx-arm64.tar.gz |
Quick Start
Default (full profile with LLM narrative - most detailed):
./datasummarizer -f data.csv --model qwen2.5-coder:7bSuper fast mode (stats only, no LLM - fastest):
./datasummarizer -f data.csv --no-llm --fastTarget-aware analysis (analyze feature effects on label):
./datasummarizer -f customers.csv --target Churned --no-llmGenerate and validate constraints (data contracts):
# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
--generate-constraints --output constraints.json --no-llm
# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
--constraints constraints.json --format markdown --strictCompare datasets (segment comparison):
./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdownGenerate synthetic data:
./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000JSON output for AI agents/pipelines:
./datasummarizer tool -f data.csv --target RevenueWith LLM (Enhanced Insights)
# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve
./datasummarizer -f data.csv --model qwen2.5-coder:7bSupported Formats
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Auto-detects delimiter |
| Excel | .xlsx, .xls | Specify sheet with -s |
| Parquet | .parquet | Full support |
| JSON | .json | Auto-infers schema |
Statistics Calculated
Numeric Columns:
- Mean, Median, StdDev, MAD (robust)
- Quartiles (Q25, Q50, Q75), IQR
- Skewness, Kurtosis
- Coefficient of Variation
- Outlier count (IQR method)
- Zero count (for sparse data)
- Distribution type (Normal, Uniform, Skewed, Bimodal)
- Trend detection (with R²)
Categorical Columns:
- Unique count, Top values
- Mode, Imbalance ratio
- Shannon Entropy (information content)
- Ordinal detection
Text Columns:
- Length stats (min, max, avg)
- Empty string count
- Pattern detection (Email, URL, Phone, UUID, etc.)
- Novel pattern discovery with regex generation
DateTime Columns:
- Date range and span
- Time series granularity
- Gap detection
- Seasonality detection
Prerequisites
For basic profiling: None - works completely standalone!
For LLM-enhanced insights: Ollama with any model
Full Changelog: datasummarizer-v0.3.1...datasummarizer-v0.5.0
DataSummarizer v0.3.1
DataSummarizer v0.3.1
Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.
Full Documentation | Introduction Article
Key Features
- DuckDB-First - Stream data directly from files, handles datasets larger than RAM
- Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
- Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
- Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
- Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
- Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
- Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
- LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
- JSON Tool Mode - Structured output for MCP tools and AI agents
- Synthetic Data - Generate realistic test data from profiles
- Multi-Dataset Registry - Ingest and search across many datasets
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | datasummarizer-win-x64.zip |
| Windows | ARM64 | datasummarizer-win-arm64.zip |
| Linux | x64 | datasummarizer-linux-x64.tar.gz |
| Linux | ARM64 | datasummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | datasummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | datasummarizer-osx-arm64.tar.gz |
Quick Start
Default (full profile with LLM narrative - most detailed):
./datasummarizer -f data.csv --model qwen2.5-coder:7bSuper fast mode (stats only, no LLM - fastest):
./datasummarizer -f data.csv --no-llm --fastTarget-aware analysis (analyze feature effects on label):
./datasummarizer -f customers.csv --target Churned --no-llmGenerate and validate constraints (data contracts):
# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
--generate-constraints --output constraints.json --no-llm
# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
--constraints constraints.json --format markdown --strictCompare datasets (segment comparison):
./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdownGenerate synthetic data:
./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000JSON output for AI agents/pipelines:
./datasummarizer tool -f data.csv --target RevenueWith LLM (Enhanced Insights)
# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve
./datasummarizer -f data.csv --model qwen2.5-coder:7bSupported Formats
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Auto-detects delimiter |
| Excel | .xlsx, .xls | Specify sheet with -s |
| Parquet | .parquet | Full support |
| JSON | .json | Auto-infers schema |
Statistics Calculated
Numeric Columns:
- Mean, Median, StdDev, MAD (robust)
- Quartiles (Q25, Q50, Q75), IQR
- Skewness, Kurtosis
- Coefficient of Variation
- Outlier count (IQR method)
- Zero count (for sparse data)
- Distribution type (Normal, Uniform, Skewed, Bimodal)
- Trend detection (with R²)
Categorical Columns:
- Unique count, Top values
- Mode, Imbalance ratio
- Shannon Entropy (information content)
- Ordinal detection
Text Columns:
- Length stats (min, max, avg)
- Empty string count
- Pattern detection (Email, URL, Phone, UUID, etc.)
- Novel pattern discovery with regex generation
DateTime Columns:
- Date range and span
- Time series granularity
- Gap detection
- Seasonality detection
Prerequisites
For basic profiling: None - works completely standalone!
For LLM-enhanced insights: Ollama with any model
Full Changelog: datasummarizer-v0.3.0...datasummarizer-v0.3.1
DataSummarizer v0.3.0-alpha1
DataSummarizer v0.3.0-alpha1
Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.
Full Documentation | Introduction Article
Key Features
- DuckDB-First - Stream data directly from files, handles datasets larger than RAM
- Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
- Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
- Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
- Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
- Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
- Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
- LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
- JSON Tool Mode - Structured output for MCP tools and AI agents
- Synthetic Data - Generate realistic test data from profiles
- Multi-Dataset Registry - Ingest and search across many datasets
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | datasummarizer-win-x64.zip |
| Windows | ARM64 | datasummarizer-win-arm64.zip |
| Linux | x64 | datasummarizer-linux-x64.tar.gz |
| Linux | ARM64 | datasummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | datasummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | datasummarizer-osx-arm64.tar.gz |
Quick Start
Default (full profile with LLM narrative - most detailed):
./datasummarizer -f data.csv --model qwen2.5-coder:7bSuper fast mode (stats only, no LLM - fastest):
./datasummarizer -f data.csv --no-llm --fastTarget-aware analysis (analyze feature effects on label):
./datasummarizer -f customers.csv --target Churned --no-llmGenerate and validate constraints (data contracts):
# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
--generate-constraints --output constraints.json --no-llm
# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
--constraints constraints.json --format markdown --strictCompare datasets (segment comparison):
./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdownGenerate synthetic data:
./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000JSON output for AI agents/pipelines:
./datasummarizer tool -f data.csv --target RevenueWith LLM (Enhanced Insights)
# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve
./datasummarizer -f data.csv --model qwen2.5-coder:7bSupported Formats
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Auto-detects delimiter |
| Excel | .xlsx, .xls | Specify sheet with -s |
| Parquet | .parquet | Full support |
| JSON | .json | Auto-infers schema |
Statistics Calculated
Numeric Columns:
- Mean, Median, StdDev, MAD (robust)
- Quartiles (Q25, Q50, Q75), IQR
- Skewness, Kurtosis
- Coefficient of Variation
- Outlier count (IQR method)
- Zero count (for sparse data)
- Distribution type (Normal, Uniform, Skewed, Bimodal)
- Trend detection (with R²)
Categorical Columns:
- Unique count, Top values
- Mode, Imbalance ratio
- Shannon Entropy (information content)
- Ordinal detection
Text Columns:
- Length stats (min, max, avg)
- Empty string count
- Pattern detection (Email, URL, Phone, UUID, etc.)
- Novel pattern discovery with regex generation
DateTime Columns:
- Date range and span
- Time series granularity
- Gap detection
- Seasonality detection
Prerequisites
For basic profiling: None - works completely standalone!
For LLM-enhanced insights: Ollama with any model
Full Changelog: datasummarizer-v0.1.0...datasummarizer-v0.3.0-alpha1
DataSummarizer v0.3.0
DataSummarizer v0.3.0
Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.
Full Documentation | Introduction Article
Key Features
- DuckDB-First - Stream data directly from files, handles datasets larger than RAM
- Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
- Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
- Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
- Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
- Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
- Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
- LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
- JSON Tool Mode - Structured output for MCP tools and AI agents
- Synthetic Data - Generate realistic test data from profiles
- Multi-Dataset Registry - Ingest and search across many datasets
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | datasummarizer-win-x64.zip |
| Windows | ARM64 | datasummarizer-win-arm64.zip |
| Linux | x64 | datasummarizer-linux-x64.tar.gz |
| Linux | ARM64 | datasummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | datasummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | datasummarizer-osx-arm64.tar.gz |
Quick Start
Default (full profile with LLM narrative - most detailed):
./datasummarizer -f data.csv --model qwen2.5-coder:7bSuper fast mode (stats only, no LLM - fastest):
./datasummarizer -f data.csv --no-llm --fastTarget-aware analysis (analyze feature effects on label):
./datasummarizer -f customers.csv --target Churned --no-llmGenerate and validate constraints (data contracts):
# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
--generate-constraints --output constraints.json --no-llm
# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
--constraints constraints.json --format markdown --strictCompare datasets (segment comparison):
./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdownGenerate synthetic data:
./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000JSON output for AI agents/pipelines:
./datasummarizer tool -f data.csv --target RevenueWith LLM (Enhanced Insights)
# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve
./datasummarizer -f data.csv --model qwen2.5-coder:7bSupported Formats
| Format | Extension | Notes |
|---|---|---|
| CSV | .csv | Auto-detects delimiter |
| Excel | .xlsx, .xls | Specify sheet with -s |
| Parquet | .parquet | Full support |
| JSON | .json | Auto-infers schema |
Statistics Calculated
Numeric Columns:
- Mean, Median, StdDev, MAD (robust)
- Quartiles (Q25, Q50, Q75), IQR
- Skewness, Kurtosis
- Coefficient of Variation
- Outlier count (IQR method)
- Zero count (for sparse data)
- Distribution type (Normal, Uniform, Skewed, Bimodal)
- Trend detection (with R²)
Categorical Columns:
- Unique count, Top values
- Mode, Imbalance ratio
- Shannon Entropy (information content)
- Ordinal detection
Text Columns:
- Length stats (min, max, avg)
- Empty string count
- Pattern detection (Email, URL, Phone, UUID, etc.)
- Novel pattern discovery with regex generation
DateTime Columns:
- Date range and span
- Time series granularity
- Gap detection
- Seasonality detection
Prerequisites
For basic profiling: None - works completely standalone!
For LLM-enhanced insights: Ollama with any model
Full Changelog: datasummarizer-v0.1.0...datasummarizer-v0.3.0
DocSummarizer v3.1.6
DocSummarizer v3.1.6
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Full Documentation | Blog Post
What's New
Summarization Modes:
- ⚡ Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
- 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
- 🤖 Auto Mode - Smart mode selection based on document characteristics
Key Features:
- 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
- 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
- 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
- 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
- 🔍 Template Benchmarking - Compare multiple templates on the same document
- 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
- 🎨 Spectre.Console UI - Beautiful progress bars and tables
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
Quick Start
Fully Offline (Bert mode - no dependencies):
./docsummarizer -f document.md -m BertONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.
With LLM (better quality):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve
./docsummarizer -f document.pdfExample Commands
# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert
# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag
# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport
# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"
# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"
# JSON output for AI agents
./docsummarizer tool -f contract.pdf
# List available templates
./docsummarizer templatesPrerequisites
For Bert mode: None! Works completely offline after first model download.
For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve
For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve
Supported Formats
| Format | Requirements |
|---|---|
| Markdown (.md), Text (.txt) | None - works offline |
| ZIP archives (.zip) | None - extracts text/HTML automatically |
| PDF, DOCX, PPTX, XLSX | Docling service |
| Images (PNG, JPG, TIFF) | Docling service (OCR) |
| HTML, URLs | Built-in (optional Playwright for JS-rendered pages) |
Full Changelog: docsummarizer-v3.1.4...docsummarizer-v3.1.6
DocSummarizer v3.1.5
DocSummarizer v3.1.5
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Full Documentation | Blog Post
What's New
Summarization Modes:
- ⚡ Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
- 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
- 🤖 Auto Mode - Smart mode selection based on document characteristics
Key Features:
- 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
- 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
- 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
- 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
- 🔍 Template Benchmarking - Compare multiple templates on the same document
- 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
- 🎨 Spectre.Console UI - Beautiful progress bars and tables
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
Quick Start
Fully Offline (Bert mode - no dependencies):
./docsummarizer -f document.md -m BertONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.
With LLM (better quality):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve
./docsummarizer -f document.pdfExample Commands
# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert
# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag
# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport
# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"
# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"
# JSON output for AI agents
./docsummarizer tool -f contract.pdf
# List available templates
./docsummarizer templatesPrerequisites
For Bert mode: None! Works completely offline after first model download.
For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve
For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve
Supported Formats
| Format | Requirements |
|---|---|
| Markdown (.md), Text (.txt) | None - works offline |
| ZIP archives (.zip) | None - extracts text/HTML automatically |
| PDF, DOCX, PPTX, XLSX | Docling service |
| Images (PNG, JPG, TIFF) | Docling service (OCR) |
| HTML, URLs | Built-in (optional Playwright for JS-rendered pages) |
Full Changelog: docsummarizer-v3.1.4...docsummarizer-v3.1.5
DocSummarizer v3.1.4-rc0
DocSummarizer v3.1.4-rc0
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Full Documentation | Blog Post
What's New
Summarization Modes:
- ⚡ Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
- 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
- 🤖 Auto Mode - Smart mode selection based on document characteristics
Key Features:
- 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
- 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
- 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
- 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
- 🔍 Template Benchmarking - Compare multiple templates on the same document
- 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
- 🎨 Spectre.Console UI - Beautiful progress bars and tables
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
Quick Start
Fully Offline (Bert mode - no dependencies):
./docsummarizer -f document.md -m BertONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.
With LLM (better quality):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve
./docsummarizer -f document.pdfExample Commands
# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert
# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag
# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport
# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"
# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"
# JSON output for AI agents
./docsummarizer tool -f contract.pdf
# List available templates
./docsummarizer templatesPrerequisites
For Bert mode: None! Works completely offline after first model download.
For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve
For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve
Supported Formats
| Format | Requirements |
|---|---|
| Markdown (.md), Text (.txt) | None - works offline |
| ZIP archives (.zip) | None - extracts text/HTML automatically |
| PDF, DOCX, PPTX, XLSX | Docling service |
| Images (PNG, JPG, TIFF) | Docling service (OCR) |
| HTML, URLs | Built-in (optional Playwright for JS-rendered pages) |
Full Changelog: docsummarizer-v3.1.3...docsummarizer-v3.1.4-rc0
DocSummarizer v3.1.4
DocSummarizer v3.1.4
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Full Documentation | Blog Post
What's New
Summarization Modes:
- ⚡ Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
- 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
- 🤖 Auto Mode - Smart mode selection based on document characteristics
Key Features:
- 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
- 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
- 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
- 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
- 🔍 Template Benchmarking - Compare multiple templates on the same document
- 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
- 🎨 Spectre.Console UI - Beautiful progress bars and tables
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
Quick Start
Fully Offline (Bert mode - no dependencies):
./docsummarizer -f document.md -m BertONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.
With LLM (better quality):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve
./docsummarizer -f document.pdfExample Commands
# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert
# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag
# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport
# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"
# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"
# JSON output for AI agents
./docsummarizer tool -f contract.pdf
# List available templates
./docsummarizer templatesPrerequisites
For Bert mode: None! Works completely offline after first model download.
For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve
For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve
Supported Formats
| Format | Requirements |
|---|---|
| Markdown (.md), Text (.txt) | None - works offline |
| ZIP archives (.zip) | None - extracts text/HTML automatically |
| PDF, DOCX, PPTX, XLSX | Docling service |
| Images (PNG, JPG, TIFF) | Docling service (OCR) |
| HTML, URLs | Built-in (optional Playwright for JS-rendered pages) |
Full Changelog: docsummarizer-v3.1.4-rc0...docsummarizer-v3.1.4
DocSummarizer v3.1.3
DocSummarizer v3.1.3
Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.
Full Documentation | Blog Post
What's New
Summarization Modes:
- ⚡ Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
- 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
- 🤖 Auto Mode - Smart mode selection based on document characteristics
Key Features:
- 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
- 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
- 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
- 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
- 🔍 Template Benchmarking - Compare multiple templates on the same document
- 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
- 🎨 Spectre.Console UI - Beautiful progress bars and tables
Downloads
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | docsummarizer-win-x64.zip |
| Windows | ARM64 | docsummarizer-win-arm64.zip |
| Linux | x64 | docsummarizer-linux-x64.tar.gz |
| Linux | ARM64 | docsummarizer-linux-arm64.tar.gz |
| macOS | x64 (Intel) | docsummarizer-osx-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | docsummarizer-osx-arm64.tar.gz |
Quick Start
Fully Offline (Bert mode - no dependencies):
./docsummarizer -f document.md -m BertONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.
With LLM (better quality):
# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve
./docsummarizer -f document.pdfExample Commands
# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert
# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag
# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport
# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"
# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"
# JSON output for AI agents
./docsummarizer tool -f contract.pdf
# List available templates
./docsummarizer templatesPrerequisites
For Bert mode: None! Works completely offline after first model download.
For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve
For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve
Supported Formats
| Format | Requirements |
|---|---|
| Markdown (.md), Text (.txt) | None - works offline |
| ZIP archives (.zip) | None - extracts text/HTML automatically |
| PDF, DOCX, PPTX, XLSX | Docling service |
| Images (PNG, JPG, TIFF) | Docling service (OCR) |
| HTML, URLs | Built-in (optional Playwright for JS-rendered pages) |
Full Changelog: docsummarizer-v3.1.2...docsummarizer-v3.1.3