Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: scottgal/mostlylucidweb

DocSummarizer v3.2.0

24 Dec 18:40

Choose a tag to compare

DocSummarizer v3.2.0

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Full Documentation | Blog Post

What's New

Summarization Modes:

  • Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
  • 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode - Smart mode selection based on document characteristics

Key Features:

  • 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
  • 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
  • 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
  • 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
  • 🔍 Template Benchmarking - Compare multiple templates on the same document
  • 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
  • 🎨 Spectre.Console UI - Beautiful progress bars and tables

Downloads

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz

Quick Start

Fully Offline (Bert mode - no dependencies):

./docsummarizer -f document.md -m Bert

ONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.

With LLM (better quality):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve

./docsummarizer -f document.pdf

Example Commands

# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert

# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag

# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport

# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"

# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"

# JSON output for AI agents
./docsummarizer tool -f contract.pdf

# List available templates
./docsummarizer templates

Prerequisites

For Bert mode: None! Works completely offline after first model download.

For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve

For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve

Supported Formats

Format Requirements
Markdown (.md), Text (.txt) None - works offline
ZIP archives (.zip) None - extracts text/HTML automatically
PDF, DOCX, PPTX, XLSX Docling service
Images (PNG, JPG, TIFF) Docling service (OCR)
HTML, URLs Built-in (optional Playwright for JS-rendered pages)

Full Changelog: datasummarizer-v0.5.0...docsummarizer-v3.2.0

DataSummarizer v0.5.0

20 Dec 19:28

Choose a tag to compare

DataSummarizer v0.5.0

Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.

Full Documentation | Introduction Article

Key Features

  • DuckDB-First - Stream data directly from files, handles datasets larger than RAM
  • Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
  • Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
  • Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
  • Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
  • Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
  • Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
  • LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
  • JSON Tool Mode - Structured output for MCP tools and AI agents
  • Synthetic Data - Generate realistic test data from profiles
  • Multi-Dataset Registry - Ingest and search across many datasets

Downloads

Platform Architecture Download
Windows x64 datasummarizer-win-x64.zip
Windows ARM64 datasummarizer-win-arm64.zip
Linux x64 datasummarizer-linux-x64.tar.gz
Linux ARM64 datasummarizer-linux-arm64.tar.gz
macOS x64 (Intel) datasummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) datasummarizer-osx-arm64.tar.gz

Quick Start

Default (full profile with LLM narrative - most detailed):

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Super fast mode (stats only, no LLM - fastest):

./datasummarizer -f data.csv --no-llm --fast

Target-aware analysis (analyze feature effects on label):

./datasummarizer -f customers.csv --target Churned --no-llm

Generate and validate constraints (data contracts):

# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
  --generate-constraints --output constraints.json --no-llm

# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
  --constraints constraints.json --format markdown --strict

Compare datasets (segment comparison):

./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdown

Generate synthetic data:

./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000

JSON output for AI agents/pipelines:

./datasummarizer tool -f data.csv --target Revenue

With LLM (Enhanced Insights)

# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Supported Formats

Format Extension Notes
CSV .csv Auto-detects delimiter
Excel .xlsx, .xls Specify sheet with -s
Parquet .parquet Full support
JSON .json Auto-infers schema

Statistics Calculated

Numeric Columns:

  • Mean, Median, StdDev, MAD (robust)
  • Quartiles (Q25, Q50, Q75), IQR
  • Skewness, Kurtosis
  • Coefficient of Variation
  • Outlier count (IQR method)
  • Zero count (for sparse data)
  • Distribution type (Normal, Uniform, Skewed, Bimodal)
  • Trend detection (with R²)

Categorical Columns:

  • Unique count, Top values
  • Mode, Imbalance ratio
  • Shannon Entropy (information content)
  • Ordinal detection

Text Columns:

  • Length stats (min, max, avg)
  • Empty string count
  • Pattern detection (Email, URL, Phone, UUID, etc.)
  • Novel pattern discovery with regex generation

DateTime Columns:

  • Date range and span
  • Time series granularity
  • Gap detection
  • Seasonality detection

Prerequisites

For basic profiling: None - works completely standalone!

For LLM-enhanced insights: Ollama with any model

Full Changelog: datasummarizer-v0.3.1...datasummarizer-v0.5.0

DataSummarizer v0.3.1

20 Dec 17:16

Choose a tag to compare

DataSummarizer v0.3.1

Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.

Full Documentation | Introduction Article

Key Features

  • DuckDB-First - Stream data directly from files, handles datasets larger than RAM
  • Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
  • Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
  • Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
  • Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
  • Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
  • Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
  • LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
  • JSON Tool Mode - Structured output for MCP tools and AI agents
  • Synthetic Data - Generate realistic test data from profiles
  • Multi-Dataset Registry - Ingest and search across many datasets

Downloads

Platform Architecture Download
Windows x64 datasummarizer-win-x64.zip
Windows ARM64 datasummarizer-win-arm64.zip
Linux x64 datasummarizer-linux-x64.tar.gz
Linux ARM64 datasummarizer-linux-arm64.tar.gz
macOS x64 (Intel) datasummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) datasummarizer-osx-arm64.tar.gz

Quick Start

Default (full profile with LLM narrative - most detailed):

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Super fast mode (stats only, no LLM - fastest):

./datasummarizer -f data.csv --no-llm --fast

Target-aware analysis (analyze feature effects on label):

./datasummarizer -f customers.csv --target Churned --no-llm

Generate and validate constraints (data contracts):

# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
  --generate-constraints --output constraints.json --no-llm

# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
  --constraints constraints.json --format markdown --strict

Compare datasets (segment comparison):

./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdown

Generate synthetic data:

./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000

JSON output for AI agents/pipelines:

./datasummarizer tool -f data.csv --target Revenue

With LLM (Enhanced Insights)

# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Supported Formats

Format Extension Notes
CSV .csv Auto-detects delimiter
Excel .xlsx, .xls Specify sheet with -s
Parquet .parquet Full support
JSON .json Auto-infers schema

Statistics Calculated

Numeric Columns:

  • Mean, Median, StdDev, MAD (robust)
  • Quartiles (Q25, Q50, Q75), IQR
  • Skewness, Kurtosis
  • Coefficient of Variation
  • Outlier count (IQR method)
  • Zero count (for sparse data)
  • Distribution type (Normal, Uniform, Skewed, Bimodal)
  • Trend detection (with R²)

Categorical Columns:

  • Unique count, Top values
  • Mode, Imbalance ratio
  • Shannon Entropy (information content)
  • Ordinal detection

Text Columns:

  • Length stats (min, max, avg)
  • Empty string count
  • Pattern detection (Email, URL, Phone, UUID, etc.)
  • Novel pattern discovery with regex generation

DateTime Columns:

  • Date range and span
  • Time series granularity
  • Gap detection
  • Seasonality detection

Prerequisites

For basic profiling: None - works completely standalone!

For LLM-enhanced insights: Ollama with any model

Full Changelog: datasummarizer-v0.3.0...datasummarizer-v0.3.1

DataSummarizer v0.3.0-alpha1

20 Dec 16:09

Choose a tag to compare

DataSummarizer v0.3.0-alpha1

Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.

Full Documentation | Introduction Article

Key Features

  • DuckDB-First - Stream data directly from files, handles datasets larger than RAM
  • Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
  • Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
  • Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
  • Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
  • Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
  • Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
  • LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
  • JSON Tool Mode - Structured output for MCP tools and AI agents
  • Synthetic Data - Generate realistic test data from profiles
  • Multi-Dataset Registry - Ingest and search across many datasets

Downloads

Platform Architecture Download
Windows x64 datasummarizer-win-x64.zip
Windows ARM64 datasummarizer-win-arm64.zip
Linux x64 datasummarizer-linux-x64.tar.gz
Linux ARM64 datasummarizer-linux-arm64.tar.gz
macOS x64 (Intel) datasummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) datasummarizer-osx-arm64.tar.gz

Quick Start

Default (full profile with LLM narrative - most detailed):

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Super fast mode (stats only, no LLM - fastest):

./datasummarizer -f data.csv --no-llm --fast

Target-aware analysis (analyze feature effects on label):

./datasummarizer -f customers.csv --target Churned --no-llm

Generate and validate constraints (data contracts):

# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
  --generate-constraints --output constraints.json --no-llm

# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
  --constraints constraints.json --format markdown --strict

Compare datasets (segment comparison):

./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdown

Generate synthetic data:

./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000

JSON output for AI agents/pipelines:

./datasummarizer tool -f data.csv --target Revenue

With LLM (Enhanced Insights)

# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Supported Formats

Format Extension Notes
CSV .csv Auto-detects delimiter
Excel .xlsx, .xls Specify sheet with -s
Parquet .parquet Full support
JSON .json Auto-infers schema

Statistics Calculated

Numeric Columns:

  • Mean, Median, StdDev, MAD (robust)
  • Quartiles (Q25, Q50, Q75), IQR
  • Skewness, Kurtosis
  • Coefficient of Variation
  • Outlier count (IQR method)
  • Zero count (for sparse data)
  • Distribution type (Normal, Uniform, Skewed, Bimodal)
  • Trend detection (with R²)

Categorical Columns:

  • Unique count, Top values
  • Mode, Imbalance ratio
  • Shannon Entropy (information content)
  • Ordinal detection

Text Columns:

  • Length stats (min, max, avg)
  • Empty string count
  • Pattern detection (Email, URL, Phone, UUID, etc.)
  • Novel pattern discovery with regex generation

DateTime Columns:

  • Date range and span
  • Time series granularity
  • Gap detection
  • Seasonality detection

Prerequisites

For basic profiling: None - works completely standalone!

For LLM-enhanced insights: Ollama with any model

Full Changelog: datasummarizer-v0.1.0...datasummarizer-v0.3.0-alpha1

DataSummarizer v0.3.0

20 Dec 16:20

Choose a tag to compare

DataSummarizer v0.3.0

Statistics-first data profiling with DuckDB - compute facts first, then optionally narrate with a local LLM.

Full Documentation | Introduction Article

Key Features

  • DuckDB-First - Stream data directly from files, handles datasets larger than RAM
  • Statistical Profiling - Row/column counts, nulls, uniques, quantiles, distributions, correlations
  • Smart Alerts - Target imbalance, potential leakage, outliers, zero-inflation, data quality issues
  • Target Analysis - Feature effects with Cohen's d and rate deltas (supervised analysis without modeling)
  • Pattern Detection - Email, UUID, phone, novel patterns + distribution shape labeling
  • Constraint Validation - Auto-generate data contracts and validate against them (drift detection)
  • Segment Comparison - Compare datasets, cohorts, or time periods (A/B profiling)
  • LLM Integration - Optional Ollama-powered insights + SQL generation (works offline without LLM)
  • JSON Tool Mode - Structured output for MCP tools and AI agents
  • Synthetic Data - Generate realistic test data from profiles
  • Multi-Dataset Registry - Ingest and search across many datasets

Downloads

Platform Architecture Download
Windows x64 datasummarizer-win-x64.zip
Windows ARM64 datasummarizer-win-arm64.zip
Linux x64 datasummarizer-linux-x64.tar.gz
Linux ARM64 datasummarizer-linux-arm64.tar.gz
macOS x64 (Intel) datasummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) datasummarizer-osx-arm64.tar.gz

Quick Start

Default (full profile with LLM narrative - most detailed):

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Super fast mode (stats only, no LLM - fastest):

./datasummarizer -f data.csv --no-llm --fast

Target-aware analysis (analyze feature effects on label):

./datasummarizer -f customers.csv --target Churned --no-llm

Generate and validate constraints (data contracts):

# Generate constraints from production data
./datasummarizer validate --source production.csv --target test.csv \
  --generate-constraints --output constraints.json --no-llm

# Validate new data against constraints
./datasummarizer validate --source production.csv --target new_batch.csv \
  --constraints constraints.json --format markdown --strict

Compare datasets (segment comparison):

./datasummarizer segment --segment-a production.csv --segment-b synthetic.csv --format markdown

Generate synthetic data:

./datasummarizer profile -f original.csv --output profile.json
./datasummarizer synth --profile profile.json --synthesize-to synthetic.csv --synthesize-rows 10000

JSON output for AI agents/pipelines:

./datasummarizer tool -f data.csv --target Revenue

With LLM (Enhanced Insights)

# Install Ollama from https://ollama.ai, then:
ollama pull qwen2.5-coder:7b && ollama serve

./datasummarizer -f data.csv --model qwen2.5-coder:7b

Supported Formats

Format Extension Notes
CSV .csv Auto-detects delimiter
Excel .xlsx, .xls Specify sheet with -s
Parquet .parquet Full support
JSON .json Auto-infers schema

Statistics Calculated

Numeric Columns:

  • Mean, Median, StdDev, MAD (robust)
  • Quartiles (Q25, Q50, Q75), IQR
  • Skewness, Kurtosis
  • Coefficient of Variation
  • Outlier count (IQR method)
  • Zero count (for sparse data)
  • Distribution type (Normal, Uniform, Skewed, Bimodal)
  • Trend detection (with R²)

Categorical Columns:

  • Unique count, Top values
  • Mode, Imbalance ratio
  • Shannon Entropy (information content)
  • Ordinal detection

Text Columns:

  • Length stats (min, max, avg)
  • Empty string count
  • Pattern detection (Email, URL, Phone, UUID, etc.)
  • Novel pattern discovery with regex generation

DateTime Columns:

  • Date range and span
  • Time series granularity
  • Gap detection
  • Seasonality detection

Prerequisites

For basic profiling: None - works completely standalone!

For LLM-enhanced insights: Ollama with any model

Full Changelog: datasummarizer-v0.1.0...datasummarizer-v0.3.0

DocSummarizer v3.1.6

19 Dec 02:55

Choose a tag to compare

DocSummarizer v3.1.6

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Full Documentation | Blog Post

What's New

Summarization Modes:

  • Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
  • 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode - Smart mode selection based on document characteristics

Key Features:

  • 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
  • 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
  • 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
  • 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
  • 🔍 Template Benchmarking - Compare multiple templates on the same document
  • 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
  • 🎨 Spectre.Console UI - Beautiful progress bars and tables

Downloads

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz

Quick Start

Fully Offline (Bert mode - no dependencies):

./docsummarizer -f document.md -m Bert

ONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.

With LLM (better quality):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve

./docsummarizer -f document.pdf

Example Commands

# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert

# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag

# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport

# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"

# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"

# JSON output for AI agents
./docsummarizer tool -f contract.pdf

# List available templates
./docsummarizer templates

Prerequisites

For Bert mode: None! Works completely offline after first model download.

For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve

For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve

Supported Formats

Format Requirements
Markdown (.md), Text (.txt) None - works offline
ZIP archives (.zip) None - extracts text/HTML automatically
PDF, DOCX, PPTX, XLSX Docling service
Images (PNG, JPG, TIFF) Docling service (OCR)
HTML, URLs Built-in (optional Playwright for JS-rendered pages)

Full Changelog: docsummarizer-v3.1.4...docsummarizer-v3.1.6

DocSummarizer v3.1.5

19 Dec 02:48

Choose a tag to compare

DocSummarizer v3.1.5

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Full Documentation | Blog Post

What's New

Summarization Modes:

  • Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
  • 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode - Smart mode selection based on document characteristics

Key Features:

  • 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
  • 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
  • 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
  • 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
  • 🔍 Template Benchmarking - Compare multiple templates on the same document
  • 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
  • 🎨 Spectre.Console UI - Beautiful progress bars and tables

Downloads

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz

Quick Start

Fully Offline (Bert mode - no dependencies):

./docsummarizer -f document.md -m Bert

ONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.

With LLM (better quality):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve

./docsummarizer -f document.pdf

Example Commands

# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert

# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag

# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport

# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"

# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"

# JSON output for AI agents
./docsummarizer tool -f contract.pdf

# List available templates
./docsummarizer templates

Prerequisites

For Bert mode: None! Works completely offline after first model download.

For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve

For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve

Supported Formats

Format Requirements
Markdown (.md), Text (.txt) None - works offline
ZIP archives (.zip) None - extracts text/HTML automatically
PDF, DOCX, PPTX, XLSX Docling service
Images (PNG, JPG, TIFF) Docling service (OCR)
HTML, URLs Built-in (optional Playwright for JS-rendered pages)

Full Changelog: docsummarizer-v3.1.4...docsummarizer-v3.1.5

DocSummarizer v3.1.4-rc0

19 Dec 01:51

Choose a tag to compare

DocSummarizer v3.1.4-rc0

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Full Documentation | Blog Post

What's New

Summarization Modes:

  • Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
  • 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode - Smart mode selection based on document characteristics

Key Features:

  • 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
  • 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
  • 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
  • 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
  • 🔍 Template Benchmarking - Compare multiple templates on the same document
  • 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
  • 🎨 Spectre.Console UI - Beautiful progress bars and tables

Downloads

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz

Quick Start

Fully Offline (Bert mode - no dependencies):

./docsummarizer -f document.md -m Bert

ONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.

With LLM (better quality):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve

./docsummarizer -f document.pdf

Example Commands

# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert

# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag

# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport

# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"

# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"

# JSON output for AI agents
./docsummarizer tool -f contract.pdf

# List available templates
./docsummarizer templates

Prerequisites

For Bert mode: None! Works completely offline after first model download.

For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve

For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve

Supported Formats

Format Requirements
Markdown (.md), Text (.txt) None - works offline
ZIP archives (.zip) None - extracts text/HTML automatically
PDF, DOCX, PPTX, XLSX Docling service
Images (PNG, JPG, TIFF) Docling service (OCR)
HTML, URLs Built-in (optional Playwright for JS-rendered pages)

Full Changelog: docsummarizer-v3.1.3...docsummarizer-v3.1.4-rc0

DocSummarizer v3.1.4

19 Dec 02:09

Choose a tag to compare

DocSummarizer v3.1.4

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Full Documentation | Blog Post

What's New

Summarization Modes:

  • Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
  • 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode - Smart mode selection based on document characteristics

Key Features:

  • 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
  • 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
  • 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
  • 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
  • 🔍 Template Benchmarking - Compare multiple templates on the same document
  • 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
  • 🎨 Spectre.Console UI - Beautiful progress bars and tables

Downloads

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz

Quick Start

Fully Offline (Bert mode - no dependencies):

./docsummarizer -f document.md -m Bert

ONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.

With LLM (better quality):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve

./docsummarizer -f document.pdf

Example Commands

# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert

# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag

# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport

# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"

# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"

# JSON output for AI agents
./docsummarizer tool -f contract.pdf

# List available templates
./docsummarizer templates

Prerequisites

For Bert mode: None! Works completely offline after first model download.

For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve

For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve

Supported Formats

Format Requirements
Markdown (.md), Text (.txt) None - works offline
ZIP archives (.zip) None - extracts text/HTML automatically
PDF, DOCX, PPTX, XLSX Docling service
Images (PNG, JPG, TIFF) Docling service (OCR)
HTML, URLs Built-in (optional Playwright for JS-rendered pages)

Full Changelog: docsummarizer-v3.1.4-rc0...docsummarizer-v3.1.4

DocSummarizer v3.1.3

19 Dec 00:38

Choose a tag to compare

DocSummarizer v3.1.3

Turn documents or URLs into evidence-grounded summaries — for humans or AI agents — without sending anything to the cloud.

Full Documentation | Blog Post

What's New

Summarization Modes:

  • Bert Mode - Pure extractive summarization, completely offline, no LLM needed (~3-5s)
  • 🚀 BertRag Mode - Production-grade BERT extraction → retrieval → LLM synthesis
  • 🤖 Auto Mode - Smart mode selection based on document characteristics

Key Features:

  • 📦 ONNX Embeddings - Zero-config local embeddings (auto-downloads ~23MB model)
  • 📚 ZIP Archive Support - Summarize Project Gutenberg books directly from ZIP
  • 🎯 Adaptive Retrieval - Auto-scales context based on document size and type
  • 📊 14 Templates - prose, brief, executive, bookreport, technical, academic, and more
  • 🔍 Template Benchmarking - Compare multiple templates on the same document
  • 🌐 Playwright Support - Summarize JavaScript-rendered pages (SPAs, React apps)
  • 🎨 Spectre.Console UI - Beautiful progress bars and tables

Downloads

Platform Architecture Download
Windows x64 docsummarizer-win-x64.zip
Windows ARM64 docsummarizer-win-arm64.zip
Linux x64 docsummarizer-linux-x64.tar.gz
Linux ARM64 docsummarizer-linux-arm64.tar.gz
macOS x64 (Intel) docsummarizer-osx-x64.tar.gz
macOS ARM64 (Apple Silicon) docsummarizer-osx-arm64.tar.gz

Quick Start

Fully Offline (Bert mode - no dependencies):

./docsummarizer -f document.md -m Bert

ONNX model auto-downloads on first run (~23MB). Works with Markdown/TXT files.

With LLM (better quality):

# Install Ollama from https://ollama.ai, then:
ollama pull llama3.2:3b && ollama serve

./docsummarizer -f document.pdf

Example Commands

# Offline mode - no LLM needed
./docsummarizer -f doc.md -m Bert

# Best quality with LLM
./docsummarizer -f doc.pdf -m BertRag

# Summarize a Gutenberg book from ZIP
./docsummarizer -f pg1234.zip -t bookreport

# Compare templates on same document
./docsummarizer benchmark-templates -f doc.pdf -t "brief,prose,executive"

# Ask questions about a document
./docsummarizer -f manual.pdf --query "How do I install?"

# JSON output for AI agents
./docsummarizer tool -f contract.pdf

# List available templates
./docsummarizer templates

Prerequisites

For Bert mode: None! Works completely offline after first model download.

For LLM modes (Auto, BertRag): Ollama - ollama pull llama3.2:3b && ollama serve

For PDF/DOCX/PPTX/XLSX/Images: Docling - docker run -p 5001:5001 quay.io/docling-project/docling-serve

Supported Formats

Format Requirements
Markdown (.md), Text (.txt) None - works offline
ZIP archives (.zip) None - extracts text/HTML automatically
PDF, DOCX, PPTX, XLSX Docling service
Images (PNG, JPG, TIFF) Docling service (OCR)
HTML, URLs Built-in (optional Playwright for JS-rendered pages)

Full Changelog: docsummarizer-v3.1.2...docsummarizer-v3.1.3