A batch content summarizer that processes YouTube videos, blog articles, and local subtitle files (SRT/VTT/etc.) using AI-powered summarization through the fabric tool.
This tool helps you process a backlog of content by automatically generating structured markdown summaries. It extracts transcripts from YouTube videos, fetches blog content, processes local subtitle files, and creates comprehensive summaries using multiple AI patterns.
- Batch Processing: Process multiple entries from a single batch file
- Multi-Format Support: Handles YouTube videos, blog articles, and local subtitle files
- AI-Powered Summaries: Uses fabric's AI patterns for intelligent summarization
- Multiple Summary Types: Generates different perspectives on content:
- General summary
- YouTube-specific summary (for videos and subtitles)
- Extracted wisdom and insights
- Rich Metadata Extraction (YouTube only):
- Channel information with modern handle format (
@username) - Video descriptions from creators
- Auto-generated Table of Contents (TOC)
- Channel information with modern handle format (
- Local Subtitle Flow: Recursively process folders of
.srt/.vtt/.sub/.sbv/.txtfiles without YouTube lookups - Resilient to Fabric Output Drift: Shared retry + pseudo-header promotion so dropped
#prefixes never break TOC anchors - Organized Output: Structured folder hierarchy for easy navigation
- Legacy Patcher: Patch existing files with missing channel info, descriptions, and pseudo-header self-heal
- Content Upgrader: Upgrade older note formats to current template with validation + retry
- Comprehensive Reporting: Detailed statistics and success metrics
- Error Resilient: Continues processing even if individual entries fail
- Python 3.x
- fabric - AI-powered text processing tool
- Must be installed and configured with API access
- Requires patterns:
summarize,youtube_summary,extract_wisdom
- yt-dlp - YouTube metadata extraction
- Used to extract channel information from YouTube videos
- Authenticates via
--cookies-from-browserto handle age-gated, members-only, or rate-limited videos. Default browser ischrome; override with env varYTDLP_COOKIES_BROWSER(e.g.firefox,brave,edge,safari) or set it to empty string to disable cookie pulling.
The toolkit pulls YouTube auth cookies from a real browser profile via
yt-dlp --cookies-from-browser. Set this up once before running:
- Install the browser matching
YTDLP_COOKIES_BROWSER(defaultchrome). On macOS:brew install --cask google-chrome(orfirefox,brave-browser,microsoft-edge). Safari is built-in. - Sign into YouTube (i.e. into your Google account) in that browser at https://youtube.com. The cookie jar must contain a live YouTube session — fresh installs with no login produce empty cookies and yt-dlp will silently fall back to anonymous requests, defeating the point.
- Use the default profile unless you override it. yt-dlp reads the
browser's default profile by default; if you keep YouTube login in a
non-default Chrome profile, set
YTDLP_COOKIES_BROWSER="chrome:Profile 1"(yt-dlp accepts aBROWSER[:PROFILE]form). - macOS Keychain prompt (Chrome / Edge / Brave only): the first time yt-dlp reads cookies, macOS pops a Keychain prompt asking to release the "Chrome Safe Storage" password. Click Always Allow so subsequent runs are non-interactive. Firefox and Safari do not require this.
- Linux Chrome / Brave: Chromium-family browsers may need to be closed for yt-dlp to read the cookie SQLite DB on Linux (file lock). Firefox can be read while open. macOS Chrome can be read while open on recent yt-dlp.
- Stay logged in: if Google logs you out (password change, 2FA reset, browser cookie clear), re-login in the browser before next run.
To disable cookie pulling entirely (e.g. CI without a browser), export
YTDLP_COOKIES_BROWSER="". Public videos still work without cookies.
- Clone this repository
- Install Python dependencies:
pip install -r requirements.txt
- Ensure fabric is installed and configured:
# Install fabric (follow fabric's installation guide) # https://github.com/danielmiessler/fabric
Process a batch file containing multiple entries:
python content_summary_toolkit.py batch_entries.txtProcess a single YouTube video:
python youtube_summary_generator.py '[Learn RAG From Scratch](https://www.youtube.com/watch?v=sVcwVQRHIc8)'Process a single blog article:
python blog_summary_generator.py '[Article Title](https://example.com/article)'Recursively summarize every subtitle file under a directory — no YouTube
fetching, no channel lookup, no description. Outputs {name}.summary.md
alongside each source file:
# Process a whole course folder (all .srt/.vtt/.sub/.sbv/.txt)
python subtitle_summary_generator.py /path/to/subtitles
# Dry-run — list what would be processed
python subtitle_summary_generator.py /path/to/subtitles --dry-run
# Overwrite existing .summary.md outputs
python subtitle_summary_generator.py /path/to/subtitles --overwrite --verbose
# Limit to specific extensions
python subtitle_summary_generator.py /path/to/subtitles --extensions .srt .vttEach .summary.md contains the standard TOC + 3 fabric sections
(summarize / youtube_summary / extract_wisdom). Previously-produced
.summary.txt files from older versions are also skipped on re-scan.
Upgrade existing YouTube notes in an Obsidian vault to the current template — detects partial/legacy shapes and runs only the missing fabric patterns:
# Dry-run classification
python youtube_content_upgrader.py --folder /path/to/Youtube --dry-run
# Process one category, limited to N files for testing
python youtube_content_upgrader.py --folder /path/to/Youtube --category 2 --limit 5
# Process every upgradable file
python youtube_content_upgrader.py --folder /path/to/Youtube --category all --verbosePatch existing YouTube summary files with missing channel info and video descriptions:
# Patch all files in default folder (output/yt_generated/)
python youtube_summary_patcher.py
# Preview changes without modifying files
python youtube_summary_patcher.py --dry-run --verbose
# Update only channel info (skip video descriptions)
python youtube_summary_patcher.py --skip-description
# Patch files in custom folder
python youtube_summary_patcher.py --folder /path/to/folderCreate a text file with entries in markdown link format. The processor supports:
Supported Entries:
- YouTube videos:
[Video Title](https://youtube.com/watch?v=...) - Blog articles:
[Article Title](https://example.com/article) - Markdown headers:
# Section Name(skipped) - Commentary:
\# This is a comment(skipped) - Separators:
---(skipped) - Empty lines (skipped)
Example batch_entries.txt:
# AI and Machine Learning
[Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer](https://www.youtube.com/watch?v=sVcwVQRHIc8)
[Article - More of Silicon Valley is building on free Chinese AI](https://www.nbcnews.com/tech/innovation/silicon-valley-building-free-chinese-ai-rcna242430)
---
# Web Development
\# This is a note to self - check this later
[Building Modern Web Apps](https://example.com/modern-web-apps)The URL-driven flows (YouTube + blog) create an organized folder structure under the current directory:
output/
├── subtitle/ # YouTube transcripts
│ └── {video-title}.txt
├── yt_generated/ # YouTube summaries
│ └── {video-title}.md
├── blog/ # Fetched blog content
│ └── {article-title}.md
└── blog_generated/ # Blog summaries
└── {article-title}.md
The local subtitle flow (subtitle_summary_generator.py) writes its
output in-place next to each source file — no central folder:
/path/to/subtitles/
├── 01 - Intro.srt
├── 01 - Intro.srt.summary.md ← generated
├── 02 - Setup.srt
├── 02 - Setup.srt.summary.md ← generated
└── subfolder/
├── 03 - Next.vtt
└── 03 - Next.vtt.summary.md ← generated
Each generated summary contains:
- Channel author link (name and URL extracted via yt-dlp)
- Prefers modern handle format (
@username) over legacy channel ID
- Prefers modern handle format (
- Link to original video
- Table of Contents - Auto-generated from section headers
- Video Description - Original description from the creator
- General summary - AI-generated overview
- YouTube-specific summary - Detailed breakdown
- Extracted wisdom - Key insights and takeaways
Example Output Structure:
[Channel Name](https://www.youtube.com/@channelname)
[Link](https://www.youtube.com/watch?v=VIDEO_ID)
---
### TOC
- [[#ONE SENTENCE SUMMARY]]
- [[#Summary: Title]]
- [[#SUMMARY]]
---
Video description text here...
---
# ONE SENTENCE SUMMARY:
...
---
---
---
# Summary: Title
...
---
---
---
# SUMMARY
...Each generated summary contains:
- Link to original article
- General summary
- Extracted wisdom
All summaries automatically filter out AI thinking process (<think> tags).
After batch processing completes, you'll see a comprehensive report:
==================================================
Batch Processing Summary
==================================================
Total lines: 25
YouTube processed: 8
Blog processed: 12
Skipped: 3
Invalid format: 1
Errors: 1
Success rate: 95.2%
Total time: 5 min 23.45 sec
==================================================
The project is one shared helper module plus six entry-point scripts:
-
fabric_utils.py: Shared helper module (internal)
filter_think_sections— strip<think>...</think>blocksextract_first_level1_header— read first#header from markdowngenerate_toc— build### TOCwith[[#header]]wikilinkspromote_pseudo_header— recover when fabric drops the leading#on a headingrun_command— shell runner returning(success, output)with optional timeoutrun_fabric_with_retry— runs a fabric pattern, retries up toMAX_FABRIC_ATTEMPTS=3when validation fails, falls back topromote_pseudo_headerto recover deterministic dropouts. Accepts a pluggable validator so each tool can enforce its own quality bar.
-
content_summary_toolkit.py: Main orchestrator
- Parses batch files
- Routes entries to appropriate generators (YouTube vs blog)
- Tracks statistics and generates reports
-
youtube_summary_generator.py: YouTube processor
- Extracts channel information using
yt-dlp - Extracts video descriptions using
yt-dlp --get-description - Downloads transcripts using
fabric -y - Runs 3 fabric patterns with retry + pseudo-header fallback
- Generates TOC from section headers
- Creates structured markdown output with full metadata
- Extracts channel information using
-
blog_summary_generator.py: Blog processor
- Fetches blog content using
fabric -u - Runs 2 fabric patterns with retry + pseudo-header fallback
- Creates structured markdown output
- Fetches blog content using
-
subtitle_summary_generator.py: Local subtitle processor
- Recursively scans a folder for
.srt/.sub/.vtt/.sbv/.txt - Skips files that already have
.summary.md(or legacy.summary.txt) alongside them - Runs 3 fabric patterns with retry + pseudo-header fallback
- Writes
{source}.summary.mdin place - Flags:
--overwrite,--dry-run,--verbose,--extensions
- Recursively scans a folder for
-
youtube_content_upgrader.py: Content upgrader
- Classifies existing notes into categories (old-fabric, bare-content, near-compliant)
- Runs only the missing fabric patterns per category
- Enforces strict per-pattern validation (minimum line counts, required sub-headers)
- Falls back to pseudo-header promotion after retries exhaust
-
youtube_summary_patcher.py: Legacy file patcher
- Patches existing YouTube summary files to current format
- Adds missing channel information (yt-dlp)
- Generates TOC from existing headers (shared
generate_toc) - Adds missing video descriptions after TOC
- Self-heals pseudo-headers on read: if a section body has
ONE SENTENCE SUMMARY:as plain text, promotes it to# ONE SENTENCE SUMMARY:so Obsidian TOC anchors resolve - Supports dry-run mode for preview
Fabric's LLM output is non-deterministic and occasionally drops the leading
# from the top heading of a section — which used to silently produce
truncated TOCs with broken anchors. The toolkit now defends in depth:
- Retry — each fabric call is run up to
MAX_FABRIC_ATTEMPTS=3times (configurable) until the output passes validation (default: contains a level-1 header; per-tool validators may add stricter checks such as minimum line counts). - Pseudo-header promotion — if retries exhaust, the first
heading-shaped uppercase line (
ONE SENTENCE SUMMARY:,SUMMARY, etc.) is promoted to# ...so TOC generation succeeds. - Self-heal on patch —
youtube_summary_patcher.pyruns the same promotion pass over existing files on disk, so legacy notes with dropped#prefixes get fixed the next time the patcher visits them.
The processor is designed to be resilient:
- Individual entry failures don't stop batch processing
- All errors are collected and reported at the end
- Invalid format lines are logged and skipped
- Network issues are caught and reported
- Fabric output defects are retried and auto-repaired where possible
- Content Curation: Process your reading/watching backlog
- Research: Quickly extract insights from multiple sources
- Knowledge Management: Build a searchable library of summaries
- Learning: Review key points from educational content
- Content Creation: Gather research for articles or videos
This project uses detailed specifications in the specs/ folder:
specs/top_level.md- Batch processor specificationspecs/youtube_summary_generator.md- YouTube processing specificationspecs/blog_summary_generator.md- Blog processing specificationspecs/youtube_summary_patcher.md- YouTube patcher specification
When adding a new fabric-based generator, import from fabric_utils.py
rather than re-implementing the helpers — that keeps retry behavior,
TOC formatting, and pseudo-header recovery consistent across the toolkit.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with fabric by Daniel Miessler