A guide to creating AI-generated transcripts with summaries and key quotes from talks and podcasts using Whisper and Claude.
See https://wesmckinney.com/presentations for example.
This workflow takes a video or podcast URL and produces a formatted markdown transcript with:
- YAML frontmatter for metadata
- A first-person summary of key topics
- Extracted "money quotes"
- Full verbatim transcript with speaker labels
# yt-dlp for downloading audio from YouTube/Vimeo/etc
brew install yt-dlp
# ffmpeg for audio compression (if needed)
brew install ffmpeg
# Python 3.8+ with these packages
pip install openai pyyaml- OpenAI API key for Whisper transcription (set as
OPENAI_API_KEYenvironment variable)
First, extract metadata from the video URL:
yt-dlp --print "%(title)s|||%(upload_date)s|||%(duration)s" "VIDEO_URL"Important: The upload date may differ from the actual talk date. Verify the actual event date from the video description or event website.
# Download as MP3 (quality 5 is good balance of size/quality)
yt-dlp -x --audio-format mp3 --audio-quality 5 -o "/tmp/talk-audio.%(ext)s" "VIDEO_URL"Or use the provided script:
./scripts/download-audio.sh "VIDEO_URL" "output-name"Whisper API has a 25MB limit. Check and compress if needed:
ls -lh /tmp/talk-audio.mp3
# If > 25MB, compress:
ffmpeg -i /tmp/talk-audio.mp3 -b:a 64k -ac 1 /tmp/talk-audio-compressed.mp3Using the OpenAI Whisper API:
python scripts/transcribe.py /tmp/talk-audio.mp3 > /tmp/raw-transcript.txtOr manually via the API:
from openai import OpenAI
client = OpenAI()
with open("/tmp/talk-audio.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
print(transcript)Always save the raw transcript before formatting:
cp /tmp/raw-transcript.txt transcripts/raw/YYYY-MM-DD-event-slug-raw.txtUse Claude to create a first-person summary. Provide the raw transcript and this prompt:
Please analyze this transcript and create:
1. A SUMMARY section written in first person ("I", "my") that covers the key topics discussed. Organize into subsections if there are distinct topics. Be factual - avoid grandiose language like "groundbreaking" or "revolutionary".
2. A KEY QUOTES section with 3-5 impactful direct quotes from the transcript, formatted as blockquotes with context.
3. A cleaned TRANSCRIPT with speaker names in bold followed by colons.
Raw transcript:
[paste transcript here]
Create the file at transcripts/YYYY-MM-DD-event-slug.md:
---
title: "Talk Title"
date: YYYY-MM-DD
event: "Event Name"
location: "City, State/Country"
video_url: "https://..."
video_type: "Talk"
transcribed: YYYY-MM-DD
---
*This transcript and summary were AI-generated and may contain errors.*
## Summary
[First-person summary here]
## Key Quotes
> "Quote text here" — Context or speaker
## Transcript
**Speaker Name:** Dialogue text...
**Other Speaker:** Response text...See templates/transcript-template.md for a complete template.
If you're integrating with a Quarto blog like the original:
- date: 'YYYY-MM-DD'
type: podcast # or: talk, interview, keynote, tutorial
role: guest # for podcasts: guest or co-host
event: "Event Name"
title: "Talk Title"
location: Remote
links:
- type: Video
url: https://...The date must match the transcript filename exactly for auto-linking to work.
quarto preview
# Check that transcript renders correctly
# Verify links workPattern: YYYY-MM-DD-event-slug.md
- Use the actual talk date, not the upload date
- Use lowercase with hyphens for the slug
- Keep slugs short but descriptive
Examples:
2024-05-15-talk-python-to-me-pandas.md2023-09-20-pycon-keynote.md2022-03-21-gresearch-interview.md
| Field | Required | Description |
|---|---|---|
title |
Yes | Talk or episode title |
date |
Yes | Actual talk date (YYYY-MM-DD) |
event |
Yes | Event, conference, or podcast name |
location |
Yes | City, State/Country or "Remote" |
video_url |
No* | URL to video/podcast |
video_type |
No* | Talk, Keynote, Podcast, Interview, Tutorial |
slides_url |
No* | URL to slides (if no video) |
transcribed |
No | Date transcript was created |
*Use either video_url + video_type OR slides_url, not both.
- First person: Write as if you gave the talk ("I discussed...", "My approach...")
- Factual: Focus on what was actually said, not interpretation
- No puffery: Avoid "groundbreaking", "revolutionary", "transformative", etc.
- Organized: Use subsections (
### Topic) for distinct themes
## Summary
Brief overview paragraph of the main topics covered.
### First Major Topic
Details about this topic...
### Second Major Topic
Details about this topic...## Key Quotes
> "The exact quote from the transcript" — Context about when/why this was said
> "Another impactful quote" — Speaker attribution if multiple speakers- Speaker names in bold followed by colon
- Each speaker turn on its own paragraph
- Preserve natural speech (can clean up minor filler words)
- Use
*[brackets]*for non-speech elements:*[laughter]*,*[applause]*
## Transcript
**Host:** Welcome to the show. Today we're talking about data science.
**Guest:** Thanks for having me. I'm excited to discuss this topic.
*[Brief pause]*
**Host:** Let's start with your background.Before publishing, verify:
- Filename follows
YYYY-MM-DD-event-slug.mdpattern - Date is actual talk date (not upload date)
- All required YAML fields present
- AI disclaimer included
- Summary is in first person
- Summary avoids grandiose language
- Key quotes use blockquote format
- Speaker names are bolded in transcript
- No obvious transcription errors or gaps
- Raw transcript saved in
raw/folder
Compress audio to reduce file size:
ffmpeg -i input.mp3 -b:a 64k -ac 1 output.mp3If Whisper output has gaps (repeated symbols, [inaudible]):
- Re-run transcription on that section
- Manually transcribe from video
- Note gaps with
[inaudible]markers
your-project/
├── transcripts/
│ ├── raw/ # Raw transcript backups
│ │ └── YYYY-MM-DD-*.txt
│ ├── _metadata.yml # Quarto metadata (optional)
│ ├── transcript-styles.css # Custom styles (optional)
│ └── YYYY-MM-DD-*.md # Formatted transcripts
├── scripts/
│ ├── download-audio.sh
│ ├── transcribe.py
│ └── format-transcript.py
└── templates/
└── transcript-template.md
- Whisper API: ~$0.006 per minute of audio
- Claude: Varies by usage for summary generation
A 1-hour talk costs approximately $0.36 for Whisper transcription.
These scripts and templates are provided as-is for educational purposes.