A Go-based tool for improving YouTube auto-generated subtitles using Google's Gemini AI. This project helps convert YouTube's raw srv3 subtitle files into well-formatted SRT files with natural sentence breaks and proper timing.
YouTube's automatic subtitles (in srv3 format) often contain word-by-word timing data but lack proper sentence structure. This tool:
- Downloads YouTube videos with auto-generated subtitles
- Processes the subtitle data through Google's Gemini AI
- Creates properly formatted SRT files with natural language structure
- Maintains accurate timing synchronization with the video
- Video Download: Integrated YouTube video downloading via go-ytdlp
- AI-Enhanced Subtitles: Uses Gemini AI to improve subtitle readability
- Batch Processing: Handles large subtitles by processing them in manageable batches
- Debug Options: Includes debug mode for troubleshooting
- Two Command Tools:
yt_enhancer: Download videos and process subtitles in one stepconvert_srt: Process existing srv3 files to SRT format
- Go 1.18 or higher
- Google Gemini API key
- Clone the repository
- Create a
.envfile with your Gemini API key:GEMINI_API_KEY=your_api_key_here - Build the tools:
go build -o bin/yt_enhancer ./cmd/yt_enhancer
go build -o bin/convert_srt ./cmd/convert_srt./bin/yt_enhancer "https://www.youtube.com/watch?v=VIDEO_ID" [custom_filename]This will:
- Download the YouTube video
- Extract auto-generated subtitles
- Process them through Gemini API
- Generate an SRT file
./bin/convert_srt [-env=.env] [-o=output.srt] [-debug] [-debug-dir=debug] input.srv3Options:
-env: Path to environment file (default:.env)-o: Output file path (default: same as input with.srtextension)-debug: Enable debug mode-debug-dir: Directory to store debug files (default:debug)
- Subtitle Extraction: Parses the srv3 XML file to extract word-level timing data
- Batch Processing: Divides large subtitle files into manageable batches
- AI Processing: Sends word timings to Gemini API for intelligent sentence formation
- Timing Adjustment: Calculates appropriate display durations for each subtitle
- SRT Generation: Creates properly formatted SRT files with exact timing information
- cmd/: Command-line tools
- yt_enhancer/: Video download and subtitle processor
- convert_srt/: Standalone srv3 to SRT converter
- pkg/: Core functionality
- config/: Configuration handling
- gemini/: Gemini API client
- models/: Data structures
- parser/: srv3 XML parsing
- subtitle/: SRT file generation
The tool converts raw word timings from YouTube into readable subtitles with natural sentence breaks while maintaining synchronization with the video content.
This project is licensed under the MIT License.
This project was developed with assistance from GitHub Copilot and other AI tools. The code, documentation, and project structure were created using AI-powered pair programming techniques.