Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@vasanthrpjan1-boop
Copy link

This PR builds upon #2696 and significantly enhances the real-time transcription example with production-ready features.

New Features

Voice Activity Detection (VAD)

  • Only transcribes when speech is detected, saving compute resources
  • Configurable energy threshold (--energy-threshold)
  • Automatic speech segmentation based on silence detection

Word-Level Timestamps

  • --word-timestamps flag shows timing for each word
  • Useful for subtitling and precise audio alignment

Speaker Change Detection (Experimental)

  • --detect-speakers provides hints when speaker changes are detected
  • Based on pause pattern analysis

Audio Device Selection

  • --list-devices to show available microphones
  • --device-id to select a specific input device

Enhanced User Experience

  • Live audio level visualization with color-coded bar
  • Beautiful terminal UI with box-drawn headers
  • Duplicate transcript filtering using similarity scoring
  • Transcript saving with optional timestamps (--output, --timestamps)

Usage Examples

Basic usage

python examples/real_time_transcription.py

With word timestamps

python examples/real_time_transcription.py --word-timestamps

Save transcript with timestamps

python examples/real_time_transcription.py --output notes.txt --timestamps

Use specific model and language

python examples/real_time_transcription.py --model small --language es## Changes

  • examples/real_time_transcription.py - Complete rewrite with new features
  • README.md - Updated documentation with usage examples

Aditi-M007 and others added 2 commits November 29, 2025 20:57
…amps, and more

Enhanced real_time_transcription.py:
- Voice Activity Detection (VAD) for efficient speech segmentation
- CLI arguments for model, language, device, output file
- Audio device selection with --list-devices
- Live audio level visualization with color-coded bar
- Word-level timestamps (--word-timestamps)
- Speaker change detection hints (--detect-speakers)
- Duplicate transcript filtering using similarity scoring
- Beautiful terminal UI with box-drawn headers

Updated README.md:
- Documented new real-time transcription features
- Added quick start examples for common use cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants