A powerful tool to convert audio files into videos with auto-generated, synchronized, and highlighted subtitles (Karaoke style).
- Auto Transcription: Uses
faster-whisperfor high-accuracy speech-to-text with word-level timestamps. - Dynamic Backgrounds: Moving multi-color gradient with selectable palettes (plasma/aurora effects).
- Karaoke Subtitles: Real-time highlighting of spoken words.
- Customizable: Adjust font size, color, highlight color, and position.
- Transcript Editing: Export transcripts to JSON for manual correction, then re-import to generate the perfect video.
This project can be installed in two ways:
pip install lyrichromaThis project is managed with uv.
# Install uv if you haven't (https://github.com/astral-sh/uv)
pip install uv
# Clone and sync dependencies
uv syncConvert audio to video with a black background:
uv run lyrichroma --input audio.mp3 --output video.mp4Generate a video with a cool moving background:
uv run lyrichroma --input audio.mp3 --output video.mp4 --bg-type dynamic --bg-value auroraAvailable palettes: aurora (default), sunset, ocean, cyberpunk, forest.
For cases where automatic transcription isn't perfect, LyriChroma provides a transcript editing workflow:
- Generate transcript JSON (skip video generation):
uv run lyrichroma --input audio.mp3 --save-transcript transcript.json
- Edit
transcript.jsonmanually to correct any transcription errors or adjust timing. - Generate video from edited transcript:
uv run lyrichroma --input audio.mp3 --output video.mp4 --load-transcript transcript.json
This workflow is especially useful when you need perfect accuracy for professional use cases, or when dealing with specialized vocabulary, names, or accents that ASR systems might not recognize correctly.
Customize the subtitle appearance:
uv run lyrichroma \
--input audio.mp3 \
--output video.mp4 \
--font-size 60 \
--font-color "#FFFFFF" \
--highlight-color "#FF0000" \
--text-y 0.8| Argument | Description | Default |
|---|---|---|
--input, -i |
Input audio file path | Required |
--output, -o |
Output video file path | Optional |
--bg-type |
color, image, or dynamic |
color |
--bg-value |
Hex color (#000000), image path, or palette name (aurora, sunset, ocean, cyberpunk, forest) |
#000000 or aurora |
--model-size |
Whisper model size (tiny, base, large-v2, etc.) | base |
--save-transcript |
Path to save JSON transcript | None |
--load-transcript |
Path to load JSON transcript | None |
--font-size |
Subtitle font size | 40 |
--font-color |
Text color | white |
--highlight-color |
Highlight color for active word | yellow |
--text-y |
Vertical position (0.0=top, 1.0=bottom) | 0.8 |
Run unit tests:
uv run pytestLyrichroma supports a pluggable ASR architecture. By default, it uses faster-whisper.
You can implement your own ASR provider by subclassing lyrichroma.asr.base.ASRProvider and passing it to the Transcriber.
- Install Dependencies:
uv sync - Run Tests:
uv run pytest - Format & Lint:
uv run ruff check . uv run black .