yomisub is a desktop application for automatically generating subtitles for video files using AI-powered transcription with OpenAI Whisper. It is designed to assist with language learning by enabling users to watch foreign television shows and movies with native-language subtitles, even when official subtitles are unavailable.
The application includes:
- A GTK-based graphical user interface
- File picker for selecting video files
- Audio track selection for videos with multiple language tracks
- Whisper model selection (e.g., tiny, base, medium, large-v2)
- Subtitle generation in
.srtformat, saved alongside the video file
This tool was created to support language immersion and comprehension, particularly for learners of Japanese.
- Python 3.11
Tested on Debian with the following installation process:
sudo apt update
# install system dependencies
sudo apt install python3.11-venv ffmpeg python3-gi python3-gi-cairo gir1.2-gtk-3.0 libgirepository1.0-dev gir1.2-gtk-3.0
# add system site package to venv
python3 -m venv .venv --system-site-packages
# reload venv
source .venv/bin/activate
# test that gi can now be imported while in the venv
python -c "import gi"
# install pip dependencies
source .venv/bin/activate
pip install -r requirements.txtpython3 yomisub.pyChoose a model which will fit within the VRAM space of your GPU. Smaller models are faster but less accurate.
Whisper will download the chosen model to ~/.cache/whisper
This project is open source under the MIT License. See LICENSE.txt for full details.
© 2025 Sean Esopenko