English | 简体中文
Whisper48 is an easy-to-use tool for automatically generating accurate subtitles from videos and audio files. It uses advanced AI models to:
- ✅ Convert speech to text automatically
- ✅ Generate accurate word-level timestamps (when each word is spoken)
- ✅ Support multiple languages (Japanese, Chinese, English, French, German, Spanish, Italian, Portuguese, Russian, and more)
- ✅ Work completely in the cloud (no installation required!)
Perfect for: Video creators, translators, content producers, podcast editors, and anyone who needs to transcribe audio without tedious manual work.
The easiest way to use Whisper48 is through Google Colab - a free cloud service that runs the software for you. No installation needed!
1. Prepare Your Files
- Have your audio or video file ready (MP3, WAV, MP4, MKV, etc.)
- Upload it to your Google Drive (if you don't have one, create a free Google account)
- Place your file in any folder in Google Drive - the notebook will help you find it
2. Open the Notebook
- Click this link: WhisperX48 on Google Colab
- Google will open the notebook in your browser
3. Configure GPU (Important!)
- At the top of the page, click
Runtime→Change runtime type - Select GPU as the hardware accelerator
- Click
Save - This gives you free computing power to process your files faster
4. Run Each Step in Order
- Each cell (code block) has clear instructions in both Chinese and English
- Click the ▶ button (or press Ctrl+Enter) to run each cell one by one
- Wait for it to finish before moving to the next one
- The notebook will guide you through:
- Connecting to Google Drive
- Installing required software
- Selecting your media file
- Configuring settings (language, model size, etc.)
- Processing and downloading your subtitles
5. Download Your Subtitles
- After processing completes, the subtitle file (
.srtformat) will automatically download - Use this file in any video editor (Adobe Premiere, DaVinci Resolve, CapCut, etc.)
- Audio Quality: Clearer audio = better results. If possible, use the original audio track rather than compressed versions
- Model Size: Choose
large-v2orlarge-v3for best accuracy. Larger models take longer but are more accurate - Language Selection: Select the correct language of your content for better accuracy
- Chunk Size: Leave at 5 seconds (default) unless you know what you're doing
| Issue | Solution |
|---|---|
| "GPU not found" error | Go to Runtime → Change runtime type and make sure GPU is selected |
| Running very slowly | GPU memory issue - reduce batch_size in advanced settings, or use a smaller model |
| Subtitles have poor timing | Try using large-v3 model instead of large-v2, or check if audio quality is good |
| File not appearing in browser | Run the file browser cell again (step 1.3) to refresh the file list |
Requirements:
- Advanced technical knowledge
- Python programming experience
- Powerful GPU (NVIDIA recommended)
- 30+ GB of free disk space
Instructions: Try running this script on your own computer. You will need to install PyTorch, WhisperX, FFmpeg, and other dependencies first.
WhisperX Technology: Unlike standard Whisper, WhisperX provides word-level timestamps:
- Standard Whisper: "Hello world" (0:00-0:05)
- WhisperX: "Hello" (0:00-0:02) "world" (0:02-0:05)
This precision allows for better subtitle synchronization and more professional results.
- 🇯🇵 Japanese (日本語)
- 🇨🇳 Chinese (中文)
- 🇬🇧 English
- 🇫🇷 French
- 🇩🇪 German
- 🇪🇸 Spanish
- 🇮🇹 Italian
- 🇵🇹 Portuguese
- 🇷🇺 Russian
- And many more...
- Audio: MP3, WAV, M4A, AAC, FLAC
- Video: MP4, MKV, TS, FLV
For a comparison between WhisperX and other models please have a look here: https://www.bilibili.com/video/BV1RFa5zhEeG/
Whisper48 uses a 3-step process:
- Speech Recognition: OpenAI's Whisper model converts audio to text
- Timestamp Alignment: WhisperX uses wav2vec2.0 to align text with precise word timings
- Subtitle Generation: Automatic subtitle formatting and download
The entire process runs on Google Colab's free GPU servers - no installation needed on your computer!
- Website: Detailed guides and FAQ: ifeimi.github.io/whisper48
- Email: Contact the developer at: yfwu0202 AT gmail DOT com
- GitHub Issues: Report bugs or request features on the project's issue page
This project is based on excellent open-source software:
- WhisperX: Improves OpenAI's Whisper with accurate word-level timestamps
- OpenAI Whisper: State-of-the-art speech recognition model
- faster-whisper: Optimized Whisper implementation
- wav2vec2.0: Speech model for timestamp alignment
Whisper48 started as a fork from N46Whisper and has been significantly modified to:
- Use more accurate Whisper-based models (WhisperX)
- Improve Japanese language support
- Provide better timestamp accuracy
- Add more language options
- Simplify the user interface
We encourage contributions to both this project and the upstream projects!
This project is released under the MIT License. See LICENSE.md for details.
- Original concept: N46Whisper by Ayanaminn
- Improvements: WhisperX integration and optimization
- Code improvements: Bilingual documentation and interface enhancements
You are free to:
- ✅ Use for personal and commercial projects
- ✅ Modify and redistribute
- ✅ Study and learn from the code
You must:
- ✅ Include the license text
- ✅ Credit the original authors
Have questions or suggestions? Get in touch!
- Email: ifeimi48 AT gmail DOT com
- GitHub Issues: Report bugs or request features
- Website: ifeimi.github.io/whisper48
I would love to hear your feedback and help with any questions!
Last updated: January 2026