WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
-
Updated
Apr 4, 2026 - Python
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages
A PyTorch-based Speech Toolkit
Multilingual Voice Understanding Model
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
đ¤ wukong-robot ćŻä¸ä¸ŞçŽĺăçľć´ťăäźé çä¸ćčŻéłĺŻščŻćşĺ¨äşş/ćşč˝éłçŽąéĄšçŽďźćŻćChatGPTĺ¤č˝ŽĺŻščŻč˝ĺďźčżĺŻč˝ćŻéŚä¸ŞćŻćčćşäş¤äşçĺźćşćşč˝éłçŽąéĄšçŽă
html5 js ĺ˝éł mp3 wav ogg webm amr g711a g711u ć źĺźďźćŻćpcĺAndroidăiOSé¨ĺWebćľč§ĺ¨ăHybrid AppďźćäžAndroid iOS Appćşç ďźă垎俥ďźćäžASRčŻéłčŻĺŤč˝Źćĺ H5çčŻéłéčŻčĺ¤Šç¤şäž DTMFçźç č§Łç
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Production First and Production Ready End-to-End Speech Recognition Toolkit
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Streamer-Sales éĺ ââ ĺ货丝ć LLM 大樥ĺđđďźä¸ä¸Şč˝ĺ¤ć šćŽçťĺŽçĺĺçšçšäťćżĺç¨ćˇč´äš°ććżč§ĺşŚĺşĺčżčĄĺĺ解说çĺ货丝ć大樥ĺăđâĺ ĺŤčŻŚçťçć°ćŽçććľç¨â đŚĺŚĺ¤čżéćäş LMDeploy ĺ éć¨çđăRAGćŁç´˘ĺ˘ĺźşçć đăTTSćĺ轏čŻéłđăć°ĺäşşçć đڏă Agent 使ç¨ç˝çťćĽčŻ˘ĺŽćśäżĄćŻđăASR čŻéłč˝Źćĺđď¸ăVue çććĺťşĺ獯đăFastAPI ćĺťşĺ獯đď¸ăDocker-compose ćĺ é¨ç˝˛đ
OpenAI Whisper ASR Webservice API
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
faster_whisper GUI with PySide6
Lingvo
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
đ¸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Add a description, image, and links to the asr topic page so that developers can more easily learn about it.
To associate your repository with the asr topic, visit your repo's landing page and select "manage topics."