A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
-
Updated
May 23, 2026 - Python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
High-performance Google Colab Notebook for fast & accurate audio transcription/translation using OpenAI Whisper. Accelerated on TPUs with PyTorch/XLA. Features an interactive UI for model selection, multi-language support, and long-form audio processing.
Point of Interest Error Rate (PIER) Metric for Code-Switching ASR: A specialized evaluation metric designed to focus on critical points in multilingual speech recognition, providing a more accurate analysis of code-switched utterances.
Real-time transformer-based ASR supporting 100+ languages - Google Cloud integration with noise cancellation & low-latency optimization
AISRT - 本地 AI 字幕生成工具 / local AI subtitle generator for video/audio to SRT, multilingual ASR, timestamp alignment, GUI/CLI batch processing, and local SRT translation.
Add a description, image, and links to the multilingual-asr topic page so that developers can more easily learn about it.
To associate your repository with the multilingual-asr topic, visit your repo's landing page and select "manage topics."