A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
-
Updated
Jan 7, 2026 - Python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
A curated list of awesome papers on contextualizing E2E ASR outputs
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
An implementation of RNN-Transducer loss in TF-2.0.
I'm building an end-to-end Vietnamese Speech Recognition System. I'll deploy it into production with the help of Flask, Uwsgi, Nginx, and AWS ...
FunASRๅฎๆถ่ฏญ้ณ่ฏๅซ็๏ผ่ฏๅซ้บฆๅ ้ฃๅ็ต่ๅ ๆญๆพ็ๅฃฐ้ณ๏ผ็ต่่ฏญ้ณๆๅญ่ฝฏไปถ
Pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction" https://arxiv.org/abs/1609.08194
๐ Enhance speech recognition with GLM-ASR-Nano-2512, a high-performance model excelling in dialect support and low-volume audio accuracy.
Deep learning-based subtitle generation model that processes audio datasets to generate accurate text transcriptions. Includes audio feature extraction, encoder-decoder architecture, training pipelines, and evaluation metrics for subtitle alignment.
๐ Create and manage SPL tokens on the Solana blockchain with ease, using our Next.js-based launchpad for streamlined token and liquidity management.
๐ค Deploy a simplified voice synthesis service with Fun-CosyVoice3-0.5B-2512, featuring real-time audio output and advanced performance optimizations.
Add a description, image, and links to the rnnt topic page so that developers can more easily learn about it.
To associate your repository with the rnnt topic, visit your repo's landing page and select "manage topics."