Starred repositories
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
"Paper2Slides: From Paper to Presentation in One Click"
MOSS-Speech is a true speech-to-speech large language model without text guidance.
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
Automatic Korean word spacing with Python
可循环值守和多人录制的直播录制软件,支持抖音、TikTok、Youtube、快手、虎牙、斗鱼、B站、小红书、pandatv、sooplive、flextv、popkontv、twitcasting、winktv、百度、微博、酷狗、17Live、Twitch、Acfun、CHZZK、shopee等40+平台直播录制
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
My solutions to DLFC - Deep Learning: Foundations and Concepts
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Efficient audio understanding with general audio captions
국립국어원 사전 / FOSS Korean dictionary by National Institute of Korean Language
A fast and lightweight python-based CTC beam search decoder for speech recognition.
T-one is a high-performance streaming ASR pipeline for Russian, specialized for the telephony domain.
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
✨✨Latest Advances on Multimodal Large Language Models
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Text-audio foundation model from Boson AI
chinese speech pretrained models