-
Northwestern Polytechnical University (2016-2023) && WeNet Community (2022 - now)
- Suzhou, Jiangsu
-
08:41
(UTC +08:00) - https://scholar.google.com/citations?user=WUiXR1cAAAAJ&hl=en
Lists (10)
Sort Name ascending (A-Z)
Stars
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
A PyTorch-based knowledge distillation toolkit for natural language processing
A framework for efficient model inference with omni-modality models
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
An N-gram punctuator for Chinese and English.
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
MiMo-Audio: Audio Language Models are Few-Shot Learners
A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
A curated list of resources in audio question answering and related area. :-)
This package aims at simplifying the download of the strong version of AudioSet dataset.
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Efficient audio understanding with general audio captions
Text-audio foundation model from Boson AI