Highlights
- Pro
Lists (16)
Sort Name ascending (A-Z)
Starred repositories
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Minimal reproduction of DeepSeek R1-Zero
Pocket Flow: Codebase to Tutorial
A lightweight real-time captioning application for macOS, powered by whisper.cpp and DeepSeek-V3.
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
The python library for real-time communication
Easyvideotrans后端。 https://easyvideotrans.com/
Official implementation for EMNLP 2023 paper "Non-autoregressive Streaming Transformer for Simultaneous Translation"
AcademiCodec: An Open Source Audio Codec Model for Academic Research
✨✨Latest Advances on Multimodal Large Language Models
MARS5 speech model (TTS) from CAMB.AI
A generative speech model for daily dialogue.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Implementation of CTC alignment-based single step non-autoregressive transformer
A unified interface for multiple Text-to-Speech (TTS) providers.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Vector (and Scalar) Quantization, in Pytorch