yuekaizhang

Yuekai Zhang yuekaizhang

144 followers · 26 following

@NVIDIA
Shanghai, CN
https://scholar.google.com/citations?user=YGmuq3UAAAAJ&hl=en

Achievements

x3 x2

Achievements

x3 x2

Highlights

Lists (16)

Sort

Starred repositories

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,205 87 Updated Sep 22, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 8,627 1,046 Updated Nov 3, 2025

qiancheng0 / ToolRL

Python 378 30 Updated Oct 16, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 12,377 1,522 Updated Apr 24, 2025

The-Pocket / PocketFlow-Tutorial-Codebase-Knowledge

Pocket Flow: Codebase to Tutorial

Python 11,715 1,336 Updated Oct 24, 2025

nvidia-china-sae / mair-hub

Jupyter Notebook 62 17 Updated Nov 6, 2025

ZuoFuhong / subtitle

A lightweight real-time captioning application for macOS, powered by whisper.cpp and DeepSeek-V3.

C++ 23 2 Updated Oct 11, 2025

inclusionAI / Ming

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 532 43 Updated Oct 30, 2025

shuaijiang / Ke-Omni-R

Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU

Python 54 1 Updated Jun 11, 2025

souzatharsis / podcastfy

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

Python 5,606 648 Updated Oct 31, 2025

remsky / Kokoro-FastAPI

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching

Python 3,921 651 Updated Nov 5, 2025

matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.

Python 829 122 Updated Feb 2, 2025

speaches-ai / speaches

Python 2,577 326 Updated Nov 10, 2025

gradio-app / fastrtc

The python library for real-time communication

JavaScript 4,390 410 Updated Sep 19, 2025

sutro-planet / easyvideotrans

Easyvideotrans后端。 https://easyvideotrans.com/

Python 478 40 Updated Nov 5, 2025

limboinf / xiaoyuzhoufm

xiaoyuzhou fm audio downloder.

Python 44 10 Updated Mar 12, 2025

ictnlp / NAST

Official implementation for EMNLP 2023 paper "Non-autoregressive Streaming Transformer for Simultaneous Translation"

Python 10 1 Updated Oct 19, 2023

yangdongchao / AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 655 80 Updated Dec 27, 2023

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,278 531 Updated Sep 23, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

16,661 1,074 Updated Nov 9, 2025

Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,804 246 Updated Aug 1, 2024

SpeechTranslation / GigaS2S

10 2 Updated Jan 22, 2023

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 38,137 4,135 Updated Jul 6, 2025

HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 25,380 3,177 Updated Nov 10, 2025

Diamondfan / cassnat_asr

Implementation of CTC alignment-based single step non-autoregressive transformer

Python 14 1 Updated Jun 2, 2023

frostming / tetos

A unified interface for multiple Text-to-Speech (TTS) providers.

Python 277 17 Updated Jan 8, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,492 766 Updated May 27, 2025

BriansIDP / WhisperBiasing

Jupyter Notebook 87 7 Updated Jul 31, 2025

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 3,674 298 Updated Nov 9, 2025

okio-ai / nendo

The Nendo AI Audio Tool Suite

Python 215 12 Updated Apr 25, 2024

Yuekai Zhang yuekaizhang

Highlights

Lists (16)

Align

api

ASR

CUDA/Onnx/Trt/Triton

llm

mark

NLU

Others

RL

Separation

Server

Speaker

Translation

TTS

Useful

VAD

Starred repositories

singing-voice