Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View yuekaizhang's full-sized avatar

Highlights

  • Pro

Block or report yuekaizhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,205 87 Updated Sep 22, 2025

Nano vLLM

Python 8,627 1,046 Updated Nov 3, 2025
Python 378 30 Updated Oct 16, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,377 1,522 Updated Apr 24, 2025

Pocket Flow: Codebase to Tutorial

Python 11,715 1,336 Updated Oct 24, 2025
Jupyter Notebook 62 17 Updated Nov 6, 2025

A lightweight real-time captioning application for macOS, powered by whisper.cpp and DeepSeek-V3.

C++ 23 2 Updated Oct 11, 2025

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Jupyter Notebook 532 43 Updated Oct 30, 2025

Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU

Python 54 1 Updated Jun 11, 2025

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

Python 5,606 648 Updated Oct 31, 2025

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching

Python 3,921 651 Updated Nov 5, 2025

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.

Python 829 122 Updated Feb 2, 2025
Python 2,577 326 Updated Nov 10, 2025

The python library for real-time communication

JavaScript 4,390 410 Updated Sep 19, 2025

Easyvideotrans后端。 https://easyvideotrans.com/

Python 478 40 Updated Nov 5, 2025

xiaoyuzhou fm audio downloder.

Python 44 10 Updated Mar 12, 2025

Official implementation for EMNLP 2023 paper "Non-autoregressive Streaming Transformer for Simultaneous Translation"

Python 10 1 Updated Oct 19, 2023

AcademiCodec: An Open Source Audio Codec Model for Academic Research

Python 655 80 Updated Dec 27, 2023

Material for gpu-mode lectures

Jupyter Notebook 5,278 531 Updated Sep 23, 2025

✨✨Latest Advances on Multimodal Large Language Models

16,661 1,074 Updated Nov 9, 2025

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,804 246 Updated Aug 1, 2024

A generative speech model for daily dialogue.

Python 38,137 4,135 Updated Jul 6, 2025

Label Studio is a multi-type data labeling and annotation tool with standardized output format

JavaScript 25,380 3,177 Updated Nov 10, 2025

Implementation of CTC alignment-based single step non-autoregressive transformer

Python 14 1 Updated Jun 2, 2023

A unified interface for multiple Text-to-Speech (TTS) providers.

Python 277 17 Updated Jan 8, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,492 766 Updated May 27, 2025
Jupyter Notebook 87 7 Updated Jul 31, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,674 298 Updated Nov 9, 2025

The Nendo AI Audio Tool Suite

Python 215 12 Updated Apr 25, 2024
Next