zyy-fc

zyy-fc zyy-fc

yeah, I have got it!

7 followers · 20 following

Achievements

Stars

forfrt / SteerMoE

SteerMoE: Efficient Audio-Language Models with Preserved Reasoning Capabilities

Python 5 Updated Oct 8, 2025

Yixing-Li / Continuous-Speech-Tokenizer

[NAACL 2025 Findings] Continuous Speech Tokenizer in Text To Speech

6 1 Updated Feb 7, 2025

NKU-HLT / DIFFA

[AAAI 2026] DIFFA: Large Language Diffusion Models Can Listen and Understand

Python 30 1 Updated Nov 10, 2025

stepfun-ai / Step-Audio-EditX

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 435 21 Updated Nov 13, 2025

jindongli-Ai / LLM-Discrete-Tokenization-Survey

The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is under review.

70 1 Updated Aug 9, 2025

XiaomiMiMo / MiMo-Audio-Training

Python 83 9 Updated Oct 16, 2025

VectorSpaceLab / OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,936 7 Updated Sep 30, 2025

k2-fsa / ZipVoice

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 700 93 Updated Nov 12, 2025

Alpha-VLLM / Lumina-Image-2.0

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Python 818 57 Updated Nov 3, 2025

haidog-yaqub / MeanFlow

Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.

Python 928 55 Updated Oct 16, 2025

MoonshotAI / Kimi-Audio

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,347 312 Updated Jun 21, 2025

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,806 1,635 Updated Jul 6, 2025

halsay / ASR-TTS-paper-daily

Update ASR paper everyday

Python 366 18 Updated Nov 13, 2025

WeichenFan / CFG-Zero-star

Official repo for CFG-Zero*

Python 676 23 Updated May 2, 2025

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 15,208 1,739 Updated Nov 7, 2025

bytedance / MegaTTS3

Python 6,023 463 Updated Aug 29, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,784 297 Updated Jun 12, 2025