Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View zyy-fc's full-sized avatar

Block or report zyy-fc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SteerMoE: Efficient Audio-Language Models with Preserved Reasoning Capabilities

Python 5 Updated Oct 8, 2025

[NAACL 2025 Findings] Continuous Speech Tokenizer in Text To Speech

6 1 Updated Feb 7, 2025

[AAAI 2026] DIFFA: Large Language Diffusion Models Can Listen and Understand

Python 30 1 Updated Nov 10, 2025

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 435 21 Updated Nov 13, 2025

The official GitHub page for the survey paper "Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey". And this paper is under review.

70 1 Updated Aug 9, 2025

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,936 7 Updated Sep 30, 2025

Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching

Python 700 93 Updated Nov 12, 2025

Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Python 818 57 Updated Nov 3, 2025

Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.

Python 928 55 Updated Oct 16, 2025

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,347 312 Updated Jun 21, 2025

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 18,806 1,635 Updated Jul 6, 2025

Update ASR paper everyday

Python 366 18 Updated Nov 13, 2025

Official repo for CFG-Zero*

Python 676 23 Updated May 2, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 15,208 1,739 Updated Nov 7, 2025
Python 6,023 463 Updated Aug 29, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,784 297 Updated Jun 12, 2025

Towards Human-Sounding Speech

Python 5,718 485 Updated May 6, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 437 60 Updated Jul 10, 2025

Open-source framework for conversational voice AI agents

C 8,549 998 Updated Nov 13, 2025

Dippy Synthetic Speech Subnet

Python 17 3 Updated Sep 11, 2025

Generative models for conditional audio generation

Python 3,498 392 Updated Oct 9, 2025

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!

Python 1,211 177 Updated Feb 5, 2024

A generative speech model for daily dialogue.

Python 38,147 4,137 Updated Jul 6, 2025

SALMONN family: A suite of advanced multi-modal LLMs

1,352 111 Updated Sep 28, 2025

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 917 97 Updated Oct 24, 2025

✨✨Latest Advances on Multimodal Large Language Models

16,682 1,075 Updated Nov 12, 2025

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 1,078 164 Updated Oct 13, 2025

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,438 798 Updated Mar 15, 2025
Next