liuhuadai

Huadai Liu liuhuadai

Focusing on multimodal synthesis (speech/audio/music), speech translation, and audio editing.

70 followers · 19 following

Zhejiang University
https://liuhuadai.github.io

Achievements

Stars

FunAudioLLM / Fun-Audio-Chat

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Python 688 67 Updated Dec 25, 2025

google / visqol

Perceptual Quality Estimator for speech and audio

C++ 855 139 Updated May 17, 2025

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,098 255 Updated Jan 5, 2026

Harmonai-org / oobleck

open soundstream-ish VAE codecs for downstream neural audio synthesis

Python 120 10 Updated Jun 12, 2023

gioannides / Density-Adaptive-JEPA

Python 22 2 Updated Dec 7, 2025

galilai-group / lejepa

Python 835 74 Updated Dec 9, 2025

wren93 / tuna

84 3 Updated Dec 12, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,826 1,525 Updated Jan 4, 2026

tyfeld / MMaDA-Parallel

Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"

Python 285 6 Updated Nov 19, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,276 204 Updated Jan 8, 2026

lebellig / discrete-fm

Educational implementation of the Discrete Flow Matching paper

Jupyter Notebook 128 7 Updated Aug 26, 2024

krea-ai / realtime-video

Krea Realtime 14B. An open-source realtime AI video model.

Python 455 26 Updated Nov 13, 2025

shaochenze / calm

Official implementation of "Continuous Autoregressive Language Models"

Python 693 83 Updated Dec 1, 2025

kuleshov-group / e2d2

[NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Python 35 3 Updated Oct 29, 2025

nnnth / UniLIP

Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"

Python 123 4 Updated Nov 21, 2025

leofan90 / Awesome-World-Models

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…

1,097 28 Updated Jan 16, 2026