Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View liuhuadai's full-sized avatar

Block or report liuhuadai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.

Python 688 67 Updated Dec 25, 2025

Perceptual Quality Estimator for speech and audio

C++ 855 139 Updated May 17, 2025

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,098 255 Updated Jan 5, 2026

open soundstream-ish VAE codecs for downstream neural audio synthesis

Python 120 10 Updated Jun 12, 2023
Python 835 74 Updated Dec 9, 2025
84 3 Updated Dec 12, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,826 1,525 Updated Jan 4, 2026

Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"

Python 285 6 Updated Nov 19, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,276 204 Updated Jan 8, 2026

Educational implementation of the Discrete Flow Matching paper

Jupyter Notebook 128 7 Updated Aug 26, 2024

Krea Realtime 14B. An open-source realtime AI video model.

Python 455 26 Updated Nov 13, 2025

Official implementation of "Continuous Autoregressive Language Models"

Python 693 83 Updated Dec 1, 2025

[NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Python 35 3 Updated Oct 29, 2025

Official implementation of "UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing"

Python 123 4 Updated Nov 21, 2025

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…

1,097 28 Updated Jan 16, 2026

PyTorch code and models for VJEPA2 self-supervised learning from video.

Python 2,800 295 Updated Aug 28, 2025

Contexts Optical Compression

Python 22,060 2,009 Updated Oct 25, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,699 58 Updated Dec 26, 2025

Official implementation of "DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training".

Python 199 14 Updated Nov 5, 2025

Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Python 25 3 Updated Oct 1, 2024

The official repo for SpaceVista: All-Scale Visual Spatial Reasoning from mm to km.

Python 37 1 Updated Oct 13, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 659 49 Updated Jun 5, 2025

Scalable and memory-optimized training of diffusion models

Python 1,322 142 Updated Jun 4, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 26,176 1,844 Updated Jan 9, 2026

Open-Source Frontier Voice AI

Python 20,343 2,252 Updated Dec 17, 2025

VGGSounder, a multi-label audio-visual classification dataset with modality annotations.

Jupyter Notebook 12 Updated Oct 8, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 9,326 697 Updated Nov 20, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,543 130 Updated Jan 17, 2026

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,483 480 Updated Aug 7, 2024
Next