Starred repositories
Unified Controllable Visual Generation Model
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.
[CVPR 2024 Highlight] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based generation.
Pytorch DTensor native training library for LLMs/VLMs with OOTB Hugging Face support
A comprehensive JAX/NNX library for diffusion and flow matching generative algorithms, featuring DiT (Diffusion Transformer) and its variants as the primary backbone with support for ImageNet train…
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Open-source framework for the research and development of foundation models.
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Lynx: Towards High-Fidelity Personalized Video Generation
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Fully Open Framework for Democratized Multimodal Training
This is a 3DGS(3D Gaussian Splatting) viewer built on Three.js, with features for marking, measurements, text watermarks, etc.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Official repository for the UAE paper, unified-GRPO, and unified-Bench
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models