[ICML 2026] ByteDance's All-in-One Video Generation Model for Human-Object Interaction Video Generation
-
Updated
May 19, 2026 - Python
[ICML 2026] ByteDance's All-in-One Video Generation Model for Human-Object Interaction Video Generation
Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"
Add a description, image, and links to the mmdit topic page so that developers can more easily learn about it.
To associate your repository with the mmdit topic, visit your repo's landing page and select "manage topics."