-
MaskGIT: Masked Generative Image Transformer [CVPR 2022]
-
Muse: Text-To-Image Generation via Masked Generative Transformers [ICML 2023]
-
[🌟]Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [ICLR 2025]
-
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
-
Di[𝙼]O: Distilling Masked Diffusion Models into One-step Generator [ICCV 2025]
-
[🌟]Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model [ICLR 2026]
-
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer [ICCV 2025]
-
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
-
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
-
[🌟]Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
-
Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models
-
TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
-
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
-
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces [ICML 2025]
-
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy [NeurIPS 2025]
-
[🌟]From Masks to Worlds: A Hitchhiker's Guide to World Models
-
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
-
Accelerating Inference of Masked Image Generators via Reinforcement Learning
-
Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models
-
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
-
MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation
-
Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model
-
More papers are coming soon! See MeissonFlow Research (Organization Card) for more about our vision.
Welcome to the official repository of Muddit — a next-generation foundation model in the Meissonic family, built upon discrete diffusion for unified and efficient multimodal generation.
Unlike traditional autoregressive methods, Muddit leverages discrete diffusion (a.k.a. MaskGIT-style masking) as its core mechanism — enabling fast, parallel decoding across modalities.
While most unified models are still rooted in language priors, Muddit is developed from a visual-first perspective for scalable and flexible generation.
Muddit (512) and Muddit Plus (1024) aim to handle diverse tasks across modalities -- such as text generation, image generation, and vision-language reasoning -- within a single architecture and decoding paradigm.
Please refer to https://huggingface.co/spaces/MeissonFlow/muddit/blob/main/app.py
To train Muddit, follow these steps:
-
Install dependencies:
pip install -r requirements.txt
-
Prepare your own dataset and dataset class following the format in dataset_utils.py and train_meissonic.py
- Modify train.sh with your dataset path
-
Start training:
bash train/train_unified.sh
Note: For custom datasets, you'll likely need to implement your own dataset class.
If you find this work helpful, please consider citing:
@article{shi2025muddit,
title={Muddit: Liberating generation beyond text-to-image with a unified discrete diffusion model},
author={Shi, Qingyu and Bai, Jinbin and Zhao, Zhuoran and Chai, Wenhao and Yu, Kaidong and Wu, Jianzong and Song, Shuangyong and Tong, Yunhai and Li, Xiangtai and Li, Xuelong and others},
journal={arXiv preprint arXiv:2505.23606},
year={2025}
}Made with ❤️ by the MeissonFlow Research
See MeissonFlow Research (Organization Card) for more about our vision.