Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
-
MaskGIT: Masked Generative Image Transformer [CVPR 2022]
-
Muse: Text-To-Image Generation via Masked Generative Transformers [ICML 2023]
-
[๐]Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [ICLR 2025]
-
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
-
Di[๐ผ]O: Distilling Masked Diffusion Models into One-step Generator [ICCV 2025]
-
[๐]Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
-
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer [ICCV 2025]
-
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
-
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
-
[๐]Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
-
Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models
-
TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
-
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
-
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces [ICML 2025]
-
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy [NeurIPS 2025]
-
[๐]From Masks to Worlds: A Hitchhiker's Guide to World Models
-
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
-
More papers are coming soon! See MeissonFlow Research (Organization Card) for more about our vision.
Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.
Key Features:
- ๐ผ๏ธ High-resolution image generation (up to 1024x1024)
- ๐ป Designed to run on consumer GPUs
- ๐จ Versatile applications: text-to-image, image-to-image
git clone https://github.com/viiika/Meissonic/
cd Meissonicconda create --name meissonic python
conda activate meissonic
pip install -r requirements.txtgit clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .python app.pypython inference.py --prompt "Your creative prompt here"python inpaint.py --mode inpaint --input_image path/to/image.jpg
python inpaint.py --mode outpaint --input_image path/to/image.jpgOptimize performance with FP8 quantization:
Requirements:
- CUDA 12.4
- PyTorch 2.4.1
- TorchAO
Note: Windows users install TorchAO using
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cpuCommand-line inference
python inference_fp8.py --quantization fp8Gradio for FP8 (Select Quantization Method in Advanced settings)
python app_fp8.py| Precision (Steps=64, Resolution=1024x1024) | Batch Size=1 (Avg. Time) | Memory Usage |
|---|---|---|
| FP32 | 13.32s | 12GB |
| FP16 | 12.35s | 9.5GB |
| FP8 | 12.93s | 8.7GB |
To train Meissonic, follow these steps:
-
Install dependencies:
cd train pip install -r requirements.txt -
Download the Meissonic base model from Hugging Face.
-
Prepare your dataset:
- Use the sample dataset: MeissonFlow/splash
- Or prepare your own dataset and dataset class following the format in line 100 in dataset_utils.py and line 656-680 in train_meissonic.py
- Modify train.sh with your dataset path
-
Start training:
bash train/train.sh
Note: For custom datasets, you'll likely need to implement your own dataset class.
If you find this work helpful, please consider citing:
@article{bai2024meissonic,
title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
journal={arXiv preprint arXiv:2410.08261},
year={2024}
}We thank the community and contributors for their invaluable support in developing Meissonic. We thank [email protected] for making Meissonic Demo. We thank @NewGenAI and @้ฃ้ทนใใใ@่ช็งฐๆ็ณปใใญใฐใฉใใฎๅๅผท for making YouTube tutorials. We thank @pprp for making fp8 and int4 quantization. We thank @camenduru for making jupyter tutorial. We thank @chenxwh for making Replicate demo and api. We thank Collov Labs for reproducing Monetico. We thank Shitong et al. for identifying effective design choices for enhancing visual quality.
Made with โค๏ธ by the MeissonFlow Research