๐How to Run Qwen-Image-2512 Locally in ComfyUI
Step-by-step tutorial for running Qwen-Image-2512 on your local device with ComfyUI.
Qwen-Image-2512 is the December update to Qwen's text-to-image foundational models. The model is the top performing open-source diffusion model and this guide will teach you how to run it locally via Unsloth GGUF and ComfyUI.
Qwen-Image-2512 features: more realistic looking people; richer details in landscapes/textures; and more accurate text rendering. Uploads: GGUF โข FP8
The quants use Unsloth Dynamic methodology which upcasts important layers to higher precision to recover more accuracy. Thank you Qwen for allowing Unsloth day 0 support.
๐ ComfyUI Tutorial
To run, you don't need a GPU, just a CPU with RAM will work. For best results, ensure your total usable memory (RAM + VRAM / unified) is larger than the GGUF size; e.g. 4-bit (Q4_K_M) unsloth/Qwen-Image-Edit-2512-GGUF is 13.1 GB, so you should have 13.2+ GB of combined memory.
ComfyUI is an open-source diffusion model GUI, API, and backend that uses a node-based (graph/flowchart) interface. This guide will focus on machines with CUDA, but instructions to build with on Apple or CPU are similar.
#1. Install & Setup
To install ComfyUI, you can download the desktop app on Windows or Mac devices here. Otherwise, to setup ComfyUI for running GGUF models run the following:
mkdir comfy_ggufs
cd comfy_ggufs
python -m venv .venv
source .venv/bin/activate
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
cd custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
cd ComfyUI-GGUF
pip install -r requirements.txt
cd ../..#2. Download Models
Diffusion models typically need 3 models. A Variational AutoEncoder (VAE) that encodes image pixel space to latent space, a text encoder to translate text to input embeddings, and the actual diffusion transformer. You can find all Unsloth diffusion GGUFs in our Collection here.
Both the diffusion model and text encoder can be GGUF format while we typically use safetensors for the vae. According to Qwen's repo, we shall use Qwen2.5-VL and not Qwen3-VL. Let's download the models we will use:
See GGUF uploads for: Qwen-Image-2512, Qwen-Image-Edit-2511, and Qwen-Image-Layered
The format of the vae and diffusion model might be different than the diffusers checkpoints if using checkpoints other than the ones above. Only use checkpoints that are compatible with ComfyUI.
These files must be in the correct folders for ComfyUI to see them. In addition the vision tower stored in the mmproj file must use the same prefix as the text encoder.
Download reference images to be used later as well:
#3. Workflow and Hyperparameters
For more info you can also view our detailed ๐ฏ Workflow and Hyperparameters Guide.
Navigate to the main ComfyUI directory and run:
This will launch a web server that allows you to access https://127.0.0.1:8188 . If you are running this on the cloud, you'll need to make sure port forwarding is setup to access on your local machine.
Workflows are saved as JSON files embedded in output images (PNG metadata) or as separate .json files. You can:
Drag & drop an image into ComfyUI to load its workflow
Export/import workflows via the menu
Share workflows as JSON files
Below are two examples of Qwen-Image-2512 and Qwen-Image-Edit-2511 json files which you can download and use:
For this json file, we are creating a non-realistic cartoonish character. For more realistic results, skip keywords like โphotorealisticโ or โdigital renderingโ or โ3d renderโ and use terms like โphotographโ instead.
Instead of setting up the workflow from scratch you can download the workflow here.
Load it into the browser page by clicking the Comfy Logo -> File -> Open -> Then choose the unsloth_qwen_image_2512.json file you just downloaded. It should look like the below:


This workflow is based on the official ComfyUI published workflow except it uses the GGUF loader extension, and is simplified to illustrate text to image functionality.
#4. Inference
ComfyUI is highly customizable. You can mix models and create extremely complex pipelines. For a basic text to image setup we need to load the model, specify prompt and image details, and decide on a sampling strategy.
Upload Models + Set Prompt
We already downloaded the models, so we just need to pick the correct ones. For Unet Loader pick qwen-image-2512-Q4_K_M.gguf, for CLIPLoader pick Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf, and for Load VAE pick qwen_image_vae.safetensors.
You can set any prompt you'd like, and also specify a negative prompt. The negative prompt helps by telling the model where to steer away from.
Image Size + Sampler Parameters
The Qwen Image model series supports different image sizes. You can make rectangular shapes by setting the values of width and height. For sampler parameters, you can experiment with different samplers other than euler, and more or less sampling steps. The workflow has steps set to 40, but for quick tests 20 might be good enough. Change the control after generate setting from randomize to fixed if you want to see how different settings change outputs.
Run
Click Run and an image will be generated in about 1 minute (30 seconds for 20 steps). That output image can be saved. The interesting part is that the metadata for the entire comfy workflow is saved in the image. You can share and anyone can see how it was created by loading it in the UI.

Multi Reference Generation
A key feature of Qwen-Image-Edit-2511 is multi reference generation where you can supply multiple images to use to help control generation. This time load the unsloth_qwen_image_edit_2511.json. We will use most of the same models but switching qwen-image-2512-Q4_K_M.gguf to qwen-image-edit-2511-Q4_K_M.gguf for the unet. The other difference this time are extra nodes to select images to reference, which we've downloaded earlier. You'll notice the prompt refers to both image 1 and image 2 which are prompt anchors for the images. Once loaded click Run, and you'll see an output that creates our two unique sloth characters together while preserving their likeness.



stable-diffusion.cpp
If you want to run the model in stable-diffusion.cpp, you can follow our guide here.
Last updated
Was this helpful?

