SpotEdit: Selective Region Editing in Diffusion Transformers
Zhibin Qin1, Zhenxiong Tan1, Zeqing Wang1, Songhua Liu2, Xinchao Wang1
1 National University of Singapore
2 Shanghai Jiao Tong University
SpotEdit is a training-free, region-aware framework for instruction-based image editing with Diffusion Transformers (DiTs).
While most image editing tasks only modify small local regions, existing diffusion-based editors regenerate the entire image at every denoising step, leading to redundant computation and potential degradation in preserved areas. SpotEdit follows a simple principle: edit only what needs to be edited.
SpotEdit dynamically identifies non-edited regions during the diffusion process and skips unnecessary computation for these regions, while maintaining contextual coherence for edited regions through adaptive feature fusion.
conda create -n spotedit python=3.10
conda activate spotedit
pip install -r requirements.txt- For Flux-Kontext basemodel:
example\flux.ipynb - For Qwen-Image-Edit basemodel:
example\qwen.ipynb
- Experiments and test examples are typically conducted at a resolution of 1024×1024. We recommend setting both input and output image sizes to 1024×1024 when running SpotEdit.
- SpotEdit is not intended for global edits that affect most or all regions of the image, such as full-scene style transfer or global color changes. In these cases, SpotEdit cannot reliably identify non-edited regions, and thus falls back to computation that is effectively equivalent to the original full-image diffusion process.
@artical{qin2025spotedit,
title= {SpotEdit: Selective Region Editing in Diffusion Transformers},
author= {Qin, Zhibin and Tan, Zhenxiong and Wang, Zeqing and Liu, Songhua and Wang, Xinchao},
journal={arXiv preprint arXiv:2512.22323},
year={2025}
}

