📣 News

[2025.07.30] 🎆 We upload benchmark results and the FluxKontext fine-tune-only T5 checkpoint.
[2025.07.27] 🎆 We release GPT-Image-Edit, a state-of-the-art image editing model with 1.5M high-quality editing samples. All data, models, training code and evaluation code are open-sourced. Our code is based on UniWorld-V1, Thanks to the authors of UniWorld-V1. Checking our report for more details. Welcome to watch 👀 this repository for the latest updates.

🔥 Quick Start

1.Set up environment

git clone https://github.com/wyhlovecpp/GPT-Image-Edit.git
cd GPT-Image-Edit
conda create -n univa python=3.10 -y
conda activate univa
pip install -r requirements.txt
pip install flash_attn --no-build-isolation

2.Download pretrained checkpoint

huggingface-cli download --resume-download UCSC-VLAA/gpt-image-edit-training --local-dir ${MODEL_PATH}
huggingface-cli download --resume-download black-forest-labs/FLUX.1-Kontext-dev --local-dir ${FLUX_PATH}

3.Run with CLI

MODEL_PATH="path/to/model/final_checkpoint"
FLUX_PATH="path/to/flux"
CUDA_VISIBLE_DEVICES=0 python -m univa.serve.cli \
    --model_path ${MODEL_PATH} \
    --flux_path ${FLUX_PATH}

4.Run with Gradio Web Interface Highly recommend trying out our web demo by the following command.

python app.py --model_path ${MODEL_PATH} --flux_path ${FLUX_PATH}

5.Run with Gradio Serve

CUDA_VISIBLE_DEVICES=0 python -m univa.serve.gradio_web_server \
    --model_path ${MODEL_PATH} \
    --flux_path ${FLUX_PATH}

🗝️ Training

Data preparation

Download the data from UCSC-VLAA/GPT-Image-Edit-1.5M. The dataset consists of two parts: source images and annotation metadata. The json files is under UCSC-VLAA/gpt-image-edit-training/training_json

Prepare a data.txt file in the following format:

The first column is the root path to the image.
The second column is the corresponding annotation JSON file.
The third column indicates whether to enable the region-weighting strategy. We use False in our training setting.

We have prepared a data.txt file about gpt-edit for your reference.

`data.txt` for gpt-edit

data/gpt-edit/hqedit/edit,training_json/hqedit_gpt_edit.json,false
data/gpt-edit/hqedit/generate,training_json/hqedit_gpt_generate.json,false
data/gpt-edit/omniedit,training_json/omniedit_gpt.json,false
data/gpt-edit/omniedit,training_json/omniedit_gpt_rewrite.json,false
data/gpt-edit/omniedit/complex-edit,training_json/complexedit_gpt.json,false
data/gpt-edit/ultraedit,training_json/ultraedit_gpt.json,false

Data details

Image Editing

UCSC-VLAA/GPT-Image-Edit-1.5M [5T storage usage.]

Training

Prepare pretrained weights

Download black-forest-labs/FLUX.1-Kontext-dev to $FLUX_PATH. Download Qwen/Qwen2.5-VL-7B-Instruct to $QWENVL_PATH. We also support other sizes of Qwen2.5-VL.

SAVE_PATH="path/to/save/UniWorld-Qwen2.5-VL-7B-Instruct-FLUX.1-dev-fp32"
python scripts/make_univa_qwen2p5vl_weight.py \
    --origin_flux_ckpt_path $FLUX_PATH \
    --origin_qwenvl_ckpt_path $QWENVL_PATH \
    --save_path ${SAVE_PATH}

Stage 1

You need to specify pretrained_lvlm_name_or_path to ${SAVE_PATH} in flux_qwen2p5vl_7b_vlm_stage1_512.yaml.

We recommend using pretrained weight from LanguageBind/UniWorld-V1/stage1.

# stage 1
# if use prodigy, pip install prodigy
bash scripts/denoiser/flux_qwen2p5vl_7b_vlm_stage1_512.sh

Stage 2

You need to specify pretrained_mlp2_path, which is trained by stage 1 or use the pretrained weight from LanguageBind/UniWorld-V1/stage1.

For training with 512×512 scale images (batch size 1), it consume about 78G in 1 node (8 GPUs).

Setting ema_pretrained_lvlm_name_or_path: null can saving memory if you want to train the higher resolution (e.g, 1024×1024 scale) or larger batch size. Using more nodes also can save memory because we use zero2 for main model in stage 2.

# stage 2
bash scripts/denoiser/flux_qwen2p5vl_7b_vlm_stage2_1024.sh

⚡️ Evaluation

Image Editing

ImgEdit

cd univa/eval/imgedit
# follow the instruction in univa/eval/imgedit/README.md

GEdit

cd univa/eval/gdit
# follow the instruction in univa/eval/gdit/README.md

complex-edit

cd univa/eval/complex-edit
# follow the instruction in univa/eval/complex-edit/README.md

OmniContext

cd univa/eval/omnicontext
# follow the instruction in univa/eval/omnicontext/README.md

📊 Benchmarks

GEdit-EN-full

Model	BG Change	Color Alt.	Mat. Mod.	Motion	Portrait	Style	Add	Remove	Replace	Text	Tone	Avg
Open-Sourced Models
AnyEdit	4.31	4.25	2.64	0.67	1.90	1.95	3.72	3.75	3.23	0.77	4.21	2.85
MagicBrush	6.17	5.41	4.75	1.55	2.90	4.10	5.53	4.13	5.10	1.33	5.07	4.19
Instruct-Pix2Pix	3.94	5.40	3.52	1.27	2.62	4.39	3.07	1.50	3.48	1.13	5.10	3.22
OmniGen	5.23	5.93	5.44	3.12	3.17	4.88	6.33	6.35	5.34	4.31	4.96	5.01
Step1X-Edit	7.03	6.26	6.46	3.66	5.23	7.24	7.17	6.42	7.39	7.40	6.62	6.44
Bagel	7.44	6.99	6.26	5.09	4.82	6.04	7.94	7.37	7.31	7.16	6.17	6.60
Bagel-thinking	7.22	7.24	6.69	7.12	6.03	6.17	7.93	7.44	7.45	3.61	6.36	6.66
Ovis-U1	7.49	6.88	6.21	4.79	5.98	6.46	7.49	7.25	7.27	4.48	6.31	6.42
OmniGen2	-	-	-	-	-	-	-	-	-	-	-	6.42
Step1X-Edit (v1.1)	7.45	7.38	6.95	4.73	4.70	7.11	8.20	7.59	7.80	7.91	6.85	6.97
FluxKontext dev	7.06	7.03	5.52	5.62	4.68	5.55	6.95	6.76	6.13	6.10	7.48	6.26
Proprietary Models
Gemini	7.11	7.14	6.47	5.67	3.99	4.95	8.12	6.89	7.41	6.85	7.01	6.51
Doubao	8.07	7.36	7.20	5.38	6.28	7.20	8.05	7.71	7.87	4.01	7.67	6.98
GPT-4o	6.96	6.85	7.10	5.41	6.74	7.44	7.51	8.73	8.55	8.45	8.69	7.49
Ours	7.80	7.54	7.12	7.75	7.09	6.74	8.04	7.95	7.17	5.45	6.95	7.24

Complex-Edit

Method	IF	IP	PQ	Overall
AnyEdit	1.60	8.15	7.25	5.67
UltraEdit	6.56	5.93	7.29	6.59
OmniGen	6.25	6.42	7.54	6.74
FluxKontext Dev	8.56	8.39	8.51	8.49
Imagen3	7.56	6.55	7.67	7.26
SeedEdit	8.49	6.91	8.74	8.04
GPT-4o	9.29	7.51	9.47	8.76
Ours	8.99	8.41	8.93	8.78

ImgEdit-Full

Model	Add	Adjust	Extract	Replace	Remove	Background	Style	Hybrid	Action	Overall
MagicBrush	2.84	1.58	1.51	1.97	1.58	1.75	2.38	1.62	1.22	1.90
Instruct-Pix2Pix	2.45	1.83	1.44	2.01	1.50	1.44	3.55	1.20	1.46	1.88
AnyEdit	3.18	2.95	1.88	2.47	2.23	2.24	2.85	1.56	2.65	2.45
UltraEdit	3.44	2.81	2.13	2.96	1.45	2.83	3.76	1.91	2.98	2.70
OmniGen	3.47	3.04	1.71	2.94	2.43	3.21	4.19	2.24	3.38	2.96
Step1X-Edit	3.88	3.14	1.76	3.40	2.41	3.16	4.63	2.64	2.52	3.06
ICEdit	3.58	3.39	1.73	3.15	2.93	3.08	3.84	2.04	3.68	3.05
BAGEL	3.56	3.31	1.70	3.30	2.62	3.24	4.49	2.38	4.17	3.20
UniWorld-V1	3.82	3.64	2.27	3.47	3.24	2.99	4.21	2.96	2.74	3.26
OmniGen2	3.57	3.06	1.77	3.74	3.20	3.57	4.81	2.52	4.68	3.44
Ovis-U1	4.13	3.62	2.98	4.45	4.06	4.22	4.69	3.45	4.61	4.00
FluxKontext dev	3.76	3.45	2.15	3.98	2.94	3.78	4.38	2.96	4.26	3.52
GPT-4o	4.61	4.33	2.90	4.35	3.66	4.57	4.93	3.96	4.89	4.20
Ours	4.07	3.79	2.04	4.13	3.89	3.90	4.84	3.04	4.52	3.80

👍 Acknowledgement and Related Work

UniWorld-V1: UniWorld-V1 is a unified framework for understanding, generation, and editing.
ImgEdit: ImgEdit is a large-scale, high-quality image-editing dataset comprising 1.2 million carefully curated edit pairs and a comprehensive benchmark for image editing.
Complex-edit: Complex-edit is benchmark for complex image editing.
Qwen2.5-VL: The new flagship vision-language model of Qwen.
FLUX.1-Kontext-dev: A state-of-the-art image editing model.
Step1X-Edit: A state-of-the-art image editing model and a comprehensive benchmark for image editing.
OmniGen2: A state-of-the-art image editing model and a comprehensive benchmark for image editing.

🔒 License

See LICENSE for details. The FLUX Kontext weights fall under the FLUX.1 Kontext [dev] Non-Commercial License.

✏️ Citing

@article{wang2025gpt,
  title={GPT-IMAGE-EDIT-1.5 M: A Million-Scale, GPT-Generated Image Dataset},
  author={Wang, Yuhan and Yang, Siwei and Zhao, Bingchen and Zhang, Letian and Liu, Qing and Zhou, Yuyin and Xie, Cihang},
  journal={arXiv preprint arXiv:2507.21033},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📣 News

🔥 Quick Start

🗝️ Training

Data preparation

Data details

Training

Prepare pretrained weights

Stage 1

Stage 2

⚡️ Evaluation

Image Editing

📊 Benchmarks

GEdit-EN-full

Complex-Edit

ImgEdit-Full

👍 Acknowledgement and Related Work

🔒 License

✏️ Citing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
scripts		scripts
univa		univa
README.md		README.md
app.py		app.py
data.txt		data.txt
requirements.txt		requirements.txt
train_denoiser.py		train_denoiser.py

wyhlovecpp/GPT-Image-Edit

Folders and files

Latest commit

History

Repository files navigation

📣 News

🔥 Quick Start

🗝️ Training

Data preparation

Data details

Training

Prepare pretrained weights

Stage 1

Stage 2

⚡️ Evaluation

Image Editing

📊 Benchmarks

GEdit-EN-full

Complex-Edit

ImgEdit-Full

👍 Acknowledgement and Related Work

🔒 License

✏️ Citing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages