🌍 Introduction

🎨ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies

🌍 Introduction

ComplexBench-Edit is a benchmark for image editing specifically designed to assess performance on complex instructions involving multiple combined and dependent modifications. Our benchmark systematically evaluates howwell models can handle both parallel and, critically, chain-dependent instructions. Furthermore, we propose a novel vision consistency evaluation method that excludes the influence of modified content by assessing consistency only in the remaining, unaltered regions. We also introduce a simple yet powerful CoT-based approach for image editing.

🔥 News

[2025.6.3] We release the comparison cases between different baselines and GPT-4o.
[2025.6.2] We release the source image and editing instructions about ComplexBench-Edit Benchmark.
[2025.6.1] We release the evaluation code.

⭐ Benchmark Collection

🛠️ Setup

Clone the repository:

git clone https://github.com/llllly26/ComplexBench-Edit
cd ComplexBench-Edit

Install Dependencies:
```
pip install -r requirements.txt
```
Download Datasets: The source image could be downloaded from [ Here ], put the source images in data/more-object-no-multi3 directory. Overview of data could be found in

🧳 Project Folder Structure

ComplexBench-Edit/
├── LICENSE
├── README.md
├── baselines/                  # Contains implementations of some baseline models
│   ├── icedit.py
├── data/                       # Contains benchmark images and instructions in json file.
    │   ├── instructions/
    │   │   ├── COCO-obj-attr-global/
    │   │   ├── COCO-three-obj/
    │   │   ├── COCO-two-obj-one-attr/
    │   │   ├── three-chain/
    │   │   └── two-chain/
    │   ├── more-object-no-multi3/
├── edited-image/               # Stores editing images of models
│   └── Gemini/                 # Example: Images edited by Gemini
└── evaluation/                 # Contains evaluation scripts and prompts
    ├── count_score.py
    ├── eval-detection.py
    ├── eval_prompt/            # Evaluation prompts
    ├── final_score.py
    ├── get-bbox.py
    ├── ins_eval.py
    └── read.txt

🚀 Running Baselines and Evaluation

For the evaluations of all baselines, we utilize the demo code parameters provided in their respective original repositories. Thanks for all the authors.

Example for running a baseline:

python .\baselines\icedit.py

Example for running evaluation of instruction following:

python .\evaluation\ins_eval.py --results_folder ".\edited-image\Gemini\COCO-three-obj\testResults_42" --json_path ".\data\COCO-three-obj\final_update_v2.json" --output_dir ".\edited-image\Gemini\COCO-three-obj\testResults_42_eval_v3_thinking_01_21"

🎈 Case Editing Results

Here, we showcase several examples from our ComplexBench-Edit benchmark. The image demonstrates the evaluation results of leading instruction-driven editing methods, including GPT-4o.

Citation

If you find that this work is useful for your research, please kindly give a star ⭐ and consider citation:

@article{wang2025complexbench,
  title={ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies},
  author={Wang, Chenglin and Zhou, Yucheng and Wang, Qianning and Wang, Zhe and Zhang, Kai},
  journal={arXiv preprint arXiv:2506.12830},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎨ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies

🌍 Introduction

🔥 News

⭐ Benchmark Collection

🛠️ Setup

🧳 Project Folder Structure

🚀 Running Baselines and Evaluation

🎈 Case Editing Results

Citation

About

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
baselines		baselines
data		data
evaluation		evaluation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

llllly26/ComplexBench-Edit

Folders and files

Latest commit

History

Repository files navigation

🎨ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies

🌍 Introduction

🔥 News

⭐ Benchmark Collection

🛠️ Setup

🧳 Project Folder Structure

🚀 Running Baselines and Evaluation

🎈 Case Editing Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages