Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MeiGen-AI/GenEvolve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GenEvolve

GenEvolve

Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Paper Project Page Weights Dataset GitHub

πŸ‘₯ Authors

Sixiang Chen1,2, Zhaohu Xing1, Tian Ye1, Xinyu Geng3, Yunlong Lin, Jianyu Lai1,2, Xuanhua He3, Fuxiang Zhai1, Jialin Gao4,‑, Lei Zhu1,3,†

1The Hong Kong University of Science and Technology (Guangzhou)

2Meituan

3The Hong Kong University of Science and Technology

4National University of Singapore

Project Leader: Junfeng Luo (Meituan)


GenEvolve teaser

The same trained agent policy paired with two reference-conditioned generators ⟢
Qwen-Image-Edit (open) Β Β·Β  Nano Banana Pro (strong)

🌟 What is GenEvolve?

GenEvolve formulates open-ended image generation as a tool-orchestrated visual trajectory. The agent gathers external textual evidence, retrieves visual references, performs internal knowledge activation through callable generation skills, and synthesizes a prompt-reference program $z = (g, R)$ that any reference-conditioned generator can render.

The released GenEvolve policy is based on Qwen3-VL-8B and is designed to be generator-transferable: the same agent output can be rendered by the open Qwen-Image-Edit backend or by a stronger proprietary renderer such as Nano Banana Pro.

🎁 What's released

Component Where
🧠 Trained agent policy GenEvolve (Qwen3-VL-8B-based) πŸ€— MeiGen-AI/GenEvolve
⚑ Standalone inference runtime (GenEvolveAgent, OpenAI-compatible) this repo
πŸ› οΈ Three tools (search, image_search, query_knowledge) this repo
πŸ“š The eight skill markdown files used at training time this repo
🎨 Reference-conditioned generator wrappers (Qwen-Image-Edit + Nano Banana Pro) this repo
πŸ“¦ SFT trajectories (9,000 records) πŸ€— MeiGen-AI/GenEvolve-Data-Bench / GenEvolve-Data-SFT/
🎯 Self-evolution prompts + GT images (3,175 records) πŸ€— MeiGen-AI/GenEvolve-Data-Bench / GenEvolve-Data-RL/
πŸ“Š Held-out evaluation benchmark (594 prompts + GT images) πŸ€— MeiGen-AI/GenEvolve-Data-Bench / GenEvolve-Bench/

πŸ“‹ Requirements

GenEvolve has a main runtime environment for policy serving, agent rollouts, tool execution, and benchmark inference. This is not the only process used in a full image-generation pipeline: for reproducible Qwen rendering, run Qwen-Image-Edit as a separate FastAPI/diffusers service and call it from GenEvolve through --service-url.

Main GenEvolve runtime - genevolve

Use this environment for the released agent code path: serving GenEvolve, running the agent, calling tools, using the Nano client, and calling a Qwen service endpoint. Install it once using the Quickstart commands below.

Component Version Notes
Python 3.11
CUDA stack CUDA 12.x; our logs used PyTorch CUDA 12.8 wheels
torch / torchvision 2.8.0 / 0.23.0
transformers 4.57.1
vllm 0.11.0
ray 2.54.1
flash-attn 2.8.3

This environment does not install or launch external services such as Qwen-Image-Edit, Serper, or the Google image API. Those are configured separately.

External services

Service Variable Used for
serper.dev SERPER_API_KEY required for search and image_search
Google Generative Language API GOOGLE_API_KEY or GEMINI_API_KEY only for --backend nano-banana-pro
Qwen-Image-Edit FastAPI service --service-url only for --backend qwen-image-edit-service

Qwen-Image-Edit service environment

For Qwen rendering, use a separate service environment instead of mixing the diffusion stack into the vLLM server. A typical working stack is Python 3.11, PyTorch/torchvision 2.6.0/0.21.0 with CUDA 12.4 wheels, diffusers>=0.38, transformers>=4.57, accelerate, fastapi, uvicorn, pillow, and requests.

conda create -n qwenimage python=3.11 -y
conda activate qwenimage
pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu124
pip install "diffusers>=0.38" "transformers>=4.57" accelerate fastapi uvicorn pillow requests

Start any Qwen-Image-Edit FastAPI service compatible with POST /generate; a common deployment is one Qwen pipeline per visible GPU, with one HTTP endpoint such as http://host:8001. GenEvolve sends requests with --backend qwen-image-edit-service --service-url http://host:8001.

πŸš€ Quickstart

1. Install the main GenEvolve runtime

git clone https://github.com/MeiGen-AI/GenEvolve.git
cd GenEvolve

conda create -n genevolve python=3.11 -y
conda activate genevolve
pip install -U pip setuptools wheel packaging psutil ninja
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install --no-build-isolation -r requirements.txt
pip install -e .

This installs only the main GenEvolve runtime: vLLM serving, the agent tools, and lightweight generator clients/wrappers. It does not install or start the separate Qwen-Image-Edit service; set up that service from the Qwen environment section above when using --backend qwen-image-edit-service.

2. Serve the released checkpoint

Put the Hugging Face checkpoint directory in MODEL_PATH. The serving scripts support both tensor parallelism (TP) and data parallel replicas (DP).

  • TP shards one model replica across multiple GPUs.
  • DP launches multiple model replicas to improve throughput for many concurrent prompts.
  • Total GPU usage is TP Γ— DP.
  • Use a larger DP when scripts/run_agent.py --parallel is large and each request fits on one GPU.
  • Use a larger TP when one model replica needs more memory or longer context than one GPU can provide.
# Single GPU / single replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=1 bash scripts/serve_vllm.sh

# Higher throughput on one 8-GPU node: 8 replicas, one GPU per replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=1 DP=8 bash scripts/serve_vllm.sh

# If one replica needs more memory: 4 replicas, two GPUs per replica.
MODEL_PATH=MeiGen-AI/GenEvolve PORT=8000 TP=2 DP=4 bash scripts/serve_vllm.sh

For example, TP=8 DP=1 is one model replica sharded over 8 GPUs. It is not 8 independent services. For throughput on one 8-GPU node, prefer TP=1 DP=8 if the model fits on one GPU; use TP=2 DP=4 or TP=4 DP=2 when each replica needs multiple GPUs.

3. Run an end-to-end example

export SERPER_API_KEY=<your_key>             # required for search and image_search
export GOOGLE_API_KEY=<your_key>             # or GEMINI_API_KEY; only for Nano Banana Pro

python examples/quickstart.py \
    --backend nano-banana-pro \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --prompt "A 1990s travel-magazine cover of two backpackers in front of the Eiffel Tower at golden hour, the title \"PARIS\" rendered in bold serif type." \
    --output paris.png

For the open-generator path, use --backend qwen-image-edit-service with one or more Qwen-Image-Edit service endpoints:

python examples/quickstart.py \
    --backend qwen-image-edit-service \
    --service-url http://your-qwen-service:8001 \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --output paris_qwen.png

--backend qwen-image-edit is kept only as a local diffusers debug path when the Qwen-Image-Edit dependencies are installed in the active environment.

4. Batch pipeline

The agent rollout and the heavy image rendering are split into two stages so they can run on different machines.

# Stage 1: agent rollouts -> results.json.
python scripts/run_agent.py \
    --input examples/example_prompts.jsonl \
    --output-dir runs/example \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --parallel 4

# Stage 2a: render through one or more Qwen-Image-Edit services.
# Repeating --service-url enables round-robin dispatch; --parallel sends
# concurrent requests so multiple service workers can be used.
python scripts/generate_images.py \
    --input runs/example/results.json \
    --output-dir runs/example_qwen_service \
    --backend qwen-image-edit-service \
    --service-url http://your-qwen-service-1:8001 \
    --service-url http://your-qwen-service-2:8001 \
    --parallel 8

# Stage 2b: render with Nano Banana Pro.
python scripts/generate_images.py \
    --input runs/example/results.json \
    --output-dir runs/example_nano \
    --backend nano-banana-pro \
    --parallel 4

Current script support:

Stage Script Scaling knobs
Agent model serving scripts/serve_vllm.sh TP, DP, PORT, MAX_MODEL_LEN, MODEL_PATH
Agent rollouts scripts/run_agent.py --parallel, --base-url, --model
Remote Qwen rendering scripts/generate_images.py --backend qwen-image-edit-service repeat --service-url and set --parallel
Local Qwen debug rendering scripts/generate_images.py --backend qwen-image-edit single local process; requires a Qwen-compatible diffusers environment
Nano rendering scripts/generate_images.py --backend nano-banana-pro --parallel, subject to API quota/rate limits

5. Benchmark scoring

To reproduce benchmark metrics, download the public dataset and pass the benchmark JSONL directly to the agent runner. The public benchmark uses question as the prompt field; scripts/run_agent.py accepts both question and prompt, preserves extra fields such as gt_image, eval_type, category, and difficulty, and the rendering script copies them into its output results.json.

The scorer in scripts/evaluate_images.py is the paper-compatible Gemini judge: it uses the same rubric prompt, the same image order (Image 1 = generated, Image 2 = GT), the same OpenAI-compatible multimodal chat-completions call, and the same score normalization and weighted overall formula used for the reported benchmark numbers. No service endpoint or API key is hard-coded.

Public benchmark row format:

{"id": "0", "question": "A detailed image-generation request...", "gt_image": "images/case_00000.jpg", "eval_type": "Knowledge-Anchored", "category": "architecture_landmark", "difficulty": "hard"}

Run the same two-stage pipeline, then score the rendered images with Gemini:

huggingface-cli download MeiGen-AI/GenEvolve-Data-Bench \
    --repo-type dataset \
    --local-dir ./GenEvolve-Data-Bench

# Stage 1: agent rollouts.
python scripts/run_agent.py \
    --input ./GenEvolve-Data-Bench/GenEvolve-Bench/test.jsonl \
    --output-dir runs/bench_agent \
    --base-url http://localhost:8000/v1 \
    --model GenEvolve \
    --parallel 16

# Stage 2: render images, for example through Qwen-Image-Edit services.
python scripts/generate_images.py \
    --input runs/bench_agent/results.json \
    --output-dir runs/bench_qwen \
    --backend qwen-image-edit-service \
    --service-url http://your-qwen-service:8001 \
    --parallel 16

# Stage 3: Gemini judge.
# Use an OpenAI-compatible Gemini chat-completions endpoint.
export OPENAI_API_KEY=<your_eval_api_key>
export OPENAI_API_BASE=<your_openai_compatible_base_url>
python scripts/evaluate_images.py \
    --results runs/bench_qwen/results.json \
    --gt-root ./GenEvolve-Data-Bench/GenEvolve-Bench \
    --model gemini-3.1-pro-preview \
    --max-workers 16 \
    --rpm 60 \
    --resume

scripts/evaluate_images.py writes:

File Contents
results_eval.json per-sample judge output and rationale
summary.json aggregate metrics
summary.csv the same metrics in table form

results_eval.json also appends benchmark split summaries such as eval_type:Knowledge-Anchored, eval_type:Quality-Anchored, and overall_avg.

The reported metrics are faithfulness, visual_correctness, text_accuracy, aesthetics, and the weighted overall score:

overall = 0.1 * faithfulness
        + 0.4 * visual_correctness
        + 0.4 * text_accuracy
        + 0.1 * aesthetics

overall_missing_zero keeps the full denominator and treats missing or failed cases as zero. The summary also reports metrics by eval_type, category, and difficulty when those fields are present.

🧩 Optional Python Usage

If you only want to run the provided scripts, you can skip this section. This is for users who want to call the agent and renderer directly from their own Python pipeline instead of going through scripts/run_agent.py and scripts/generate_images.py.

from genevolve import GenEvolveAgent
from genevolve.generator import QwenImageEditServiceGenerator  # or NanoBananaProGenerator

agent = GenEvolveAgent(
    model="GenEvolve",
    base_url="http://localhost:8000/v1",
    api_key="EMPTY",
)
result = agent.run("A cyberpunk version of the Sydney Opera House at sunset.")

# z = (gen_prompt, reference_images)
print(result.gen_prompt)
for r in result.reference_images:
    print(r["img_id"], r["local_path"], r["note"])

backend = QwenImageEditServiceGenerator(["http://your-qwen-service:8001"])
image = backend.generate(
    result.gen_prompt,
    [r["local_path"] for r in result.reference_images if r.get("local_path")],
)
image.save("opera.png")

🧠 Method overview

GenEvolve method overview

For a user request $x$, the agent samples a multi-turn trajectory

$$\tau = (a_1, o_1, \ldots, a_T, o_T, z), \qquad z = (g, R),$$

where each $a_t$ is one of the three actions below and $o_t$ is the corresponding observation. The downstream generator renders $\hat{y} = G(g, R)$.

ToolRoleOutput
search(queries) External textual evidence - entities, dates, facts. Markdown digest.
image_search(query) Visual references; each result gets a unique IMG_### id. Image list with local paths.
query_knowledge(skill_name) Internal knowledge activation - invokes one of the eight callable generation skills. Skill instructions in Markdown.

The final answer is a JSON object, the prompt-reference program:

{
  "gen_prompt": "... a targeted instruction that refers to references by ordinal phrases ('the first reference image', 'the second reference image') ...",
  "reference_images": [
    {"img_id": "IMG_001", "note": "what to copy from this reference"}
  ]
}

πŸ“¦ Data

We release the training data and benchmark in one Hugging Face dataset repository: MeiGen-AI/GenEvolve-Data-Bench. The total trajectory data is too large for GitHub but installs in one line via πŸ€— datasets / huggingface-cli.

Dataset Records Size Purpose
GenEvolve-Data-SFT/ 9,000 records ~7.4 GB Multi-turn tool-orchestrated trajectories used for the SFT cold start. Each record: messages (chat-format ReAct trajectory ending in <answer>{gen_prompt, reference_images}) + images (reference jpegs).
GenEvolve-Data-RL/ 3,175 records ~680 MB Open-ended user requests paired with curated GT images. Used for GRPO + Visual Experience Distillation, where multiple agent rollouts per prompt are scored against the GT.
GenEvolve-Bench/ 594 prompts ~120 MB Held-out evaluation benchmark. Contains both Knowledge-Anchored (335) and Quality-Anchored (259) tracks plus per-prompt category, difficulty, and skill metadata.

Quick load

pip install -U huggingface_hub datasets

huggingface-cli download MeiGen-AI/GenEvolve-Data-Bench \
    --repo-type dataset \
    --local-dir ./GenEvolve-Data-Bench
from datasets import load_dataset

repo_id = "MeiGen-AI/GenEvolve-Data-Bench"

bench = load_dataset(repo_id, "bench", split="test")
print(bench[0]["question"], bench[0]["gt_image"])

rl = load_dataset(repo_id, "rl", split="train")
sft = load_dataset(repo_id, "sft", split="train")
print(sft[0]["messages"])
print(sft[0]["images"])

All paths inside the datasets are relative, for example images/case_00512.jpg or images/traj_00213/IMG_001.jpg; resolve them against the dataset directory you downloaded to. Per-dataset usage notes live on each dataset's Hub page.

The full training scripts are not included in this repository, but the released SFT/RL datasets, model weights, tools, and runtime let you reproduce the path from a user request to a rendered image.

πŸ–ΌοΈ Visual results

Qualitative comparison

The same GenEvolve policy paired with two different reference-conditioned generators. Orange marks external/uncommon knowledge, blue marks internal generation-knowledge requirements.

🎨 Extended gallery - paired with Nano Banana Pro

GenEvolve + Nano Banana Pro gallery

Additional qualitative results of GenEvolve with Nano Banana Pro as the downstream renderer. The agent autonomously orchestrates search, reference selection, and skill activation across diverse open-ended categories: spatial layout, text rendering, quantity counting, attribute binding, anatomy/pose, creative transfer, material physics, and aesthetic drawing.

🎨 Extended gallery - paired with Qwen-Image-Edit (open)

GenEvolve + Qwen-Image-Edit gallery

The same trained agent policy paired with the open-source Qwen-Image-Edit-2511 renderer. Consistent quality across both generators demonstrates that GenEvolve learns generator-transferable tool orchestration rather than overfitting to one specific renderer.

βš™οΈ Configuration

Variable Purpose Default
OPENAI_BASE_URL OpenAI-compatible chat-completions endpoint http://localhost:8000/v1
OPENAI_API_KEY API key for the inference server or the OpenAI-compatible evaluator endpoint EMPTY for local inference
OPENAI_API_BASE OpenAI-compatible Gemini judge endpoint used by scripts/evaluate_images.py provider-specific
SERPER_API_KEY serper.dev key for text and image search required
SERPER_BASE_URL Override for Serper-compatible gateways https://google.serper.dev
IMAGE_DOWNLOAD_DIR Local cache for image_search downloads /tmp/genevolve_images
GOOGLE_API_KEY / GEMINI_API_KEY Google Generative Language API key required for Nano backend

🧯 Troubleshooting

Symptom Check
search / image_search returns authentication errors Set SERPER_API_KEY or configure SERPER_BASE_URL for your internal Serper-compatible gateway.
Agent cannot connect to the model Confirm the vLLM server is running and OPENAI_BASE_URL or --base-url ends with /v1.
Qwen local renderer fails at import time Use a separate Qwen-Image-Edit service environment and call it with qwen-image-edit-service; avoid mixing incompatible xformers / flash-attn combinations into the renderer env.
Qwen renderer says it needs a reference image Qwen-Image-Edit is reference-conditioned; rerun the agent or use Nano Banana Pro for no-reference prompts.
evaluate_images.py cannot find GT images Keep gt_image in each input record and pass --gt-root pointing to the downloaded benchmark directory.
flash-attn build fails Install a PyTorch/CUDA wheel first, then run pip install flash-attn==2.8.3 --no-build-isolation.
Batch rendering resumes after interruption scripts/generate_images.py writes results.json incrementally under the output directory.

πŸ—‚οΈ Repository layout

genevolve/
β”œβ”€β”€ genevolve/
β”‚   β”œβ”€β”€ agent.py               # GenEvolveAgent: ReAct loop on top of an OpenAI-compatible server
β”‚   β”œβ”€β”€ system_prompt.py       # system prompt used by the released agent
β”‚   β”œβ”€β”€ knowledge_tool.py      # query_knowledge: eight callable generation skills
β”‚   β”œβ”€β”€ tools/web_search.py    # search + image_search (Serper-compatible)
β”‚   β”œβ”€β”€ generator.py           # Qwen-Image-Edit + Nano Banana Pro backends
β”‚   └── knowledge/skills/      # skill markdown files
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ serve_vllm.sh          # serve the checkpoint with vLLM
β”‚   β”œβ”€β”€ run_agent.py           # batch agent rollouts -> results.json
β”‚   β”œβ”€β”€ generate_images.py     # render images from results.json
β”‚   └── evaluate_images.py     # Gemini judge scoring and metric summary
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ quickstart.py          # single-prompt end-to-end example
β”‚   └── example_prompts.jsonl
β”œβ”€β”€ assets/                    # README figures
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ setup.py
└── README.md

πŸ™ Acknowledgements

We thank the authors and maintainers of Gen-Searcher, Qwen3-VL, Qwen-Image-Edit, vLLM, Serper.dev, and the Google Generative Language API.

πŸ“ Citation

@misc{chen2026genevolveselfevolvingimagegeneration,
      title={GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation}, 
      author={Sixiang Chen and Zhaohu Xing and Tian Ye and Xinyu Geng and Yunlong Lin and Jianyu Lai and Xuanhua He and Fuxiang Zhai and Jialin Gao and Lei Zhu},
      year={2026},
      eprint={2605.21605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.21605}, 
}

πŸ“œ License

Code is released under the Apache 2.0 license. Released model weights inherit the upstream license of Qwen3-VL-8B-Instruct. Search results returned by Serper.dev and images rendered by Nano Banana Pro / Qwen-Image-Edit are governed by the respective upstream service terms.

About

Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors