Create environment and install dependencies.
conda create -n MM python=3.8
pip install -r requirements.txt
We have host MJ-Bench dataset on huggingface, where you should request access on this page first and shall be automatically approved. Then you can simply load the dataset vi:
from datasets import load_dataset
dataset = load_dataset("MJ-Bench/MJ-Bench")
# use streaming mode to load on the fly
dataset = load_dataset("MJ-Bench/MJ-Bench", streaming=True)
config/config.yaml
contains the configuration for the three types of multimodal judges that you want to evaluate. You can copy the default configuration to a new file and modify the model_path and api_key to use in your own envionrment. If you add new models, make sure you also add the load_model
and get_score
functions in the corresponding files under reward_models/
.
To get the inference result from a multimodal judge, simply run
python inference.py --model [MODEL_NAME] --config_path [CONFIG_PATH] --dataset [DATASET] --perspective [PERSPECTIVE] --save_dir [SAVE_DIR] --threshold [THRESHOLD] --multi_image [MULTI_IMAGE] --prompt_template_path [PROMPT_PATH]
where MODEL_NAME
is the name of the reward model to evaluate; CONFIG_PATH
is the path to the configuration file; DATASET
is the dataset to evaluate on (default is MJ-Bench/MJ-Bench
); PERSPECTIVE
is the data subset to evaluate (e.g. alignment, safety, quality, bias); SAVE_DIR
is the directory to save the results; and THRESHOLD
is the preference threshold for the score-based RMs(i.e. image_0
is prefered only if score(image_0) - score(image_1) > THRESHOLD
); MULTI_IMAGE
indicates whether input multiple images or not (only close-source VLMs and some open-source VLMs support this); PROMPT_PATH
indicates the path to the prompt for the VLM judges (needs to be consistent with MULTI_IMAGE
).
@article{chen2024mj,
title={MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?},
author={Chen, Zhaorun and Du, Yichao and Wen, Zichen and Zhou, Yiyang and Cui, Chenhang and Weng, Zhenzhen and Tu, Haoqin and Wang, Chaoqi and Tong, Zhengwei and Huang, Qinglan and others},
journal={arXiv preprint arXiv:2407.04842},
year={2024}
}