Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

License

Notifications You must be signed in to change notification settings

MJ-Bench/MJ-Bench

Repository files navigation

If you like our project, please consider giving us a star ⭐ 🥹🙏

project hf_space hf_leaderboard Dataset
arXiv Hits GitHub issues GitHub Stars

Setup

Installation

Create environment and install dependencies.

conda create -n MM python=3.8
pip install -r requirements.txt

Dataset Preparation

We have host MJ-Bench dataset on huggingface, where you should request access on this page first and shall be automatically approved. Then you can simply load the dataset vi:

from datasets import load_dataset
dataset = load_dataset("MJ-Bench/MJ-Bench")
# use streaming mode to load on the fly
dataset = load_dataset("MJ-Bench/MJ-Bench", streaming=True)

Judge Model Configuration

config/config.yaml contains the configuration for the three types of multimodal judges that you want to evaluate. You can copy the default configuration to a new file and modify the model_path and api_key to use in your own envionrment. If you add new models, make sure you also add the load_model and get_score functions in the corresponding files under reward_models/.

Judge Model Evaluation

To get the inference result from a multimodal judge, simply run

python inference.py --model [MODEL_NAME] --config_path [CONFIG_PATH] --dataset [DATASET] --perspective [PERSPECTIVE] --save_dir [SAVE_DIR] --threshold [THRESHOLD] --multi_image [MULTI_IMAGE] --prompt_template_path [PROMPT_PATH]

where MODEL_NAME is the name of the reward model to evaluate; CONFIG_PATH is the path to the configuration file; DATASET is the dataset to evaluate on (default is MJ-Bench/MJ-Bench); PERSPECTIVE is the data subset to evaluate (e.g. alignment, safety, quality, bias); SAVE_DIR is the directory to save the results; and THRESHOLD is the preference threshold for the score-based RMs(i.e. image_0 is prefered only if score(image_0) - score(image_1) > THRESHOLD); MULTI_IMAGE indicates whether input multiple images or not (only close-source VLMs and some open-source VLMs support this); PROMPT_PATH indicates the path to the prompt for the VLM judges (needs to be consistent with MULTI_IMAGE).

Citation

@article{chen2024mj,
  title={MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?},
  author={Chen, Zhaorun and Du, Yichao and Wen, Zichen and Zhou, Yiyang and Cui, Chenhang and Weng, Zhenzhen and Tu, Haoqin and Wang, Chaoqi and Tong, Zhengwei and Huang, Qinglan and others},
  journal={arXiv preprint arXiv:2407.04842},
  year={2024}
}

About

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published