TL;DR 📖: How can AI models be truly aligned with the multi-faceted human aesthetic judgment? This paper points out that zero-shot reasoning with MLLMs as a promising solution. Extensive experiment reveals that zero-shot aesthetic reasoning outperform the SOTA image aesthetic assessment models, even without any training.

🔥🔥: Jul 05: This work is accepted to ACM MM 2025!
The 1,000 content, 1,000 style, and stylized images from 10 models can be downlowed from the links below: Download Link: 百度网盘 (Baidu NetDisk) | Google Drive
Please download and extract the dataset. Put it under data/ folder. The evaluation Code and guide will be released soon.
We propose ArtCoT to enhance the inference-time reasoning capability of MLLMs. A example conversation is provided below. Detailed quantitative comparison can be found in paper. The full response from MLLMs in our experiments will also be released to facilitate further research.