🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Paper |Arxiv

TL;DR 📖: How can AI models be truly aligned with the multi-faceted human aesthetic judgment? This paper points out that zero-shot reasoning with MLLMs as a promising solution. Extensive experiment reveals that zero-shot aesthetic reasoning outperform the SOTA image aesthetic assessment models, even without any training.

Update

🔥🔥: Jul 05: This work is accepted to ACM MM 2025!

The FineArtBench Dataset

The 1,000 content, 1,000 style, and stylized images from 10 models can be downlowed from the links below: Download Link: 百度网盘 (Baidu NetDisk) | Google Drive

Please download and extract the dataset. Put it under data/ folder. The evaluation Code and guide will be released soon.

ArtCoT

We propose ArtCoT to enhance the inference-time reasoning capability of MLLMs. A example conversation is provided below. Detailed quantitative comparison can be found in paper. The full response from MLLMs in our experiments will also be released to facilitate further research.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
asset		asset
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Paper |Arxiv

Update

The FineArtBench Dataset

ArtCoT

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

songrise/MLLM4Art

Folders and files

Latest commit

History

Repository files navigation

🎨 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | Paper |Arxiv

Update

The FineArtBench Dataset

ArtCoT

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages