Article: Zijian Zhao*, Dian Jin, Zijing Zhou"Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach" (under review)
amaai-lab/MidiCaps · Datasets at Hugging Face
Please rename the train.json as meta.txt.
The data process part is based on the code of jwdj/EasyABC: EasyABC (github.com).
python main.py@misc{zhao2025zeroeffortimagetomusicgenerationinterpretable,
title={Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach},
author={Zijian Zhao and Dian Jin and Zijing Zhou},
year={2025},
eprint={2509.22378},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2509.22378},
}
Some websites provide the service for abc2midi and midi2abc: