DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models (ICLR 2025)
This papers aims to mitigate hallucinations in Vision-Language Models (VLMs) by accumulating visual information from earlier layers, where we found that correct information often appears in the early stage. By refining activations throughout the inference procedure, DAMO effectively preserves essential visual semantics, leading to more accurate and reliable predictions.
Here is the paper link: https://openreview.net/forum?id=JUr0YOMvZA
- Please create the llava 1.5 env from official repo:
https://github.com/haotian-liu/LLaVA.git - Please replace the
LLaVA/llava/model/language_model/llava_llama.pywith ours. - Please replace the
LLaVA/llava/model/llava_arch.pywith ours. - Please replace the
LLaVA/llava/eval/run_llava.pywith ours. - Then, for MME benchmark, you could run
CUDA_VISIBLE_DEVICES=0 python llava_mme.py --output_dir DAMOto evaluate MME benchmark.
@inproceedings{wangdamo,
title={DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models},
author={Wang, Kaishen and Gu, Hengrui and Gao, Meijun and Zhou, Kaixiong},
booktitle={The Thirteenth International Conference on Learning Representations}
}
