-
AImageLab - University of Modena and Reggio Emilia
- Modena
Stars
Recurrence Meets Transformers for Universal Multimodal Retrieval
[BMVC 2025] Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, human visual search.
[ECCV 2024] BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues.
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Pytorch code for ECCVW 2022 paper "Consistency-based Self-supervised Learning for Temporal Anomaly Localization"