Stars
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
[NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
FuxiaoLiu / awesome-Large-MultiModal-Hallucination
Forked from xieyuquanxx/awesome-Large-MultiModal-Hallucination😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
[CSCWD] Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Open-source datasets for paper "Fairness in Graph Mining: A Survey".
Open-source code for ''Graph Neural Networks with Adaptive Frequency Response Filter''.
Open source code for paper "EDITS: Modeling and Mitigating Data Bias for Graph Neural Networks".
Open-source Library PyGDebias: Graph Datasets and Fairness-Aware Graph Mining Algorithms
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>
An automatic MLLM hallucination detection framework
CoNLI: a plug-and-play framework for ungrounded hallucination detection and reduction
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"
the repository of A survey on image-text multimodal models
A Survey on multimodal learning research.