NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors
This is the official repository for No-Language-Hallucination Decoding (NoLan), a simple, training-free decoding framework designed to mitigate object hallucinations in Large Vision-Language Models (LVLMs) by dynamically suppressing language priors.
- ⭐️ Paper released.
- 🚀 Code released.
Object hallucination is a critical issue in LVLMs, where models generate objects that do not appear in the image.
Given an LVLM, an image
NoLan dynamically increases suppression when KL is small, thus restoring visual grounding.
We define:
This dynamically increases suppression when multimodal and text-only distributions are similar — precisely when hallucination risk is high.
conda create -yn nolan python=3.9
conda activate nolan
pip install -r requirements.txtNoLan operates during inference and can be seamlessly integrated into autoregressive LVLMs such as:
- LLaVA-1.5
- InstructBLIP
- Qwen-VL
- Add the following at the beginning of the start-up script:
from nolan_utils.nolan_sample import evolve_nolan_sampling
evolve_nolan_sampling()The evolve_nolan_sampling function replaces the sampling function in the transformers library. The modified sampling function includes an option for visual contrastive decoding, while keeping the rest unchanged.
-
Slightly modify
llava_llama.py:a. Add nolan decoding parameters in the
LlavaLlamaForCausalLMclass'sforwardfunction to avoid exceptions inmodel.generate.b. Add the
prepare_inputs_for_generation_cdfunction. -
Tokenize multimodal and text-only inputs:
input_ids_cd = tokenizer.encode(prompt_cd, return_tensors="pt").unsqueeze(0).cuda()
input_ids = tokenizer.encode(prompt, return_tensors="pt").unsqueeze(0).cuda()- Set the hyperparameter in the
generatefunction:
output_ids = model.generate(
input_ids,
images=image_tensor.unsqueeze(0).half().cuda(),
input_ids_cd=input_ids_cd,
cd_alpha=args.cd_alpha,
cd_beta=args.cd_beta,
do_sample=True)-
Please refer to our paper for detailed experimental results.
If you find our project helpful, please consider starring the repository and citing our paper as follows:
@article{ren2026nolan,
author = {Lingfeng Ren, Weihao Yu, Runpeng Yu, Xinchao Wang},
title = {NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors},
year = 2026,
journal = {arXiv preprint arXiv:2602.22144},
url = {https://arxiv.org/abs/2602.22144}
}
- VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
- Contrastive Decoding: Open-ended Text Generation as Optimization
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- LLaVA 1.5: Improved Baselines with Visual Instruction Tuning Thanks for their awesome works.