Thanks to visit codestin.com
Credit goes to github.com

Skip to content

lingfengren/NoLan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

158 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

This is the official repository for No-Language-Hallucination Decoding (NoLan), a simple, training-free decoding framework designed to mitigate object hallucinations in Large Vision-Language Models (LVLMs) by dynamically suppressing language priors.

🔥 Update

  • ⭐️ Paper released.
  • 🚀 Code released.

🎯 Overview

NoLan

Object hallucination is a critical issue in LVLMs, where models generate objects that do not appear in the image.

Given an LVLM, an image $v$, and a language question $x$, NoLan mitigates hallucinations in responses by comparing outputs generated from multimodal and unimodal (text-only) inputs. Step 2 can also be simplified by setting $\alpha$ to a fixed value of $1$.

NoLan dynamically increases suppression when KL is small, thus restoring visual grounding.

We define:

$$\gamma = \frac{D_{KL}(l_m \| l_u) + D_{KL}(l_u \| l_m)}{2}$$ $$\alpha = \beta \times (\tanh(1/\gamma) + 1)$$

This dynamically increases suppression when multimodal and text-only distributions are similar — precisely when hallucination risk is high.

🕹️ Usage

Environment Setup

conda create -yn nolan python=3.9
conda activate nolan
pip install -r requirements.txt

🛠 How to Integrate NoLan into LVLMs

NoLan operates during inference and can be seamlessly integrated into autoregressive LVLMs such as:

  • LLaVA-1.5
  • InstructBLIP
  • Qwen-VL
  1. Add the following at the beginning of the start-up script:
from nolan_utils.nolan_sample import evolve_nolan_sampling
evolve_nolan_sampling()

The evolve_nolan_sampling function replaces the sampling function in the transformers library. The modified sampling function includes an option for visual contrastive decoding, while keeping the rest unchanged.

  1. Slightly modify llava_llama.py:

    a. Add nolan decoding parameters in the LlavaLlamaForCausalLM class's forward function to avoid exceptions in model.generate.

    b. Add the prepare_inputs_for_generation_cd function.

  2. Tokenize multimodal and text-only inputs:

input_ids_cd = tokenizer.encode(prompt_cd, return_tensors="pt").unsqueeze(0).cuda()
input_ids = tokenizer.encode(prompt, return_tensors="pt").unsqueeze(0).cuda()
  1. Set the hyperparameter in the generate function:
output_ids = model.generate(
    input_ids,
    images=image_tensor.unsqueeze(0).half().cuda(),
    input_ids_cd=input_ids_cd,
    cd_alpha=args.cd_alpha,
    cd_beta=args.cd_beta,
    do_sample=True)

🏅 Experiments

  • The efficacy of NoLan on POPE exp1 exp2

  • The efficacy of NoLan on MME exp3

  • Please refer to our paper for detailed experimental results.

📌 Examples

exp4 exp5 exp6

📑 Citation

If you find our project helpful, please consider starring the repository and citing our paper as follows:

@article{ren2026nolan,
  author = {Lingfeng Ren, Weihao Yu, Runpeng Yu, Xinchao Wang},
  title = {NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors},
  year = 2026,
  journal = {arXiv preprint arXiv:2602.22144},
  url = {https://arxiv.org/abs/2602.22144}
}

📝 Related Projects

  • VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
  • Contrastive Decoding: Open-ended Text Generation as Optimization
  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
  • Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
  • LLaVA 1.5: Improved Baselines with Visual Instruction Tuning Thanks for their awesome works.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages