CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

🔥 News

[17 Oct 2025] 🔩 CCD has been upgraded to support view classification for chest X-rays — see the Supported Expert Models section for details.
[06 Oct 2025] 🎮 The online demo is available at Hugging Face Spaces. Feel free to try it out!
[30 Sep 2025] 🗂️ The processed test data for quick start are now available — enjoy exploring with the provided guidelines!
[27 Sep 2025] ⛳ Our preprint is now live on arXiv — check it out for details.

Overview

Multimodal large language models (MLLMs) are advancing radiology by combining image and text understanding, but often generate inaccurate or unsupported clinical details—so-called medical hallucinations. We propose Clinical Contrastive Decoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task‑specific radiology expert models. CCD reduces hallucinations and improves clinical accuracy without changing the base model. Experiments show CCD boosts performance on multiple datasets and models, offering a practical way to make radiology MLLMs more reliable.

CCD's Framework

📖 Contents

⛏️ Installation

Tip

Use uv for installation — it's faster and more reliable than pip.

Option 1:

Install the latest version directly from GitHub for quick setup:

uv pip install git+https://github.com/X-iZhang/CCD.git

Note

Requirements: Python 3.9 or later, and a CUDA-compatible GPU (recommended)

Option 2:

If you plan to modify the code or contribute to the project, you can clone the repository and install it in editable mode:

Clone the repository and navigate to the project folder

git clone https://github.com/X-iZhang/CCD.git
cd CCD

Set up the environment and install in editable mode

conda create -n CCD python=3.10 -y
conda activate CCD
pip install uv # enable uv support
uv pip install -e .

🔄 Upgrade to the latest code base

git pull
uv pip install -e .

⚡ Quick Start

CLI Inference

You can perform inference directly from the command line using our CLI tool:

python -m ccd.run_ccd \
  --model-path "X-iZhang/libra-maira-2" \
  --image "./path/to/Chest_Xray.jpg" \
  --question "Is there evidence of any abnormalities?" \
  --max-new-tokens 128

Optional arguments:

Argument	Description	Default
`--alpha`	Clinical guidance weight (range: 0.0–1.0)	0.5
`--beta`	Expert token weight (range: 0.0–1.0)	0.5
`--gamma`	Token bias magnitude (range: 2, 5, 10)	10
`--expert-model`	Choice of expert model: `"DenseNet"` or `"MedSiglip"`	DenseNet

Script Inference

You can run inference programmatically using the ccd_eval function from ccd/run_ccd.py.
After installing this repository, you can easily launch a model (either your own trained model or ours) locally or in Google Colab.

from ccd import ccd_eval

# Run CCD inference on a chest X-ray
output = ccd_eval(
    model_path="X-iZhang/libra-maira-2",  # or your custom radiology MLLM
    image="./path/to/Chest_Xray.jpg",
    question="Describe the findings in this chest X-ray.",
    alpha=0.5,        # Clinical guidance weight
    beta=0.5,         # Expert token weight
    gamma=10,         # Token bias magnitude
    temperature=0.9,  # Sampling temperature
    top_p=0.9,        # Nucleus sampling probability
    top_k=50,         # Top-k sampling
    expert_model="DenseNet",    # or "MedSiglip"
    max_new_tokens=256
)
print(output)

💡 You can also use run_eval to test the original model output (without CCD).

from ccd import run_eval

# Run standard inference without CCD
output = run_eval(
    model_path="X-iZhang/libra-maira-2",
    image="./path/to/Chest_Xray.jpg",
    question="Describe the findings in this chest X-ray.",
    max_new_tokens=128,
    num_beams=1
)
print(output)

Gradio Web Interface

You can launch the Gradio demo locally with:

python -m ccd.app

Or try it directly on 🤗 Hugging Face Spaces 🤗.

Once the Gradio web interface is launched, you can open it using the URL printed on your screen. You will notice that both the default MAIRA-2 model and the expert models are ready for setup, with more models available in the list. Simply upload a chest X-ray image, enter your question, and click 🚀Generate to view the results!

🛠️ Advanced Usage

Supported MLLM Models

CCD is compatible with any radiology MLLM that follows the Libra/LLaVA architecture:

Note

To switch MLLM models, simply set the --model-path argument (CLI) or model_path parameter (Python) to one of the following checkpoints.

Model	Checkpoint
Libra-v1.0-7B	X-iZhang/libra-v1.0-7b
Libra-v1.0-3B	X-iZhang/libra-v1.0-3b
MAIRA-2	X-iZhang/libra-maira-2
LLaVA-Med-v1.5	X-iZhang/libra-llava-med-v1.5-mistral-7b
LLaVA-Rad	X-iZhang/libra-llava-rad
Med-CXRGen-F	X-iZhang/Med-CXRGen-F
Med-CXRGen-I	X-iZhang/Med-CXRGen-I

Warning

The model adapted from the Libra repository is intended for demonstration purposes only. For accurate evaluation, please refer to the original model weights and configuration settings, particularly the chat template.

Supported Expert Models

CCD integrates two expert models for clinical signal extraction:

Note

To switch expert models, simply set the --expert-model argument (CLI) or expert_model parameter (Python) to one of the following names.

Model	Checkpoint	Note
DenseNet	torchxrayvision/densenet121-res224-chex	CheXpert (Stanford)
MedSiglip	google/medsiglip-448	Variant of SigLIP
View Model	ChestViewSplit	'Frontal' or 'Lateral'

Tip

When deploying DenseNet, it has been upgraded to support the view classification expert model, which helps the system better understand the view position of chest X-rays, thereby improving the accuracy of report generation. MedSigLIP has also been configured accordingly. The design is inspired by the MAIRA-2 chat template.

Parameter Settings

alpha (0.0-1.0): Weight for clinical guidance text
- Higher = more influence from expert-generated guidance
- Recommended: 0.3-0.7
beta (0.0-1.0): Weight for direct token biasing
- Higher = stronger push toward clinical terminology
- Recommended: 0.3-0.7
gamma (2, 5, 10): Maximum token bias magnitude
- 2: Subtle influence
- 5: Moderate influence
- 10: Strong influence (default)

Tip

These parameters can be set beyond the recommended range for adversarial testing to observe CCD’s behaviour under extreme conditions.

🗂️ Dataset

CCD supports multiple medical imaging datasets commonly used in radiology research:

MIMIC-CXR — Chest X-ray images with corresponding radiology reports.
IU-Xray — Chest X-ray dataset with structured annotations.
CheXpert Plus — Large-scale dataset for chest X-ray interpretation.
Medical-CXR-VQA — A dataset for visual question answering in chest X-rays.

Note

To facilitate hands-on testing, we provide pre-processed test splits for MIMIC-CXR, IU-Xray, CheXpert Plus and Medical-CXR-VQA, available on Hugging Face Collections.

Warning

Carefully read the READMEs; Please note that the image quality of these datasets has been compressed for efficient storage and sharing. Use the original datasets for evaluation.

📊 Evaluation

For evaluating generated reports, we recommend using RadEval — a unified framework for radiology text evaluation that integrates multiple standard metrics. Details can be found in the GitHub repository.

You can install RadEval via pip:

pip install RadEval

Tip

RadEval supports metrics such as BLEU, ROUGE, BERTScore, CheXbert F1, and RadGraph F1, making it ideal for comprehensive evaluation of radiology report generation models.

📝 Citation

If you find our paper and code useful in your research and applications, please cite using this BibTeX:

@misc{zhang2025ccdmitigatinghallucinationsradiology,
      title={CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding}, 
      author={Xi Zhang and Zaiqiao Meng and Jake Lever and Edmond S. L. Ho},
      year={2025},
      eprint={2509.23379},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.23379}, 
}

📚 Acknowledgments

This project builds upon the following outstanding open-source works:

Libra — A flexible toolkit supporting multiple radiology LLM backbones, covering the full pipeline from training to inference.
TorchXRayVision — A library for chest X-ray datasets and models.
MedSigLIP — Medical Signal–Language Image Pretraining.
RadEval — A unified framework for radiology text evaluation.

We thank the authors for their valuable contributions to the medical AI community.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🧰 Intended Use

CCD is designed to assist clinical practitioners, researchers, and medical trainees in generating and analysing chest X-ray reports, with a focus on temporal reasoning and context-aware description of radiological findings.

Key Applications

🩺 Clinical Decision Support — Produces preliminary findings or comparative analyses that can aid radiologists in drafting and reviewing reports.
🎓 Educational Tool — Demonstrates example interpretations and temporal progressions for teaching radiology residents and students.
🔬 Research Utility — Enables investigation of automated report generation, visual-language alignment, and temporal feature learning in medical imaging.

Important

All outputs must be reviewed and validated by qualified radiologists or medical professionals before informing any clinical decision.

Limitations and Recommendations

Data Bias — Performance may degrade on underrepresented populations or rare disease categories.
Clinical Oversight — CCD is a supportive system, not a replacement for professional medical judgment.
Temporal Sensitivity — Although TAC enhances temporal alignment, subtle or atypical longitudinal changes may remain unrecognised.
Generalisation — Performance may vary on image types or clinical contexts not present in the training distribution.

Ethical Considerations

Patient Privacy — All input data must be fully de-identified and compliant with HIPAA, GDPR, or equivalent local regulations.
Responsible Deployment — CCD’s outputs may contain inaccuracies; users should interpret them with appropriate caution.
Accountability — The responsibility for clinical verification and safe deployment lies with the end-user organisation or researcher.

Disclaimer

This model and accompanying tools are intended solely for research and educational purposes.
CCD is not approved by the FDA, CE, or other regulatory authorities for clinical use.
For medical diagnosis or treatment decisions, please consult a licensed healthcare professional.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
ccd		ccd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

🔥 News

Overview

📖 Contents

⛏️ Installation

Option 1:

Option 2:

⚡ Quick Start

CLI Inference

Script Inference

Gradio Web Interface

🛠️ Advanced Usage

Supported MLLM Models

Supported Expert Models

Parameter Settings

🗂️ Dataset

📊 Evaluation

📝 Citation

📚 Acknowledgments

📜 License

🧰 Intended Use

Key Applications

About

Uh oh!

Releases 1

Packages

Languages

License

X-iZhang/CCD

Folders and files

Latest commit

History

Repository files navigation

CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

🔥 News

Overview

📖 Contents

⛏️ Installation

Option 1:

Option 2:

⚡ Quick Start

CLI Inference

Script Inference

Gradio Web Interface

🛠️ Advanced Usage

Supported MLLM Models

Supported Expert Models

Parameter Settings

🗂️ Dataset

📊 Evaluation

📝 Citation

📚 Acknowledgments

📜 License

🧰 Intended Use

Key Applications

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages