- [17 Oct 2025] 🔩 CCD has been upgraded to support view classification for chest X-rays — see the Supported Expert Models section for details.
- [06 Oct 2025] 🎮 The online demo is available at Hugging Face Spaces. Feel free to try it out!
- [30 Sep 2025] 🗂️ The processed test data for quick start are now available — enjoy exploring with the provided guidelines!
- [27 Sep 2025] ⛳ Our preprint is now live on arXiv — check it out for details.
Multimodal large language models (MLLMs) are advancing radiology by combining image and text understanding, but often generate inaccurate or unsupported clinical details—so-called medical hallucinations. We propose Clinical Contrastive Decoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task‑specific radiology expert models. CCD
reduces hallucinations and improves clinical accuracy without changing the base model. Experiments show CCD
boosts performance on multiple datasets and models, offering a practical way to make radiology MLLMs more reliable.
- ⛏️ Installation
- ⚡ Quick Start
- 🛠️ Advanced Usage
- 🗂️ Dataset
- 📊 Evaluation
- 📝 Citation
- 📚 Acknowledgments
- 📜 License
- 🧰 Intended Use
Tip
Use uv
for installation — it's faster and more reliable than pip
.
Install the latest version directly from GitHub for quick setup:
uv pip install git+https://github.com/X-iZhang/CCD.git
Note
Requirements: Python 3.9 or later, and a CUDA-compatible GPU (recommended)
If you plan to modify the code or contribute to the project, you can clone the repository and install it in editable mode:
- Clone the repository and navigate to the project folder
git clone https://github.com/X-iZhang/CCD.git
cd CCD
- Set up the environment and install in editable mode
conda create -n CCD python=3.10 -y
conda activate CCD
pip install uv # enable uv support
uv pip install -e .
🔄 Upgrade to the latest code base
git pull
uv pip install -e .
You can perform inference directly from the command line using our CLI tool:
python -m ccd.run_ccd \
--model-path "X-iZhang/libra-maira-2" \
--image "./path/to/Chest_Xray.jpg" \
--question "Is there evidence of any abnormalities?" \
--max-new-tokens 128
Optional arguments:
Argument | Description | Default |
---|---|---|
--alpha |
Clinical guidance weight (range: 0.0–1.0) | 0.5 |
--beta |
Expert token weight (range: 0.0–1.0) | 0.5 |
--gamma |
Token bias magnitude (range: 2, 5, 10) | 10 |
--expert-model |
Choice of expert model: "DenseNet" or "MedSiglip" |
DenseNet |
You can run inference programmatically using the ccd_eval
function from ccd/run_ccd.py
.
After installing this repository, you can easily launch a model (either your own trained model or ours) locally or in Google Colab.
from ccd import ccd_eval
# Run CCD inference on a chest X-ray
output = ccd_eval(
model_path="X-iZhang/libra-maira-2", # or your custom radiology MLLM
image="./path/to/Chest_Xray.jpg",
question="Describe the findings in this chest X-ray.",
alpha=0.5, # Clinical guidance weight
beta=0.5, # Expert token weight
gamma=10, # Token bias magnitude
temperature=0.9, # Sampling temperature
top_p=0.9, # Nucleus sampling probability
top_k=50, # Top-k sampling
expert_model="DenseNet", # or "MedSiglip"
max_new_tokens=256
)
print(output)
💡 You can also use run_eval
to test the original model output (without CCD).
from ccd import run_eval
# Run standard inference without CCD
output = run_eval(
model_path="X-iZhang/libra-maira-2",
image="./path/to/Chest_Xray.jpg",
question="Describe the findings in this chest X-ray.",
max_new_tokens=128,
num_beams=1
)
print(output)
You can launch the Gradio demo locally with:
python -m ccd.app
- Or try it directly on 🤗 Hugging Face Spaces 🤗.
Once the Gradio web interface is launched, you can open it using the URL printed on your screen. You will notice that both the default MAIRA-2 model and the expert models are ready for setup, with more models available in the list. Simply upload a chest X-ray image, enter your question, and click 🚀Generate
to view the results!
CCD is compatible with any radiology MLLM that follows the Libra/LLaVA architecture:
Note
To switch MLLM models, simply set the --model-path
argument (CLI) or model_path
parameter (Python) to one of the following checkpoints.
Model | Checkpoint |
---|---|
Libra-v1.0-7B | X-iZhang/libra-v1.0-7b |
Libra-v1.0-3B | X-iZhang/libra-v1.0-3b |
MAIRA-2 | X-iZhang/libra-maira-2 |
LLaVA-Med-v1.5 | X-iZhang/libra-llava-med-v1.5-mistral-7b |
LLaVA-Rad | X-iZhang/libra-llava-rad |
Med-CXRGen-F | X-iZhang/Med-CXRGen-F |
Med-CXRGen-I | X-iZhang/Med-CXRGen-I |
Warning
The model adapted from the Libra repository is intended for demonstration purposes only. For accurate evaluation, please refer to the original model weights and configuration settings, particularly the chat template.
CCD integrates two expert models for clinical signal extraction:
Note
To switch expert models, simply set the --expert-model
argument (CLI) or expert_model
parameter (Python) to one of the following names.
Model | Checkpoint | Note |
---|---|---|
DenseNet | torchxrayvision/densenet121-res224-chex | CheXpert (Stanford) |
MedSiglip | google/medsiglip-448 | Variant of SigLIP |
View Model | ChestViewSplit | 'Frontal' or 'Lateral' |
Tip
When deploying DenseNet, it has been upgraded to support the view classification expert model, which helps the system better understand the view position of chest X-rays, thereby improving the accuracy of report generation. MedSigLIP has also been configured accordingly. The design is inspired by the MAIRA-2 chat template.
-
alpha
(0.0-1.0): Weight for clinical guidance text- Higher = more influence from expert-generated guidance
- Recommended: 0.3-0.7
-
beta
(0.0-1.0): Weight for direct token biasing- Higher = stronger push toward clinical terminology
- Recommended: 0.3-0.7
-
gamma
(2, 5, 10): Maximum token bias magnitude- 2: Subtle influence
- 5: Moderate influence
- 10: Strong influence (default)
Tip
These parameters can be set beyond the recommended range for adversarial testing to observe CCD’s behaviour under extreme conditions.
CCD supports multiple medical imaging datasets commonly used in radiology research:
- MIMIC-CXR — Chest X-ray images with corresponding radiology reports.
- IU-Xray — Chest X-ray dataset with structured annotations.
- CheXpert Plus — Large-scale dataset for chest X-ray interpretation.
- Medical-CXR-VQA — A dataset for visual question answering in chest X-rays.
Note
To facilitate hands-on testing, we provide pre-processed test splits for MIMIC-CXR, IU-Xray, CheXpert Plus and Medical-CXR-VQA, available on Hugging Face Collections.
Warning
Carefully read the READMEs
; Please note that the image quality of these datasets has been compressed for efficient storage and sharing. Use the original datasets for evaluation.
For evaluating generated reports, we recommend using RadEval — a unified framework for radiology text evaluation that integrates multiple standard metrics. Details can be found in the GitHub repository.
You can install RadEval via pip:
pip install RadEval
Tip
RadEval supports metrics such as BLEU, ROUGE, BERTScore, CheXbert F1, and RadGraph F1, making it ideal for comprehensive evaluation of radiology report generation models.
If you find our paper and code useful in your research and applications, please cite using this BibTeX:
@misc{zhang2025ccdmitigatinghallucinationsradiology,
title={CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding},
author={Xi Zhang and Zaiqiao Meng and Jake Lever and Edmond S. L. Ho},
year={2025},
eprint={2509.23379},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.23379},
}
This project builds upon the following outstanding open-source works:
- Libra — A flexible toolkit supporting multiple radiology LLM backbones, covering the full pipeline from training to inference.
- TorchXRayVision — A library for chest X-ray datasets and models.
- MedSigLIP — Medical Signal–Language Image Pretraining.
- RadEval — A unified framework for radiology text evaluation.
We thank the authors for their valuable contributions to the medical AI community.
This project is licensed under the MIT License - see the LICENSE file for details.
CCD is designed to assist clinical practitioners, researchers, and medical trainees in generating and analysing chest X-ray reports, with a focus on temporal reasoning and context-aware description of radiological findings.
- 🩺 Clinical Decision Support — Produces preliminary findings or comparative analyses that can aid radiologists in drafting and reviewing reports.
- 🎓 Educational Tool — Demonstrates example interpretations and temporal progressions for teaching radiology residents and students.
- 🔬 Research Utility — Enables investigation of automated report generation, visual-language alignment, and temporal feature learning in medical imaging.
Important
All outputs must be reviewed and validated by qualified radiologists or medical professionals before informing any clinical decision.
Limitations and Recommendations
- Data Bias — Performance may degrade on underrepresented populations or rare disease categories.
- Clinical Oversight — CCD is a supportive system, not a replacement for professional medical judgment.
- Temporal Sensitivity — Although TAC enhances temporal alignment, subtle or atypical longitudinal changes may remain unrecognised.
- Generalisation — Performance may vary on image types or clinical contexts not present in the training distribution.
Ethical Considerations
- Patient Privacy — All input data must be fully de-identified and compliant with HIPAA, GDPR, or equivalent local regulations.
- Responsible Deployment — CCD’s outputs may contain inaccuracies; users should interpret them with appropriate caution.
- Accountability — The responsibility for clinical verification and safe deployment lies with the end-user organisation or researcher.
Disclaimer
This model and accompanying tools are intended solely for research and educational purposes.
CCD is not approved by the FDA, CE, or other regulatory authorities for clinical use.
For medical diagnosis or treatment decisions, please consult a licensed healthcare professional.