The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, Structural Entities extraction and patient indications Incorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.
- 2024-09-09, Upload the Poster
- 2024-09-19, Update the repository to make it easy.
- 2024-09-19, Update the generated reports for the MIMIC-CXR test set.
- torch==2.1.2+cu118
- transformers==4.23.1
- torchvision==0.16.2+cu118
- radgraph==0.09
- Due to the specific environment of RadGraph,  please refer to knowledge_encoder/factual serialization. pyfor the environment of the structural entities approach.
You can download checkpoints of SEI as follows:
- For MIMIC-CXR, you can download checkpoints from Baidu Netdisk (its code isMK13) and huggingface 🤗.
- 
For MIMIC-CXR, you can download medical images from PhysioNet.
- 
You can download medical reportsfrom Google Drive. Note that you can apply with your license of PhysioNet, and itstoy caseis inknowledge_encoder/case.json
- 
Config RadGraph environment based on knowledge_encoder/factual_serialization.py===================environmental setting================= Basic Setup (One-time activity) a. Clone the DYGIE++ repository from here. This repository is managed by Wadden et al., authors of the paper Entity, Relation, and Event Extraction with Contextualized Span Representations. git clone https://github.com/dwadden/dygiepp.git b. Navigate to the root of the repo in your system and use the following commands to set the conda environment: conda create --name dygiepp python=3.7 conda activate dygiepp cd dygiepp pip install -r requirements.txt conda develop . # Adds DyGIE to your PYTHONPATH c. Activate the conda environment: conda activate dygiepp Notably, for our RadGraph environment, you can refer to knowledge_encoder/radgraph_requirements.yml.
- 
Configure radgraph_pathandann_pathinknowledge_encoder/see.py. Theann_pathcorresponds to the local file path to the medical reports. Theradgraph_pathcan be obtained directly from PhysioNet.
- 
Run the knowledge_encoder/see.pyto extract the factual entity sequence for each report.
- 
Finally, the annotation.jsonbecomesmimic_cxr_annotation_sen.jsonthat is identical tonew_ann_file_namevariable insee.py
- Run bash pretrain_mimic_cxr.shto pretrain a model on the MIMIC-CXR data (Note that themimic_cxr_ann_pathismimic_cxr_annotation_sen.json).
- 
Config --loadargument inpretrain_inference_mimic_cxr.sh. Note that the argument is the pre-trained model from the first stage.
- 
Run bash pretrain_inference_mimic_cxr.shto retrieve similar historical cases for each sample, formingmimic_cxr_annotation_sen_best_reports_keywords_20.json(i.e., themimic_cxr_annotation_sen.jsonbecomes this*.jsonfile).
- 
Extract and preprocess the indication sectionin the radiology report.a. Config ann_pathandreport_dirinknowledge_encoder/preprocessing_indication_section.py, and its value ismimic_cxr_annotation_sen_best_reports_keywords_20.json. Note thatreport_dircan be downloaded from PhysioNet.b. Run knowledge_encoder/preprocessing_indication_section.py, formingmimic_cxr_annotation_sen_best_reports_keywords_20_all_components_with_fs_v0227.json
- 
Config --loadargument infinetune_mimic_cxr.sh. Note that the argument is the pre-trained model from the first stage. Furthermore,mimic_cxr_ann_pathismimic_cxr_annotation_sen_best_reports_keywords_20_all_components_with_fs_v0227.json
- 
Download these checkpoints. Notably, the chexbert.pthandradgraphare used to calculate CE metrics, andbert-base-uncasedandscibert_scivocab_uncasedare pre-trained models for cross-modal fusion network and text encoder. Then put these checkpoints in the same local dir (e.g., "/home/data/checkpoints"), and configure the--ckpt_zoo_dir /home/data/checkpointsargument infinetune_mimic_cxr.sh
| Chekpoint | Variable_name | Download | 
|---|---|---|
| chexbert.pth | chexbert_path | here | 
| bert-base-uncased | bert_path | huggingface | 
| radgraph | radgraph_path | PhysioNet | 
| scibert_scivocab_uncased | scibert_path | huggingface | 
- Run bash finetune_mimic_cxr.shto generate reports based on similar historical cases.
- 
You must download the medical images, their corresponding reports (i.e., mimic_cxr_annotation_sen_best_reports_keywords_20_all_components_with_fs_v0227.json), and checkpoints (i.e.,SEI-1-finetune-model-best.pth) in Section Datasets and Section Checkpoints, respectively.
- 
Config --loadand--mimic_cxr_ann_patharguments intest_mimic_cxr.sh
- 
Run bash test_mimic_cxr.shto generate reports based on similar historical cases.
- 
Results on MIMIC-CXR are presented as follows: 
- Next, the code for this project will be streamlined.
If you use or extend our work, please cite our paper at MICCAI 2024.
@InProceedings{liu-sei-miccai-2024,
      author={Liu, Kang and Ma, Zhuoqi and Kang, Xiaolu and Zhong, Zhusi and Jiao, Zhicheng and Baird, Grayson and Bai, Harrison and Miao, Qiguang},
      title={Structural Entities Extraction and Patient Indications Incorporation for Chest X-Ray Report Generation},
      booktitle={Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
      year={2024},
      publisher={Springer Nature Switzerland},
      address={Cham},
      pages={433--443},
      isbn={978-3-031-72384-1},
      doi={10.1007/978-3-031-72384-1_41}
}
- R2Gen Some codes are adapted based on R2Gen.
- R2GenCMN Some codes are adapted based on R2GenCMN.
- MGCA Some codes are adapted based on MGCA.
[1] Chen, Z., Song, Y., Chang, T.H., Wan, X., 2020. Generating radiology reports via memory-driven transformer, in: EMNLP, pp. 1439–1449.
[2] Chen, Z., Shen, Y., Song, Y., Wan, X., 2021. Cross-modal memory networks for radiology report generation, in: ACL, pp. 5904–5914.
[3] Wang, F., Zhou, Y., Wang, S., Vardhanabhuti, V., Yu, L., 2022. Multigranularity cross-modal alignment for generalized medical visual representation learning, in: NeurIPS, pp. 33536–33549.