A pytorch implementation of On the Automatic Generation of Medical Imaging Reports.
The detail of the paper can be found in On the Automatic Generation of Medical Imaging Reports.
From model only_training/only_training/20180528-02:44:52/
| Mode | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGE | CIDEr |
|---|---|---|---|---|---|---|---|
| Train | 0.386 | 0.275 | 0.215 | 0.176 | 0.187 | 0.369 | 1.075 |
| Val | 0.303 | 0.182 | 0.118 | 0.077 | 0.143 | 0.256 | 0.214 |
| Test | 0.316 | 0.190 | 0.123 | 0.081 | 0.148 | 0.264 | 0.221 |
| Paper | 0.517 | 0.386 | 0.306 | 0.247 | 0.217 | 0.447 | 0.327 |
usage: trainer.py [-h] [--patience PATIENCE] [--mode MODE]
[--vocab_path VOCAB_PATH] [--image_dir IMAGE_DIR]
[--caption_json CAPTION_JSON]
[--train_file_list TRAIN_FILE_LIST]
[--val_file_list VAL_FILE_LIST] [--resize RESIZE]
[--crop_size CROP_SIZE] [--model_path MODEL_PATH]
[--load_model_path LOAD_MODEL_PATH]
[--saved_model_name SAVED_MODEL_NAME] [--momentum MOMENTUM]
[--visual_model_name VISUAL_MODEL_NAME] [--pretrained]
[--classes CLASSES]
[--sementic_features_dim SEMENTIC_FEATURES_DIM] [--k K]
[--attention_version ATTENTION_VERSION]
[--embed_size EMBED_SIZE] [--hidden_size HIDDEN_SIZE]
[--sent_version SENT_VERSION]
[--sentence_num_layers SENTENCE_NUM_LAYERS]
[--dropout DROPOUT] [--word_num_layers WORD_NUM_LAYERS]
[--batch_size BATCH_SIZE] [--learning_rate LEARNING_RATE]
[--epochs EPOCHS] [--clip CLIP] [--s_max S_MAX]
[--n_max N_MAX] [--lambda_tag LAMBDA_TAG]
[--lambda_stop LAMBDA_STOP] [--lambda_word LAMBDA_WORD]
optional arguments:
-h, --help show this help message and exit
--patience PATIENCE
--mode MODE
--vocab_path VOCAB_PATH
the path for vocabulary object
--image_dir IMAGE_DIR
the path for images
--caption_json CAPTION_JSON
path for captions
--train_file_list TRAIN_FILE_LIST
the train array
--val_file_list VAL_FILE_LIST
the val array
--resize RESIZE size for resizing images
--crop_size CROP_SIZE
size for randomly cropping images
--model_path MODEL_PATH
path for saving trained models
--load_model_path LOAD_MODEL_PATH
The path of loaded model
--saved_model_name SAVED_MODEL_NAME
The name of saved model
--momentum MOMENTUM
--visual_model_name VISUAL_MODEL_NAME
CNN model name
--pretrained not using pretrained model when training
--classes CLASSES
--sementic_features_dim SEMENTIC_FEATURES_DIM
--k K
--attention_version ATTENTION_VERSION
--embed_size EMBED_SIZE
--hidden_size HIDDEN_SIZE
--sent_version SENT_VERSION
--sentence_num_layers SENTENCE_NUM_LAYERS
--dropout DROPOUT
--word_num_layers WORD_NUM_LAYERS
--batch_size BATCH_SIZE
--learning_rate LEARNING_RATE
--epochs EPOCHS
--clip CLIP gradient clip, -1 means no clip (default: 0.35)
--s_max S_MAX
--n_max N_MAX
--lambda_tag LAMBDA_TAG
--lambda_stop LAMBDA_STOP
--lambda_word LAMBDA_WORD
usage: tester.py [-h] [--model_dir MODEL_DIR] [--image_dir IMAGE_DIR]
[--caption_json CAPTION_JSON] [--vocab_path VOCAB_PATH]
[--file_lits FILE_LITS] [--load_model_path LOAD_MODEL_PATH]
[--resize RESIZE] [--cam_size CAM_SIZE]
[--generate_dir GENERATE_DIR] [--result_path RESULT_PATH]
[--result_name RESULT_NAME] [--momentum MOMENTUM]
[--visual_model_name VISUAL_MODEL_NAME] [--pretrained]
[--classes CLASSES]
[--sementic_features_dim SEMENTIC_FEATURES_DIM] [--k K]
[--attention_version ATTENTION_VERSION]
[--embed_size EMBED_SIZE] [--hidden_size HIDDEN_SIZE]
[--sent_version SENT_VERSION]
[--sentence_num_layers SENTENCE_NUM_LAYERS]
[--dropout DROPOUT] [--word_num_layers WORD_NUM_LAYERS]
[--s_max S_MAX] [--n_max N_MAX] [--batch_size BATCH_SIZE]
[--lambda_tag LAMBDA_TAG] [--lambda_stop LAMBDA_STOP]
[--lambda_word LAMBDA_WORD]
optional arguments:
-h, --help show this help message and exit
--model_dir MODEL_DIR
--image_dir IMAGE_DIR
the path for images
--caption_json CAPTION_JSON
path for captions
--vocab_path VOCAB_PATH
the path for vocabulary object
--file_lits FILE_LITS
the path for test file list
--load_model_path LOAD_MODEL_PATH
The path of loaded model
--resize RESIZE size for resizing images
--cam_size CAM_SIZE
--generate_dir GENERATE_DIR
--result_path RESULT_PATH
the path for storing results
--result_name RESULT_NAME
the name of results
--momentum MOMENTUM
--visual_model_name VISUAL_MODEL_NAME
CNN model name
--pretrained not using pretrained model when training
--classes CLASSES
--sementic_features_dim SEMENTIC_FEATURES_DIM
--k K
--attention_version ATTENTION_VERSION
--embed_size EMBED_SIZE
--hidden_size HIDDEN_SIZE
--sent_version SENT_VERSION
--sentence_num_layers SENTENCE_NUM_LAYERS
--dropout DROPOUT
--word_num_layers WORD_NUM_LAYERS
--s_max S_MAX
--n_max N_MAX
--batch_size BATCH_SIZE
--lambda_tag LAMBDA_TAG
--lambda_stop LAMBDA_STOP
--lambda_word LAMBDA_WORD
- test(): Compute loss
- generate(): generate captions for each image, and saved result (json) in
os.path.join(model_dir, result_path). - sample(img_name): generate a caption for an image and its heatmap (
cam).
python2 metric_performance.py
usage: metric_performance.py [-h] [--result_path RESULT_PATH]
optional arguments:
-h, --help show this help message and exit
--result_path RESULT_PATH
By using jupyter to read review_captions.ipynb, you can review the model generated captions for each image.
By changing tensorboard --logdir report_models to your owned saved models path in tensorboard.sh, you can visualize training procedure.
./tensorboard.sh
In utils/models, I have implemented all models in basic version, and I think there will be some more powerful model structures which can improve the performance. So enjoy your work ^_^.