0 ratings0% found this document useful (0 votes) 19 views8 pagesMorekuro
Read Japanese manga inside browser with selectable text.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
911724, 8:56 PM Git -ambrzeskikagglsna-2019
@ ambrzeski / kaggle-rsna-2019 | Public
Wy W6stars Y Sforks PF Branches © Tags Activity
Ye star D Notifications,
<> Code © Issues 1 Pullrequests 3 © Actions fF) Projects © Security [~ Insights
P PF 20Branches QOTags FP OD Q Gotofile Go to file Code .
© tgilewicz-cta Update READMEmd 3e6b4dd -5 years ago @
@ rsnai9 Set paths for stage2 data 5 years ago
DB gitignore added output directory to giti 5 years ago
O README.md Update README. md 5 years ago
DB directory_structure.tet Add directory_structure.txt 5 years ago
(3 requirements.txt Update requiements.txt 5 years ago
Kaggle RSNA 2019 - 8th place solution
Project is authored by BrainScan.ai team (Adam Brzeski, Maciej Budys, Tomasz Gilewicz, #-
Szymon Papierowski) and Dmytro Poplavskiy.
Solution overview: link
Reproduction instructions:
* Data conversion
* Dataset files
\g_ models 1/2 (Dmytro's models
* Training models 2/2 (BrainScan models
* Generating predictions for challenge data 1/2 (Dmytro's models)
* Generating predictions for challenge data 2/2 (BrainScan models)
* Tra
hips igihub.com/ambrzeskikaggle-sna-2019 189117124, 8:56 PM GitHub -ambrzeskikagglesna-2019
‘* Second level model and generating final predictions
© Testing on new data
Data conversion
Project source is expected to operate on a converted form of challenge data, involving e.g.
changing directory structure and image format. The new directory structure generated
according to instructions below will however create symlinks to the original data, so the
original data should be kept in original location after completing data conversion process
1, Download the Kaggle data, including: stage_1_train_images, stage_1_test_images,
stage_2_test_images, stage_1_train.csv and stage_2_train.csv.
2. Open project config file located at rsna19/config.py:
class config oa
train_dir = '/kolos/storage/ct/data/rsna/stage_1_train_images*
test_dir = ‘/kolos/storage/ct/data/rsna/stage_1_test_inages
test2_dir = '/kolos/storage/ct/data/rsna/stage_2_test_images'
labels_path = "/kolos/storage/ct/data/rsna/stage_1_train.csv"
data_root = "/kolos/m2/ct/data/rsna/"
# Used for Dnytro's models
checkpoints_dir = "../output/checkpoints"
tensorboard_dir = ",./output/tensorboard”
oof_dir Joutput/oof"
# Used for Brainscan models
model_outdir = '/kolos/m2/ct/nodels/classification/rsna/*
Modify train_dir, test_dir, test2_dir and labels_path variables to make them point to
appropriate data paths on your hard drive. Also, modify the data_root variable to indicate
output directory, where the converted data should be saved. Finally, set models output
paths to desired locations, which will be used later during trainings.
3, Next, three scripts should be executed to perform the full process of conversion. The
scripts can be run right away if you open the project in PyCharm. Otherwise you may
need add project package to PYTHONPATH:
export PYTHONPATH="$PYTHONPATH: /(path}/(to}/kaggle-rsna-2019/rsnai9/"
Finally you can run the three scripts (please keep the order)
hips igthub.com/ambrzeskikaggle-sna-2019 289117124, 8:56 PM GitHub -ambrzeskikagglesna-2019
LU README
$ python rsnai9/data/scripts/create_datafrane.py
4 Create new directory structure and symlinks to original dicoms
$ python rsnai9/data/scripts/create_symlinks.py
Convert dicom images (slices) to npy arrays and pngs images
$ python rsnai9/data/seripts/convert_dataset..py
$ python rsnai9/data/seripts/prepare_3d_data.py
As a result of the conversion, for each examination a set of subdirs will be created:
/dicom - original dicom files
* /png - png images with drawn labels for easier viewing and browsing
/npy - slices saved as numpy arrays for faster loading during training (>3x faster)
3d - transformed slices that are used for actual trainings, transforms include fixing scan
gantry tilt and 400x400 crop in x and y dimensions around volume center of mass
Dataset files
In rsna19/data/csv directory you can find a set .csv dataset files defining train/val/test splits,
cross-validation splits and samples labels. The dataset files were generated using
rsna9/data/notebooks/generate_folds.ipynb notebook. The most significant dataset files are:
* Sfold.csv - stage 1 training data sample split into 5 folds
‘© Sfolds-rev3.csv - same as above, but labels for 196 are modified basing on our manual
annotation of scans
+ Sfold-test.csv - stage 2 training data (stage 1 extended with stage 1 test data), split into §
fold - unfortunately, as it turned out later, patient leaks between folds occured in this
split
* Sfolds-test-rev3.csv - same as above, but labels for 196 are modified basing on our
manual annotation of scans
Training models 1/2 (Dmytro’s models)
For training stage 1 models run the following commands for folds 0-4:
$ python models/cl#2D/train.py train --model resnet18_ 400 --fold @
$ python models/clf2b/train.py train --model
resnet34_400_5_planes_combine_last_var_dr@ --fold @
$ python models/cl#2D/train_segmentation.py train --model
resnet18_384_5_planes_bn_f8 --fold @
tps igthub.com/ambrzeskikaggle-sna-2019 31897124, 8:66 PM GitHub -ambrzeskikagglesna-2019
$ python models/cl#3D/train_3d.py train --model -
dpn68_384_S_planes_conbine_last --fold @
$ python models/cl#20/train.py train --model airnetS@_384 --fold @ --apex
For stage 2 models run the follo
9 for folds 0-4:
$ python models/clf20/train.py train --model se_preresnext26b_400 --fold@ --
apex
$ python models/cl#2D/train.py train --model resnextS@_400 --fold @ --apex
Training models 2/2 (BrainScan models)
First you need to train baseline models that are used for initiating weights in final trainings.
Trainings are conducted by running rsna19/models/clf2D¢/train.py script with appropriate
config imported at the top of the file instead of the default place of 'clf2Dc’. For example, to
train train ‘clf2Dc_resnet34_3c’ config, change:
from rsnai9.configs.clf20c import Config ae
to:
from psnai9.configs.cl#20c_resnet34_3c Inport Config o
Also each training must be repeated 5 times with different ‘val_folds’ attributes (from [0] to
{4}), modified in appropriate config files. You can also change gpu used for training using
‘gpu' attribute
Configs to train for baseline models:
© clf2Dc_resnet34_3c.py
© dlf2Dc_resnet50_3c_384.py
Then final models that we trained for stage 1 can be trained using configs:
# df2D¢_resnet34_3x3.py
* df2Dc resnet50_7c_400.py
In stage 2 we trained to additional models (make sure to set 5fold-test.csv for both
“train_dataset_file’ and 'val_dataset_file' in config files)
© dlf2Dc_resnet34_3x3_2.py
# df2D¢_resnet34_3x3_5 slices.py
itps:igitub convambrzeskikaggle-na 2019 an911724, 8:96 PM Gitub-ambreeskikaggleena-2019
Generating predictions for challenge data 1/2 (Dmytro's
models)
Run the following set of commands for folds 0-4:
# Stage 1 models out-of-fold predictions a
$ python models/cl#2D/predict.py predict_oof --model resnet18_4e@ --epoch 6 --
fold @ --mode all
$ python models/cl#2D/predict.py predict_oof --model
resnet34_40a_5_planes_combine_last_var_dr@ --epoch 7 --fold @ --mode all
$ python models/cl#3D/predict.py predict_oof --model
dpn68_384_5_planes_conbine_last --epoch 72 --fold @ --mode all
$ python models/cl#2D/predict.py predict_oof --model
resnet18_384_5_planes_bn_f8 --epoch 6 --fold @ --mode all
$ python models/clf2D/predict.py predict_oof --model airnetS@_384 --epoch 6 --
fold @ --mode all
4H Stage 1 models test predictions
$ python models/cl#2D/predict.py predict_test --model resnet18_400 --epoch 6 -
-fold @ --mode all
$ python models/cl#2/predict.py predict_test --model
resnet34_40@_5_planes_combine_last_var_dr@ --epoch 7 --fold @ --node all
$ python models/clf3D/predict.py predict_test --model
dpn68_384_5_planes_conbine_last --epoch 72 --fold @ --mode all
$ python models/cl#2D/predict.py predict_test --model
resnet18_384_5_planes_bn_f8 --epoch 6 --fold @ --mode all
$ python models/clf2D/predict.py predict_test --model airnetS@_384 --epoch 6 -
-fold @ --mode all
4 Stage 2 models out-of-fold predictions
$ python models/cl#2D/predict.py predict_oof --model se_preresnext26b_400 --
epoch 6 --fold @ --mode all
$ python models/clf2D/predict.py predict_oof --model resnextS@_40@ --epoch 6 -
-fold @ --mode all
4H Stage 2 models test predictions
$ python models/clf2D/predict.py predict_test --model se_preresnext26b_400 --
epoch 6 --fold @ --mode all
$ python models/cl#2D/predict.py predict_test --model resnexts®_4@@ --epoch 6
=-fold @ --mode all
Generating predictions for challenge data 2/2 (BrainScan
models)
Calculate model predictions including TTAs by running:
tps igthub.com/ambrzeskikaggle-sna-2019 89117124, 8:56 PM GitHub -ambrzeskikagglesna-2019
$ python rsna19/nodels/clf20¢/predict-py a
Second level model and generating final predictions
1. Make sure that all models are copied to common directory, so that the directory structure
matches the following form:
{model_outdir}/{model_name}/{fold)/predictions/{predictions.csv} oa
Set path to this directory in rsna19/configs/base_config.Config.model_outdir.
2. Generate the dataset for the second level model by running the following script for each
ofS folds (fold name can be specified in rsna19/configs/second_level.py):
$ python rsnai9/models/second_level/dataset2.py ea
3. In rsna19/configs/second_level.py:
© set models_root to path containing all trained models;
© set cache_dir to some path where data used by second level model will be saved.
4, Run rsna19/models/second_level/dataset2.py for 5 folds (set appropriate fold in
rsna19/configs/second_level.py)..
5. Run rsna19/models/second_level/second_level.ipynb notebook to train L2 model and
save csv with predictions on test set.
6, Run rsna19/data/generate_submission.py script to generate submission from the
obtained csv.
Testing on new data
Unfortunately, as of now we don't provide a script to generate predictions on new data
directly. However, if you can save the new data in the same format as challenge data, you can
use the instructions above to preprocess the data and run the inference.
Specifically, you need to take the following steps:
«set the path to the new test data directory in ‘test dir’ in ‘rsna19/config.py'
* continue with data conversion steps
# generate new test csv file using rsna19/data/notebooks/generate_folds.ipynb notebook
tps igthub.com/ambrzeskikaggle-sna-2019 a89117124, 8:56 PM GitHub -ambrzeskikagglesna-2019
«update test csv file path in appropriate prediction scripts
‘+ run the predictions steps using the new test csv file
Hardware & OS
Server 1:
* Intel Core i7-6850K CPU @ 3.60GHz, 6 cores
© 64GB RAM
4x Nvidia Titan Xp
Server 2:
Intel Core i7-6850K CPU @ 3.60GHz, 6 cores
© 64GB RAM
© 4x Nvidia Titan X
Server 3:
© Intel Core i5-3570 CPU @ 3.40GHz, 4 cores
© 32GB RAM
© 2x Nvidia Titan X
Server 4:
© 2x Xeon E5-2667 v2
© 384.GB RAM
© 4x 1080 Ti
Server 5:
© AMD TR 1950x
© 128 GB RAM
© 2x 2080 Ti, 1x 1080 Ti
os:
© Brainscan.ai team: Ubuntu 16.04
2 Menten Danlanieline hese
hips igihub.com/ambrzeskikaggle-sna-2019
7189117124, 8:56 PM GitHub -ambrzeskikagglesna-2019
Releases
No releases published
Packages
No packages publishes
Contributors 6
1@sTAA
Languages
© Python 549% @ Jupyter Notebook 45.1%
hips igihub.com/ambrzeskikaggle-sna-2019 a8