This repo contains the code for "Model Agnostic Interpretability for Multiple Instance Learning".
Paper: https://arxiv.org/abs/2201.11701
Executable scripts can be found in the scripts directory.
Source code can be found in the src directory.
One copy of each trained model can be found in models.
Outputs from experiments can be found in out.
Results can be found in results.
We use five custom data set implementations:
mnist_bags.py, crc_dataset.py, sival_dataset, musk_dataset and tef_dataset;
all inherit from mil_dataset.py.
Rather than returning a single instance, they return a bag of instances and a single label.
Sources:
- SIVAL: http://pages.cs.wisc.edu/~bsettles/data/
- MNIST: https://pytorch.org/vision/stable/datasets.html#mnist
- CRC: https://warwick.ac.uk/fac/cross_fac/tia/data/crchistolabelednucleihe/
- Musk: https://archive.ics.uci.edu/ml/datasets/Musk+%28Version+2%29
- Tiger, Elephant and Fox: http://www.cs.columbia.edu/~andrews/mil/datasets.html
The models are implemented in src/model.
We provide trained versions of these models in the models directory.
The training scripts are in scripts/train.
These can be used to train single or multiple models.
They were tuned using the scripts in scripts/tune .
The interpretability functionality can be found in the src/interpretability directory.
The methods are implemented in interpretability/instance_attribution.
Our experiment scripts can be found in scripts/experiments.
These produce the sample size figures found in the paper.
The output scripts can be found in scripts/out.
These produce the interpretability outputs found in the paper.
The milli_weights_plot file produces the plots for the MILLI curve and integral.
All paths are relative to the root of the repo, so scripts must be executed from this location.
Required libraries can be found in requirements.txt.