Hi, good to see you here! 👋
This is code for "Active Testing: Sample-Efficient Model Evaluation".
Please cite our paper, if you find this helpful:
@article{kossen2021active,
title={{A}ctive {T}esting: {S}ample-{E}fficient {M}odel {E}valuation},
author={Kossen, Jannik and Farquhar, Sebastian and Gal, Yarin and Rainforth, Tom},
journal={arXiv:2103.05331},
year={2021}
}
The requirements.txt can be used to set up a python environment for this codebase.
You can do this, for example, with conda:
conda create -n isactive python=3.8
conda activate isactive
pip install -r requirements.txt
- To reproduce a figure of the paper, first run the appropriate experiments
sh reproduce/experiments/figure-X.sh
- And then create the plots with the Jupyter Notebook at
notebooks/plots_paper.ipynb
-
(The notebook let's you conveniently select which plots to recreate.)
-
Which should put plots into
notebooks/plots/. -
In the above, replace
Xby123for Figures 1, 2, 34for Figure 45for Figure 56for Figure 67for Figure 7
-
Other notes
- Synthetic data experiments do not require GPUs and should run on pretty much all recent hardware.
- All other plots, realistically speaking, require GPUs.
- We are also happy to share a 4 GB file with results from all experiments presented in the paper.
- You may want to produce plots 7 and 8 for other experiment setups than the one in the paper, i.e. ones you already have computed.
- Some experiments, e.g. those for Figures 4 or 6, may run a really long time on a single GPU. It may be good to
- execute the scripts in the sh-files in parallel on multiple GPUs.
- start multiple runs in parallel and then combine experiments. (See below).
- end the runs early / decrease number of total runs (this can be very reasonable -- look at the config files in
conf/paperto modify this property)
- If you want to understand the code, below we give a good strategy for approaching it. (Also start with synthetic data experiments. They have less complex code!)
-
main.pyis the main entry point into this code-base.- It executes a a total of
n_runsactive testing experiments for a fixed setup. - Each experiment:
- Trains (or loads) one main model.
- This model can then be evaluated with a variety of acquisition strategies.
- Risk estimates are then computed for points/weights from all acquisition strategies for all risk estimators.
- It executes a a total of
-
This repository uses
Hydrato manage configs.- Look at
conf/config.yamlor one of the experiments inconf/...for default configs and hyperparameters. - Experiments are autologged and results saved to
./output/.
- Look at
-
See
notebooks/eplore_experiment.ipynbfor some example code on how to evaluate custom experiments.- The evaluations use
activetesting.visualize.Visualiserwhich implements visualisation methods. - Give it a
pathto an experiment inoutput/path/to/experimentand explore the methods. - If you want to combine data from multiple runs, give it a list of paths.
- I prefer to load this in Jupyter Notebooks, but hey, everybody's different.
- The evaluations use
-
A guide to the code
main.pyruns repeated experiments and orchestrates the whole shebang.- It iterates through all
n_runsandacquisition strategies.
- It iterates through all
experiment.pyhandles a single experiment.- It combines the
model,dataset,acquisition strategy, andrisk estimators.
- It combines the
datasets.py,aquisition.py,loss.py,risk_estimators.pyall contain exactly what you would expect!hoover.pyis a logging module.models/contains all models, scikit-learn and pyTorch.- In
sk2torch.pywe have some code that wraps torch models in a way that lets them be used as scikit-learn models from the outside.
- In
Thanks for stopping by!
If you find anything wrong with the code, please contact us.
We are happy to answer any questions related to the code and project.