This repo contains all the data and tools needed to process data for Beanstalk. The dataset can be found on Zenodo here. See the artifact overview for usage instructions and dataset explainer for details on dataset structure if necessary.
Python with JAX, matplotlib, and tqdm packages are needed.
GPU backend for JAX is not required, but highly recommended to run processing scripts quickly. If you have these already, you can skip the rest of this section
The environment packaged as Docker build for consistency:
# Defaults to GPU enabled systems.
# Use the 'Dockerfile.cpu' build file if using CPU fallback
docker build -t beanstalk .If using the Docker environment, run this in the root of this repo directory before proceeding with following steps:
# If using CPU fallback, remove '--gpus all'
docker run -it --gpus all --rm -v "`pwd`:/home/evaluator" beanstalkExtract {data,summary,simulations}.zip packaged in Zenodo/Github releases to the root directory of this repo.
To generate figures, run ./gen_figures.sh (ignore any runtime warnings). The output should mirror figures.zip .
The contained directories data, summary, and simulations above can be reproduced from the raw cluster data, following these steps:
- Extract Raw Data from
data-raw.zipto the root directory of the repo. - Data: Run
./gen_data.sh(approx. 2 min to run). - Summary: Run
./summarize.sh(approx. 2 min to run on GPU). - Simulations: Run
./run_simulations.sh(approx. 20 min to run 10000 replicates on GPU). If necessary, replicates can be configured with first argument to the script. - Figures: Run
./gen_figures.sh(as in the previous section).
NOTES:
- GPU support for JAX is recommended; CPU backends can be alternatively be used, but may take significantly longer to execute.
- The
manage.pyscript manages all data scripts (see-hoption for more information on what parameters can be configured for experiments).