This repository contains the code to reproduce the analyses and figures from our publication on using Gaussian process models to infer sequence-function relationships.
For running our code on your own data, please refer to our EpiK repository.
- Zhou J., Martí-Gómez C., Petti S., McCandlish D.M. (2025). Learning sequence-function relationships with scalable, interpretable Gaussian processes. bioRxiv. doi: 10.1101/2025.08.15.670613
The repository is organized into the following folders:
data: Contains input data from external sources required for all calculations.scripts: Contains scripts necessary to reproduce the analyses in the paper.cluster: Includes bash scripts for submitting GPU jobs to our cluster.figures: Contains Python scripts for generating all figures.process: Includes Python scripts for data preprocessing and processing files in theoutputfolder to create figures.scratch: Contains older scripts used during exploratory analyses. These scripts may not align with the current repository structure and are retained for record-keeping purposes.
results: Contains result files used for figure generation.output: Stores intermediate output files generated during model fitting.figures: Contains all figure panels created using the scripts.
Create a new Conda environment:
conda create -n epik python=3.8Activate the environment and install dependencies:
conda activate epik
git clone https://github.com/cmarti/epik.git
cd epik
python setup.py install
cd ..
git clone https://github.com/cmarti/gpmap-tools.git
cd gpmap-tools
python setup.py install
cd ..Download the repository:
git clone [email protected]:cmarti/epik_analyses.gitThis repository provides all the code and data required to reproduce the figures from our study. The scripts for figure generation are located in the scripts/figures folder. To generate the figures, run the make_figures.sh script:
bash make_figures.shThe make_analysis.sh script outlines the steps to run various scripts for reproducing the analyses. Note that some jobs or scripts depend on the completion of other jobs.
NOTE: We use EpiK on V100 GPUs in the Cold Spring Harbor Laboratory high-performance computing cluster. The provided scripts in
scripts/clusterare tailored to our system and may require adaptation for use in other computing environments.