Thanks to visit codestin.com
Credit goes to github.com

Skip to content

cmarti/epik_analyses

Repository files navigation

Learning sequence-function relationships with scalable, interpretable Gaussian processes

This repository contains the code to reproduce the analyses and figures from our publication on using Gaussian process models to infer sequence-function relationships.

For running our code on your own data, please refer to our EpiK repository.

  • Zhou J., Martí-Gómez C., Petti S., McCandlish D.M. (2025). Learning sequence-function relationships with scalable, interpretable Gaussian processes. bioRxiv. doi: 10.1101/2025.08.15.670613

Repository Structure

The repository is organized into the following folders:

  • data: Contains input data from external sources required for all calculations.
  • scripts: Contains scripts necessary to reproduce the analyses in the paper.
    • cluster: Includes bash scripts for submitting GPU jobs to our cluster.
    • figures: Contains Python scripts for generating all figures.
    • process: Includes Python scripts for data preprocessing and processing files in the output folder to create figures.
    • scratch: Contains older scripts used during exploratory analyses. These scripts may not align with the current repository structure and are retained for record-keeping purposes.
  • results: Contains result files used for figure generation.
  • output: Stores intermediate output files generated during model fitting.
  • figures: Contains all figure panels created using the scripts.

Requirements

Dependencies

Setting Up the Environment

Create a new Conda environment:

conda create -n epik python=3.8

Activate the environment and install dependencies:

conda activate epik

git clone https://github.com/cmarti/epik.git
cd epik
python setup.py install
cd ..

git clone https://github.com/cmarti/gpmap-tools.git
cd gpmap-tools
python setup.py install
cd ..

Download the repository:

git clone [email protected]:cmarti/epik_analyses.git

Figures

This repository provides all the code and data required to reproduce the figures from our study. The scripts for figure generation are located in the scripts/figures folder. To generate the figures, run the make_figures.sh script:

bash make_figures.sh

Computational Analysis

The make_analysis.sh script outlines the steps to run various scripts for reproducing the analyses. Note that some jobs or scripts depend on the completion of other jobs.

NOTE: We use EpiK on V100 GPUs in the Cold Spring Harbor Laboratory high-performance computing cluster. The provided scripts in scripts/cluster are tailored to our system and may require adaptation for use in other computing environments.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published