G-ReInCATALiZE

GPU-accelerated Reinforcement learning-enabled Integrated Combination of Automated Transformer-based Approaches with Ligand binding and 3D prediction for Enzyme Evolution

About

G-ReInCATALiZE is an advanced computational pipeline for in-silico enzyme evolution, developed by the CCBIO team at the University of Applied Sciences (ZHAW) in Wädenswil. The system combines cutting-edge machine learning approaches to optimize enzyme mutations for improved catalytic activity with specific target substrates.

Key Capabilities:

Find optimal enzyme mutants from wildtype enzymes for targeted ligand transformations
GPU-accelerated molecular docking and structure prediction
Reinforcement learning-guided mutation optimization
Transformer-based mutation effect prediction using ESM2 models

Architecture

The pipeline integrates four main computational components as shown in the overview diagram:

DeepMut - ESM2 transformer-based semi-rational multi-site mutation prediction
RESIDORA - Deep reinforcement learning using Proximal Policy Optimization (PPO) for residue substitution
Pyroprolex - PyRosetta-based local mutant structure relaxation
GAESP - GPU-accelerated enzyme-substrate docking pipeline using Vina-GPU

The system uses an actor-critic reinforcement learning architecture to iteratively improve enzyme candidates through mutation and structural evaluation cycles.

Getting Started

Prerequisites

Docker with GPU support
NVIDIA Container Toolkit for GPU acceleration
NVIDIA GPU with CUDA support

Quick Start with Docker

Build the container:

docker build --platform linux/amd64 -t gaesp .

Run with GPU support:

docker run -d --gpus all --name gaesp-container -p 80:80 gaesp

Execute the pipeline:

# Standard execution
PYTHONPATH=. python src/main.py --config src/CONFIG/config_default.yaml

# Debug mode
PYTHONPATH=. python src/main.py --config src/CONFIG/config_debug.yaml

Usage

Configuration

The pipeline is configuration-driven using YAML files in src/CONFIG/. Key configuration sections:

globalConfig - Transformer models, GPU settings, file paths
gaespConfig - Wildtype sequence, structure paths, docking parameters
pyroprolexConfig - PyRosetta mutation settings

Example Configuration

config1:
  globalConfig:
    transformerName: facebook/esm2_t6_8M_UR50D
    transformerDevice: cuda:0
    gpu_vina: true

  gaespConfig:
    wildTypeAASeq: "MSTETLRLQKARATEEGLAFETPGGLT..."
    wildTypeStructurePath: data/raw/enzyme.pdb
    reference_ligand: data/raw/reference/ligand.pdb
    boxSize: 15
    num_modes: 5

Running Tests

pytest tests/

Development Setup

Local Development (Advanced)

Click to expand detailed installation instructions

The Docker container includes a complex molecular biology software stack. For local development, you'll need:

Core Dependencies

Python 3.8+ with Poetry for dependency management
PyTorch with CUDA support
Transformers library for ESM2 models

Molecular Biology Tools

Boost Library 1.82.0
Vina-GPU-2.0 for GPU-accelerated docking
AutoDock-Vina for scripting support
OpenBabel for chemical structure processing
PyRosetta for protein structure manipulation
ADFRsuite-1.0 for additional docking utilities

Installation Commands

Boost Library:

wget https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/boost_1_82_0.tar.bz2
tar --bzip2 -xf boost_1_82_0.tar.bz2
cd boost_1_82_0
sudo ./bootstrap.sh
sudo ./b2 install

Vina-GPU-2.0:

git clone https://github.com/DeltaGroupNJUPT/Vina-GPU-2.0
# Configure Makefile with appropriate paths
cd Vina-GPU-2.0/Vina-GPU+
make clean && make source

OpenBabel:

conda install -c conda-forge openbabel

PyRosetta: Follow instructions at https://www.pyrosetta.org/downloads

Package Management

# Install dependencies
poetry install

# Add new dependency
poetry add <package_name>

# Code formatting (Black configured for 79 character lines)
black src/

License

Distributed under the MIT License.

Contact

CCBIO Team - University of Applied Sciences (ZHAW), Wädenswil

Developed for computational enzyme engineering and in-silico directed evolution research.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
PPO_logs		PPO_logs
data		data
docker		docker
docker_esm		docker_esm
docker_reincat_pipeline		docker_reincat_pipeline
docs		docs
gitTest		gitTest
images		images
log		log
models		models
notebooks		notebooks
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONDA_ENV_INIT.sh		CONDA_ENV_INIT.sh
README.md		README.md
RUN_MAIN.sh		RUN_MAIN.sh
RUN_MAIN_DEBUG.sh		RUN_MAIN_DEBUG.sh
activate_claude_code.sh		activate_claude_code.sh
mvFilesFromServerToDevice.txt		mvFilesFromServerToDevice.txt
poetry.lock		poetry.lock
projectStructure.txt		projectStructure.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G-ReInCATALiZE

About

Architecture

Getting Started

Prerequisites

Quick Start with Docker

Usage

Configuration

Example Configuration

Running Tests

Development Setup

Local Development (Advanced)

Core Dependencies

Molecular Biology Tools

Installation Commands

Package Management

License

Contact

About

Uh oh!

Releases

Packages

Languages

cewinharhar/reincatalyze

Folders and files

Latest commit

History

Repository files navigation

G-ReInCATALiZE

About

Architecture

Getting Started

Prerequisites

Quick Start with Docker

Usage

Configuration

Example Configuration

Running Tests

Development Setup

Local Development (Advanced)

Core Dependencies

Molecular Biology Tools

Installation Commands

Package Management

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages