Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Codebase for Germinal, a broadly enabling generative pipeline for efficient generation of epitope-targeted de novo antibodies.

License

cytokineking/germinal

 
 

Repository files navigation

Germinal: Efficient generation of epitope-targeted de novo antibodies

Germinal Banner

Germinal is a pipeline for designing de novo antibodies against specified epitopes on target proteins. The pipeline follows a 3-step process: hallucination based on ColabDesign, selective sequence redesign with AbMPNN, and cofolding with a structure prediction model. Germinal is capable of designing both nanobodies and scFvs against user-specified residues on target proteins.

We describe Germinal in the preprint: "Efficient generation of epitope-targeted de novo antibodies with Germinal"

⚠️ We are still actively working on code improvements.

  • We strongly recommend use of AF3 for design filtering as done in the paper, as filters are only calibrated for AF3 confidence metrics. We are actively working to add Chai calibrated thresholds for commercial users. Until then, running Germinal with structure_model: "chai" and notstructure_model: "af3" should be considered experimental and may have lower passing rates.
  • While nanobody design is fully functional, we are still working on calibrating weightings and filters for scFv, so that functionality should still be also be considered experimental.
  • As recommended in the preprint, we suggest performing a small parameter sweep before launching full sampling runs. This is especially important when working with a new target or selecting a new epitope. In configs/run/vhh_paper.yaml and configs/run/scfv_paper.yaml, we provide the parameters that we used for PD-L1 nanobody generation in our paper. In configs/run/vhh.yaml and configs/run/scfv.yaml we provide a set of reasonable default parameters to use as a starting point for parameter sweep experiments. Parameters can be configured from the command line, for example, you can set weights_beta and weights_plddt with the following command:
python run_germinal.py weights_beta=0.3 weights_plddt: 1.0

Contents

Setup

Requirements

Prerequisites:

System Requirements:

  • GPU: NVIDIA GPU with CUDA support
  • Memory: 40GB+ VRAM*
  • Storage (recommended): 50GB+ space for results

*The pipeline has been tested on: A100 40GB, H100 40GB MIG, L40S 48GB, A100 80GB, and H100 80GB. These runs tested a 130 amino acid target with a 131 amino acid nanobody. For larger runs, we recommend 60GB+ VRAM.

Installation

  1. Ensure you have an NVIDIA GPU with a recent driver (recommended CUDA 12+). You can verify with:

    nvidia-smi
  2. Install Miniconda or Anaconda if not already available.

  3. Follow the instructions in environment_setup.md

  4. Copy AlphaFold-Multimer parameters to params/ and untar them. Alternatively, you can run the following lines inside params/ to download and untar:

    aria2c -x 16 https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar
    tar -xf alphafold_params_2022-12-06.tar -C .
  5. Activate the environment:

    conda activate germinal
  6. (Optional) Run validation at any time to ensure all packages have installed correctly:

    python validate_install.py

Notes:

  • AlphaFold-Multimer and AlphaFold3 parameters are large and must be downloaded manually.

Docker

Germinal can be run using Docker:

docker build -t germinal .
docker run -it --rm --gpus all \
  -v "$PWD/results:/workspace/results" \
  -v "$PWD/pdbs:/workspace/pdbs" \
  germinal bash

and Singularity (shown)/Apptainer:

mkdir -p results
singularity pull germinal.sif docker://jwang003/germinal:latest
singularity shell --nv \
  --bind "$PWD/results:/workspace/results" \
  --bind "$PWD/pdbs:/workspace/pdbs" \
  --pwd /workspace \
  germinal.sif

Note: Pulling may hang on Creating SIF file... If so, check if the command is done with singularity exec germinal.sif python -c "print('ok')"

Volumes are mounted to save generated input complexes and results from sampling.

Once inside the container:

python run_germinal.py

Usage

Quick Start

The main entry point to the pipeline is run_germinal.py. Germinal uses Hydra for orchestrating different configurations. An example main configuration file is located in configs/config.yaml. This yaml file contains high level run parameters as well as pointers to more granular configuration settings.

These detailed options are stored in four main settings files:

  • Main run settings: configs/run/vhh.yaml
  • Target settings: configs/target/[your_target].yaml
  • Post-hallucination (initial) filters: configs/filter/initial/default.yaml
  • Final filters: configs/filters/final/default.yaml

Configuration Structure

configs/
├── config.yaml              # Main configuration yaml
├── run/                     # Main run settings
│   ├── vhh.yaml             # VHH (nanobody) specific settings
│   └── scfv.yaml            # scFv specific settings
├── target/                  # Target protein configurations
│   └── pdl1.yaml            # PDL1 target example
└── filter/                  # Filter configurations
    ├── initial/
    │   └── default.yaml     # Post-hallucination (initial) filters
    └── final/
        ├── default.yaml     # Final acceptance filters
        └── scfv.yaml        # Final filters for scfv runs

In general, the main run settings and filters should stay the same and can be run as defaults unless you are experimenting. To design nanobodies targeting PD-L1, simply run:

python run_germinal.py

To design scFvs targeting PD-L1, run:

python run_germinal.py run=scfv filter.initial=scfv

If you wish to change the configuration of runs, you can:

  • create an entirely new config yaml
  • swap one of the four main settings files
  • pass specific overrides

Basic Usage

Run with defaults (VHH + PDL1 + default filters):

python run_germinal.py

Switch to scFv:

python run_germinal.py run=scfv

Use different target:

python run_germinal.py target=my_target

Use a different config file with Hydra:

python run_germinal.py --config_name new_config.yaml

CLI Overrides

Hydra provides powerful CLI override capabilities. You can override any parameter in any configuration file.

Note

Settings in configs/run/ folder use the global namespace and do not need a run prefix before overriding. See example below.

Basic parameter overrides:

# Override trajectory limits
python run_germinal.py max_trajectories=100 max_passing_designs=50

# Override experiment settings
python run_germinal.py experiment_name=my_experiment run_config=test_run

# Override loss weights. Note: no run prefix since run settings are global
python run_germinal.py weights_plddt=1.5 weights_iptm=0.8 

Filter threshold overrides:

# Make initial filters less stringent
python run_germinal.py filter.initial.clashes.value=2

# Adjust final filter thresholds
python run_germinal.py filter.final.external_plddt.value=0.9 filter.final.external_iptm.value=0.8

# Change filter operators
python run_germinal.py filter.final.sc_rmsd.operator='<=' filter.final.sc_rmsd.value=5.0

Target configuration overrides:

# Change target hotspots
python run_germinal.py target.target_hotspots="A26,A30,A36,A44"

# Use different PDB file
python run_germinal.py target.target_pdb_path="pdbs/my_target.pdb" target.target_name="my_target"

Complex multi-parameter overrides:

# Complete scFv run with custom settings
python run_germinal.py \
  run=scfv \
  target=pdl1 \
  max_trajectories=500 \
  experiment_name="scfv_pdl1_test" \
  target.target_hotspots="A37,A39,A41" \
  filter.final.external_plddt.value=0.85 \
  weights_iptm=1.0

Target Configuration

For each new target, you will need to define a target settings yaml file which contains all relevant information about the target protin. Here is an example:

target_name: "pdl1"
target_pdb_path: "pdbs/pdl1.pdb"
target_chain: "A"
binder_chain: "B"
target_hotspots: "25,26,39,41"
dimer: false  # support coming soon!
length: 133

Filters Configuration

There are two sets of filters: post-hallucination (initial) filters and final filters. The post-hallucination filters are applied after the hallucination step to determine which sequences to proceed to the redesign step. This filter set is a subset of the final filters, which is applied at the end of the pipeline to determine passing antibody sequences. Here is an example of the post-hallucination filters:

clashes: {'value': 1, 'operator': '<'}
cdr3_hotspot_contacts: {'value': 0, 'operator': '>'}
percent_interface_cdr: {'value': 0.5, 'operator': '>'}
interface_shape_comp: {'value': 0.6, 'operator': '>'}

Output Format

Germinal generates organized output directories:

runs/your_target_nb_20240101_120000/
├── final_config.yaml           # Complete run configuration after overrides
├── trajectories/               # Results for trajectories which pass hallucination but fail the first set of filters
│   ├── structures/     
│   ├── plots/            
│   └── designs.csv      
├── redesign_candidates/        # Results for trajectories which are AbMPNN redesigned but fail the final set of filters
│   ├── structures/          
│   └── designs.csv           
├── accepted/                   # Antibodies that pass all filters
│   ├── structures/          
│   └── designs.csv           
├── all_trajectories.csv        # Main CSV containing designs in all three folders above
└── failure_counts.csv          # CSV logging # trajectories failing each step of hallucination

Key Output Files:

  • accepted/structures/*.pdb - Final antibody-antigen structure for passing antibody designs.
  • all_trajectories.csv - Complete list of designs that passed hallucination, their in silico metrics, which stage they reached, and the pdb path to the designed structure.

Important Notes and Tips for Design

Hallucination is inherently expensive. Designing against a 130 residue target takes anywhere from 2-8 minutes for a nanobody design iteration on an H100 80GB GPU, depending on which stage the designed sequence reaches. For 40GB GPUs or scFvs, this number is around 50% larger.

During sampling, we typically run antibody generation until there are around 1,000 passing designs against the specified target and observe a success rate of around 0.5 - 1 per GPU hour. Of those, we typically select the top 40-50 sequences for experimental testing based on a combination of in silico metrics described in the preprint. While in silico success rates vary wildly across targets, we estimate that 200-400 H100 80GB GPU hours of sampling are typically enough to generate ~200 successful designs and some functional antibodies.

Best design parameters are different for each target and antibody type! If you are experiencing low success rates, we recommend tweaking interface confidence weights (ipTM / iPAE), structure-based weights (helix, beta, framework loss), or the IgLM weights defined in iglm_scale. Filters are easily changeable in the filters configurations. To add or remove filters from the initial and final filtering rounds, simply create a new filter with the same name as the intended metric and specify the threshold value and the operator (<, >, =, etc).

More tips coming soon!

Bugfix Changelog

  • 9/25/25: Import fix for local colabdesign module (commit 8b5b655, pr #8)
  • 9/25/25: A metric meant for tracking purposes external_i_pae was erroneously set to be used as a filter (commit 49be2e9, issue #7)
  • 9/26/25: Resolved an error which caused passing runs to crash at the final stage due to a misnamed variable (commit 9292e1e, issue #11)

Citation

If you use Germinal in your research, please cite:

@article{mille-fragoso_efficient_2025,
	title = {Efficient generation of epitope-targeted de novo antibodies with Germinal},
   author = {Mille-Fragoso, Luis Santiago and Wang, John N. and Driscoll, Claudia L. and Dai, Haoyu and Widatalla, Talal M. and Zhang, Xiaowe and Hie, Brian L. and Gao, Xiaojing J.},
	url = {https://www.biorxiv.org/content/10.1101/2025.09.19.677421v1},
	doi = {10.1101/2025.09.19.677421},
	publisher = {bioRxiv},
	year = {2025},
}

Acknowledgments

Germinal builds upon the foundational work of previous hallucination-based protein design pipelines such as ColabDesign and BindCraft and this codebase incorporates code from both repositories. We are grateful to the developers of these tools for making them available to the research community.

Related Work: If you use components of this pipeline, please also cite the underlying methods:

Community Acknowledgments

  • @cytokineking — for helping raise numerous bugs to our attention

License

This repository is licensed under the Apache License 2.0.

External Dependencies

Some components require separate licenses that are not included in this repository:

  • IgLM: Provided under a non-commercial academic license from Johns Hopkins University.
    See their documentation for details.
  • PyRosetta: Provided by the Rosetta Commons and University of Washington under a non-commercial, non-profit license.
    PyRosetta cannot be redistributed and must be obtained separately.
    Commercial use requires a separate license. See https://www.pyrosetta.org.

About

Codebase for Germinal, a broadly enabling generative pipeline for efficient generation of epitope-targeted de novo antibodies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 60.4%
  • Jupyter Notebook 39.1%
  • Other 0.5%