Germinal is a pipeline for designing de novo antibodies against specified epitopes on target proteins. The pipeline follows a 3-step process: hallucination based on ColabDesign, selective sequence redesign with AbMPNN, and cofolding with a structure prediction model. Germinal is capable of designing both nanobodies and scFvs against user-specified residues on target proteins.
We describe Germinal in the preprint: "Efficient generation of epitope-targeted de novo antibodies with Germinal"
- We strongly recommend use of AF3 for design filtering as done in the paper, as filters are only calibrated for AF3 confidence metrics. We are actively working to add Chai calibrated thresholds for commercial users. Until then, running Germinal with
structure_model: "chai"and notstructure_model: "af3"should be considered experimental and may have lower passing rates. - While nanobody design is fully functional, we are still working on calibrating weightings and filters for scFv, so that functionality should still be also be considered experimental.
- As recommended in the preprint, we suggest performing a small parameter sweep before launching full sampling runs. This is especially important when working with a new target or selecting a new epitope. In
configs/run/vhh_paper.yamlandconfigs/run/scfv_paper.yaml, we provide the parameters that we used for PD-L1 nanobody generation in our paper. Inconfigs/run/vhh.yamlandconfigs/run/scfv.yamlwe provide a set of reasonable default parameters to use as a starting point for parameter sweep experiments. Parameters can be configured from the command line, for example, you can setweights_betaandweights_plddtwith the following command:
python run_germinal.py weights_beta=0.3 weights_plddt: 1.0- Setup
- Usage
- Output Format
- Tips for Design
- Bugfix Changelog
- Citation
- Acknowledgments
- Community Acknowledgments
Prerequisites:
- PyRosetta (academic license required)
- ColabDesign/AlphaFold-Multimer parameters (click link for download or see below for cli)
- AlphaFold3 parameters (optional)
- JAX with GPU support
System Requirements:
- GPU: NVIDIA GPU with CUDA support
- Memory: 40GB+ VRAM*
- Storage (recommended): 50GB+ space for results
*The pipeline has been tested on: A100 40GB, H100 40GB MIG, L40S 48GB, A100 80GB, and H100 80GB. These runs tested a 130 amino acid target with a 131 amino acid nanobody. For larger runs, we recommend 60GB+ VRAM.
-
Ensure you have an NVIDIA GPU with a recent driver (recommended CUDA 12+). You can verify with:
nvidia-smi
-
Install Miniconda or Anaconda if not already available.
-
Follow the instructions in
environment_setup.md -
Copy AlphaFold-Multimer parameters to
params/and untar them. Alternatively, you can run the following lines insideparams/to download and untar:aria2c -x 16 https://storage.googleapis.com/alphafold/alphafold_params_2022-12-06.tar tar -xf alphafold_params_2022-12-06.tar -C . -
Activate the environment:
conda activate germinal
-
(Optional) Run validation at any time to ensure all packages have installed correctly:
python validate_install.py
Notes:
- AlphaFold-Multimer and AlphaFold3 parameters are large and must be downloaded manually.
Germinal can be run using Docker:
docker build -t germinal .
docker run -it --rm --gpus all \
-v "$PWD/results:/workspace/results" \
-v "$PWD/pdbs:/workspace/pdbs" \
germinal bashand Singularity (shown)/Apptainer:
mkdir -p results
singularity pull germinal.sif docker://jwang003/germinal:latest
singularity shell --nv \
--bind "$PWD/results:/workspace/results" \
--bind "$PWD/pdbs:/workspace/pdbs" \
--pwd /workspace \
germinal.sifNote: Pulling may hang on
Creating SIF file...If so, check if the command is done withsingularity exec germinal.sif python -c "print('ok')"
Volumes are mounted to save generated input complexes and results from sampling.
Once inside the container:
python run_germinal.pyThe main entry point to the pipeline is run_germinal.py. Germinal uses Hydra for orchestrating different configurations. An example main configuration file is located in configs/config.yaml. This yaml file contains high level run parameters as well as pointers to more granular configuration settings.
These detailed options are stored in four main settings files:
- Main run settings:
configs/run/vhh.yaml - Target settings:
configs/target/[your_target].yaml - Post-hallucination (initial) filters:
configs/filter/initial/default.yaml - Final filters:
configs/filters/final/default.yaml
configs/
├── config.yaml # Main configuration yaml
├── run/ # Main run settings
│ ├── vhh.yaml # VHH (nanobody) specific settings
│ └── scfv.yaml # scFv specific settings
├── target/ # Target protein configurations
│ └── pdl1.yaml # PDL1 target example
└── filter/ # Filter configurations
├── initial/
│ └── default.yaml # Post-hallucination (initial) filters
└── final/
├── default.yaml # Final acceptance filters
└── scfv.yaml # Final filters for scfv runs
In general, the main run settings and filters should stay the same and can be run as defaults unless you are experimenting. To design nanobodies targeting PD-L1, simply run:
python run_germinal.pyTo design scFvs targeting PD-L1, run:
python run_germinal.py run=scfv filter.initial=scfvIf you wish to change the configuration of runs, you can:
- create an entirely new config yaml
- swap one of the four main settings files
- pass specific overrides
Run with defaults (VHH + PDL1 + default filters):
python run_germinal.pySwitch to scFv:
python run_germinal.py run=scfvUse different target:
python run_germinal.py target=my_targetUse a different config file with Hydra:
python run_germinal.py --config_name new_config.yamlHydra provides powerful CLI override capabilities. You can override any parameter in any configuration file.
Note
Settings in configs/run/ folder use the global namespace and do not need a run prefix before overriding. See example below.
Basic parameter overrides:
# Override trajectory limits
python run_germinal.py max_trajectories=100 max_passing_designs=50
# Override experiment settings
python run_germinal.py experiment_name=my_experiment run_config=test_run
# Override loss weights. Note: no run prefix since run settings are global
python run_germinal.py weights_plddt=1.5 weights_iptm=0.8 Filter threshold overrides:
# Make initial filters less stringent
python run_germinal.py filter.initial.clashes.value=2
# Adjust final filter thresholds
python run_germinal.py filter.final.external_plddt.value=0.9 filter.final.external_iptm.value=0.8
# Change filter operators
python run_germinal.py filter.final.sc_rmsd.operator='<=' filter.final.sc_rmsd.value=5.0Target configuration overrides:
# Change target hotspots
python run_germinal.py target.target_hotspots="A26,A30,A36,A44"
# Use different PDB file
python run_germinal.py target.target_pdb_path="pdbs/my_target.pdb" target.target_name="my_target"Complex multi-parameter overrides:
# Complete scFv run with custom settings
python run_germinal.py \
run=scfv \
target=pdl1 \
max_trajectories=500 \
experiment_name="scfv_pdl1_test" \
target.target_hotspots="A37,A39,A41" \
filter.final.external_plddt.value=0.85 \
weights_iptm=1.0For each new target, you will need to define a target settings yaml file which contains all relevant information about the target protin. Here is an example:
target_name: "pdl1"
target_pdb_path: "pdbs/pdl1.pdb"
target_chain: "A"
binder_chain: "B"
target_hotspots: "25,26,39,41"
dimer: false # support coming soon!
length: 133There are two sets of filters: post-hallucination (initial) filters and final filters. The post-hallucination filters are applied after the hallucination step to determine which sequences to proceed to the redesign step. This filter set is a subset of the final filters, which is applied at the end of the pipeline to determine passing antibody sequences. Here is an example of the post-hallucination filters:
clashes: {'value': 1, 'operator': '<'}
cdr3_hotspot_contacts: {'value': 0, 'operator': '>'}
percent_interface_cdr: {'value': 0.5, 'operator': '>'}
interface_shape_comp: {'value': 0.6, 'operator': '>'}Germinal generates organized output directories:
runs/your_target_nb_20240101_120000/
├── final_config.yaml # Complete run configuration after overrides
├── trajectories/ # Results for trajectories which pass hallucination but fail the first set of filters
│ ├── structures/
│ ├── plots/
│ └── designs.csv
├── redesign_candidates/ # Results for trajectories which are AbMPNN redesigned but fail the final set of filters
│ ├── structures/
│ └── designs.csv
├── accepted/ # Antibodies that pass all filters
│ ├── structures/
│ └── designs.csv
├── all_trajectories.csv # Main CSV containing designs in all three folders above
└── failure_counts.csv # CSV logging # trajectories failing each step of hallucination
Key Output Files:
accepted/structures/*.pdb- Final antibody-antigen structure for passing antibody designs.all_trajectories.csv- Complete list of designs that passed hallucination, their in silico metrics, which stage they reached, and the pdb path to the designed structure.
Hallucination is inherently expensive. Designing against a 130 residue target takes anywhere from 2-8 minutes for a nanobody design iteration on an H100 80GB GPU, depending on which stage the designed sequence reaches. For 40GB GPUs or scFvs, this number is around 50% larger.
During sampling, we typically run antibody generation until there are around 1,000 passing designs against the specified target and observe a success rate of around 0.5 - 1 per GPU hour. Of those, we typically select the top 40-50 sequences for experimental testing based on a combination of in silico metrics described in the preprint. While in silico success rates vary wildly across targets, we estimate that 200-400 H100 80GB GPU hours of sampling are typically enough to generate ~200 successful designs and some functional antibodies.
Best design parameters are different for each target and antibody type! If you are experiencing low success rates, we recommend tweaking interface confidence weights (ipTM / iPAE), structure-based weights (helix, beta, framework loss), or the IgLM weights defined in iglm_scale. Filters are easily changeable in the filters configurations. To add or remove filters from the initial and final filtering rounds, simply create a new filter with the same name as the intended metric and specify the threshold value and the operator (<, >, =, etc).
More tips coming soon!
- 9/25/25: Import fix for local colabdesign module (commit 8b5b655, pr #8)
- 9/25/25: A metric meant for tracking purposes
external_i_paewas erroneously set to be used as a filter (commit 49be2e9, issue #7) - 9/26/25: Resolved an error which caused passing runs to crash at the final stage due to a misnamed variable (commit 9292e1e, issue #11)
If you use Germinal in your research, please cite:
@article{mille-fragoso_efficient_2025,
title = {Efficient generation of epitope-targeted de novo antibodies with Germinal},
author = {Mille-Fragoso, Luis Santiago and Wang, John N. and Driscoll, Claudia L. and Dai, Haoyu and Widatalla, Talal M. and Zhang, Xiaowe and Hie, Brian L. and Gao, Xiaojing J.},
url = {https://www.biorxiv.org/content/10.1101/2025.09.19.677421v1},
doi = {10.1101/2025.09.19.677421},
publisher = {bioRxiv},
year = {2025},
}Germinal builds upon the foundational work of previous hallucination-based protein design pipelines such as ColabDesign and BindCraft and this codebase incorporates code from both repositories. We are grateful to the developers of these tools for making them available to the research community.
Related Work: If you use components of this pipeline, please also cite the underlying methods:
- ColabDesign: https://github.com/sokrypton/ColabDesign
- IgLM: https://github.com/Graylab/IgLM
- Chai-1: https://github.com/chaidiscovery/chai-lab
- AlphaFold3: https://github.com/google-deepmind/alphafold3
- AbMPNN: Dreyer, F. A., Cutting, D., Schneider, C., Kenlay, H. & Deane, C. M. Inverse folding for antibody sequence design using deep learning. (2023).
- PyRosetta: https://www.pyrosetta.org/
- @cytokineking — for helping raise numerous bugs to our attention
This repository is licensed under the Apache License 2.0.
Some components require separate licenses that are not included in this repository:
- IgLM: Provided under a non-commercial academic license from Johns Hopkins University.
See their documentation for details. - PyRosetta: Provided by the Rosetta Commons and University of Washington under a non-commercial, non-profit license.
PyRosetta cannot be redistributed and must be obtained separately.
Commercial use requires a separate license. See https://www.pyrosetta.org.