Alignment and allele calling for immunoglobulin (IG) and T‑cell receptor (TCR) repertoires.
- Quick Start
- Selecting a Model
- Model Bundles
- Running Predictions
- Multi-Chain Details
- SavedModel Export & Fine-Tuning
- Parameters Reference
- Docker (Advanced)
- Development & Contribution
- Data & Citation
- License
- Contact
Pull image:
docker pull thomask90/alignair:latestList bundled pretrained models:
docker run --rm -it thomask90/alignair:latest list-pretrainedExample output:
Bundle Type SeqLen Chains Status
IGH_S5F_576 single_chain 576 - OK
IGH_S5F_576_Extended single_chain 576 - OK
IGL_S5F_576 multi_chain 576 BCR_LIGHT_LAMBDA,BCR_LIGHT_KAPPA OK
TCRB_UNIFORM_576 single_chain 576 - OK
Optional flags:
docker run --rm -it thomask90/alignair:latest list-pretrained --show-files
docker run --rm -it thomask90/alignair:latest list-pretrained --json-outputRun heavy chain (extended):
docker run --rm -v /path/to/input:/data -v /path/to/output:/out \
thomask90/alignair:latest run \
--model-dir=/app/pretrained_models/IGH_S5F_576_Extended \
--genairr-dataconfig=HUMAN_IGH_EXTENDED \
--sequences=/data/sequences.csv \
--save-path=/out \
--translate-to-ascWindows (PowerShell) path example:
docker run --rm `
-v C:/Users/you/Datasets:/data `
-v C:/Users/you/Downloads:/out `
thomask90/alignair:latest run `
--model-dir=/app/pretrained_models/IGH_S5F_576_Extended `
--genairr-dataconfig=HUMAN_IGH_EXTENDED `
--sequences=/data/sequences.csv `
--save-path=/outOutput file: /out/<input_basename>_alignairr_results.csv
git clone https://github.com/MuteJester/AlignAIR.git
cd AlignAIR
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
python app.py list-pretrained --root checkpoints
python app.py run --model-dir=checkpoints/IGH_S5F_576 --sequences=tests/data/test/sample_igh_extended.csv --save-path=tmp_outPython requirement: >=3.9,<3.12.
| Bundle | Type | Max Seq Len | Chains (multi) | Path |
|---|---|---|---|---|
| IGH_S5F_576 | single_chain | 576 | - | /app/pretrained_models/IGH_S5F_576 |
| IGH_S5F_576_Extended | single_chain | 576 | - | /app/pretrained_models/IGH_S5F_576_Extended |
| IGL_S5F_576 | multi_chain | 576 | Lambda,Kappa | /app/pretrained_models/IGL_S5F_576 |
| TCRB_UNIFORM_576 | single_chain | 576 | - | /app/pretrained_models/TCRB_UNIFORM_576 |
Choose:
- Standard heavy chain:
IGH_S5F_576 - Extended heavy chain:
IGH_S5F_576_Extended - Lambda + Kappa (classification):
IGL_S5F_576 - TCR beta:
TCRB_UNIFORM_576
List programmatically:
docker run --rm thomask90/alignair:latest list-pretrainedBundle layout:
model_dir/
config.json
dataconfig.pkl
training_meta.json # optional
VERSION
fingerprint.txt
saved_model/
checkpoint.weights.h5 # optional (fine‑tuning)
README.md # optional
Why bundles:
- Structural + dataconfig reproducibility
- Integrity check via fingerprint
- Single directory: metadata + SavedModel (+ optional weights)
Legacy non‑bundle checkpoints still load; prefer bundles for new work.
Create during training (example):
trainer.train(..., save_pretrained=True)Load in Python:
from AlignAIR.Models.SingleChainAlignAIR.SingleChainAlignAIR import SingleChainAlignAIR
model = SingleChainAlignAIR.from_pretrained('path/to/bundle')CLI:
python app.py run --model-dir=path/to/bundle --genairr-dataconfig=HUMAN_IGH_OGRDB --sequences=input.csv --save-path=outIntegrity: mismatch raises on load if fingerprint differs.
Migrate legacy:
legacy_model.save_pretrained('new_bundle_dir')python app.py run \
--model-dir=checkpoints/IGH_S5F_576 \
--genairr-dataconfig=HUMAN_IGH_OGRDB \
--sequences=tests/data/test/sample_igh_extended.csv \
--save-path=tmp_outOutput naming: <input_basename>_alignairr_results.csv in --save-path.
Threshold application: probabilities filtered by threshold; caps applied afterward.
Optional flags:
--translate-to-ascfor ASC allele labels--airr-formatfor AIRR schema output
Triggered when --genairr-dataconfig has >1 comma‑separated entry.
Example (Lambda + Kappa):
--genairr-dataconfig=HUMAN_IGL_OGRDB,HUMAN_IGK_OGRDB
Ordering must match training (Lambda first). Output includes chain_type.
Single chain: supply one identifier or path to a dataconfig pickle.
SavedModel is under saved_model/ inside a bundle.
Export (via bundle creation):
model.save_pretrained('bundle_dir')Direct export:
model.export_saved_model('export_dir/saved_model')Load SavedModel:
import tensorflow as tf
sm = tf.saved_model.load('bundle_dir/saved_model')
serving_fn = sm.signatures['serving_default']Bundle vs SavedModel:
- General use & metadata: bundle path
- Serving stack:
saved_model/ - Fine‑tuning: rebuild model + load
checkpoint.weights.h5if present
Fine‑tuning steps:
from pathlib import Path
from AlignAIR.Serialization.io import load_bundle
from AlignAIR.Models.SingleChainAlignAIR.SingleChainAlignAIR import SingleChainAlignAIR
cfg, dataconfig, meta = load_bundle(Path('bundle'))
model = SingleChainAlignAIR(max_seq_length=cfg.max_seq_length, dataconfig=dataconfig)
if (Path('bundle') / 'checkpoint.weights.h5').exists():
model.load_weights('bundle/checkpoint.weights.h5').expect_partial()| Flag | Description | Default |
|---|---|---|
--model-dir |
Model bundle directory | (required) |
--model-checkpoint |
Legacy checkpoint directory | None |
--genairr-dataconfig |
Built‑in name(s) or path(s); comma‑separated for multi-chain | HUMAN_IGH_OGRDB |
--sequences |
Input CSV/TSV/FASTA | (required) |
--save-path |
Output directory | (required) |
--batch-size |
Batch size | 2048 |
| Flag | Description | Default |
|---|---|---|
--v-allele-threshold |
V allele threshold | 0.75 |
--d-allele-threshold |
D allele threshold | 0.30 |
--j-allele-threshold |
J allele threshold | 0.80 |
--v-cap / --d-cap / --j-cap |
Max retained alleles | 3 |
| Flag | Description | Default |
|---|---|---|
--translate-to-asc |
Translate allele names | False |
--airr-format |
AIRR schema output | False |
--fix-orientation |
Orientation correction | True |
| Flag | Description |
|---|---|
--config-file |
YAML with parameters |
--custom-orientation-pipeline-path |
Custom orientation pipeline |
--custom-genotype |
Genotype file for likelihood adjustment |
--save-predict-object |
Persist internal object (debug) |
Full help: docker run thomask90/alignair:latest run --help
Custom bundle:
docker run --rm -v /models/my_bundle:/bundle -v /data:/data -v /out:/out \
thomask90/alignair:latest run \
--model-dir=/bundle \
--sequences=/data/sequences.csv \
--genairr-dataconfig=HUMAN_IGH_OGRDB \
--save-path=/outList mounted directory:
docker run --rm -v /models:/extra thomask90/alignair:latest list-pretrained --root /extra --json-output > bundles.jsonJSON inspection:
docker run --rm thomask90/alignair:latest list-pretrained --json-output | jq '.[].name'Windows paths: prefer forward slashes; ensure drive sharing is enabled.
Quick commands:
pip install -e .
pytest -q
python app.py list-pretrained --root checkpoints
python app.py run --model-dir=checkpoints/IGH_S5F_576 --sequences=tests/data/test/sample_igh_extended.csv --save-path=tmp_outWorkflow:
- Branch
- Add/update tests
pytest -q- Open PR
License: GPLv3 (see LICENSE).
Citation DOI:
doi:10.5281/zenodo.15687939
GPL v3.0 or later (see LICENSE).
Issues: GitHub issues Email: [email protected] Site: https://alignair.ai