Open development of genomic language models — data, modeling, and evaluation.
Inspired by Marin.
Tracked as GitHub issues. See the experiment-labeled issues.
Variant effect prediction leaderboards (under construction): openathena.ai/marin-dna.
uv syncOptional installs (all opt-in)
| Selector | Purpose |
|---|---|
--group dev |
Pre-commit, ruff, pytest, snakefmt. |
--extra marin |
marin / marin-levanter / marin-iris / marin-zephyr / marin-rigging — for marin-launched DNA experiments under experiments/. Lives as an extra (not a group) so iris workers can install it via uv sync --extra marin. |
--group enhancer-classification |
AlphaGenome-Pytorch, Lightning, py2bit — for the enhancer-classification training path. |
--group alphagenome-eval |
AlphaGenome — for AlphaGenome eval pipelines. |
--group aws-cli |
awscli for snakemake rules that shell out to aws s3 cp (e.g. evals/ldscore_download). |
The marin extra and aws-cli group are mutually exclusive (awscli pins
fsspec/s3fs older than marin's requirements). For TPU training under marin,
also pass --extra tpu:
uv sync --extra marin --extra tpu# Install dev dependencies and pre-commit hooks
uv sync --group dev
uv run pre-commit install
# Run quality checks
uv run pre-commit run
# Run tests
uv run pytestSee AGENTS.md.
Join the Marin Discord; MarinDNA discussion happens in the #dna channel.
If you find datasets, models, or experiments from this repo useful, please cite:
MarinDNA: open development of genomic language models. Open Athena, 2026. https://github.com/Open-Athena/marin-dna
BibTeX:
@misc{marin-dna,
title = {MarinDNA: open development of genomic language models},
author = {{Open Athena}},
year = {2026},
url = {https://github.com/Open-Athena/marin-dna},
}