Computational methods for research informatics and genomics research. Code examples from bennettwaxse.com and shared analysis tools.
This repository contains analysis pipelines and tools for working with NIH's All of Us Research Program data, including:
- Genomics - Variant analysis, ancestry inference, PCA workflows (PLINK2, Hail)
- HPV Research - OMOP-based cohort construction
- N3C/RECOVER - Long COVID phenotyping algorithms
- Reference Materials - All of Us data dictionaries, PheCode mappings, utilities
Code is designed for the All of Us Researcher Workbench:
- Legacy Workbench (current) - Full genomics support
- Verily Workbench (new) - See
_reference/verily/for setup - Requires Google Cloud Platform (BigQuery, Cloud Storage, Dataproc)
genomics/ # Genomic analysis pipelines (PLINK2, Hail, phetk)
hpv/ # HPV cohort construction
nc3/ # N3C RECOVER Long COVID algorithm
_reference/ # Reference data and utilities
├─ verily/ # Verily Workbench setup
├─ all_of_us_tables/ # CDR data dictionaries
└─ phecode/ # PheCode mappings
Each directory contains both .py scripts and .ipynb notebooks (in notebooks/ subdirectories).
- Review CLAUDE.md files - Each directory has guidance for working with that code
- Set up environment - For Verily Workbench, run
_reference/verily/00_setup_workspace.ipynb - Choose a template - Use existing scripts as starting points for your analysis
- Never share counts < 20 - Display as
< 20in all outputs - Never commit patient data - See
.gitignorefor protected file types - Follow data use agreements - All analyses must comply with All of Us policies
This repository includes comprehensive CLAUDE.md files for use with Claude Code. These provide context about architecture, workflows, and platform-specific patterns.
See CONTRIBUTING.md for guidelines on contributing to this repository.
See LICENSE for details.