GPU-based whole genome analysis pipeline
input: decompressed FASTQ files from Nucleus
↓
1. quality control (FastQC / MultiQC / VerifyBamID / QualiMap)
↓
2. alignment to T2T-CHM13 (BWA-MEM2 / GPU Parabricks)
↓
3. alignment to graph HPRC pangenome (vg giraffe / minigraph)
↓
4. variant calling (DeepVariant or GATK HaplotypeCaller)
↓
VCF
↓
5. structural + copy-number variants (Manta / Lumpy / CNVnator)
↓
6. variant annotation + clinical databases (VEP / SnpEff / CADD / REVEL / ClinVar / PharmGKB)
↓
7. comparative population + ancestry analysis (PLINK / ADMIXTURE / KING / EIGENSOFT / Foldseek)
↓
8. evolutionary constraint & cross-species comparison (PhyloP / PhastCons / FoldSeek)
↓
9. predictive modeling & CRISPR design
├─ expression & variant effects (Enformer v2 / xTrimoGene / AlphaMissense / EVE)
└─ edit simulation (CRISPRitz / DeepCRISPR / BE-Deep / PrimeDesign)
↓
10. protein structure & drug docking (ColabFold / Foldseek / AutoDock Vina)
↓
11. reporting dashboard (MultiQC + summary UI graphics)
↓
output: complete genome interpretation: annotated variants, ancestry report, functional predictions, molecular effects, and editable targets