Welcome to the github page for the "Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes".
The tutorials directory contains all the information necessary to apply MNV discovery and annotation pipeline on your own dataset.
It also allows user to reproduce the main figure and most of the supplementary figure in the gnomAD MNV preprint).
Specifically, the tutorials directory consists of six Jupyter notebooks:
identify_mnv.ipynbexplains how to extract MNV from a vcf (or a matrix table).annotate_mnv.ipynbexplains how to annotate the functional consequences and category of MNVsfunctional_impact.ipynbexplains how to generate per gene (or per individual) statisticsglobal_mechanisms.ipynbexplains how to analyize the generation mechanisms of MNVs, genome wideper_region_mechanisms.ipynbexplains how to partition the genome into different functional (WIP)phase_sensitivity.ipynbexplains how the phasing sensitivity analysis could be performed category, and analyze the MNVs across categories
Which figure and table in the preprint/paper is generated in which notebook is listed below: (those with () does are related but not exactly used)
| notebook | main figure | supplementary figure | supplementary table | supplementary file |
|---|---|---|---|---|
| identify_mnv.ipynb | (1a,b) | (11) | ||
| annotate_mnv.ipynb | 2a | 1 | ||
| functional_impact.ipynb | 2 | 2 | 1 | |
| global_mechanisms.ipynb | 3, 4a | 3-19 | 3 | |
| per_region_mechanisms.ipynb | 4b-d | 20 | 2 | |
| phase_sensitivity.ipynb | 1 | 1 |
However, since most of the analysis was performed in Hail, we recommend users who are not familiat with Hail to visit the Hail tutorial page.
(The analysis was performed using Hail version 0.2.11, and we recommend downloading this specific version of hail to perform MNV analysis using Hail, e.g. with command pip install hail==0.2.11.)
All the scripts used in the gnomAD MNV paper are stored in the code directory.
However, note that due to the gnomAD sample data as well as the exome data of rare disease families being not publicly available,
most of the scripts cannot be simply run in your local.
util contains some of the functions used in the analysis.