MSpangepop is a workflow for simulating pangenome variation graphs from coalescent simulations.
A simplified description of the algorithm can be found here.
The official MSpangepop repository can be found at the INRAE forge.
A GitHub mirror can be found at INRAE GitHub.
The mirror is especially useful for people with no Renater account to submit issues.
|
|
Stage | Process | Scripts | Rules |
---|---|---|---|
1. Setup | → Validate FASTA/YAML → Expand configs → Create index |
input_index.py sample_ranges.py recap.py |
setup |
2. Msprime Simulation | → Build demographic model → Run msprime → Generate visualizations |
msprime_simulation.py visualizer_arg.py visualizer_tree.py |
msprime_simulation visualization |
3. Preprocessing | → Split by locus → Preorder traverse trees → Define SVs type lenght and position |
coalescent_traversal.py draw_variants.py split_recombination.py |
coalescent_traversal draw_variants split_recombination |
4. Graph Creation | Initialize: Build locus ancestral graphs Mutate: Apply variants using MSpangepop library Save: Assign IDs → Merge subgraphs→ Lint → Export chopped graph |
graph_creation.py graph_utils.py matrix.py |
graph_creation |
5. Unchop | VG unchop command | - | graph_merging |
Clone the Git repository
git clone https://forge.inrae.fr/pangepop/MSpangepop
- Create an environement for snakemake (from the provided envfile):
conda env create -n wf_env -f dependencies/wf_env.yaml
Two elements are needed to run the simulation :
- The
masterconfig
-> Master Configuration - The
demographic_file
-> Demographic model configuration
Edit the masterconfig
file in the .config/
directory with your sample information. (Master Configuration)
nano .config/masterconfig.yaml
Example config:
samples:
test_run:
model: "simulation_data/Panmictic_Model.json"
replicates: 1
-
model
is the demographic scenario the simulation will run on. You can create your own or tailor the ones in./simulation_data
(Demographic model configuration) -
⚠️ Don't want to create your own model?⚠️ Use the providedPanmictic_Model.json
- simply edit it to specify your genome, then adjustmutation_rate
andrecombination_rate
(start with low values)
- Run the workflow :
sbatch mspangepop dry # Check for warnings
sbatch mspangepop run # Then
Nb : If your account name can't be automatically determined, add it in the
.config/snakemake/profiles/slurm/config.yaml
file.
Nb : Use the command
squeue --format="%.10i %.9P %.6j %.10k %.8u %.2t %.10M %.6D %.20R" -A $user
to see job names
./mspangepop dry # Check for warnings
./mspangepop local-run # Then
mspangepop [dry|run|local-run|dag|rulegraph|unlock|touch] [additional snakemake args]
dry - run in dry-run mode
run - run the workflow with SLURM
local-run - run the workflow localy (on a single node)
dag - generate the directed acyclic graph for the workflow
rulegraph - generate the rulegraph for the workflow
unlock - Unlock the directory if snakemake crashed
touch - Tell snakemake that all files are up to date (use with caution)
[additional snakemake args] - for any snakemake arg, like --until hifiasm
MSpangepop implements graph path operations to add variants by modifying how lineages traverse the graph.
Core Features:
- Multi-path targeting - Operations apply to single or multiple lineage paths simultaneously, enabling both unique and shared variants
- Orientation-aware - All operations preserve node directionality using edges that track exit and entry node sides, creating orientation-aware links (++, +-, -+, --)
- Composable - Operations can be nested and overlapping (e.g., deletion within inversion), representing complex compound variants
These operations modify paths through existing nodes rather than altering the graph structure, maintaining shared sequences while creating alternative routes for different lineages. New nodes (e.g., for insertions) are generated using an order 1 Markov model to produce realistic sequences.
Operation | Function | Used For | Path Change |
---|---|---|---|
bypass(a,b) |
Skip nodes a to b | Deletions | Creates shortcut edge |
loop(a,b) |
Duplicate nodes a to b | Tandem duplications | Adds loop-back + repeat |
invert(a,b) |
Reverse nodes a to b | Inversions | Flips path direction |
swap(a,node) |
Replace node at position a | SNPs | Substitutes single node |
paste(a,a+1,nodes) |
Insert between adjacent nodes | Insertions | Adds new node sequence |
MSpangepop is developed at INRAE as part of the PangenOak project.