mappgene is a SARS-CoV-2 genomic sequence analysis pipeline designed for high-performance parallel computing. It mainly wraps iVar (https://github.com/andersen-lab/ivar) and LoFreq (https://github.com/CSB5/lofreq) with a collection of useful scripts for deployment in almost any Linux environment.
Inputs: .fastq.gz
Outputs: .vcf and .variants.tsv
- Python 3.7+
- Singularity
pip3 install mappgene
singularity pull library://khyox/mappgene/image.sif:latest
mappgene <SUBJECT.FASTQ.GZ>
Check that mappgene works on your system by running the example input data, sourced from here.
mappgene --test
You can specify multiple subjects with specific paths or Unix-style globbing
mappgene <SUBJECT1.FASTQ.GZ> <SUBJECT2.FASTQ.GZ> <SUBJECT3.FASTQ.GZ>
mappgene <SUBJECT_DIR>/*.fastq.gz
If there are two subjects with matching names that end in _R1.fastq.gz and _R2.fastq.gz, mappgene will assume they are a deinterleaved pair.
mappgene <SUBJECT>_R1.fastq.gz <SUBJECT>_R1.fastq.gz
Multiple subjects can be run on distributed systems using Slurm or Flux.
mappgene --slurm -n 1 -b mybank -p mypartition <SUBJECT.FASTQ.GZ>
mappgene --help
Mappgene is distributed under the terms of the BSD-3 License.
LLNL-CODE-821512