__ __ ____ _ _ _____
| \/ |/ __ \| \ | |_ _|
| \ / | | | | \| | | |
| |\/| | | | | . ` | | |
| | | | |__| | |\ |_| |_
|_| |_|\____/|_| \_|_____|
ver 0.2.2A Pangenomics Index for Finding MEMs.
MONI index uses the prefix-free parsing of the text [2][3] to build the Burrows-Wheeler Transform (BWT) of the reference genomes, the suffix array (SA) samples at the beginning and at the end of each run of the BWT, and the threshold positions of [1].
MONI is available on docker:
docker pull maxrossi91/moni:v0.2.2
docker run maxrossi91/moni:v0.2.2 moni -hif using singularity:
singularity pull moni_sif docker://maxrossi91/moni:v0.2.2
./moni_sif moni --helpWe provide MONI on a .deb package:
wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2_amd64.deb
sudo dpkg -i moni_v0.2.2_amd64.deb
moni -hWe provide MONI on a linux .sh installer:
wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2-Linux.sh
chmod +x moni_v0.2.2-Linux.sh
./moni_v0.2.2-Linux.sh
moni -hWe provide MONI on a pre-compiled .tar.gz:
wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2-Linux.tar.gz
tar -xzvf moni_v0.2.2-Linux.tar.gz
moni_v0.2.2-Linux/bin/moni -hapt-get update
apt-get install -y build-essential cmake git python3 zlib1g-devgit clone https://github.com/maxrossi91/monico moni
mkdir build
cd build; cmake -DCMAKE_INSTALL_PREFIX=<path/to/install/prefix> ..
makeReplace <path/to/install/prefix> with your preferred install path. If not specified the install path is /usr/bin by default.
make installusage: moni build [-h] -r REFERENCE [-w WSIZE] [-p MOD] [-t THREADS] [-k] [-v]
[-f] [--moni-ms] [--spumoni]
-h, --help show this help message and exit
-r REFERENCE, --reference REFERENCE
reference file name (default: None)
-o OUTPUT, --output OUTPUT
output directory path (default: same as reference)
-w WSIZE, --wsize WSIZE
sliding window size (default: 10)
-p MOD, --mod MOD hash modulus (default: 100)
-t THREADS, --threads THREADS
number of helper threads (default: 0)
-k keep temporary files (default: False)
-v verbose (default: False)
-f read fasta (default: False)
-g GRAMMAR, --grammar GRAMMAR
select the grammar [plain, shaped] (default: plain)
usage: moni ms [-h] -i INDEX -p PATTERN [-o OUTPUT] [-t THREADS]
-h, --help show this help message and exit
-i INDEX, --index INDEX
reference index base name (default: None)
-p PATTERN, --pattern PATTERN
the input query (default: None)
-o OUTPUT, --output OUTPUT
output directory path (default: .)
-t THREADS, --threads THREADS
number of helper threads (default: 1)
-g GRAMMAR, --grammar GRAMMAR
select the grammar [plain, shaped] (default: plain)
usage: moni mems [-h] -i INDEX -p PATTERN [-o OUTPUT] [-e] [-s] [-t THREADS]
-h, --help show this help message and exit
-i INDEX, --index INDEX
reference index base name (default: None)
-p PATTERN, --pattern PATTERN
the input query (default: None)
-o OUTPUT, --output OUTPUT
output directory path (default: .)
-e, --extended-output
output MEM occurrence in the reference (default: False)
-s, --sam-output
output MEM in a SAM formatted file. (default: False)
-t THREADS, --threads THREADS
number of helper threads (default: 1)
-g GRAMMAR, --grammar GRAMMAR
select the grammar [plain, shaped] (default: plain)
usage: moni extend [-h] -i INDEX -p PATTERN [-o OUTPUT] [-t THREADS] [-b BATCH] [-g GRAMMAR] [-L EXTL] [-A SMATCH] [-B SMISMATCH] [-O GAPO] [-E GAPE]
optional arguments:
-h, --help show this help message and exit
-i INDEX, --index INDEX
reference index folder (default: None)
-p PATTERN, --pattern PATTERN
the input query (default: None)
-o OUTPUT, --output OUTPUT
output directory path (default: .)
-t THREADS, --threads THREADS
number of helper threads (default: 1)
-b BATCH, --batch BATCH
number of reads per thread batch (default: 100)
-g GRAMMAR, --grammar GRAMMAR
select the grammar [plain, shaped] (default: plain)
-L EXTL, --extl EXTL length of reference substring for extension (default: 100)
-A SMATCH, --smatch SMATCH
match score value (default: 2)
-B SMISMATCH, --smismatch SMISMATCH
mismatch penalty value (default: 4)
-O GAPO, --gapo GAPO coma separated gap open penalty values (default: 4,13)
-E GAPE, --gape GAPE coma separated gap extension penalty values (default: 2,1)
moni build -r data/SARS-CoV2/SARS-CoV2.1k.fa.gz -o sars-cov2 -fIt produces three files sars-cov2.plain.slp, sars-cov2.thrbv.ms, and sars-cov2.idx in the current folder which contain the grammar, the rlbwt and the thresholds, and the starting position and name of each fasta sequence in the reference file respectively.
Compute the matching statistics of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder
moni ms -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o readsIt produces two output files reads.lengths and reads.pointers in the current folder which store the lengths and the positions of the matching statistics of the reads against the reference in a fasta-like format.
moni mems -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o readsIt produces one output file reads.mems in the current folder which store the MEMs reposted as pairs of position and lengths in a fasta-like format.
moni extend -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o readsIt produces one output file reads.sam in the current folder which stores the information of the MEM extensions in SAM format.
Please, if you use this tool in an academic setting cite the following papers:
@article{RossiOLGB21,
author = { Massimiliano Rossi and
Marco Oliva and
Ben Langmead and
Travis Gagie and
Christina Boucher},
title = {MONI: A Pangenomics Index for Finding Maximal Exact Matches},
booktitle = {Research in Computational Molecular Biology - 25th Annual
International Conference, {RECOMB} 2021, Padova, Italy},
journal = {Journal of Computational Biology},
volume = {29},
number = {2},
pages = {169--187},
year = {2022},
publisher = {Mary Ann Liebert, Inc., publishers 140 Huguenot Street, 3rd Floor New~…}
}
- Christina Boucher
- Travis Gagie
- Ben Langmead
- Massimiliano Rossi
Moni is the Finnish word for multi.
[1] Hideo Bannai, Travis Gagie, and Tomohiro I, "Refining ther-index", Theoretical Computer Science, 812 (2020), pp. 96–108
[2] Christina Boucher, Travis Gagie, Alan Kuhnle and Giovanni Manzini, "Prefix-Free Parsing for Building Big BWTs", In Proc. of the 18th International Workshop on Algorithms in Bioinformatics (WABI 2018).
[3] Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, and Taher Mun. "Prefix-free parsing for building big BWTs.", Algorithms for Molecular Biology 14, no. 1 (2019): 13.