Tool for generating synthetic next-generation sequencing reads with same nucleotide distribution as model reads.
boquila 0.6.1
Generate NGS reads with same nucleotide distribution as input file
Generated reads will be written to stdout
By default input and output format is FASTQ
USAGE:
boquila [FLAGS] [OPTIONS] <src>
ARGS:
<src> Model file
FLAGS:
--fasta Change input and output format to FASTA
--setQual Use given Quality score with parameter 'qual' for all simulated reads.
-h, --help Print help information
--inseqFasta Change the input sequencing format to FASTA
-V, --version Print version information
OPTIONS:
--bed <FILE> File name in which the simulated reads will be saved in BED format
--inseq <FILE> Input sequencing reads to be used instead of reference genome
--kmer <INT> Kmer size to be used while calculating frequency [default: 1]
--ref <FILE> Reference FASTA
--regions <FILE> RON formatted file containing genomic regions that generated reads will
be selected from
--seed <INT> Random number seed. If not provided system's default source of entropy
will be used instead.
--sens <INT> Sensitivity of selected reads.
If some positions are predominated by specific nucleotides, increasing
this value can make simulated reads more similar to input reads.
However runtime will also increase linearly.
[possible values: 10-100] [default: 20]
--qual <QUAL> Quality score to be applied to to each position for all reads.
'setQual' flag should be present in order it to work
Has no effect if input reads are not in FASTQ format. [default: I]
Generated reads will be written to stdout in FASTA or FASTQ format.
If --bed option is provided, generated reads also will be written to given file in BED6 format.
Sample regions file for Homo sapiens (human) genome assembly GRCh38 (hg38) is provided as GRCh38.ron
If Input Sequencing reads will be used for simulation, they should be provided with --inseq argument, instead of using --ref and --regions.
More detailed example can be found in the examples directory
Simple usage
boquila input_reads.fq --ref ref_genome.fa --regions GRCh38.ron > out.fq
Using seed for RNG
boquila input_reads.fq --ref ref_genome.fa --regions GRCh38.ron --seed 7 > out.fq
Using reads that are in FASTA format
boquila input_reads.fa --fasta --ref ref_genome.fa --regions GRCh38.ron > out.fa
Saving output in BED format
boquila input_reads.fq --ref ref_genome.fa --regions GRCh38.ron --bed out.bed > out.fq
Using Input Sequencing instead of reference genome
boquila input_reads.fq --inseq inputseq_reads.fq > out.fq
Using Input Sequencing reads which are in FASTA format
boquila input_reads.fq --inseqFasta --inseq inputseq_reads.fa > out.fq
- boquila is available via bioconda and can be easily installed via
conda install boquila -c biocondaOr via Rust toolchain
Next two methods require Cargo the Rust package manager, which should be installed automatically while installing Rust
boquila is written in Rust, so you'll need to grab a Rust installation in order to install or compile it.
The current minimum Rust version is 1.55.0
- Installing with
cargoCargo will build and install the binary, by default to$HOME/.cargo/bin/
$ cargo install --branch main --git https://github.com/CompGenomeLab/boquila.git boquila
-
Building from source For convenience, you can copy the executable
./target/release/boquilato some directory in yourPATH.- Clone the repository
git clone https://github.com/CompGenomeLab/boquila.git
- Then build with
cargocd boquila cargo build --release ./target/release/boquila --version 0.6.1
- Clone the repository