Introduction

ADRSM (Ancient DNA Read Simulator for Metagenomics) is a tool designed to simulate the paired-end sequencing of a metagenomic community. ADRSM allows you to control precisely the amount of DNA from each organism in the community, which can be used to benchmark different metagenomics methods.

Dependencies

Conda

Installation

conda install -c maxibor adrsm

Usage

adrsm -d ./data/genomes ./data/short_genome_list.csv

Output

metagenome.{1,2}.fastq : Simulated paired end reads
stats.csv : Statistics of simulated metagenome (organism, percentage of organism's DNA in metagenome)

Help

$ adrsm --help
usage: ADRSM v0.6 [-h] [-d DIRECTORY] [-r READLENGTH] [-n NBINOM]
                  [-fwd FWDADAPT] [-rev REVADAPT] [-e ERROR] [-p GEOM_P]
                  [-m MIN] [-M MAX] [-o OUTPUT] [-q QUALITY] [-s STATS]
                  [-se SEED] [-t THREADS]
                  confFile

Ancient DNA Read Simulator for Metagenomics

positional arguments:
  confFile       path to configuration file

optional arguments:
  -h, --help     show this help message and exit
  -d DIRECTORY   path to genome directory. Default = .
  -r READLENGTH  Average read length. Default = 76
  -n NBINOM      n parameter for Negative Binomial insert length distribution.
                 Default = 8
  -fwd FWDADAPT  Forward adaptor. Default = AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
                 NNNNNNATCTCGTATGCCGTCTTCTGCTTG
  -rev REVADAPT  Reverse adaptor. Default =
                 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
  -e ERROR       Illumina sequecing error. Default = 0.01
  -p GEOM_P      Geometric distribution parameter for deamination. Default =
                 0.5
  -m MIN         Deamination substitution base frequency. Default = 0.001
  -M MAX         Deamination substitution max frequency. Default = 0.3
  -o OUTPUT      Output file basename. Default = ./metagenome.*
  -q QUALITY     Base quality encoding. Default = d (PHRED+64)
  -s STATS       Statistic file. Default = stats.csv
  -se SEED       Seed for random generator. Default = 7357
  -t THREADS     Number of threads for parallel processing. Default = 2

Genome directory

Each genome fasta file must be named after the name of the organism. (example: data/genomes)

Configuration file (`confFile`)

The configuration file is a .csv file describing, one line per genome, the mean insert size, and the expected genome coverage. Example short_genome_list.csv:

genome (mandatory)	insert_size (mandatory)	coverage (mandatory)	deamination (mandatory)
Agrobacterium_tumefaciens.fa	47	0.1	yes
Bacillus_anthracis.fa	48	0.2	no

Note on Coverage

Given the sequencing error, and the random choice of inserts, the target coverage might differ slightly from the real coverage (fig 1)

Figure 1: Coverage plot for simulated sequencing of Elephas maximus mitocondria. Aligned with Bowtie2 (default-parameters). Read-length = 76, insert-length = 200.

Note on Deamination simulation

The deamination is modeled using a Geometric distribution With the default parameters, the substitution frequency is depicted in fig 2:

Figure 2: Substitution frequency.

For each nucleotide, a random number Pu is sampled from an uniform distribution (of support [0 ,1]) and compared to the corresponding value Pg of the rescaled geometric PMF at this nucleotide.
If Pg >= Pu, the base is substituted (fig 3).

Figure 3: Substitutions distribution along a DNA insert, with default parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
img		img
lib		lib
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
adrsm		adrsm
metagenome.1.fastq		metagenome.1.fastq
metagenome.2.fastq		metagenome.2.fastq
stats.csv		stats.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Dependencies

Installation

Usage

Output

Help

Genome directory

Configuration file (`confFile`)

Note on Coverage

Note on Deamination simulation

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

maxibor/adrsm

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dependencies

Installation

Usage

Output

Help

Genome directory

Configuration file (confFile)

Note on Coverage

Note on Deamination simulation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Configuration file (`confFile`)

Packages