MAGICIAN is a tool for easily generating simulated metagenome-assembled genomes from a user-determined "community".
MAGICIAN is a Snakemake pipeline that uses conda to manage dependencies. Thus, it primarily requires Snakemake and conda/mamba to be used.
It is also necessary to install a fork of CAMISIM 1.2 in order to use custom error profiles. This can be done by running
git clone https://github.com/KatSteinke/CAMISIM
In order to get started with MAGICIAN, simply clone the repository:
git clone https://github.com/KatSteinke/magician
You will also have to adapt the Snakefile.
When using your own copy of CAMISIM, set CAMISIM_DIR to the directory in which you installed CAMISIM.
MAGICIAN requires the following files to run:
- genome sequences of the organisms the simulated community should consist of, in genbank or fasta format
- a tab-separated file of sample distributions named
sample_distributions.tsvin the directory where you wish to simulate your communities. The first column lists the paths to the genomes, the second lists sequence type (chromosome or plasmid) and all subsequent columns list community composition names and the relative abundance of the sequences in these communities:
| genomes | seq_type | community1 | community2 | ... |
|------------------|------------|------------|------------|-----|
| /path/to/genome1 | chromosome | 1 | 1.5 | |
| /path/to/genome2 | chromosome | 1 | 0 | |
| /path/to/plasmid | plasmid | 1 | 1 | |
MAGICIAN is started using run_magician.py:
run_magician.py [-h] [--profile_type {mbarc,hi,mi,hi150,own}]
[--profile_name PROFILE_NAME]
[--profile_readlength PROFILE_READLENGTH]
[--insert_size INSERT_SIZE] [--cluster CLUSTER]
target
--snake_flags "--cores [N_CORES] [SNAKE_FLAGS...]"
target: the desired output file or rule. To run the entire workflow for all communities insample_distributions.tsv, specifyall_bin_summariesas the target here. To run the workflow for a single community, givesummaries/bin_summary_[COMMUNITY].xlsxhere, replacing[COMMUNITY]with the name of the community you wish to simulate.--snake_flags: the flags to be passed on to Snakemake, enclosed in double quotes. As a minimum, this means"-n "for a dry run or"--cores [N_CORES]"(with[N_CORES]being the amount of cores Snakemake should use) for an actual run.
To use conda or mamba, specify--use-conda(and--conda-frontend condaif required). For all else, refer to Snakemake's documentation.
--profile_type: the error profile CAMISIM should use for ART. This defaults to CAMISIM's default ofmbarc; other choices arehi,mi,hi150,own. The last allows users to specify their own profiles.--profile_name: required when specifying one's own profile. This is the base path to the forward/reverse reads' error profiles (e.g.path/to/custom/profile_Rif forward and reverse reads are located atpath/to/custom/profile_R1.txtandpath/to/custom/profile_R2.txtrespectively)--profile_readlength: the read length used for the custom error profile; required when specifying one's own error profile.--insert_size: mean insert size for read simulation (defaults to 270 bp)--cluster: when using Snakemake's cluster mode, supply the command for submitting jobs as you would with Snakemake