-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Bactopia v2 Overview
With tremendous effort by @Mxrcon and @abhi18av, the foundation for migrating Bactopia to DSL2 has been laid out. This transition represents the key milestone to push Bactopia to v2! (Super excited about this Davi and Abhinav!)
By switching to DSL2, the door for creating custom Bactopia workflows has been opened. For example, let's say you have some Staphylococcus aureus samples, and you want to run Bactopia and then the Bactopia Tool staph-typer. Instead with DSL2, we can create a sub-workflow (e.g. Staphopia) that will automatically run Bactopia and staph-typer. In other words, we can start creating organism-specific sub-workflows, as well as sub-workflows that only include certain steps such as assembly.
I think this also a good time to start cleaning up some things and adding features that will make long-term maintenance more sustainable.
House Cleaning
These are to help reduce the burden required to maintain Bactopia long-term. These are really about standardizing things in such a way that we can automate things. For example, printing usage across each of the workflows can be configured through config files (e.g. nf-core json schema. There are also a lot of shared functions for checking inputs, creating channels, etc. These duplications are no longer necessary in DSL2.
-
Automate Version and Citation tracking
- Add meta.yml to every module
- Use nf-core/modules meta.yml a template
- Integrate this into
bactopia citations - Integrate this into
bactopia versions(now handled byversions.yml)
- Add meta.yml to every module
-
Reduce code duplication
- Schema for printing Usage (using modified nf-core pipelines
lib/*) - Input file checking (using modified nf-core pipelines
lib/*) - Merging tables add csvtk/concat module nf-core/modules#785
- Input validation (e.g. type checking) (using modified nf-core pipelines lib/*)
- Schema for printing Usage (using modified nf-core pipelines
-
Organize DSL2 structure
- Create structure to follow (using structure loosely based off nf-core dsl2 pipelines)
-
Fixes
Additional Features
- Support for Nanopore reads
- Drop support for some uncompressed outputs (e.g. assemblies)
- Defaults to compressed outputs,
--skip_compressiondisables this feature
- Defaults to compressed outputs,
- GenBank compatible assembly
- Currently does not like
gnl|
- Currently does not like
- Outputs from the tutorial https://doi.org/10.6084/m9.figshare.17097156.v1
Implement pytest for testing
I'd like to create a suite of tests that are operated by pytest and pytest-workflows. The nf-core/modules team has a framework that can be extended to Bactopia.
- Setup Test-Data repo - Done! https://github.com/bactopia/bactopia-tests
- Setup walk through for testing - Done! https://github.com/bactopia/bactopia/tree/dsl2/tests
- Add tests to Github Actions -Done! https://github.com/bactopia/bactopia/actions/runs/1256507737
- Create Tests for Bactopia Modules
-
annotate_genome -
antimicrobial_resistance -
ariba_analysis -
assemble_genome -
assembly_qc -
blast -
call_variants -
(merged intocount_31mersminmer_sketch) -
(merged intodownload_referencescall_variants) -
(merged intoestimate_genome_sizegather_samples) -
(merged intofastq_statusgather_samples) -
gather_samples -
mapping_query -
minmer_query -
minmer_sketch -
qc_reads -
sequence_type
-
- Create Tests for Bactopia Tool Modules
-
agrvate -
bakta -
ectyper -
emmtyper -
eggnog -
fastani -
hicap -
ismapper -
kleborate -
lissero -
mashtree -
meningotype -
ngmaster -
pangenome -
seqsero2 -
spatyper -
staph-typer -
staphopiasccmec -
tbprofiler
-
Convert some processes to nf-core/modules
There are a few tools used by Bactopia that are the only tool in the process. Most of these tools are in the Bactopia Tools. I think its best that these tools be transferred to nf-core/modules. Many of these will need to be added to nf-core but they are in need of some bacterial genomic tool love, so its ok!
- Need to be added (25 total)
-
agrvatemodule foragrvatenf-core/modules#693 -
baktaadd bakta module nf-core/modules#1085 -
clonalframeadd clonalframeml module nf-core/modules#974 -
ectyperadd ectyper module nf-core/modules#948 -
eggnog_mapperadd eggnog-mapper module nf-core/modules#1020 -
emmtyperadd emmtyper module nf-core/modules#1028 -
fastanimodule fastani nf-core/modules#695 -
fastq-scanadd module for fastq-scan nf-core/modules#935 -
gtdbAdd gtdbtk/classifywf module nf-core/modules#765 -
hicapadd hicap module nf-core/modules#772 -
ismapperadd ismapper module nf-core/modules#773 -
kleborateAdd modulekleboratenf-core/modules#711 -
lisseroadd lissero module nf-core/modules#1026 -
mashtreeadd mashtree module nf-core/modules#767 -
meningotypeadd meningotype module nf-core/modules#1022 -
ngmasteradd ngmaster module nf-core/modules#1024 -
phyloflashAdd phyloflash module nf-core/modules#786 -
pirateadd pirate module nf-core/modules#777 -
roaryadd roary module nf-core/modules#776 -
scoaryadd scoary module nf-core/modules#1034 -
seqsero2add seqsero2 module nf-core/modules#1016 -
snp-distsmodule: snp-dists nf-core/modules#697 -
spatyperadd spatyper module nf-core/modules#784 -
staphopia-sccmecadd staphopia-sccmec module nf-core/modules#702 -
tb-profileradd module for tbprofiler nf-core/modules#947
-
Curated Datasets
I think one of the best features of Bactopia is the ability to include public datasets. This works great for general datasets, but organism-specific datasets are kind of lost. I think it would be great to start a set of curated datasets that users can add data to.
Here's an example of a curated Staphylococcus aureus Bactopia Dataset. This dataset can easily be imported and allow users to rapidly analyze their samples with a curated dataset specific to their organism.
I think it would also be nice if these curated datasets, included SRA accessions linked to publications. But this exceeds my capabilities and would require extensive community support.
Species specific Workflows
With DSL2, we can create Species Specific workflows by combining the main Bactopia workflow with some Bactopia Tools. The main example, and thus shall act as a proof-of-concept will be Staphopia. Staphopia is essentially Bactopia + the Bactopia Tool staph-typer.
- Create a Staphopia workflow