gaga2 requires a working nextflow installation (v20.4+).
Other dependencies:
- bbmap
- fastqc
- figaro
- R v4+ with dada2, devtools, tidyverse, and cowplot installed
For convenience, gaga2 comes with a Singularity container with all dependencies installed.
singularity pull oras://ghcr.io/zellerlab/gaga2:latest
gaga2 takes as input Illumina paired-end 16S amplicon sequences (e.g. sequenced on a MiSeq).
- Read files need to be named according the typical pattern
<prefix=sample_id>_R?[12].{fastq,fq,fastq.gz,fq.gz}. They should, but don't have to, be arranged in a sample-based directory structure:
<project_dir> (aka "input_dir")
|___ <sample_1>
| |____ <sample_1_forward_reads>
| |____ <sample_2_reverse_reads>
|
|___ <sample_2>
| |____ <empty samples will be ignored>
|
|___ <sample_n>
|____ <sample_n_forward_reads>
|____ <sample_n_reverse_reads>
A flat directory structure (with all read files in the same directory) or a deeply-branched (with read files scattered over multiple levels) should also work.
If gaga2 preprocesses the reads, it will automatically use _R1/2 endings internally.
-
If input reads have already been preprocessed, you can set the
--preprocessedflag. In this case,gaga2will do no preprocessing at all and instructdada2to perform no trimming. Otherwie,gaga2will assess the read lengths for uniformity. If read lengths differ within and between samples, preprocessing withfigarois not possible anddada2will be run without trimming. -
Samples with less than
110reads afterdada2preprocessing, will be discarded.
gaga2 can be directly run from github.
nextflow run zellerlab/gaga2 <parameters>
To obtain a newer version, do a
nextflow pull zellerlab/gaga2
before.
In addition, you should obtain a copy of the run.config from the gaga2 github repo and modify it according to your environment.
--input_diris the project directory mentioned above.--output_dirwill be created automatically.--amplicon_lengththis is derived from your experiment parameters (this is not read-length, but the length of the, well, amplicon!)--single_endthis is only required for single-end libraries (auto-detection of library-type is in progress)
--min_overlapof read pairs is20bpby default--primers <comma-separated-list-of-primer-sequences>or--left_primer, and--right_primerIf primer sequences are provided via--primers,gaga2will remove primers and upstream sequences (usingbbduk), such as adapters based on the primer sequences. If non-zero primer lengths are provided instead (via--left_primerand--right_primer),figarowill take those into account when determining the best trim positions.--preprocessedwill prevent any further preprocessing bygaga2- this flag should only be used if the read data is reliably clean.
- The old gaga2 version can be run with
source /g/scb2/zeller/schudoma/software/wrappers/gaga2_wrapperbefore submitting job to cluster - Please report issues/requests/feedback in the github issue tracker
- If you want to run
gaga2on the cluster,nextflowalone requires>=5GBmemory just for 'managing'.