Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
The first part of the pipeline is shown here:
The second part of the pipeline is shown here:
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
The pipeline is built using Nextflow and processes data using the following steps:
- Merge per-lane FASTQ files with the
nf-core/cat_fastqmodule. - Report raw read quality with
FastQC. - (optional) remove reads that DO NOT start with G.
- Trim adapters with
TrimGalore. - Report trimmed read quality with
FastQC - (optional; done by default) Trim the first
Gin forward reads withcutadapt. - (optional) Build a
STARorbowtie2index of the reference genome FASTA file, if the index is not provided. For theSTARindex, use a mandatory genome annotation in a GTF format. - Map trimmed reads onto the genome and filter alignments. If using
STAR, then retain only the reads with at most 2 alignments (done within theSTARalignment module); if usingbowtie2, then retain only the reads with$MAPQ\geq 20$ withsamtools view. - Convert wigs to bigWigs using
UCSC wigtobigwigmodule. - (optional) Remove PCR and optical duplicate reads with
samtools markdup. See below for details. - Sort the obtained BAM files using
samtools sort. - Index the sorted BAM files with
samtools index. - Assess mapping quality using
samtools stats,samtools flagstatandsamtools idxstats. - MultiQC - Aggregate report describing results and QC from the mapping part of the pipeline
- Create a BSgenome package for the reference genome, if the package is not available.
- Create a CAGEexp object and call TSSs with
CAGErusing a BSgenome package for the respective genome. If reads were mapped withSTAR, bigWig files to use as input forCAGEr; if reads were mapped withbowtie2, then use MAPQ-filtered and sorted BAM files asCAGErinput. - Analysis of CAGE reads according to the manual of
CAGEr. Final output is a markdown document summarizing the results and QC, as well as tracks: bed and bigwig files, a set of intermediate RDS files, stand-alone plots (all shown or referenced in the report), and data tables. - Pipeline information - Report metrics generated during the workflow execution
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gzEach row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run ComputationalRegulatoryGenomicsICL/customcage \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--gtf example.gtf \
--outdir <OUTDIR>Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For extended documentation about the input parameters and usage, please visit the usage documentation. About outputs you can read at the outputs page.
nf-core/customcageq has been developed by Sviatoslav Sidorov (@sidorov-si), Katalin Ferenc (@ferenckata), Damir Baranasic (@da-bar), Elena Gómez-Marín (@ElenaGoMa), and Pavel Nikitin (@nikitin-p).
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.