MobiCT - ctDNA Analysis Pipeline

CI currently running - Use the main branch for now.

Introduction

MobiCT is an analysis pipeline designed for detecting SNVs (Single Nucleotide Variants) and small InDels in circulating tumor DNA (ctDNA) obtained through non-invasive liquid biopsy. The pipeline serves diagnostic, prognostic, and therapeutic purposes in precision oncology.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute environments in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.

Pipeline Summary

MobiCT performs the following key steps:

Quality Control: Raw read quality assessment using FastP
Alignment: Read mapping to reference genome using BWA-MEM
Deduplication: PCR duplicate removal using Picard/fgbio
Variant Calling: SNV and InDel detection using VarDict
Annotation: Variant annotation using Ensembl VEP
Quality Metrics: Comprehensive QC metrics generation
Reporting: MultiQC report generation

Quick Start

Install Nextflow (>=20.04.0)

Create a conda environment with required tools:

conda create -n mobict -c conda-forge -c bioconda \
  gatk4 fgbio bwa fastp samtools picard vardict ensembl-vep

Download the pipeline:

git clone https://github.com/SimCab-CHU/MobiCT.git
cd MobiCT

Test the pipeline with minimal dataset:

conda activate mobict
nextflow run MobiCT.nf -c nextflow.config --input test_data

Usage

Typical command

nextflow -log /path/to/output/my.log run MobiCT.nf -c nextflow.config

Configuration

Before running the pipeline, edit the nextflow.config file to specify:

Input FASTQ files paths
Output directory
Reference genome path
Target intervals/BED files
Resource allocation

Input Requirements

The pipeline expects:

FASTQ files: Paired-end sequencing data from ctDNA samples
Reference genome: Human reference genome (e.g., GRCh38)
Target intervals: BED file defining regions of interest
VEP database: Pre-downloaded VEP cache and databases

Reference Data Preparation

Download reference genome (GRCh38 recommended):

wget http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Download VEP databases (see VEP documentation):

vep_install -a cf -s homo_sapiens -y GRCh38 -c /path/to/vep/cache

Output

Results are organized in the specified output directory:

outdir/
├── sample1/
│   ├── sample1.deduplicated.bam
│   ├── sample1.deduplicated.bam.bai
│   ├── sample1.annotated.vcf
│   ├── sample1.HsMetrics.1.txt
│   ├── sample1.HsMetrics.3.txt
│   └── sample1.QC.bcftools_stats.stats
├── sample2/
│   └── ...
└── multiqc/
    └── multiqc_report.html

Output Files Description

.deduplicated.bam: Aligned, deduplicated BAM file
.annotated.vcf: Variant calls with functional annotations
.HsMetrics.*.txt: Hybrid selection metrics from Picard
.QC.bcftools_stats.stats: Variant calling statistics
multiqc_report.html: Comprehensive quality control report

Parameters

Core Options

Parameter	Description	Default
`--input`	Path to input FASTQ files	Required
`--outdir`	Output directory	`./results`
`--genome`	Reference genome path	Required
`--intervals`	Target intervals BED file	Required

Resource Options

Parameter	Description	Default
`--max_cpus`	Maximum number of CPUs	16
`--max_memory`	Maximum memory allocation	'128.GB'
`--max_time`	Maximum time per job	'240.h'

Profiles

The pipeline supports different execution profiles:

conda: Use Conda for dependency management
docker: Use Docker containers
singularity: Use Singularity containers
test: Run with test dataset

Example:

nextflow run MobiCT.nf -profile conda,test

Test Data

Raw sequencing data (FASTQ files) of commercial controls used in the study are available at: NCBI SRA: PRJNA1209006

Citations

If you use MobiCT for your analysis, please cite:

MobiCT: ctDNA Analysis Pipeline

[Publication DOI will be added]

Tools Citations

This pipeline uses several bioinformatics tools. Please also cite:

Nextflow: Paolo Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017)
BWA: Li H. and Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009)
VarDict: Zhongwu Lai, et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Research 44, e108 (2016)
VEP: McLaren W, et al. The Ensembl Variant Effect Predictor. Genome Biology 17, 122 (2016)
MultiQC: Philip Ewels, et al. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047-3048 (2016)

Credits

MobiCT was developed by the SimCab team at CHU.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions and support:

Create an issue on GitHub
Contact the development team

Changelog

Version 1.0.0

Initial release
Support for SNV and small InDel detection in ctDNA
Integrated quality control and reporting
VEP-based variant annotation

Name		Name	Last commit message	Last commit date
Latest commit History 161 Commits
Example_data		Example_data
Generate_Interface		Generate_Interface
assets		assets
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
License.md		License.md
README.md		README.md
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
nf-core-MobiCT_logo_light.png		nf-core-MobiCT_logo_light.png
nf-core-ctDNA_logo_light.png		nf-core-ctDNA_logo_light.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

MobiCT - ctDNA Analysis Pipeline

Introduction

Pipeline Summary

Quick Start

Usage

Typical command

Configuration

Input Requirements

Reference Data Preparation

Output

Output Files Description

Parameters

Core Options

Resource Options

Profiles

Test Data

Citations

Tools Citations

Credits

License

Support

Changelog

Version 1.0.0

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

Licenses found

SimCab-CHU/MobiCT

Folders and files

Latest commit

History

Repository files navigation

MobiCT - ctDNA Analysis Pipeline

Introduction

Pipeline Summary

Quick Start

Usage

Typical command

Configuration

Input Requirements

Reference Data Preparation

Output

Output Files Description

Parameters

Core Options

Resource Options

Profiles

Test Data

Citations

Tools Citations

Credits

License

Support

Changelog

Version 1.0.0

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages