Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SimCab-CHU/MobiCT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MobiCT - ctDNA Analysis Pipeline

Nextflow Run with conda Run with docker DOI

CI currently running - Use the main branch for now.

Introduction

MobiCT is an analysis pipeline designed for detecting SNVs (Single Nucleotide Variants) and small InDels in circulating tumor DNA (ctDNA) obtained through non-invasive liquid biopsy. The pipeline serves diagnostic, prognostic, and therapeutic purposes in precision oncology.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute environments in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible.

Pipeline Summary

MobiCT performs the following key steps:

  1. Quality Control: Raw read quality assessment using FastP
  2. Alignment: Read mapping to reference genome using BWA-MEM
  3. Deduplication: PCR duplicate removal using Picard/fgbio
  4. Variant Calling: SNV and InDel detection using VarDict
  5. Annotation: Variant annotation using Ensembl VEP
  6. Quality Metrics: Comprehensive QC metrics generation
  7. Reporting: MultiQC report generation

Quick Start

  1. Install Nextflow (>=20.04.0)

  2. Create a conda environment with required tools:

    conda create -n mobict -c conda-forge -c bioconda \
      gatk4 fgbio bwa fastp samtools picard vardict ensembl-vep
  3. Download the pipeline:

    git clone https://github.com/SimCab-CHU/MobiCT.git
    cd MobiCT
  4. Test the pipeline with minimal dataset:

    conda activate mobict
    nextflow run MobiCT.nf -c nextflow.config --input test_data

Usage

Typical command

nextflow -log /path/to/output/my.log run MobiCT.nf -c nextflow.config

Configuration

Before running the pipeline, edit the nextflow.config file to specify:

  • Input FASTQ files paths
  • Output directory
  • Reference genome path
  • Target intervals/BED files
  • Resource allocation

Input Requirements

The pipeline expects:

  • FASTQ files: Paired-end sequencing data from ctDNA samples
  • Reference genome: Human reference genome (e.g., GRCh38)
  • Target intervals: BED file defining regions of interest
  • VEP database: Pre-downloaded VEP cache and databases

Reference Data Preparation

  1. Download reference genome (GRCh38 recommended):

    wget http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
  2. Download VEP databases (see VEP documentation):

    vep_install -a cf -s homo_sapiens -y GRCh38 -c /path/to/vep/cache

Output

Results are organized in the specified output directory:

outdir/
├── sample1/
│   ├── sample1.deduplicated.bam
│   ├── sample1.deduplicated.bam.bai
│   ├── sample1.annotated.vcf
│   ├── sample1.HsMetrics.1.txt
│   ├── sample1.HsMetrics.3.txt
│   └── sample1.QC.bcftools_stats.stats
├── sample2/
│   └── ...
└── multiqc/
    └── multiqc_report.html

Output Files Description

  • .deduplicated.bam: Aligned, deduplicated BAM file
  • .annotated.vcf: Variant calls with functional annotations
  • .HsMetrics.*.txt: Hybrid selection metrics from Picard
  • .QC.bcftools_stats.stats: Variant calling statistics
  • multiqc_report.html: Comprehensive quality control report

Parameters

Core Options

Parameter Description Default
--input Path to input FASTQ files Required
--outdir Output directory ./results
--genome Reference genome path Required
--intervals Target intervals BED file Required

Resource Options

Parameter Description Default
--max_cpus Maximum number of CPUs 16
--max_memory Maximum memory allocation '128.GB'
--max_time Maximum time per job '240.h'

Profiles

The pipeline supports different execution profiles:

  • conda: Use Conda for dependency management
  • docker: Use Docker containers
  • singularity: Use Singularity containers
  • test: Run with test dataset

Example:

nextflow run MobiCT.nf -profile conda,test

Test Data

Raw sequencing data (FASTQ files) of commercial controls used in the study are available at: NCBI SRA: PRJNA1209006

Citations

If you use MobiCT for your analysis, please cite:

MobiCT: ctDNA Analysis Pipeline

[Publication DOI will be added]

Tools Citations

This pipeline uses several bioinformatics tools. Please also cite:

  • Nextflow: Paolo Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017)
  • BWA: Li H. and Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009)
  • VarDict: Zhongwu Lai, et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Research 44, e108 (2016)
  • VEP: McLaren W, et al. The Ensembl Variant Effect Predictor. Genome Biology 17, 122 (2016)
  • MultiQC: Philip Ewels, et al. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047-3048 (2016)

Credits

MobiCT was developed by the SimCab team at CHU.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions and support:

  • Create an issue on GitHub
  • Contact the development team

Changelog

Version 1.0.0

  • Initial release
  • Support for SNV and small InDel detection in ctDNA
  • Integrated quality control and reporting
  • VEP-based variant annotation

About

ctDNA Analysis pipeline. Version 1.0.0

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
License.md

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Nextflow 69.5%
  • Python 27.5%
  • HTML 3.0%