Thanks to visit codestin.com
Credit goes to github.com

Skip to content

catg-umag/ig-ont-variants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nanopore Amplicon Variant Detection Pipeline

Nextflow pipeline for quality filtering, mapping, and AID-motif variant detection in Oxford Nanopore amplicon sequencing data. Specifically designed for immunoglobulin rearrangements to detect somatic hypermutation patterns, particularly C>T and G>A transitions associated with Activation-Induced Deaminase (AID) activity.

Workflow

  1. Reference Preparation: Converts AB1 files to FASTA format
  2. Quality Filtering: Filters reads by quality and length (fastp)
  3. Read Mapping: Aligns reads to reference sequences (minimap2)
  4. Variant Calling: Pileup-based variant detection with dual frequency filters
    • Requires minimum read depth and allele frequency
    • Alt dominance filter: alternative allele must dominate among non-reference bases
    • Tracks variants in individual reads for per-sequence analysis
  5. AID Motif Annotation: Annotates variants with AID motif patterns (WRCY, WA, RCG) and spot types

Input

CSV sample sheet with columns: sample, fastq_file, reference_ab1 (optional), reference_fa (optional). Each sample requires either AB1 or FASTA reference. If there are multiple FASTQ files per sample, provide a reference for only one of them.

Output

By default, results are stored in the results/ directory with the following structure:

  • BAM files: Sorted and indexed alignment files (results/bam/)
  • QC reports: fastp quality control reports in JSON and HTML formats (results/fastp/)
  • Variant tables:
    • Per-position variants with AID motif annotations (results/variants/*_variants.csv)
    • Per-sequence variant summaries with C>T and G>A counts (results/variants/*_variants_per_seq.csv)

Requirements

Usage

You need to provide at least a sample sheet CSV file via --samples_file, where you specify the paths to your FASTQ files and reference files. In the example_input/ folder you can find an example samples.csv file and example data.

To run the pipeline, use one of the following commands (adjust paths as needed):

# With Docker
nextflow run main.nf -profile docker --samples_file example_input/samples.csv

# With Apptainer/Singularity
nextflow run main.nf -profile apptainer --samples_file example_input/samples.csv

Configuration

The pipeline can be configured via command-line parameters. The following parameters are available:

Parameter Default Description
--samples_file - Path to input CSV sample sheet
--output_dir results Output directory for results
--qf_min_length 500 Minimum read length after quality filtering
--qf_max_length 750 Maximum read length after quality filtering
--qf_min_quality 13 Minimum base quality score (Phred)
--qf_cut_mean_quality 15 Mean quality threshold for sliding window trimming
--var_min_depth 100 Minimum read depth to call a variant
--var_min_alt_af 0.005 Minimum alternative allele frequency (alt_count / total_depth)
--var_min_alt_dom 0.75 Minimum alt dominance among non-ref bases (alt / (total - ref))

Tools and Versions

Tool Version Purpose
Nextflow >=24.04 Workflow management
fastp 0.23.4 Quality filtering and trimming
EMBOSS seqret 6.6.0 AB1 to FASTA conversion
minimap2 2.26 Read alignment (map-ont preset)
samtools 1.17 BAM file manipulation and indexing
pysam 0.21.0 Python library for variant calling
Python 3.11 Custom variant detection script

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published