Nextflow pipeline for quality filtering, mapping, and AID-motif variant detection in Oxford Nanopore amplicon sequencing data. Specifically designed for immunoglobulin rearrangements to detect somatic hypermutation patterns, particularly C>T and G>A transitions associated with Activation-Induced Deaminase (AID) activity.
- Reference Preparation: Converts AB1 files to FASTA format
- Quality Filtering: Filters reads by quality and length (fastp)
- Read Mapping: Aligns reads to reference sequences (minimap2)
- Variant Calling: Pileup-based variant detection with dual frequency filters
- Requires minimum read depth and allele frequency
- Alt dominance filter: alternative allele must dominate among non-reference bases
- Tracks variants in individual reads for per-sequence analysis
- AID Motif Annotation: Annotates variants with AID motif patterns (WRCY, WA, RCG) and spot types
CSV sample sheet with columns: sample, fastq_file, reference_ab1 (optional), reference_fa (optional). Each sample requires either AB1 or FASTA reference. If there are multiple FASTQ files per sample, provide a reference for only one of them.
By default, results are stored in the results/ directory with the following structure:
- BAM files: Sorted and indexed alignment files (
results/bam/) - QC reports: fastp quality control reports in JSON and HTML formats (
results/fastp/) - Variant tables:
- Per-position variants with AID motif annotations (
results/variants/*_variants.csv) - Per-sequence variant summaries with C>T and G>A counts (
results/variants/*_variants_per_seq.csv)
- Per-position variants with AID motif annotations (
- Nextflow (>=24.04). See Nextflow installation guide for instructions.
- Docker or Apptainer/Singularity (for containerized execution). See Docker quick start guide or Apptainer quick start guide for installation instructions.
You need to provide at least a sample sheet CSV file via --samples_file, where you specify the paths to your FASTQ files and reference files.
In the example_input/ folder you can find an example samples.csv file and example data.
To run the pipeline, use one of the following commands (adjust paths as needed):
# With Docker
nextflow run main.nf -profile docker --samples_file example_input/samples.csv
# With Apptainer/Singularity
nextflow run main.nf -profile apptainer --samples_file example_input/samples.csvThe pipeline can be configured via command-line parameters. The following parameters are available:
| Parameter | Default | Description |
|---|---|---|
--samples_file |
- | Path to input CSV sample sheet |
--output_dir |
results | Output directory for results |
--qf_min_length |
500 | Minimum read length after quality filtering |
--qf_max_length |
750 | Maximum read length after quality filtering |
--qf_min_quality |
13 | Minimum base quality score (Phred) |
--qf_cut_mean_quality |
15 | Mean quality threshold for sliding window trimming |
--var_min_depth |
100 | Minimum read depth to call a variant |
--var_min_alt_af |
0.005 | Minimum alternative allele frequency (alt_count / total_depth) |
--var_min_alt_dom |
0.75 | Minimum alt dominance among non-ref bases (alt / (total - ref)) |
| Tool | Version | Purpose |
|---|---|---|
| Nextflow | >=24.04 | Workflow management |
| fastp | 0.23.4 | Quality filtering and trimming |
| EMBOSS seqret | 6.6.0 | AB1 to FASTA conversion |
| minimap2 | 2.26 | Read alignment (map-ont preset) |
| samtools | 1.17 | BAM file manipulation and indexing |
| pysam | 0.21.0 | Python library for variant calling |
| Python | 3.11 | Custom variant detection script |