NanoSwe is a preliminary analysis toolkit for experiments that involve sequencing data from ONT's PromethION device. It has also been used for other long-read SweGen data (e.g. PacBio).
| Purpose | Program | 
|---|---|
| Quality Control | NanoPlot for QC and NanoComp | 
| Mapping to the reference | Minimap2-2.14 | 
| Sorting, Indexing, and calculating statistics | Samtools 1.9 | 
| Subsampling | Sambamba 0.7.1 | 
| BAM QC Statistics | Qualimap 2.2.1 | 
| Structural Variant Calling | Sniffles 1.0.10 | 
| Data Extraction (VCF Files only) | bcftools 1.9 | 
| Finding intersection in genomic regions | Survivor 1.0.7 | 
| Evaluation of SVs | Survivor 1.0.7 and surpyvor: 0.5.0 | 
| Removing control DNA sequences | NanoLyse | 
| Trimming Short Reads | BBMap/BBTools | 
| Homology Detection | Blast 2.7.1+ | 
| Data Visualisation | R version 3.5.3. See the scripts directory for information on libraries/packages used. | 
Example tree structure of nanopore sequencing data files
├── /basecalled/<sample>/<flowcell>/
│   ├── fastq_0.fastq
│   ├── fastq_850.fastq
│   ├── sequencing_summary_0.txt
│   ├── sequencing_summary_850.txt
│   └── reads (1)
│       ├── 0 (2)
│       │   ├── file_read_1_ch_90_strand.fast5
│       │   ├── file_read_41_ch_40_strand2.fast5
│       │   └── file_read_300_ch_40_strand2.fast5
│       └── 850
│           ├── file_read_1000_ch_200_strand.fast5
│           ├── file_read_9000_ch_100_strand.fast5
│           └── file_read_95000_ch_1000_strand2.fast5
└── /bin/
(1) Each folder contains ~8000 fast5 files 
(2) fast5 file named e.g. PCT0001_YYYYMMDD_0001A20B002222C_{flowcell}_sequencing_run_{library_full_name}__read_{number}_ch_{number}_strand.fast5)
Example tree structure of data organisation
├── /basecalled/<sample>/<flowcell>/
│   ├── FASTQ_files
│   │   ├── fastq_0.fastq
│   │   └── fastq_850.fastq
│   ├── sequencing_summary
│   │   ├── sequencing_summary_0.txt
│   │   └── sequencing_summary_850.txt
│   ├── reads *
│   │   ├── 0 *
│   │   │   ├── file_read_1_ch_90_strand.fast5
│   │   │   ├── file_read_41_ch_40_strand2.fast5
│   │   │   └── file_read_300_ch_40_strand2.fast5
│   │   └── 850
│   │       ├── file_read_1000_ch_200_strand.fast5
│   │       ├── file_read_9000_ch_100_strand.fast5
│   │       └── file_read_95000_ch_1000_strand2.fast5
│   └── <sample>_analysis
│       ├── reference_genome.fna
|       ├── reference_genome.fna.fai
│       ├── Snakefile
│       ├── /bam_files/
│       ├── /vcf_files/
│       └── /logs/
└── /bin/
./scRipts - R scripts created for visulisation of long read data. 
commands.md - Tool commands used for different analyses.
- SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
- De novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.
- Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Which human reference genome to use?
- Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome
- The thesis
- Evaluating nanopore sequencing data processing pipelines for structural variation identification
If you plan to use repository as a guide, simply and kindly mention the link https://github.com/Nazeeefa/NanoSwe for acknowledgment. To cite our publication, you can cite it as as shown below otherwise visit citeas.org to choose a different format. Thank you.
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes. 2020; 11(12):1444.
Fatima, Nazeefa; Petri, Anna; Gyllensten, Ulf; Feuk, Lars; Ameur, Adam. 2020. "Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes." Genes 11, no. 12: 1444.