Telomore is a tool for identifying and extracting telomeric sequences from Oxford Nanopore or Illumina sequencing reads of Streptomycetes spp. that have been excluded from a de novo assembly. It processes sequencing data to extend assemblies, generate quality control (QC) maps, and produce finalized assemblies with the telomere/recessed bases included.
Telomore does not identify linear contigs but rather rely on the user to provide that information in the header of the fasta-reference file.
telomore --mode <mode> --reference <reference.fasta> [options]Required Arguments
--modeSpecify the sequencing platform. Options: nanopore or illumina.--referencePath to the reference genome file in FASTA format.
Nanopore-Specific Arguments
--singlePath to a single gzipped FASTQ file containing Nanopore reads.
Illumina-Specific Arguments
--read1Path to gzipped FASTQ file for Illumina read 1.--read2Path to gzipped FASTQ file for Illumina read 2.
Optional Arguments
--coverage_thresholdSet the threshold for coverage to stop trimming during consensus trimming (Default is coverage=5 for ONT reads and coverage=1 for Illumina reads).--quality_thresholdSet the Q-score required to count a read position in the coverage calculation during consensus trimming (Default is Q-score=10 for ONT reads and Q-score=30 for Illumina reads).--threadsNumber of threads to use (default: 1).--keepRetain intermediate files (default: False).--quietSuppress console logging.
The process is as follows:
- Map Reads: Reads are mapped against all contigs in a reference using either minimap2 or Bowtie2.
- Extract Extending Reads Extending reads that are mapped to the ends of linear contigs are extracted.
- Build Consensus The terminal extending reads from each end is used to construct a consensus using either lamassemble or mafft + EMBOSS cons
- Align and Attach consensus The consensus for each end is aligned to the reference and used to extend it.
- Trim Extended Replicon In a final step, all terminally mapped reads are mapped to the new extended reference and used to trim away spurious sequence, based on read-support.
At the end of a run Telomore produces the following outputs:
├── {fasta_basename}_{seqtype}_telomore
│ ├── {contig_name}_telomore_extended.fasta
│ ├── {contig_name}_telomore_ext_{seqtype}.log
│ ├── {contig_name}_telomore_QC.bam
│ ├── {contig_name}_telomore_QC.bam.bai
│ ├── {contig_name}_telomore_untrimmed.fasta
│ └── {fasta_basename}_telomore.fasta
└── telomore.log # log containing run information.
In the folder there is a number of files generated for each contig considered:
| File Name | Description |
|---|---|
{contig_name}_telomore_extended.fasta |
Original contig sequence + added terminal bases - trimmed bases |
{contig_name}_telomore_ext_{seqtype}.log |
Log contianing information about bases added, trimmed off and final result. |
{contig_name}_telomore_QC.bam |
BAM file containing terminal reads mapped to {contig_name}_telomore_extended.fasta. Useful for manual inspection of the extension |
{contig_name}_telomore_QC.bam.bai |
Index file for the corresponding BAM file. |
{contig_name}_telomore_untrimmed.fasta |
Original contig sequence + added terminal bases |
Additionally, there is a fasta-file collecting all tagged linear contigs as they appear in {contig_name}_telomore_extended.fasta together with all non-linear contigs in the order they appear in the original file.
Inspecting the {contig_name}_QC.bam-file in IGV (Integrative Genomics Viewer) can be informative in evaluating the extended contig.
- Bowtie2
- Emboss tools (cons specifically)
- Lamassemble
- LAST-DB
- Mafft
- Minimap2, version 2.25 or higher
- Samtools
These can be installed using the conda recipe in this repo:
conda env create -f environment.yml -yThis repo can then be downloaded using git clone, the conda enviroment activated and the tool installed
# Activate telomore conda env
conda activate telomore
# Clone telomore repo
git clone https://github.com/dalofa/telomore && cd telomore
# Install package
pip install -e '.[dev]'