optDNTRA is a comprehensive command-line tool designed to optimize transcript assemblies generated by RNA-seq assemblers (e.g., Trinity). It streamlines data processing for both single-end and paired-end RNA-seq datasets and provides extensive quality control and assessment capabilities. The tool is fully integrated with Snakemake, ensuring reproducible and scalable workflows.
To install the required dependencies for optDNTRA, we recommend using Mamba, a fast and robust package manager that serves as a drop-in replacement for Conda. Mamba significantly accelerates the installation process and handles complex dependency resolution more efficiently.
To set up optDNTRA, first ensure you have Mamba installed. Then, clone the repository and create an environment as follows:
git clone https://github.com/zywu2002/optDNTRA.git
cd optDNTRA
mamba env create -f environment.yml
mamba activate optDNTRA
export PATH=/path/to/optDNTRA/directory:$PATHNote
Before running the optDNTRA Snakemake pipeline, you need to configure the defaults.yml file. This file defines the necessary paths and parameters for the workflow.
Here's a summary of the available command-line arguments:
$ optDNTRA.py -h
usage: optDNTRA.py [-h] -c defaults.yml -t transcripts.fasta [-f reads.fq] [-1 reads_1.fq] [-2 reads_2.fq] [-s samples.tab] [-o optDNTRA_out] [-r reference.fasta] [-se] [-ss {F,R,RF,FR}] [--trim] [--qc] [--buscoAsmt] [--omarkAsmt] [--emapperAnno] [-v] [-p THREADS] [--snakemakeOptions SNAKEMAKEOPTIONS]
optDNTRA: Optimization of De Novo Transcriptome RNA-seq Assembly
For RNA-seq input data:
If Paired-end:
-1 <string>, --left <string> Left reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
-2 <string>, --right <string> Right reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
Or Single-end:
-f <string>, --fastq <string> Single-end RNA-seq reads (.fq, .fastq, .fq.gz, .fastq.gz)
Or
-s <string>, --sampleSheet <string> Sample metadata file for multi-sample analysis (tab-delimited)
Example:
cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq
cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq
cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq
cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq
For single-end, remove the 4th column in the text file.
options:
-h, --help show this help message and exit
-c defaults.yml, --config defaults.yml
Configuration file for workflow parameters (YAML format)
-t transcripts.fasta, --transcript transcripts.fasta
Input transcript assembly file (.fa, .fasta)
-f reads.fq, --fastq reads.fq
Single-end RNA-seq reads (.fq, .fastq, .fq.gz, .fastq.gz)
-1 reads_1.fq, --left reads_1.fq
Left reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
-2 reads_2.fq, --right reads_2.fq
Right reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
-s samples.tab, --sampleSheet samples.tab
Sample metadata file for multi-sample analysis (tab-delimited)
-o optDNTRA_out, --outDir optDNTRA_out
Output directory for results (default: optDNTRA_out)
-r reference.fasta, --reference reference.fasta
Reference transcriptome for comparison (.fa, .fasta)
-se, --singleEnd Set input data as single-end reads (default: paired-end)
-ss {F,R,RF,FR}, --ss-lib-type {F,R,RF,FR}
Library strand specificity: F/R (single-end), RF/FR (paired-end)
--trim Enable adapter trimming and quality filtering
--qc Enable quality control reports for input reads
--buscoAsmt Enable BUSCO completeness assessment
--omarkAsmt Enable OMArk protein domain assessment
--emapperAnno Enable functional annotation with eggNOG-mapper
-v, --verbose Enable verbose logging output
-p THREADS, --threads THREADS
Number of CPU threads to use (default: 1)
--snakemakeOptions SNAKEMAKEOPTIONS
Additional Snakemake options (e.g., '--dryrun --rerun-incomplete')
Thank you for using optDNTRA (Optimization of De Novo Transcriptome RNA-seq Assembly)!
For a practical usage scenario, consider the following command to run optDNTRA on paired-end reads with trimming and quality control enabled:
optDNTRA.py \
--config defaults.yml \
--transcript transcripts.fasta \
--left left_reads.fq \
--right right_reads.fq \
--outDir optDNTRA_out \
--trim \
--qc \
--threads 8Upon completion, optDNTRA generates several output files in the specified directory. The key output files include:
-
$outDir/results/02-optimization/03-transEvidence/transcript.flt.final.fa- Optimized transcript assemblies with low-quality or redundant sequences removed, providing a refined set of transcripts for downstream analysis. -
$outDir/results/02-optimization/03-transEvidence/transcript.flt.final.pep- Protein FASTA file derived from optimized transcript assemblies. -
$outDir/results/03-assessment/Busco/(if enabled) - Contains completeness scores of the transcriptome based on Benchmarking Universal Single-Copy Orthologs. -
$outDir/results/03-assessment/Omark/(if enabled) - Contains functional assessment results. -
$outDir/results/04-annotation/transAsm.emapper.annotations(if enabled) - EggNOG-mapper functional annotations mapped to transcripts.
All output files are organized in a structured results folder, ensuring easy access to quality control metrics, optimized transcripts, and assessment data for further analysis or reporting.
If you use optDNTRA in your research, please cite:
- Xue Hao-Chen, Xu Zhou-Geng, Liu Yu-Jie & Wang Jia-Wei. (2025). A unified cell atlas of vascular plants reveals cell-type foundational genes and accelerates gene discovery. Cell. DOI: 10.1016/j.cell.2025.07.036