optDNTRA: Optimization of De Novo Transcriptome RNA-seq Assembly

Overview

optDNTRA is a comprehensive command-line tool designed to optimize transcript assemblies generated by RNA-seq assemblers (e.g., Trinity). It streamlines data processing for both single-end and paired-end RNA-seq datasets and provides extensive quality control and assessment capabilities. The tool is fully integrated with Snakemake, ensuring reproducible and scalable workflows.

Installation

To install the required dependencies for optDNTRA, we recommend using Mamba, a fast and robust package manager that serves as a drop-in replacement for Conda. Mamba significantly accelerates the installation process and handles complex dependency resolution more efficiently.

To set up optDNTRA, first ensure you have Mamba installed. Then, clone the repository and create an environment as follows:

git clone https://github.com/zywu2002/optDNTRA.git
cd optDNTRA
mamba env create -f environment.yml
mamba activate optDNTRA
export PATH=/path/to/optDNTRA/directory:$PATH

Usage

Note

Before running the optDNTRA Snakemake pipeline, you need to configure the defaults.yml file. This file defines the necessary paths and parameters for the workflow.

Here's a summary of the available command-line arguments:

$ optDNTRA.py -h


usage: optDNTRA.py [-h] -c defaults.yml -t transcripts.fasta [-f reads.fq] [-1 reads_1.fq] [-2 reads_2.fq] [-s samples.tab] [-o optDNTRA_out] [-r reference.fasta] [-se] [-ss {F,R,RF,FR}] [--trim] [--qc] [--buscoAsmt] [--omarkAsmt] [--emapperAnno] [-v] [-p THREADS] [--snakemakeOptions SNAKEMAKEOPTIONS]


optDNTRA: Optimization of De Novo Transcriptome RNA-seq Assembly


For RNA-seq input data:
    If Paired-end:
        -1 <string>, --left <string>         Left reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
        -2 <string>, --right <string>        Right reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
    Or Single-end:
        -f <string>, --fastq <string>        Single-end RNA-seq reads (.fq, .fastq, .fq.gz, .fastq.gz)
    Or
        -s <string>, --sampleSheet <string>  Sample metadata file for multi-sample analysis (tab-delimited)
        Example:
            cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
            cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
            cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
            cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
        For single-end, remove the 4th column in the text file.


options:
  -h, --help            show this help message and exit
  -c defaults.yml, --config defaults.yml
                        Configuration file for workflow parameters (YAML format)
  -t transcripts.fasta, --transcript transcripts.fasta
                        Input transcript assembly file (.fa, .fasta)
  -f reads.fq, --fastq reads.fq
                        Single-end RNA-seq reads (.fq, .fastq, .fq.gz, .fastq.gz)
  -1 reads_1.fq, --left reads_1.fq
                        Left reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
  -2 reads_2.fq, --right reads_2.fq
                        Right reads for paired-end RNA-seq data (.fq, .fastq, .fq.gz, .fastq.gz)
  -s samples.tab, --sampleSheet samples.tab
                        Sample metadata file for multi-sample analysis (tab-delimited)
  -o optDNTRA_out, --outDir optDNTRA_out
                        Output directory for results (default: optDNTRA_out)
  -r reference.fasta, --reference reference.fasta
                        Reference transcriptome for comparison (.fa, .fasta)
  -se, --singleEnd      Set input data as single-end reads (default: paired-end)
  -ss {F,R,RF,FR}, --ss-lib-type {F,R,RF,FR}
                        Library strand specificity: F/R (single-end), RF/FR (paired-end)
  --trim                Enable adapter trimming and quality filtering
  --qc                  Enable quality control reports for input reads
  --buscoAsmt           Enable BUSCO completeness assessment
  --omarkAsmt           Enable OMArk protein domain assessment
  --emapperAnno         Enable functional annotation with eggNOG-mapper
  -v, --verbose         Enable verbose logging output
  -p THREADS, --threads THREADS
                        Number of CPU threads to use (default: 1)
  --snakemakeOptions SNAKEMAKEOPTIONS
                        Additional Snakemake options (e.g., '--dryrun --rerun-incomplete')

Thank you for using optDNTRA (Optimization of De Novo Transcriptome RNA-seq Assembly)!

Example Command

For a practical usage scenario, consider the following command to run optDNTRA on paired-end reads with trimming and quality control enabled:

optDNTRA.py \
 --config defaults.yml \
 --transcript transcripts.fasta \
 --left left_reads.fq \
 --right right_reads.fq \
 --outDir optDNTRA_out \
 --trim \
 --qc \
 --threads 8

Output

Upon completion, optDNTRA generates several output files in the specified directory. The key output files include:

$outDir/results/02-optimization/03-transEvidence/transcript.flt.final.fa - Optimized transcript assemblies with low-quality or redundant sequences removed, providing a refined set of transcripts for downstream analysis.
$outDir/results/02-optimization/03-transEvidence/transcript.flt.final.pep - Protein FASTA file derived from optimized transcript assemblies.
$outDir/results/03-assessment/Busco/ (if enabled) - Contains completeness scores of the transcriptome based on Benchmarking Universal Single-Copy Orthologs.
$outDir/results/03-assessment/Omark/ (if enabled) - Contains functional assessment results.
$outDir/results/04-annotation/transAsm.emapper.annotations (if enabled) - EggNOG-mapper functional annotations mapped to transcripts.

All output files are organized in a structured results folder, ensuring easy access to quality control metrics, optimized transcripts, and assessment data for further analysis or reporting.

Citation

If you use optDNTRA in your research, please cite:

Xue Hao-Chen, Xu Zhou-Geng, Liu Yu-Jie & Wang Jia-Wei. (2025). A unified cell atlas of vascular plants reveals cell-type foundational genes and accelerates gene discovery. Cell. DOI: 10.1016/j.cell.2025.07.036

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
test_data		test_data
workflow		workflow
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
defaults.yml		defaults.yml
environment.yml		environment.yml
optDNTRA.py		optDNTRA.py
optDNTRA_pipeline.png		optDNTRA_pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

optDNTRA: Optimization of De Novo Transcriptome RNA-seq Assembly

Overview

Installation

Usage

Example Command

Output

Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

zywu2002/optDNTRA

Folders and files

Latest commit

History

Repository files navigation

optDNTRA: Optimization of De Novo Transcriptome RNA-seq Assembly

Overview

Installation

Usage

Example Command

Output

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages