Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A Shiny app for dual and bulk RNA-sequencing analysis

License

Notifications You must be signed in to change notification settings

inDAGOverse/inDAGO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

184 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

inDAGO

favicon-96x96

A Shiny app for dual and bulk RNA‑sequencing analysis

👀 Overview

inDAGO supports both dual and bulk RNA-seq workflows within a single, user-friendly Shiny interface.

For dual RNA-seq, users can choose between two alignment strategies:

  • Sequential mapping — reads are mapped separately to each reference genome
  • Combined mapping — reads are aligned once to a merged reference genome
📊 See dual RNA-seq workflow
Dual RNA-seq workflow

Figure: Overview of the inDAGO dual RNA-seq workflow. The workflow supports both sequential and combined mapping approaches and consists of seven steps. Steps 1, 2, 5, 6, and 7 are common to both approaches, whereas Steps 3 and 4 differ.

Step 1: Quality control of raw mixed reads (organism A + organism B, FASTQ format) using the Biostrings and ShortRead packages; visualizations are produced with ggplot2 and custom R scripts.
Step 2: Filtering of raw mixed reads using Biostrings and ShortRead.
Step 3: Genome indexing of reference sequences (FASTA) performed with Rsubread. In the sequential approach, each organism is indexed separately; in the combined approach, a concatenated genome is indexed once.
Step 4: Alignment of filtered reads, manipulation of SAM/BAM files, and in-silico discrimination of mixed transcripts using Rsubread, Rsamtools, and base R functions. The sequential approach performs two mappings (one per organism), while the combined approach performs a single mapping followed by computational read separation.
Step 5: Assignment and summarization of mapped reads for each organism using Rsubread.
Step 6: Exploration of summarized counts through statistical and graphical analysis using ggplot2, pheatmap, Hmisc, and RNAseQC.
Step 7: Identification of differentially expressed genes (DEGs) with edgeR and HTSFilter.

The two organism genomes are shown in different colors (yellow and blue). When the organisms are analyzed separately, their genomes are also displayed separately in the workflow as distinct block lines; otherwise, they are shown as connected.
📊 See bulk RNA-seq workflow
Bulk RNA-seq workflow

Figure: Overview of the inDAGO bulk RNA-seq workflow. The bulk RNA-seq workflow follows seven key steps, mirroring the dual workflow but focused on a single organism.

Step 1: Quality control of raw reads.
Step 2: Filtering of low-quality sequences.
Step 3: Genome indexing of the reference genome (FASTA).
Step 4: Alignment of reads to the reference.
Step 5: Summarization of mapped reads by biological unit (e.g., gene).
Step 6: Statistical exploration and visualization of read counts.
Step 7: Identification of differentially expressed genes (DEGs).

The bulk RNA-seq workflow uses the same core set of R packages as the dual pipeline, ensuring consistency and reproducibility across analyses.

The interface walks you step‑by‑step through the entire analysis, from raw reads to publication‑ready plots, and lets you:

  • Download intermediate results at each step
  • Export high‑quality figures directly for your manuscript

Thanks to optimized, parallelized code, inDAGO runs efficiently on a standard laptop (16 GB RAM), so you don’t need access to a high‑performance cluster.

🔧 Key Modules

  1. Quality Control
    Generates quality control metrics and graphical plots.
📊 See plots
Quality Control Module Outputs

Figure: Quality Control Module Outputs. This figure presents key quality control plots generated by inDAGO: (A) average base quality line plot; (B) sequence length distribution; (C) GC content distribution across reads; (D) base quality boxplot showing average and variation per base position; (E) base composition line plot; and (F) base composition area chart across the dataset. Together, these visualizations provide a comprehensive assessment of the sequencing quality and the overall characteristics of the raw read data.

  1. Sequence Pre‑processing
    Read trimming, low‑quality filtering, and adapter removal
  2. Genome indexing
    Index genome or genomes according to the selected approach (bulk or dual RNA‑seq)
  3. Reference‑based Alignment
    Align reads according to the selected approach (bulk or dual RNA‑seq)
  4. Read Count Summarization
    Generate gene or transcript level count matrices
  5. Exploratory Data Analysis
    PCA, MDS, heatmaps, and more.
📊 See plots
Exploratory Data Analysis

Figure: Exploratory Data Analysis Module Outputs. This figure presents key exploratory data analysis plots generated by inDAGO: (A) Principal Component Analysis (PCA) plot; (B) Multi-Dimensional Scaling (MDS) plot; (C) gene expression boxplot; (D) library size bar plot; (E) gene expression heatmap; (F) correlation heatmap; and (G) saturation plot. Together, these visualizations provide a comprehensive overview of the exploratory data analysis results and the underlying characteristics of the count data.

  1. Differential Expression Genes (DEGs) analysis
    Identify differentially expressed genes/transcripts across comparisons
📊 See plots
Differential Expression Genes (DEGs) analysis

Figure: Differential Expression Gene (DEG) Module Outputs. This figure presents key DEGs analysis plots generated by inDAGO: (A) volcano plot; and (B) UpSet plot. Together, these visualizations provide a comprehensive overview of the differential expression analysis results and highlight key transcriptional changes between conditions.



💻 INSTALLATION GUIDE: R AND RSTUDIO

1. Install R

Official site: CRAN R Project

OS Command or Link
Windows Download R for Windows and run the .exe installer.
macOS Download R for macOS and run the .pkg installer.

2. Install RStudio (Posit Desktop)

Official site: Posit RStudio Desktop

OS Command or Link
Windows Download the .exe installer and run it.
macOS Download the .dmg installer and drag RStudio into Applications.

3. Verify installation

R --version
Rscript -e 'cat(R.version.string, "\n")'
💻 INSTALLATION GUIDE: INDAGO

How to install inDAGO from CRAN or GitHub

Install the Bioconductor dependencies

# Install Bioconductor dependencies if you don't have them yet
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
bioc_pac <- c(
  "XVector",
  "ShortRead",
  "S4Vectors",
  "rtracklayer",
  "Rsubread",
  "Rsamtools",
  "limma",
  "HTSFilter",
  "edgeR",
  "Biostrings",
  "BiocGenerics"
) 
for (pac in bioc_pac) {
  if (!requireNamespace(pac, quietly = TRUE))
    BiocManager::install(pac)
}

Install inDAGO from GitHub

#Install devtools if you don’t have it yet
if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")

# Install inDAGO
devtools::install_github("inDAGOverse/inDAGO")
# Install inDAGO
install.packages("inDAGO")
🚀 HOW TO LOAD AND LAUNCH THE APP
# Load and launch the app
library(inDAGO)
inDAGO::inDAGO()
⚙️ TIPS FOR A SEAMLESS EXECUTION

To ensure execution during long time-consuming steps such as reference‑based alignment:

💤 Disable sleep mode to keep your system active.

💡 Reduce screen brightness to save power.

These simple precautions can help avoid incomplete runs and unnecessary power consumption.

👥 AUTHORS & ACKNOWLEDGEMENTS

If you find this code useful in your research, please cite:

Aufiero G, Fruggiero C and D’Agostino N (2025) inDAGO: a user-friendly interface for seamless dual and bulk RNA-Seq analysis. Front. Bioinform. 5:1696823. doi: 10.3389/fbinf.2025.1696823

About

A Shiny app for dual and bulk RNA-sequencing analysis

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages