A repo for tools and workflows used by RAWG

This repo contains the RNASeq analysis tools we pakcaged in cwl script. Below is a list of all the tools we currently have and the input and output names for cwl scripts.

Mini documentation for each tools

The best effort has made to standardise input and output names but there is still plenty room for improvement. Suggestions and comments are very welcome!

Specify the inputs and outputs for each cwl script

ballgown

inputs:

input_script: R ballgown script.
tablemaker_output: directory of the tablemaker output
metadata: metadata csv file.
condition: string giving the condition of interest in the metadata columns

outputs:

DGE_res.csv
DTE_res.csv
cuffdiff

inputs:

output: name for output directory
threads: number of cores to use
label: the factors of the condition of interest
FDR: the FDR rate to which to label as significant, could default to 0.05.
merged_gtf: merged gtf annotation file in gtf format
condition1_files: cxb files of all samples for condition 1. cuffquant output
condition2_files: cxb files of all samples for condition 2. cuffquant output

outputs:

output: names after inputs.output. directory containing all cuffdiff output files

can add gene_exp.diff output only.
cufflinks

inputs:

gtf: annotation file in gtf format
output: name for output directory
threads: number of threads to use.
bam: bam file

outputs:

output: named after output_dir. directory with all cufflink file outputs
transcripts.gtf
cuffmerge

inputs:

output: string stating the output directory name
gtf: annotation file in gtf format
threads: number of threads to use.
fasta: fasta file used in indexing
cufflinks_output: transcripts.gtf files generated by cufflinks

outputs:

output: output directory named after input.output
merged.gtf
cuffnorm

inputs: output: name for output directory
threads: number of threads to use.
merged_gtf: merged gtf annotation file in gtf format
condition1_files: cxb files of all samples for condition 1. cuffquant output
condition2_files: cxb files of all samples for condition 2. cuffquant output

outputs:

output: named after inputs.output. file contained normalised gene count matrix
cuffquant

inputs:

output: name for output directory
threads: number of threads to use.
merged_gtf: merged gtf annotation file in gtf format
bam: bam file

outputs:

output: named after inputs.output. Directory containing all quant files
abundances.cxb
DESeq2

inputs: input_script: R DESeq2 script.
count_matrix: gene count matrix file
metadata: metadata csv file.

outputs:

DGE_results.csv
DEXSeq

inputs:

input_script: R DEXSeq script
count_matrix: exon count matrix file
gff: directory containing gff file from htseq prepare
metadata: metadata csv file.
threads: number of threads to use

outputs:

DEE_results.csv
edger

inputs:

input_script: R edger script
condition: string giving the condition of interest in the metadata columns
count_matrix: gene counts matrix
metadata: metadata csv file.

outputs:

DGE_res.csv
featurecounts

inputs:

input_script: R featurecounts script
bam_files: all bam files
gtf: annotation in ftd format
threads: number of threads to use.
metadata: used to see the libType of each sample.

outputs: gene_count_matrix.csv
fgsea

inputs:

input_script: R fgsea input_script
de_results: DGE res file
gene_set: file containing gene set information in long form.

ouputs: gsea_res.csv
hisat2_align

inputs:

threads: number of threads
index_directory: path to hisat2 index
first_pair: first fastq file
second_pair: second fastq file
output: output name
XSTag: Tag to use ("--dta" or "--dta-cufflinks")

outputs:

hisat2_align_out: everything sam_output: sam files
hisat2_build

inputs:

fasta: fasta file
threads: number of threads to use.
output: name to use for output

outputs:

output: samfile with a basename given from inputs.output
log
htseq_count

inputs:

input_script: python htseq_count script
pairedend: logical
stranded: logical
input_format: bam or sam
sorted_by: pos
gff: gff file from htseq_annotation
bam: bam or sam file
outname: name for outfile

outputs:

outname: named after input.ouput
htseq_prepare

inputs:

input_script: python input script
gtf: gtf annotation file
gff_name: name for gff output

outputs:

gff_name: gff file named after inputs.gff_name
hypergeo

inputs:

input_script: R HyperGeo_Script script
de_res: DGE res file
gene_set: file containing gene set information in long form.

outputs:

hypergeo_res.csv
miso_index

inputs:

gtf: gtf file
output: output name for output directory

outputs:

index_dir: output directory named after inputs.index_dir
miso_run

inputs:

threads: number of threads to use
lib_type: pairedend or not

change to pairedend or change all other to lib_type

index_directory: Directory from miso_index output
bam: bam file
read_len: read length
gtf: gtf file
min_exon_size: minimum exon size to use
output: output name

outputs:

out_dir: directory named after inputs.out_dir
prepDE

inputs:

input_script: python prepDE script
stringtie_out: gtf files from stringtie

outputs:

gene_count_matrix.csv transcript_count_matrix.csv
salmon_count

inputs:

input_script: R count script
gtf: annotation file in gtf format
metadata: metadata.csv
quant_results: directory with subdirectorys of outputs from salmon_quant

outputs:

gene_abundance_matrix.csv
gene_count_matrix.csv * this one used for DGE analysis e.g. DESeq2
gene_length_matrix.csv
salmon_index

inputs:

fasta: fasta files
index_type: 2 different type of running "fmd" or "quasi"
threads: number of threads to use
output: directory output name

outputs:

index_name: output directory with name inputs.index_name
salmon_quant

inputs:

index_directory: directory of salmon index
threads: number of threads to use
output: output name
first_end_fastq: if paired end. first pair fastq file
second_end_fastq if paired end. second pair fastq file
single_fastq: if single end. single fastq file

outputs:

out_dir: Directory with name of inputs.out_dir
samtools

inputs:

action: samtools command (default in sort)
sortby: how to sort the bam file
threads: how many threads to use (total - 1), it is additional
samfile: sam file
outfilename: ouutput file name

outputs:

outfilename: bam file with name inputs.outfilename
STAR_index

inputs:

threads: number of threads
Mode: how to run STAR (use genomeGenerate)
output: directory name to use
fasta: fasta files
gtf: gtf file

outputs:

index_out: index files
STAR_readmap

inputs:

threads: number of threads to use
genomeDir: directory name to use
readFilesIn: fastq files
outFileNamePrefix: output name for directory

outputs:

star_read_out: every output file sam_output: sam file
stringtie

inputs:

bam: bam file
threads: number of threads to use.
gtf: gtf file
output: output name for file

outputs:

outfilename: output file name with name inputs.outfilename
tablemaker

inputs:

threads: number of threads to use.
merged_gtf: merged gtf from cuffmerge
bam: bam files to use
output: output name for directory.

outputs:

output: Directory with name inputs.outputs

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
scripts		scripts
tests		tests
tools		tools
workflows		workflows
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
generate_tests.sh		generate_tests.sh
requirements.txt		requirements.txt

License

rawgene/cwl

Folders and files

Latest commit

History

Repository files navigation

A repo for tools and workflows used by RAWG

Mini documentation for each tools

Specify the inputs and outputs for each cwl script

ballgown

cuffdiff

cufflinks

cuffmerge

cuffnorm

cuffquant

DESeq2

DEXSeq

edger

featurecounts

fgsea

hisat2_align

hisat2_build

htseq_count

htseq_prepare

hypergeo

miso_index

miso_run

prepDE

salmon_count

salmon_index

salmon_quant

samtools

STAR_index

STAR_readmap

stringtie

tablemaker

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages