This repo contains the RNASeq analysis tools we pakcaged in cwl script. Below is a list of all the tools we currently have and the input and output names for cwl scripts.
The best effort has made to standardise input and output names but there is still plenty room for improvement. Suggestions and comments are very welcome!
-
inputs:
input_script: R ballgown script.
tablemaker_output: directory of the tablemaker output
metadata: metadata csv file.
condition: string giving the condition of interest in the metadata columnsoutputs:
DGE_res.csv
DTE_res.csv -
inputs:
output: name for output directory
threads: number of cores to use
label: the factors of the condition of interest
FDR: the FDR rate to which to label as significant, could default to 0.05.
merged_gtf: merged gtf annotation file in gtf format
condition1_files: cxb files of all samples for condition 1. cuffquant output
condition2_files: cxb files of all samples for condition 2. cuffquant outputoutputs:
output: names after inputs.output. directory containing all cuffdiff output filescan add gene_exp.diff output only.
-
inputs:
gtf: annotation file in gtf format
output: name for output directory
threads: number of threads to use.
bam: bam fileoutputs:
output: named after output_dir. directory with all cufflink file outputs
transcripts.gtf -
inputs:
output: string stating the output directory name
gtf: annotation file in gtf format
threads: number of threads to use.
fasta: fasta file used in indexing
cufflinks_output: transcripts.gtf files generated by cufflinksoutputs:
output: output directory named after input.output
merged.gtf -
inputs:
output: name for output directory
threads: number of threads to use.
merged_gtf: merged gtf annotation file in gtf format
condition1_files: cxb files of all samples for condition 1. cuffquant output
condition2_files: cxb files of all samples for condition 2. cuffquant outputoutputs:
output: named after inputs.output. file contained normalised gene count matrix -
inputs:
output: name for output directory
threads: number of threads to use.
merged_gtf: merged gtf annotation file in gtf format
bam: bam fileoutputs:
output: named after inputs.output. Directory containing all quant files
abundances.cxb -
inputs:
input_script: R DESeq2 script.
count_matrix: gene count matrix file
metadata: metadata csv file.outputs:
DGE_results.csv -
inputs:
input_script: R DEXSeq script
count_matrix: exon count matrix file
gff: directory containing gff file from htseq prepare
metadata: metadata csv file.
threads: number of threads to useoutputs:
DEE_results.csv -
inputs:
input_script: R edger script
condition: string giving the condition of interest in the metadata columns
count_matrix: gene counts matrix
metadata: metadata csv file.outputs:
DGE_res.csv -
inputs:
input_script: R featurecounts script
bam_files: all bam files
gtf: annotation in ftd format
threads: number of threads to use.
metadata: used to see the libType of each sample.outputs:
gene_count_matrix.csv -
inputs:
input_script: R fgsea input_script
de_results: DGE res file
gene_set: file containing gene set information in long form.ouputs:
gsea_res.csv -
inputs:
threads: number of threads
index_directory: path to hisat2 index
first_pair: first fastq file
second_pair: second fastq file
output: output name
XSTag: Tag to use ("--dta" or "--dta-cufflinks")outputs:
hisat2_align_out: everythingsam_output: sam files -
inputs:
fasta: fasta file
threads: number of threads to use.
output: name to use for outputoutputs:
output: samfile with a basename given from inputs.output
log -
inputs:
input_script: python htseq_count script
pairedend: logical
stranded: logical
input_format: bam or sam
sorted_by: pos
gff: gff file from htseq_annotation
bam: bam or sam file
outname: name for outfileoutputs:
outname: named after input.ouput -
inputs:
input_script: python input script
gtf: gtf annotation file
gff_name: name for gff outputoutputs:
gff_name: gff file named after inputs.gff_name -
inputs:
input_script: R HyperGeo_Script script
de_res: DGE res file
gene_set: file containing gene set information in long form.outputs:
hypergeo_res.csv -
inputs:
gtf: gtf file
output: output name for output directoryoutputs:
index_dir: output directory named after inputs.index_dir -
inputs:
threads: number of threads to use
lib_type: pairedend or notchange to
pairedendor change all other tolib_typeindex_directory: Directory from miso_index output
bam: bam file
read_len: read length
gtf: gtf file
min_exon_size: minimum exon size to use
output: output nameoutputs:
out_dir: directory named after inputs.out_dir -
inputs:
input_script: python prepDE script
stringtie_out: gtf files from stringtieoutputs:
gene_count_matrix.csvtranscript_count_matrix.csv -
inputs:
input_script: R count script
gtf: annotation file in gtf format
metadata: metadata.csv
quant_results: directory with subdirectorys of outputs fromsalmon_quantoutputs:
gene_abundance_matrix.csv
gene_count_matrix.csv* this one used for DGE analysis e.g. DESeq2
gene_length_matrix.csv -
inputs:
fasta: fasta files
index_type: 2 different type of running "fmd" or "quasi"
threads: number of threads to use
output: directory output nameoutputs:
index_name: output directory with name inputs.index_name -
inputs:
index_directory: directory of salmon index
threads: number of threads to use
output: output name
first_end_fastq: if paired end. first pair fastq file
second_end_fastqif paired end. second pair fastq file
single_fastq: if single end. single fastq fileoutputs:
out_dir: Directory with name of inputs.out_dir -
inputs:
action: samtools command (default in sort)
sortby: how to sort the bam file
threads: how many threads to use (total - 1), it is additional
samfile: sam file
outfilename: ouutput file nameoutputs:
outfilename: bam file with name inputs.outfilename -
inputs:
threads: number of threads
Mode: how to run STAR (use genomeGenerate)
output: directory name to use
fasta: fasta files
gtf: gtf fileoutputs:
index_out: index files -
inputs:
threads: number of threads to use
genomeDir: directory name to use
readFilesIn: fastq files
outFileNamePrefix: output name for directoryoutputs:
star_read_out: every output filesam_output: sam file -
inputs:
bam: bam file
threads: number of threads to use.
gtf: gtf file
output: output name for fileoutputs:
outfilename: output file name with name inputs.outfilename -
inputs:
threads: number of threads to use.
merged_gtf: merged gtf from cuffmerge
bam: bam files to use
output: output name for directory.outputs:
output: Directory with name inputs.outputs