General repository to share code
The general idea of this repository is just to have a central place for members of ICG to share and collaborate on code. To make it easy for people to find code useful for them, please provide a short description of what each of your scripts can do
Make sure that before you push any of your local commits to this repository, do a git pull first!! This can prevents all sorts of problems and branch conflicts etc.
Commonly used perun submission scripts for particular softwares can be found under perun_scripts
compare_assemblies.py - Visually compare sequence similarity between contigs of two assemblies
repeatmaskerOut_to_gff3.sh - Convert RepeatMasker .out outputfile to GFF3 format with colors for IGV
add_intron_features.py - Add intron features to a GFF3 file that doesn't already have them explicitly defined
add_intergenic_space_features.py - Add intergenic space features to a GFF3 file that doesn't already have them explicitly defined
fix_genes_with_false_introns.py - Flag introns with poor RNAseq splicing coverage, remove them and re-predict gene, mRNA, exon, CDS features in regions where the false introns used to be using a simple ORF finder
aa_recoding.pl - Recode your amino acid coding alignments into Dayhoff4, Dayhoff6, SR4, HP
alignment_pruner.pl - Remove the most compositionally heterogeneous sites of your alignment
count_tripartitions.py - Decompose a tree or set of trees (e.g. bootstrap trees or MCMC trees) into a set of tripartitions/nodes that are counted
parse_tripartition_counts.py - Read tripartition files generated by count_tripartitions.py to check counts for certain tripartitions of interest
tre_make_splits.pl - Decompose a tree or set of trees (e.g. bootstrap trees or MCMC trees) into a set of splits/bipartitions that are counted
parseSplitcounts.pl - Read split files generated by tre_make_splits.pl to check counts for certain splits of interest
tre_discordance_two.pl - Compare two sets of splits, generated by tre_make_splits.pl, and compute a phylogenetic discordance score
render_tree.py - Read a newick tree file. Root it, give clades colors, give it a title, export to PDF, PNG or JPEG
concatenateRenameAlignment.pl - Concatenate a set of protein multiple sequence alignments into a concatenated supermatrix alignment
plot_pb_stats.R - Parse the PhyloBayes-MPI trace files and output Tracer-like plots for log likelihood, tree length, alpha parameter and number of categories
TreeFINISHER_ete3_v1-1.py - (Not finished nor well tested!) Use ETE3 kit to process input newick tree: colour branches by taxonomic group, rename taxa, and output as SVG.
colorFastq.pl - Print a FASTQ file to your terminal screen, but phred symbols become colored bars
NCBI_genome_download.py - Download genomes from NCBI genbank by giving a list of genome accession numbers
NCBI_genome_download_using_centrifuge.sh - using centrifuge-download command to download NCBI genomes, based on domain(s)
getCodingDensity.pl - Read a genbank file of a genome and estimate its coding density
getIntergenicSpace.pl - Read a genbank file of a genome and estimate its average intergenic space
getRecords.pl - Use NCBIs E-Utilities to request and receive a GenBank record on the command line
fastaNamesSizes.pl - Calculate the length of every entry in the FASTA file, as well as mean length and standard deviation
sequenceCutter.pl - Select a contig or sequence in your FASTA file, and extract a subsequence, discarding the rest
splitMultiFasta.pl - Splits a single multiFASTA file into multiple singleFASTA files
calcCARSC.py - Calculate the mean number of Carbon Atoms per Residue Side Chain in a protein FASTA file
calcNARSC.py - Calculate the mean number of Nitrogen Atoms per Residue Side Chain in a protein FASTA file
Remove_short_contigs_fasta_files_in_a_fold.py - Remove short contigs for all FASTA files in a fold
Extract_contigs_left2remains.py - Extract desired contigs & the rest contigs go to another file