icg-shared-scripts

General repository to share code

The general idea of this repository is just to have a central place for members of ICG to share and collaborate on code. To make it easy for people to find code useful for them, please provide a short description of what each of your scripts can do

Important!

Make sure that before you push any of your local commits to this repository, do a git pull first!! This can prevents all sorts of problems and branch conflicts etc.

Perun submission scripts

Commonly used perun submission scripts for particular softwares can be found under perun_scripts

Genome assembly

compare_assemblies.py - Visually compare sequence similarity between contigs of two assemblies

Genome annotation

repeatmaskerOut_to_gff3.sh - Convert RepeatMasker .out outputfile to GFF3 format with colors for IGV

add_intron_features.py - Add intron features to a GFF3 file that doesn't already have them explicitly defined

add_intergenic_space_features.py - Add intergenic space features to a GFF3 file that doesn't already have them explicitly defined

fix_genes_with_false_introns.py - Flag introns with poor RNAseq splicing coverage, remove them and re-predict gene, mRNA, exon, CDS features in regions where the false introns used to be using a simple ORF finder

Phylogenetics

aa_recoding.pl - Recode your amino acid coding alignments into Dayhoff4, Dayhoff6, SR4, HP

alignment_pruner.pl - Remove the most compositionally heterogeneous sites of your alignment

count_tripartitions.py - Decompose a tree or set of trees (e.g. bootstrap trees or MCMC trees) into a set of tripartitions/nodes that are counted

parse_tripartition_counts.py - Read tripartition files generated by count_tripartitions.py to check counts for certain tripartitions of interest

tre_make_splits.pl - Decompose a tree or set of trees (e.g. bootstrap trees or MCMC trees) into a set of splits/bipartitions that are counted

parseSplitcounts.pl - Read split files generated by tre_make_splits.pl to check counts for certain splits of interest

tre_discordance_two.pl - Compare two sets of splits, generated by tre_make_splits.pl, and compute a phylogenetic discordance score

render_tree.py - Read a newick tree file. Root it, give clades colors, give it a title, export to PDF, PNG or JPEG

concatenateRenameAlignment.pl - Concatenate a set of protein multiple sequence alignments into a concatenated supermatrix alignment

plot_pb_stats.R - Parse the PhyloBayes-MPI trace files and output Tracer-like plots for log likelihood, tree length, alpha parameter and number of categories

TreeFINISHER_ete3_v1-1.py - (Not finished nor well tested!) Use ETE3 kit to process input newick tree: colour branches by taxonomic group, rename taxa, and output as SVG.

Sequencing reads analysis

colorFastq.pl - Print a FASTQ file to your terminal screen, but phred symbols become colored bars

Download GenBank genomes

NCBI_genome_download.py - Download genomes from NCBI genbank by giving a list of genome accession numbers

NCBI_genome_download_using_centrifuge.sh - using centrifuge-download command to download NCBI genomes, based on domain(s)

Analyze GenBank files

getCodingDensity.pl - Read a genbank file of a genome and estimate its coding density

getIntergenicSpace.pl - Read a genbank file of a genome and estimate its average intergenic space

getRecords.pl - Use NCBIs E-Utilities to request and receive a GenBank record on the command line

FASTA file processing

fastaNamesSizes.pl - Calculate the length of every entry in the FASTA file, as well as mean length and standard deviation

sequenceCutter.pl - Select a contig or sequence in your FASTA file, and extract a subsequence, discarding the rest

splitMultiFasta.pl - Splits a single multiFASTA file into multiple singleFASTA files

calcCARSC.py - Calculate the mean number of Carbon Atoms per Residue Side Chain in a protein FASTA file

calcNARSC.py - Calculate the mean number of Nitrogen Atoms per Residue Side Chain in a protein FASTA file

Remove_short_contigs_fasta_files_in_a_fold.py - Remove short contigs for all FASTA files in a fold

Extract_contigs_left2remains.py - Extract desired contigs & the rest contigs go to another file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

icg-shared-scripts

Important!

Perun submission scripts

Genome assembly

Genome annotation

Phylogenetics

Sequencing reads analysis

Download GenBank genomes

Analyze GenBank files

FASTA file processing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
perun_scripts		perun_scripts
Extract_contigs_left2remains.py		Extract_contigs_left2remains.py
LICENSE		LICENSE
NCBI_genome_download.py		NCBI_genome_download.py
NCBI_genome_download_using_centrifuge.sh		NCBI_genome_download_using_centrifuge.sh
README.md		README.md
Remove_short_contigs_fasta_files_in_a_fold.py		Remove_short_contigs_fasta_files_in_a_fold.py
TreeFINISHER_ete3_v1-1.py		TreeFINISHER_ete3_v1-1.py
aa_recoding.pl		aa_recoding.pl
add_intergenic_space_features.py		add_intergenic_space_features.py
add_intron_features.py		add_intron_features.py
add_orfs_to_intergenic_regions.py		add_orfs_to_intergenic_regions.py
alignment_pruner.pl		alignment_pruner.pl
calcCARSC.py		calcCARSC.py
calcNARSC.py		calcNARSC.py
colorFastq.pl		colorFastq.pl
compare_assemblies.py		compare_assemblies.py
concatenateRenameAlignment.pl		concatenateRenameAlignment.pl
count_tripartitions.py		count_tripartitions.py
fastaNamesSizes.pl		fastaNamesSizes.pl
fix_genes_with_false_introns.py		fix_genes_with_false_introns.py
getCodingDensity.pl		getCodingDensity.pl
getIntergenicSpace.pl		getIntergenicSpace.pl
getRecords.pl		getRecords.pl
mafft_and_trimal.sh		mafft_and_trimal.sh
parseSplitcounts.pl		parseSplitcounts.pl
parse_tripartition_counts.py		parse_tripartition_counts.py
plot_pb_stats.R		plot_pb_stats.R
render_tree.py		render_tree.py
repeatmaskerOut_to_gff3.sh		repeatmaskerOut_to_gff3.sh
sequenceCutter.pl		sequenceCutter.pl
splitMultiFasta.pl		splitMultiFasta.pl
tre_discordance_two.pl		tre_discordance_two.pl
tre_make_splits.pl		tre_make_splits.pl

License

Bioinformatics-tools-collection/icg-shared-scripts

Folders and files

Latest commit

History

Repository files navigation

icg-shared-scripts

Important!

Perun submission scripts

Genome assembly

Genome annotation

Phylogenetics

Sequencing reads analysis

Download GenBank genomes

Analyze GenBank files

FASTA file processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages