Extracts FASTA or FASTQ files from a bam file, with options to get output as pairs, singles or both
Runs blat in parallel on a multi-core machine.
Takes fasta file and interval file as inputs. Masks intervals in fasta file as upper or lower case
- fasta_extract_lc.pl
- fasta_extract_uc.pl
Takes two sets of blast tabular results. Separates contigs into two separate files, too close to call (min bit score difference) into a third file
no description
handy for checking "completeness" etc (eg tblastn protein seq against test genome, how many proteins are present at 70% completeness...)
no description
converts clc assembly_table -a full text alignments into a 3 column tabular format: reference CONTIGNAME START END
use instead of contig_stats.pl
no description
no description
for pulling out reads from a fasta or fastq file based on regexp or include lists, or exclude lists
Usage: -l [2|4] -p "pattern1" -p "pattern2" -p "pattern3"
same as fgrep, but written in perl. Search for fgrep.pl in thesis github
Uses a map file (old_valuenew_value) - replaces all instances of old_value with new_value in STDIN stream (delimited by spaces/punctuation). To do possibly - add -d delimiter option like in fgrep.pl
Check exact description
no description
outputs gff3 with only those contigs that are specified in a file
Test it. Check with tim. Compare to unix join
single linkage clustering. but, consider using mcl or some other clever clusterer
pick_random.pl = picks blocks of lines (4 for fastq, 8 for paired fastq, 2 for single line fasta, etc)
Removes blast tabular rows if they overlap given intervals
Merges sequence intervals (reorders so that st < en). Starts are 1-based not 0-based
no description
no description
For each taxon id, get all the gid (genbank id) from the gi_taxid dmp files. Look up https://www.wiki.ed.ac.uk/display/BlaxterLab/Local+blast+against+subsets+of+NCBI+databases+based+on+taxid for how to use
Generates basic stats for a list of numbers that can be piped in, e.g. less file.txt | awk_stats.sh -