Codestin Search App

scoresite README

Scoresite searches FASTA formatted DNA sequences for regulator sites defined by a case matrix. Case matrices consist of a template and the probability that a given base will be found at that position.

For every substring of the input sequence of the same length as the template, scoresite:

generates both a forward and reverse complement sequence to test against the suppliced template/matrix, and for each:
calculates the number of mismatches from the template.
calculates the score by adding together the probabilities of the actual bases found and dividing that sum by the length of the template.
For both the forward and reverse complement sequences, if the number of mismatches is less that the maximum mismatches specified, AND/OR the score is greater than the minimum score specified, it prints out the "hit." The choice between AND and OR depends on what is specified in the config file for that regulator, or if it is overridden on the command line.

The "hit" output format is: regulator name: As specified in the config file, e.g. "re1-jizu". actual sequence: Sequence of hit from file. score: Between 0.0 and 1.0. mismatches: Mismatches between sequence and template. position: Position of "leftmost" base on either strand. strand: Which strand (plus or minus) the hit was on. FASTA header: The header of the input sequence the hit was in.

NOTES

Calculation of mismatches and scores are de-coupled. There is no validation of the case matrix read from the config file, i.e. you could create completely inconsistent matrices and scoresite will use them. The useful aspect of this behavior involves the use of IUPAC ambiguity codes in the template, where you can define what to consider a mismatch, but still encode preferences in the case matrix probabilities. E.g., you could specify a K in the template (which encodes G or T) and specify different score preferences: 0.60/0.40 : G/T. One could even use K and yet specify a set of partially inconsistent probabilities: 0.40/0.40/0.20: G/T/C.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
test		test
.gitignore		.gitignore
AUTHORS		AUTHORS
COPYING		COPYING
INSTALL		INSTALL
README		README
README.md		README.md
TODO		TODO
argtable-2.4.tar.gz		argtable-2.4.tar.gz
bad.conf		bad.conf
conf_file.c		conf_file.c
defunct.conf		defunct.conf
freq_norm.c		freq_norm.c
makefile		makefile
parse_args.c		parse_args.c
regood.conf		regood.conf
regulators.c		regulators.c
scoresite.c		scoresite.c
scoresite.conf		scoresite.conf
scoresite.h		scoresite.h
transfac.conf		transfac.conf
utils.c		utils.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

scoresite README

NOTES

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

jhover/scoresite

Folders and files

Latest commit

History

Repository files navigation

scoresite README

NOTES

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages