Thanks to visit codestin.com
Credit goes to github.com

Skip to content

seqtk telo Add possibility for usage of fractions in penalty #222

@Ardnij123

Description

@Ardnij123

Hello!
I think it would be nice to be able to specify penalty as a fractional number.

I am creating a tool for subtelomere query in humans by querying telomeres and then using reverse complement of the bed file. When using seqtk telo with default parameters on for instance chr14_MATERNAL of assembly hg002v1.1 it correctly identifies first 1649 bases as telomere sequences but I am left with a following sequence [1]. I think we all agree the first 4 rows are certainly still telomeric repeats, maybe even up to first 12 rows.

My proposed solution is to either:

  1. add a possibility of specification penalty as a fractional number, or
  2. add a (integer) parameter specifiing the value added to score on matching motif to the sequence.

To illustrate possible implementation, I will start with changes needed to implement the second proposed solution. This version of solution is quite simple, it would suffice to add the parameter and fill it into all places, where addition of 1 from matching motif is in the present time. This would allow to use a fraction of penalty/matching score (while rescaling max drop and min score) as a new fractional penalty.

The other proposed solution could be implemented in similiar way. It would suffice to read the fractional number in penalty and rescale parameters specified above so that the penalty would turn into integer (e.g. by multiplying all parameters by 100 and flooring them).

From what I remember from our C classes (haha, joke, C does not have classes), the proposed solutions should have low to no impact on program speed as it would still work on integers.

[1]: start of sequence after telomere removal

>chr14_MATERNAL_START
ACCCCAACCCTAACCCCAACCCCAACCCTAACCCCAACCCCAACCCCAACCCCAACCCCA
ACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCAACCCCAACCCCAACCCCA
ACCCTAACCCTAACCCTAACCCTAACCCCAACCCTAACCCTAACCCTAACCCTAACCCCA
ACCCTAACCCCAACCCTAACCCTAACCCCAACCCTAACCCCAACCCTAACCCTAACCCCA
ACCCTAACCCTAACCCCAACCCGAACCCTAACCCCAACCCGAACCCTAACCCCAACCCGA
ACCCGAACCCCAACCCCGACCCCGACCCCGACCCCGACCCCGACCCCGACCCCGACCCCG
ACCCCGACCCCGACCCCGACCCCGACCCCGACCCTAACCCCGACCCCGACCCCGACCCCG
ACCCCGACCCCGACCCTGACCTTAACCCCCTAACCCTGACTTTAACCCCCTAACCCTGAC
CCTGACCCTGACCCTGACCCTGACCCTGACCCTGACCCTGACCTTAACCCCCTAACCCTG
ACCTTAACCCCCTAACCCTAACGCTGACCCTAACCCTGACCCTGACCCTGACCCTGACCC
TGACCCTAACCCGAACCCGAACCCGAACCCTAACCCTGACCCTGACCCTGACCCTGACCC
TGACCCTCACCCTCACCCTCACCCTCACCCTCACCCTCACCCTCACCCTCACTCTGACCC
TAACTGCCATGCATCAGGGTCAGGGGGAGGTCTTGTTACAACACAGATCTGCGGATCTCC
GGGGCTTGATTGTGGCAAGGATGCTGCTGGTGTCAAAACCACAACGTGGGAACCACAGAA
CCACTGGTTGGTTTTCAGTATTTCAGTGTATACAATTCCTAATATATCTGGCCAAGAAAA
CTTGTAAGTTCTTAGATTGTCCCAAAGGTGGCGCATGAAATTAAATCAGGAGAACAGTTT
CCTACGAGGTGTAGCCTGGGAACGTTGGGGGTGACTGATGGAAAGGAGGAGTGAAGCTCC
GCCCTTTCCGCTGCGAGGCTGCGCCCGAGGCTATTTAAACCCACCCTGGCTGGCCTGTAC
TCAGATCTTCGCAGAGCGGAGCAGCGGCCGGAGCGTTTGGAGGACTCTGCCTGGACTTGG
AGCTCACAGCGTCTTGCGACTTGGAAGCGGATTCAGAGGACAGGACAGAACACTTGGGCA
AGTGAATCTCTGTCTGTCTGTCTGTCTCATTGGTTGGTTTATTTCCATTTTCTTAAGGAG
CACATACCTCACACCACACACACACAAACACACACACACACACACATGCACGCACGCGCA
CACACACACACACACACACACACACACACACACACACTAATTCATTCTGCGGGTTAGGAA
ATTAGTAGGGGTCTCTGGGAGCTGCAGGTTTCCTAATCATGTCTGCACCTAAGAACAGTA
GGGTCTTGTCTGGCTCTTCTTATGAACGGTCCCCCAGCCCGGACTCCCCAAGATCCATGC
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions