Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

althonos
Copy link
Contributor

Hi again!

Overview

I noticed while addressing #100 that the GC% computed for genes in the reverse strand was often wrong by a small margin, despite the correct gene sequence being extracted, pointing at an indexing error. After checking for out-of-bound reads, I noticed that in calc_orf_gc the loop would read past sequence end in the following part:

      for(j = last[fr]+3; j <= nod[i].ndx; j+=3)
        gc[fr] += is_gc(seq, j) + is_gc(seq, j+1) + is_gc(seq, j+2);

Indeed, on the reverse strand, last[fr] is set to the index of the STOP codon; because for reverse-strand codon this is always the index of the last nucleotide, not the first, the iteration should start 1 nucleotide later, not 3.

Fix

Start the iteration at the right coordinates πŸ˜„

Example

Taking the same contig CAKWEX010000332.1 as in #100, I ran Prodigal on both the contig and its reverse complement; the genes predicted in both cases matched in sequences, but the GC% didn't match; namely, the GC content was wrong when the genes were on the reverse strand (i changed the gc_cont precision so that the difference is easier to see):

               gene strand    true_gc    fwd_gc    bwd_gc
CAKWEX010000332.1_1     -1  63.592526  0.635925  0.635925
CAKWEX010000332.1_2     -1  58.043118  0.582090  0.580431
CAKWEX010000332.1_3     -1  60.073260  0.601954  0.600733

After applying the fix, the GC-content is consistent independently of whether the gene is on the direct or reverse strand:

               gene strand    true_gc    fwd_gc    bwd_gc
CAKWEX010000332.1_1     -1  63.592526  0.635925  0.635925
CAKWEX010000332.1_2     -1  58.043118  0.580431  0.580431
CAKWEX010000332.1_3     -1  60.073260  0.600733  0.600733

@althonos
Copy link
Contributor Author

althonos commented Sep 7, 2023

@hyattpd : Do you think you could merge #100 and this PR as well? Thanks πŸ˜ƒ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant