Thanks to visit codestin.com
Credit goes to github.com

Skip to content

QUAST: add ANI column#3091

Merged
vladsavelyev merged 5 commits into
MultiQC:mainfrom
schorlton-bugseq:assembly_ani
Feb 21, 2025
Merged

QUAST: add ANI column#3091
vladsavelyev merged 5 commits into
MultiQC:mainfrom
schorlton-bugseq:assembly_ani

Conversation

@schorlton-bugseq

Copy link
Copy Markdown
Contributor

When comparing a genome against a reference with QUAST, it is helpful to know how similar that genome is. QUAST outputs this in a "# mismatches per 100 kbp" and "# indels per 100 kbp" field. Here we combine these two fields to calculate an average nucleotide identity for a new column, which is simpler and more interpretable.

I acknowledge that sequence identity is complicated and has different meanings (https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity). However, the choice is made easier given the outputs available from QUAST. Here I implement the gap-compressed identity, which the author of the above blog finds most compelling. Although it slightly differs from the definition of BLAST ANI used in the literature for prokaryotic species definition (non-gap-compressed), I think it is sufficient for most applications. The description also explicitly states it is gap-uncompressed.

Thanks for your consideration!

image

  • This comment contains a description of changes (with reason)

@vladsavelyev

Copy link
Copy Markdown
Contributor

Thanks for the suggestion!

Having this metric sounds reasonable, but should it rather be a part of QUAST? Their team is very responsive, if you want to take a stab in a pull request there.

For MultiQC, we can attempt to parse it from the QUAST report, and for older versions we could keep your code here to calculate it if it's missing.

@vladsavelyev vladsavelyev added this to the v1.27.2 milestone Feb 21, 2025
@vladsavelyev vladsavelyev added the waiting: changes Issue / PR is on hold, waiting for requested changes label Feb 21, 2025
@vladsavelyev

Copy link
Copy Markdown
Contributor

I coordinated with the QUAST author to add this metric - but it might take some time, so feel free to file a PR there!

And thanks for this PR here - happy to merge it now.

@vladsavelyev vladsavelyev removed the waiting: changes Issue / PR is on hold, waiting for requested changes label Feb 21, 2025
@vladsavelyev vladsavelyev merged commit 4b93122 into MultiQC:main Feb 21, 2025
@schorlton-bugseq

Copy link
Copy Markdown
Contributor Author

Thanks @vladsavelyev for your feedback and even accepting this proposal given the limitations! You are absolutely right that it would be better to upstream and report this from QUAST. I opened a PR there (ablab/quast#279) and happy if you want to revert this if we can get that accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants