Codestin Search App

schorlton-bugseq · 2025-02-15T05:15:19Z

When comparing a genome against a reference with QUAST, it is helpful to know how similar that genome is. QUAST outputs this in a "# mismatches per 100 kbp" and "# indels per 100 kbp" field. Here we combine these two fields to calculate an average nucleotide identity for a new column, which is simpler and more interpretable.

I acknowledge that sequence identity is complicated and has different meanings (https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity). However, the choice is made easier given the outputs available from QUAST. Here I implement the gap-compressed identity, which the author of the above blog finds most compelling. Although it slightly differs from the definition of BLAST ANI used in the literature for prokaryotic species definition (non-gap-compressed), I think it is sufficient for most applications. The description also explicitly states it is gap-uncompressed.

Thanks for your consideration!

This comment contains a description of changes (with reason)

vladsavelyev · 2025-02-21T09:06:42Z

Thanks for the suggestion!

Having this metric sounds reasonable, but should it rather be a part of QUAST? Their team is very responsive, if you want to take a stab in a pull request there.

For MultiQC, we can attempt to parse it from the QUAST report, and for older versions we could keep your code here to calculate it if it's missing.

vladsavelyev · 2025-02-21T14:02:21Z

I coordinated with the QUAST author to add this metric - but it might take some time, so feel free to file a PR there!

And thanks for this PR here - happy to merge it now.

schorlton-bugseq · 2025-02-22T05:10:41Z

Thanks @vladsavelyev for your feedback and even accepting this proposal given the limitations! You are absolutely right that it would be better to upstream and report this from QUAST. I opened a PR there (ablab/quast#279) and happy if you want to revert this if we can get that accepted.

Sam Chorlton added 3 commits February 14, 2025 21:05

Add ANI column

c6e6bbb

Add in gap-compressed

d48ecac

format fix

1bb0675

vladsavelyev added the module: enhancement label Feb 21, 2025

vladsavelyev added this to the v1.27.2 milestone Feb 21, 2025

vladsavelyev added the waiting: changes Issue / PR is on hold, waiting for requested changes label Feb 21, 2025

vladsavelyev added 2 commits February 21, 2025 12:45

Check if ANI present

113b557

Merge branch 'main' into assembly_ani

1d37f82

vladsavelyev removed the waiting: changes Issue / PR is on hold, waiting for requested changes label Feb 21, 2025

vladsavelyev merged commit 4b93122 into MultiQC:main Feb 21, 2025

schorlton-bugseq mentioned this pull request Feb 22, 2025

Add ANI field ablab/quast#279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QUAST: add ANI column#3091

QUAST: add ANI column#3091
vladsavelyev merged 5 commits into
MultiQC:mainfrom
schorlton-bugseq:assembly_ani

schorlton-bugseq commented Feb 15, 2025

Uh oh!

vladsavelyev commented Feb 21, 2025

Uh oh!

vladsavelyev commented Feb 21, 2025

Uh oh!

schorlton-bugseq commented Feb 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

schorlton-bugseq commented Feb 15, 2025

Uh oh!

vladsavelyev commented Feb 21, 2025

Uh oh!

vladsavelyev commented Feb 21, 2025

Uh oh!

schorlton-bugseq commented Feb 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants