Thanks to visit codestin.com
Credit goes to link.springer.com

Skip to main content

Memory-Bound and Taxonomy-Aware K-Mer Selection for Ultra-Large Reference Libraries

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14758))

  • 1918 Accesses

Abstract

Classifying sequencing reads based on \(k\)-mer matches to a reference library is widely used in applications such as taxonomic profiling. Given the ever-increasing number of genomes publicly available, it is increasingly impossible to keep all or a majority of their \(k\)-mers in memory. Thus, there is a growing need for methods for selecting a subset of \(k\)-mers while accounting for taxonomic relationships. We propose \(k\)-mer RANKer (KRANK), a method that uses a set of heuristics to efficiently and effectively select a size-constrained subset of \(k\)-mers from a diverse and imbalanced taxonomy that suffers biased sampling. Empirical evaluations demonstrate that a fraction of all \(k\)-mers in large reference libraries can achieve comparable accuracy to the full set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from £29.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 95.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 119.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Availability

The tool is available under https://github.com/bo1929/KRANK and data are available under https://github.com/bo1929/shared.KRANK. The full paper is available at http://doi.org/10.1101/2024.02.12.580015.

References

  1. Nasko, D.J., Koren, S., Phillippy, A.M., Treangen, T.J.: RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol. 19(1), 165 (2018). https://doi.org/10.1186/s13059-018-1554-6

    Article  Google Scholar 

  2. Ounit, R., Lonardi, S.: Higher classification sensitivity of short metagenomic reads with CLARK-S. Bioinformatics (Oxford, England) 32(24), 3823–3825 (2016). https://doi.org/10.1093/bioinformatics/btw542

    Article  Google Scholar 

  3. Pachiadaki, M.G., et al.: Charting the complexity of the marine microbiome through single-cell genomics. Cell 179(7), 1623-1635.e11 (2019). https://doi.org/10.1016/j.cell.2019.11.017

    Article  Google Scholar 

  4. Roberts, M., Hayes, W., Hunt, B.R., Mount, S.M., Yorke, J.A.: Reducing storage requirements for biological sequence comparison. Bioinformatics (Oxford, England) 20(18), 3363–3369 (2004). https://doi.org/10.1093/bioinformatics/bth408

    Article  Google Scholar 

  5. Şapcı, A.O.B., Rachtman, E., Mirarab, S.: CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing. bioRxiv (2024). https://doi.org/10.1101/2023.11.07.566115

  6. Wood, D.E., Lu, J., Langmead, B.: Improved metagenomic analysis with Kraken 2. Genome Biol. 20(1), 257 (2019). https://doi.org/10.1186/s13059-019-1891-0

    Article  Google Scholar 

  7. Zheng, H., Marçais, G., Kingsford, C.: Creating and using minimizer sketches in computational genomics. J. Comput. Biol., cmb.2023.0094 (2023). https://doi.org/10.1089/cmb.2023.0094

  8. Zhu, Q., et al.: Reference phylogeny for microbes (data pre-release) (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siavash Mirarab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Şapcı, A.O.B., Mirarab, S. (2024). Memory-Bound and Taxonomy-Aware K-Mer Selection for Ultra-Large Reference Libraries. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3989-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-1-0716-3988-7

  • Online ISBN: 978-1-0716-3989-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics