Thanks to visit codestin.com
Credit goes to lib.rs

#record #sorting #fastq #fasta #fasta-sequence #dedup #acceleration #sorter #spillover #dryice

spillover-bio

Genomics-focused disk-spilling sort pipeline for FASTQ/FASTA sequence records

4 releases

Uses new Rust 2024

0.1.3 Mar 28, 2026
0.1.2 Mar 24, 2026
0.1.1 Mar 24, 2026
0.1.0 Mar 24, 2026

#533 in Biology

MIT license

200KB
4K SLoC

Genomics-opinionated disk-spilling sort for FASTQ/FASTA records.

spillover-bio builds on the generic spillover crate to provide a ready-to-use external sorter for sequence records. It supplies genomics-specific sort keys, sequence-focused dedup strategies on sorted output, and uses the dryice format for temporary on-disk storage.


spillover-bio

spillover-bio is a genomics-focused sorting crate built on top of spillover .

It provides a ready-to-use disk-spilling sort pipeline for FASTQ/FASTA-style sequence records, with practical defaults for:

  • sequence-aware sort orders (including quality tie-breaking)
  • dryice-backed temporary storage codecs
  • keyed merge acceleration for large spill-heavy workloads
  • optional adjacent deduplication strategies

For project-level context, architecture, and runnable examples, see the main repository README:

For API documentation:

Dependencies

~2.6–8.5MB
~182K SLoC