Thanks to visit codestin.com
Credit goes to lib.rs

#genomics #bioinformatics #dna-sequence #k-mer #dna

seqkmer

High-performance FASTA/FASTQ IO and minimizer-based k-mer analysis utilities for Rust bioinformatics pipelines

6 releases

0.1.5 Nov 1, 2025
0.1.4 Nov 1, 2025
0.1.3 Oct 29, 2025
0.1.2 Jan 29, 2025
0.1.0 Aug 20, 2024

#126 in Biology


Used in 2 crates

MIT license

97KB
2K SLoC

Seqkmer

Seqkmer is a Rust library for high-throughput sequence IO and k-mer based analyses. It provides fast readers for FASTA/FASTQ (including gzipped streams), k-mer minimizer scanning, and utilities to parallelise bulk sequence processing.

Highlights

  • Universal FASTX readers: Seamlessly handle FASTA, FASTQ, interleaved paired-end, and dual-file paired-end datasets through a unified API. Automatic format detection and transparent gzip support are included.
  • Quality-aware FASTQ parsing: Optional quality-score thresholds to soft-mask low-quality bases while preserving original sequence layout.
  • Buffered & streaming modes: Choose between streaming (FastaReader, FastqReader) or buffered variants (BufferFastaReader) depending on your throughput/memory trade-offs.
  • Minimizer-based k-mer scanning: The mmscanner module exposes scan_sequence and MinimizerIterator for fast k-mer/minimizer enumeration with configurable windows.
  • Parallel orchestration: Utilities in parallel coordinate multi-threaded reading and processing pipelines using scoped thread pools.

Getting Started

Add Seqkmer to your project:

cargo add seqkmer

Reading FASTA or FASTQ

use seqkmer::{FastxReader, OptionPair, Reader};
use std::path::Path;

fn main() -> std::io::Result<()> {
    // Single FASTQ file (auto-detects FASTA vs FASTQ and gzip)
    let path = Path::new("tests/data/test.fastq");
    let mut reader = FastxReader::from_paths(OptionPair::Single(path), 0, 18)?;

    while let Some(batch) = reader.next()? {
        for entry in batch {
            println!(
                "[{}] {} (len={})",
                entry.header.format as u8,
                entry.header.id,
                entry.body.single().unwrap().len()
            );
        }
    }

    Ok(())
}

For paired-end data, provide a pair of paths. Interleaved FASTQ is detected automatically; separate R1/R2 files are also supported:

let paths = OptionPair::Pair(
    Path::new("reads_R1.fastq"),
    Path::new("reads_R2.fastq"),
);
let mut reader = FastxReader::from_paths(paths, 0, 0)?;

K-mer Minimizer Scanning

use seqkmer::{scan_sequence, Meros, MinimizerIterator};
use seqkmer::reader::Reader;

fn main() -> std::io::Result<()> {
    let meros = Meros::new(15, 5, Some(0), None, None); // (k, window, seed, min, max)
    let mut reader = seqkmer::FastaReader::from_path("tests/data/test.fasta", 0)?;

    while let Some(batch) = reader.next()? {
        for base in batch {
            let mut minimizers: Vec<_> = scan_sequence(&base, &meros).collect();
            println!("{} -> {} minimizers", base.header.id, minimizers.len());
        }
    }

    Ok(())
}

Parallel Pipelines

Use read_parallel when you need to map a function across batches using multiple threads:

use seqkmer::{read_parallel, FastaReader, Meros, ParallelResult, Reader};

fn main() -> std::io::Result<()> {
    let meros = Meros::new(11, 3, Some(0), None, None);
    let mut reader = FastaReader::from_path("tests/data/test.fasta", 0)?;

    read_parallel(
        &mut reader,
        4, // threads
        &meros,
        |seqs| seqs.len(), // work: count sequences per batch
        |result: &mut ParallelResult<usize>| {
            let mut total = 0;
            while let Some(count) = result.next() {
                total += count.unwrap();
            }
            println!("processed {} batches", total);
        },
    )?;

    Ok(())
}

Feature Overview

Module Purpose
fasta FASTA readers (streaming + buffered)
fastq FASTQ reader with automatic interleaved detection and quality masking
fastx Format-agnostic wrapper over FASTA/FASTQ readers
reader Misc IO utilities (gzip detection, trim helpers, file format detection)
parallel Threaded reader orchestration using scoped thread pools
mmscanner Minimizer scanning over DNA sequences
feat K-mer feature helper types (Meros, constants)
utils::OptionPair Helper enum for representing single vs paired resources

Testing

All functionality is covered by unit and doc tests. Run the full suite with:

cargo test

License

Seqkmer is distributed under the terms of the MIT License.

Dependencies

~695KB
~11K SLoC