Thanks to visit codestin.com
Credit goes to lib.rs

#minimap2 #short-read #aligner #rsomics #cli #fasta #fastq #cargo-workspace #bioinformatics #bam

bin+lib rsomics-minimap2

Long/short-read aligner — CLI wrapper of minimap2 FFI bindings (Quadrant ②)

1 unstable release

Uses new Rust 2024

new 0.1.0 May 18, 2026

#532 in Biology

MIT/Apache

82KB
1.5K SLoC

rsomics-world

A Cargo workspace of single-binary CLI tools that displace the C/Python/R-era bioinformatics toolchain with modern Rust: fearless parallelism, explicit SIMD, a sane installer (cargo install rsomics-<name>), no build-system ceremony.

Status: 101 published crates on crates.io, 82 tool binaries across FASTQ (18), BED (19), FASTA (15), VCF (12), GFF (8), BAM (5), and alignment (minimap2 FFI). 97% of tools have automated compat tests against upstream binaries (seqkit, bedtools, samtools, fastp, etc.).

Most upstream tools are single-threaded, memory-inefficient, and written in 2005-era C or pure R. Modern multicore + SIMD + GPU resources sit idle. The goal of rsomics-* is to put them to work, tool by tool, while staying binary-compatible with upstream so existing pipelines can swap in piecewise.

Architecture

Two layers, one workspace, one git repo. See CONVENTIONS.md for the rules; the short version:

  • crates/foundation/Layer A, library-only primitives (IO, intervals, k-mers, FM-index, alignment cores, stats). A crate is in A iff ≥ 2 tools depend on it.
  • crates/tools/<domain>/Layer B, each crate is one operation (rsomics-fastq-trim, rsomics-fasta-stats, rsomics-bam-view, …). The partition is per-function, not per-upstream-binary — Swiss-army wraps like samtools get split into view / sort / index / markdup / … crates.
  • Dependency direction is B → A → external, enforced. A never depends on B; B never depends on B; sharing happens through A.

Per-domain planning lives under docs/:

docs/
├── 00-overview/             Vision, principles, ecosystem survey, benchmarking
├── 01-foundations/          IO formats, compression, indexing, data structures
├── 02-genomics/             DNA alignment, assembly, variant calling, annotation
├── 03-transcriptomics/      Bulk RNA-seq alignment, quantification, DE, splicing
├── 04-single-cell/          scRNA, scATAC, trajectory, integration, spatial
├── 05-epigenomics/          Peak calling, methylation, Hi-C, footprinting
├── 06-metagenomics/         Classification, profiling, MAG assembly, amplicon
├── 07-proteomics-structure/ MS, structure prediction, docking
├── 08-phylogenetics-popgen/ MSA, trees, population genetics
└── 09-workflow-utility/     Workflow engines, containers, visualisation

Each module's README.md is the scope file; the topic files inside carry TODO checklists using the entry schema in CONVENTIONS.md. TODO.md is the flat aggregated view across modules.

Status

Public monorepo workspace. Published pilots:

  • rsomics-fasta-stats — Rust port of seqkit stats (FASTA subset), cargo install rsomics-fasta-stats.
  • rsomics-fastq-trim — fastp's adapter / poly-G / poly-X / fixed-length trim hot path, cargo install rsomics-fastq-trim.

Foundation primitives live under crates/foundation/ (rsomics-common, rsomics-help). The full ~150-crate partition catalog lives in docs/ and TODO.md.

Why Rust?

See docs/00-overview/motivation.md. Short version: memory safety, fearless parallelism (rayon), modern packaging (cargo), explicit SIMD (std::simd), and a maturing scientific stack (ndarray, polars, arrow, candle) make Rust a credible host language for the next generation of bioinformatics tools — much of which is still written in 2005-era C with hand-rolled allocators and brittle Autotools builds.

How to read this repo

  • Start with docs/00-overview/ for context and principles.
  • Browse module READMEs for scope.
  • TODO.md is the cross-module checklist.
  • CONVENTIONS.md covers architecture, the TODO schema, the external-dependency quadrants, license + clean-room rules, and the four first-class platform targets.

Dependencies

~7–13MB
~278K SLoC