1 unstable release
Uses new Rust 2024
| new 0.1.0 | May 15, 2026 |
|---|
#866 in Biology
13KB
281 lines
rsomics-world
A Cargo workspace of single-binary CLI tools that displace the C/Python/R-era
bioinformatics toolchain with modern Rust: fearless parallelism, explicit
SIMD, a sane installer (cargo install rsomics-<name>), no build-system
ceremony.
Most upstream tools are single-threaded, memory-inefficient, and written in
2005-era C or pure R. Modern multicore + SIMD + GPU resources sit idle. The
goal of rsomics-* is to put them to work, tool by tool, while staying
binary-compatible with upstream so existing pipelines can swap in piecewise.
Architecture
Two layers, one workspace, one git repo. See CONVENTIONS.md
for the rules; the short version:
crates/foundation/— Layer A, library-only primitives (IO, intervals, k-mers, FM-index, alignment cores, stats). A crate is in A iff ≥ 2 tools depend on it.crates/tools/— Layer B, each crate is one installable binary (rsomics-fastp,rsomics-bam,rsomics-bwa, …).- Dependency direction is B → A → external, enforced. A never depends on B; B never depends on B; sharing happens through A.
Per-domain planning lives under docs/:
docs/
├── 00-overview/ Vision, principles, ecosystem survey, benchmarking
├── 01-foundations/ IO formats, compression, indexing, data structures
├── 02-genomics/ DNA alignment, assembly, variant calling, annotation
├── 03-transcriptomics/ Bulk RNA-seq alignment, quantification, DE, splicing
├── 04-single-cell/ scRNA, scATAC, trajectory, integration, spatial
├── 05-epigenomics/ Peak calling, methylation, Hi-C, footprinting
├── 06-metagenomics/ Classification, profiling, MAG assembly, amplicon
├── 07-proteomics-structure/ MS, structure prediction, docking
├── 08-phylogenetics-popgen/ MSA, trees, population genetics
└── 09-workflow-utility/ Workflow engines, containers, visualisation
Each module's README.md is the scope file; the topic files inside carry
TODO checklists using the entry schema in CONVENTIONS.md.
TODO.md is the flat aggregated view across modules.
Status
Public monorepo workspace, Phase 2 in progress. The first tool
(rsomics-fastp) lives at crates/tools/rsomics-fastp/;
foundation primitives live under crates/foundation/. Once rsomics-common
and rsomics-fastp are published, both will install via
cargo install rsomics-<name>.
Why Rust?
See docs/00-overview/motivation.md. Short
version: memory safety, fearless parallelism (rayon), modern packaging
(cargo), explicit SIMD (std::simd), and a maturing scientific stack
(ndarray, polars, arrow, candle) make Rust a credible host language
for the next generation of bioinformatics tools — much of which is still
written in 2005-era C with hand-rolled allocators and brittle Autotools
builds.
How to read this repo
- Start with
docs/00-overview/for context and principles. - Browse module READMEs for scope.
TODO.mdis the cross-module checklist.CONVENTIONS.mdcovers architecture, the TODO schema, the external-dependency quadrants, license + clean-room rules, and the four first-class platform targets.
Dependencies
~115–480KB
~11K SLoC