Expand description
§oxbow
oxbow reads genomic data formats 🧬 as Apache Arrow 🏹.
With the oxbow Rust library, you can serialize native formats into Arrow IPC , stream larger-than-memory files as Arrow RecordBatches with zero-copy over FFI, and more!
⚠️ The Rust API is under active development and is not yet stable. The API may change in future releases.
§Features
- 🚀 Supports commonly used file formats from the htslib/GA4GH and the UCSC ecosystems.
- 🔍 Support for compression, indexing, column projection, and genomic range querying.
- 🔧 Support for nested fields and complex, typed schemas (e.g., SAM tags,
VCF
INFOandFORMATfields, AutoSql, etc.).
§Scanners
The main interface to read files are the scanners. Each scanner is a parser for a specific
format and provides scanning methods that return an iterator implementing the
arrow::record_batch::RecordBatchReader trait.
§Sequence formats
§Alignment formats
sam: Scan SAM files as Arrow RecordBatches.bam: Scan BAM files as Arrow RecordBatches.cram: Scan CRAM files as Arrow RecordBatches.
§Variant formats
§Interval feature formats
bed: Scan BED files as Arrow RecordBatches.gtf: Scan GXF files as Arrow RecordBatches.gff: Scan GFF files as Arrow RecordBatches.
§UCSC Big Binary Indexed (BBI) formats
bigbed: Scan BigBed files as Arrow RecordBatches.bigwig: Scan BigWig files as Arrow RecordBatches.BBI zoom: Scan zoom level summary statistics from BigWig/BigBed as Arrow RecordBatches.
§License
Licensed under MIT or Apache-2.0.