1 unstable release
| 0.1.0 | Nov 5, 2025 |
|---|
#483 in Biology
127 downloads per month
Used in 2 crates
1MB
25K
SLoC
onecode-rs
Rust bindings for ONEcode, a simple and efficient data representation format for genomic data.
Overview
ONEcode is a data representation framework designed primarily for genomic data, providing both human-readable ASCII and compressed binary file versions with strongly typed data.
This library provides safe, idiomatic Rust bindings to the ONEcode C library.
Features
- ✅ Read and write ONE files in both ASCII and binary formats
- ✅ Schema validation and creation
- ✅ Provenance and reference tracking
- ✅ Type-safe access to fields (integers, reals, characters, strings, lists)
- ✅ File navigation and statistics
- ✅ Sequence name extraction from embedded GDB in alignment files
- ✅ RAII-based resource management
- ✅ Fully thread-safe - concurrent operations supported
Requirements
System Dependencies
This library uses bindgen to generate Rust bindings from C headers, which requires clang/libclang:
Ubuntu/Debian:
sudo apt-get install llvm-dev libclang-dev clang
Fedora/RHEL:
sudo dnf install clang-devel llvm-devel
macOS:
xcode-select --install # Usually already installed
Arch Linux:
sudo pacman -S clang
For more details, see the bindgen requirements documentation.
Installation
Add this to your Cargo.toml:
[dependencies]
onecode = { git = "https://github.com/pangenome/onecode-rs" }
Usage
Reading a ONE file
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = OneFile::open_read("data.1seq", None, None, 1)?;
// Read through the file
loop {
let line_type = file.read_line();
if line_type == '\0' {
break; // End of file
}
match line_type {
'S' => {
// Access DNA sequence data
println!("Sequence line");
},
'I' => {
// Access identifier string
println!("ID: {}", file.int(0));
},
_ => {}
}
}
Ok(())
}
Writing a ONE file
use onecode::{OneFile, OneSchema};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a schema
let schema_text = "P 3 tst\nO T 1 3 INT\n";
let schema = OneSchema::from_text(schema_text)?;
// Open file for writing
let mut writer = OneFile::open_write_new(
"output.1tst",
&schema,
"tst",
false, // ASCII format
1 // single-threaded
)?;
// Add provenance
writer.add_provenance("myprogram", "1.0", "example command")?;
// Write data
writer.set_int(0, 42);
writer.write_line('T', 0, None);
// File is automatically closed on drop
Ok(())
}
Creating schemas from text
use onecode::OneSchema;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Define schema inline
let schema_text = r#"
P 3 seq
O S 1 3 DNA
D I 1 3 INT
"#;
let schema = OneSchema::from_text(schema_text)?;
// Use schema for file operations
Ok(())
}
Getting file statistics
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let file = OneFile::open_read("data.1seq", None, None, 1)?;
// Get statistics for a line type
let (count, max_length, total_length) = file.stats('S')?;
println!("Sequences: {}, Max length: {}, Total: {}",
count, max_length, total_length);
Ok(())
}
Working with alignment files (.1aln) and sequence names
Alignment files can contain embedded genome database (GDB) information, mapping sequence IDs to names:
use onecode::OneFile;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;
// Get all sequence names (efficient for multiple lookups)
let seq_names = file.get_all_sequence_names();
println!("Found {} sequences", seq_names.len());
// Read alignments and resolve sequence names
loop {
let line_type = file.read_line();
if line_type == '\0' { break; }
if line_type == 'A' {
let query_id = file.int(0);
let target_id = file.int(3);
if let (Some(query_name), Some(target_name)) =
(seq_names.get(&query_id), seq_names.get(&target_id)) {
println!("Alignment: {} vs {}", query_name, target_name);
}
}
}
Ok(())
}
Or look up individual names on-demand:
let mut file = OneFile::open_read("alignments.1aln", None, None, 1)?;
// Get a specific sequence name by ID
if let Some(name) = file.get_sequence_name(5) {
println!("Sequence 5: {}", name);
}
API Documentation
Full API documentation is available via cargo doc:
cargo doc --open
Key types:
OneFile- Main file handle for reading/writing ONE filesOneSchema- Schema definition and validationOneError- Error typesOneType- Field type enumeration
Building
The library uses bindgen to automatically generate bindings from the C headers and cc to compile the C library.
cargo build --release
Testing
All tests pass with full concurrent execution:
cargo test
Test suite includes:
- 9 basic functionality tests
- 3 sequence name extraction tests
- 4 thread-safety stress tests (10-50 concurrent threads)
- 2 doc tests
Thread Safety
✅ Fully thread-safe! The library supports concurrent operations without any restrictions.
The upstream ONEcode C library has been updated with thread-local storage for all global state, making it safe for concurrent use from multiple threads. All operations including schema creation, file reading, and error handling work correctly under concurrent load.
Architecture
The library is organized into several modules:
ffi- Raw FFI bindings generated by bindgenerror- Rust error types and Result wrappertypes- Rust-friendly type definitionsfile- SafeOneFilewrapper with RAII resource managementschema-OneSchemamanagement and validation
Integration with ONEcode
The C library is included as a git subtree in the ONEcode/ directory and compiled automatically during the build process.
To update the ONEcode subtree:
git subtree pull --prefix ONEcode https://github.com/thegenemyers/ONEcode.git main --squash
Performance
- Zero-copy access to data where possible
- Supports parallel reading/writing with configurable thread count
- Binary format provides efficient compression
- Thread-safe without synchronization overhead
License
This Rust wrapper is licensed under MIT OR Apache-2.0.
The ONEcode C library has its own license - see ONEcode/ for details.
Contributing
Contributions are welcome! Please ensure tests pass before submitting PRs:
cargo test
cargo clippy
cargo fmt
Acknowledgments
ONEcode was developed by Gene Myers and Richard Durbin. This Rust wrapper builds on their excellent work to provide safe, idiomatic Rust bindings.
Dependencies
~0–2.2MB
~43K SLoC