5 releases
| 0.2.0 | Dec 2, 2025 |
|---|---|
| 0.1.3 | Dec 2, 2025 |
| 0.1.2 | Dec 2, 2025 |
| 0.1.1 | Dec 1, 2025 |
| 0.1.0 | Nov 22, 2025 |
#56 in Database implementations
Used in avila
155KB
3.5K
SLoC
avila-arrow
Zero-copy columnar format with scientific extensions, SIMD acceleration, and native compression.
🚀 Features
- 11 Primitive Arrays: Int8-64, UInt8-64, Float32/64, Boolean, UTF-8
- 4 Scientific Arrays: Quaternions, Complex64, Tensor4D, Spinors
- 25+ Compute Operations: Aggregations, filters, comparisons, sorting, arithmetic (SIMD)
- SIMD Acceleration: AVX2-optimized operations up to 35x faster
- Native Compression: RLE, Delta, Dictionary, Bit-Packing (125x compression!)
- Zero External Dependencies: Only
byteorderrequired - AvilaDB Native: Direct integration with AvilaDB
- Production Ready: 80+ tests passing, proven benchmarks
🎯 Unique in the World
Avila-arrow is the only columnar format with:
- Native Scientific Types: QuaternionArray (SLERP), ComplexArray (FFT), Tensor4D (GR), Spinors (QM)
- Native Compression: 125x RLE, 16x Bit-Packing, 4x Delta - zero external dependencies
- AVX2 SIMD: 35x speedup for compute operations
📦 Installation
[dependencies]
avila-arrow = "0.1"
🔥 Quick Start
use avila_arrow::{Schema, Field, DataType, RecordBatch};
use avila_arrow::array::Int64Array;
use avila_arrow::compute::*;
// Create schema
let schema = Schema::new(vec![
Field::new("id", DataType::Int64),
Field::new("value", DataType::Float64),
]);
// Create arrays
let ids = Int64Array::from(vec![1, 2, 3, 4, 5]);
let values = Float64Array::from(vec![10.0, 20.0, 30.0, 40.0, 50.0]);
// Compute operations
let sum = sum_f64(&values);
let mean = mean_f64(&values).unwrap();
let filtered = filter_f64(&values, >_f64(&values, 25.0))?;
println!("Sum: {}, Mean: {}, Filtered: {:?}", sum, mean, filtered.values());
🧪 Scientific Computing
use avila_arrow::scientific::*;
// Quaternion arrays for spacecraft orientation
let q1 = Quaternion::from_axis_angle([0.0, 0.0, 1.0], PI / 2.0);
let q2 = Quaternion::from_axis_angle([0.0, 0.0, 1.0], PI);
let array1 = QuaternionArray::new(vec![q1; 1000]);
let array2 = QuaternionArray::new(vec![q2; 1000]);
// SLERP interpolation for smooth rotation
let interpolated = array1.slerp(&array2, 0.5).unwrap();
// Complex arrays for FFT
let signal = ComplexArray::new(vec![
Complex64::new(1.0, 0.0),
Complex64::new(0.0, 1.0),
]);
let magnitudes = signal.magnitude();
let phases = signal.phase();
🗜️ Native Compression (Zero External Dependencies!)
use avila_arrow::compression::*;
// RLE: 125x compression for repeated values
let data = vec![42u8; 10000];
let encoded = rle::encode(&data).unwrap();
// 10000 bytes -> 80 bytes!
// Delta: 4x compression for timestamps
let timestamps: Vec<i64> = (0..10000).map(|i| 1700000000 + i * 1000).collect();
let encoded = delta::encode_i64(×tamps).unwrap();
// Bit-Packing: 16x compression for small integers
let small_ints: Vec<i64> = (0..10000).map(|i| i % 16).collect();
let bit_width = bitpack::detect_bit_width(&small_ints); // 4 bits
let packed = bitpack::pack(&small_ints, bit_width).unwrap();
// Dictionary: Optimal for low cardinality
let mut encoder = DictionaryEncoderI64::new();
for i in 0..10000 {
encoder.encode(i % 20); // Only 20 unique values
}
let (dict, indices) = encoder.finish();
Compression Benchmarks
| Codec | Best For | Compression Ratio | Example |
|---|---|---|---|
| RLE | Repeated values | 125x | [1,1,1,...] |
| Bit-Pack | Small integers (0-15) | 16x | Flags, counters |
| Delta | Sequential data | 4x | Timestamps, IDs |
| Dictionary | Low cardinality | 1-10x | Categories, enums |
All codecs are 100% native Rust - no external dependencies!
⚡ SIMD Performance
Avila-arrow uses AVX2 intrinsics for hardware-accelerated operations with proven speedups:
use avila_arrow::compute::*;
let data = Float64Array::from((0..1_000_000).map(|i| i as f64).collect::<Vec<_>>());
// Automatically uses SIMD when AVX2 is available
let sum = sum_f64(&data); // 4.24x faster than scalar
📊 Benchmarks (100K-1M elements)
Basic Operations:
| Operation | Size | Scalar | SIMD | Speedup |
|---|---|---|---|---|
| Sum | 100K | 61.4μs | 14.5μs | 4.24x |
| Add | 10K | 38.8μs | 4.4μs | 8.81x |
| Multiply | 100 | 856ns | 24.4ns | 35x |
| Subtract | 1K | 4.64μs | 611ns | 7.59x |
| Divide | 10K | 76.3μs | 34.7μs | 2.20x |
| Sqrt | 1M | 8.67ms | 4.98ms | 1.74x |
| FMA | 10K | 54.9μs | 9.43μs | 5.82x |
Complex Pipelines (3 operations):
| Size | Scalar | SIMD | Speedup |
|---|---|---|---|
| 10K | 99.3μs | 24.7μs | 4.02x |
| 100K | 1.03ms | 586μs | 1.75x |
| 1M | 12.0ms | 10.8ms | 1.11x |
Memory Throughput:
| Elements | Scalar | SIMD | Speedup |
|---|---|---|---|
| 100K | 61.4μs | 14.5μs | 4.24x |
| 1M | 721μs | 292μs | 2.47x |
Note: Benchmarks run on Intel AVX2 CPU. SIMD excels at small-medium datasets (100-100K). For 10M+ elements, consider parallel processing.
🎓 Examples
See examples/ directory:
basic.rs- Arrays and RecordBatchscientific.rs- Quaternions, Complex, Tensorscompression.rs- Native compression codecs (125x!)ipc.rs- Serialization (coming soon)
Run with:
cargo run --example compression
🧬 Use Cases
- Aerospace: Spacecraft orientation tracking with quaternions
- Signal Processing: FFT analysis with complex arrays
- Physics: Relativistic simulations with tensors
- Quantum Computing: State vectors with spinors
- Data Analytics: High-performance columnar analytics
🛠️ Features
[dependencies.avila-arrow]
version = "0.1"
features = ["scientific", "compression", "ipc"]
scientific(default): Scientific array typescompression: Compression supportipc: Arrow IPC formataviladb: AvilaDB integration
📈 Roadmap
- Primitive arrays (Int8-64, UInt8-64, Float32/64)
- Scientific arrays (Quaternion, Complex, Tensor4D, Spinor)
- Compute kernels (sum, mean, filter, sort, arithmetic)
- SIMD acceleration (AVX2 with sub, div, sqrt, fma)
- Native compression (RLE, Delta, Dictionary, Bit-Packing) 🆕
- Comprehensive benchmarks (35x compute, 125x compression)
- Arrow IPC format compatibility
- GPU acceleration (CUDA/ROCm)
- Distributed computing support
- AVX-512 support for next-gen CPUs
🤝 Contributing
Contributions welcome! Please open an issue or PR.
📄 License
Dual licensed under MIT OR Apache-2.0.
🌟 Credits
Built with ❤️ by avilaops for the Brazilian scientific computing community.
Status: v0.2.0 - 80+ tests passing ✅ | 35x SIMD ✅ | 125x Compression ✅
Dependencies
~175KB