Thanks to visit codestin.com
Credit goes to lib.rs

#parallel-execution #deferred-execution #simd #rayon

bin+lib leopard_vec

A high-performance parallelized vector container with deferred execution for bulk parallel operations

1 unstable release

new 0.1.0 Nov 30, 2025

#274 in Concurrency

Apache-2.0

66KB
824 lines

Leopard Vec

Crates.io Documentation License

A high-performance parallelized vector container library for Rust with deferred execution.

Overview

Leopard Vec provides LVec, a parallel vector container that records operations and executes them in a single bulk parallel pass. This design minimizes thread pool creation overhead by batching all operations together, making it ideal for compute-intensive workloads.

Key Features

  • Deferred Execution: All operations are recorded between q.start() and q.end(), then executed in one parallel batch
  • Type-Agnostic Queue: A single LQueue can manage LVec<T> of different types (u32, f64, etc.)
  • SIMD-Style Masking: Boolean masks with blend, select, and masked operations
  • Operator Overloading: Natural syntax with +, -, *, / operators
  • Dependency Graph: Operations are automatically ordered based on data dependencies
  • Zero-Copy Where Possible: Uses Arc for efficient data sharing

Installation

Add to your Cargo.toml:

[dependencies]
leopard_vec = "0.1.0"

Quick Start

use leopard_vec::{LQueue, LVec, LMask};

fn main() {
    // Create a queue
    let q = LQueue::new();

    // Create vectors (not yet initialized)
    let x: LVec<f64> = q.lvec_with_capacity(1000);
    let y: LVec<f64> = q.lvec_with_capacity(1000);

    // Start recording operations
    q.start();

    // All operations are recorded, not executed
    let x = x.fill_with(|i| i as f64);
    let y = y.fill_with(|i| (i * 2) as f64);
    let z = &x * &y + &x;  // Operator overloading works!

    // Execute all operations in one parallel batch
    q.end();

    // Retrieve results
    let result = z.materialize().unwrap();
    println!("z[0..5] = {:?}", &result[0..5]);
}

API Overview

LQueue

The operation queue that records and executes operations.

let q = LQueue::new();

// Create vectors
let vec: LVec<f64> = q.lvec();                    // Default capacity (128)
let vec: LVec<f64> = q.lvec_with_capacity(1000);  // Custom capacity

// Recording session
q.start();           // Begin recording
// ... operations ...
q.end();             // Execute all recorded operations

q.is_recording();    // Check if currently recording

LVec Operations

All operations must be called between q.start() and q.end().

Initialization

let a = vec.fill(42.0);                    // Fill with constant
let b = vec.fill_with(|i| i as f64);       // Fill with closure

Transformation

let mapped = a.map(|i, val| val * 2.0);    // Transform each element

let branched = a.map_where(
    |i, val| *val > 10.0,   // Condition
    |i, val| val * 2.0,     // If true
    |i, val| val + 1.0,     // If false
);

Arithmetic Operators

let sum = &a + &b;
let diff = &a - &b;
let prod = &a * &b;
let quot = &a / &b;

Masking Operations

let mask = LMask::from_fn(len, |i| i >= 5);

let blended = a.blend(&b, &mask);          // a where false, b where true
let selected = LVec::select(&mask, &a, &b); // Same as blend
let applied = a.masked_apply(&mask, |i, v| v * 2.0);
let filled = a.masked_fill(&mask, 999.0);

LMask

Boolean masks for conditional operations.

// Creation
let mask = LMask::new(len, true);           // All true
let mask = LMask::from_fn(len, |i| i % 2 == 0);  // Pattern

// Logical operations
let and = &mask_a & &mask_b;
let or = &mask_a | &mask_b;
let xor = &mask_a ^ &mask_b;
let not = !&mask_a;

// Inspection
mask.len();
mask.as_slice();
mask[index];

Retrieving Results

After q.end(), use materialize() to get the computed data:

if let Some(data) = result.materialize() {
    println!("{:?}", &data[0..10]);
}

How It Works

  1. Recording Phase: Between q.start() and q.end(), operations create "pending" LVec instances and push operation descriptors to the queue.

  2. Dependency Analysis: The queue builds a dependency graph to determine execution order.

  3. Parallel Execution: At q.end(), operations are executed level-by-level. Each operation uses Rayon's into_par_iter() for parallel element-wise computation.

  4. Result Retrieval: materialize() returns the computed Arc<Vec<T>> for any pending LVec.

Performance Benefits

  • Minimized Thread Pool Overhead: One parallel execution context instead of many
  • Optimized Scheduling: Dependency-aware execution order
  • Memory Efficiency: Arc-based sharing avoids unnecessary copies
  • Bulk Operations: Better cache utilization through batched execution

Requirements

  • Rust 1.70+
  • Rayon 1.8+

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Attribution Requirement

If you use this library in your project, you must include attribution in your documentation. Example:

This project uses leopard_vec by Majid Abdelilah
https://github.com/MajidAbdelilah/Leopard_rust

Author

Majid Abdelilah - GitHub

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Dependencies

~1.5MB
~24K SLoC