branches provides branch prediction hints, control flow assumptions, abort, and manual data prefetch (read & write) helpers for performance optimization, using stable Rust primitives where available and falling back to core::intrinsics on nightly.
To use branches, use following command:
cargo add branchesFor a no_std environment, disable the default features by using following command:
cargo add branches --no-default-featuresThe following functions are provided by branches:
likely(b: bool) -> bool: Returns the input value but provides hints for the compiler that the statement is likely to be true.unlikely(b: bool) -> bool: Returns the input value but provides hints for the compiler that the statement is unlikely to be true.assume(b: bool): Assumes that the input condition is always true and causes undefined behavior if it is not. On stable Rust, this function usescore::hint::unreachable_unchecked()to achieve the same effect.abort(): Aborts the execution of the process immediately and without any cleanup.prefetch_read_data<T, const LOCALITY: i32>(addr: *const T): Hints the CPU to load data ataddrinto cache for an upcoming read.LOCALITYselects cache behavior (e.g. 0 = L1, 1 = L2, 2 = L3, other = non‑temporal or arch default).prefetch_write_data<T, const LOCALITY: i32>(addr: *const T): Hints the CPU to load a line for an upcoming write. SameLOCALITYsemantics as above.
Guidelines:
- Only prefetch a small distance ahead (tune empirically).
- Too-far or excessive prefetching can evict useful cache lines.
- Never rely on prefetch for correctness; it is purely a performance hint.
Here's an example of how you can use likely to optimize a function:
use branches::likely;
pub fn factorial(n: usize) -> usize {
if likely(n > 1) {
n * factorial(n - 1)
} else {
1
}
}Loop manual prefetch example:
use branches::{prefetch_read_data, prefetch_write_data};
pub fn accumulate(a: &[u64], out: &mut [u64]) -> u64 {
prefetch_read_data::<_, 0>(&a);
prefetch_write_data::<_, 0>(&out);
let mut sum = 0u64;
let len = a.len().min(out.len());
// Process in cache‑line sized blocks (assume 128‑byte cache line)
const CACHE_LINE_BYTES: usize = 128;
const ELEMS_PER_LINE: usize = CACHE_LINE_BYTES / core::mem::size_of::<u64>();
let mut i = 0;
while i < len {
// Prefetch next cache line (read + future write)
let next = i + ELEMS_PER_LINE;
if next < len {
prefetch_read_data::<_, 0>(&a[next]);
prefetch_write_data::<_, 0>(&out[next]);
}
// Inner loop over one cache line
let end = next.min(len);
// The compiler can (partially) unroll this inner loop because (end - i)
// is bounded by ELEMS_PER_LINE. For the final, shorter chunk (< ELEMS_PER_LINE)
// it emits the scalar fallback.
for j in i..end {
sum += a[j];
out[j] = sum;
}
i = end;
}
sum
}By correctly using the functions provided by branches, you can achieve a 10-20% improvement in the performance of your algorithms.
branches is licensed under the MIT license. See the LICENSE file for more information.