Thanks to visit codestin.com
Credit goes to github.com

Skip to content

perf[buffer]: iteration for fallible operations with validity#8120

Open
joseph-isaacs wants to merge 25 commits into
developfrom
ji/fast-iter-valid
Open

perf[buffer]: iteration for fallible operations with validity#8120
joseph-isaacs wants to merge 25 commits into
developfrom
ji/fast-iter-valid

Conversation

@joseph-isaacs

@joseph-isaacs joseph-isaacs commented May 27, 2026

Copy link
Copy Markdown
Contributor

Currently use (and arrow) handle fallible operations with scalar (non-SIMD) code.

This PR add a trait and methods to have fast SIMD checked operations (includes cast) but verified else where that checked_add benefits

@codspeed-hq

codspeed-hq Bot commented May 27, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 400 untouched benchmarks
⏩ 1155 skipped benchmarks1


Comparing ji/fast-iter-valid (49ec12a) with develop (31fda42)

Open in CodSpeed

Footnotes

  1. 1155 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
@joseph-isaacs joseph-isaacs changed the title faster iteration infra perf[buffer]: iteration for fallible operations with validity May 27, 2026
@joseph-isaacs joseph-isaacs marked this pull request as ready for review May 27, 2026 15:13
@joseph-isaacs

Copy link
Copy Markdown
Contributor Author

Open question is where to put this code?

f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
f
Signed-off-by: Joe Isaacs <[email protected]>
@joseph-isaacs joseph-isaacs added the changelog/performance A performance improvement label May 27, 2026
@robert3005

Copy link
Copy Markdown
Contributor

Sounds like we want a crate in between the array and vortex-buffer or this could be a feature flag in vortex-buffer

@joseph-isaacs

Copy link
Copy Markdown
Contributor Author

I was thinking vortex-compute what only works with dtype, buffers and rust native types?

I cannot remember if this was a problem before?

f
Signed-off-by: Joe Isaacs <[email protected]>
@joseph-isaacs joseph-isaacs requested a review from robert3005 May 28, 2026 11:30
@robert3005

Copy link
Copy Markdown
Contributor

there was no problem with vortex-compute iirc. There were some constructs that made it hard in the past but I think we're fine now.

@github-actions

Copy link
Copy Markdown
Contributor

This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days

@github-actions github-actions Bot added the stale This PR is stale and will be auto-closed soon label Jun 12, 2026
@joseph-isaacs joseph-isaacs removed the stale This PR is stale and will be auto-closed soon label Jun 12, 2026
@joseph-isaacs joseph-isaacs requested a review from a team June 12, 2026 11:23
Comment thread vortex-array/benches/cast_primitive.rs
/// The kernels in this crate require this trait instead of `Iterator` so that lane
/// reads carry no inter-iteration data dependency — the autovectorizer treats each
/// lane independently.
pub trait IndexedSource {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These traits really look like Buf/BufMut from bytes which we already implement for buffers and are implemented for similar things

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the random access read/write

///
/// Use this to drive a binary kernel from two columns. Length equality is enforced
/// at construction.
pub struct LaneZip<A, B>(pub A, pub B);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good candidate for implementing Iterator and ExactSizeIterator

/// `impl<S: IndexedSource> IndexedSourceExt for S` below. Bring the trait into
/// scope (`use vortex_compute::lane_kernels::IndexedSourceExt;`) to call
/// them with method syntax: `values.try_map_masked_into(&mask, &mut out, f)`.
pub trait IndexedSourceExt: IndexedSource + Sized {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip the ext traits? just increases the mental load of tracking where things happen IMO

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's another way of doing this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't it be default functions on the core trait?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think its worth having default functions here we likely want more of these

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand


let mut values = self;
let len = values.len();
let chunks_count = len / 64;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making 64 a const here will make it both much clearer (because you can document it once), and the ability to do some more complex cfg based on CPU features (similar code in Arrow)

// Skip the fallible kernel when type widening or (cached) min/max prove every value fits.
let target_dtype = DType::Primitive(T::PTYPE, Nullability::NonNullable);
let infallible = casts_losslessly_to(F::PTYPE, T::PTYPE)
|| cached_values_fit_in(array, &target_dtype) == Some(true);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - unwrap_or_default

Mask::AllTrue(_) => BufferMut::try_from_trusted_len_iter(

// Returns `true` if every value of `from` is representable in `to` without loss.
fn casts_losslessly_to(from: PType, to: PType) -> bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need to be a function

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this only that the body does easily read like that?


// If F and T have the same byte width, try to take unique ownership of the buffer.
let same_bit_width = F::PTYPE.byte_width() == T::PTYPE.byte_width();
let owned: Option<BufferMut<F>> = if same_bit_width {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit same_bit_width.then(...)

w
Signed-off-by: Joe Isaacs <[email protected]>
Comment thread vortex-compute/src/lib.rs
@@ -0,0 +1,15 @@
// SPDX-License-Identifier: Apache-2.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in vortex-buffer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about putting this there, but it just seemed like the wrong place

w
Signed-off-by: Joe Isaacs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants