perf[buffer]: iteration for fallible operations with validity#8120
perf[buffer]: iteration for fallible operations with validity#8120joseph-isaacs wants to merge 25 commits into
Conversation
Merging this PR will not alter performance
|
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
4b444dd to
72bca8b
Compare
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
|
Open question is where to put this code? |
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
Signed-off-by: Joe Isaacs <[email protected]>
|
Sounds like we want a crate in between the array and vortex-buffer or this could be a feature flag in vortex-buffer |
|
I was thinking I cannot remember if this was a problem before? |
Signed-off-by: Joe Isaacs <[email protected]>
|
there was no problem with vortex-compute iirc. There were some constructs that made it hard in the past but I think we're fine now. |
|
This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days |
Signed-off-by: Joe Isaacs <[email protected]>
| /// The kernels in this crate require this trait instead of `Iterator` so that lane | ||
| /// reads carry no inter-iteration data dependency — the autovectorizer treats each | ||
| /// lane independently. | ||
| pub trait IndexedSource { |
There was a problem hiding this comment.
These traits really look like Buf/BufMut from bytes which we already implement for buffers and are implemented for similar things
There was a problem hiding this comment.
I think the random access read/write
| /// | ||
| /// Use this to drive a binary kernel from two columns. Length equality is enforced | ||
| /// at construction. | ||
| pub struct LaneZip<A, B>(pub A, pub B); |
There was a problem hiding this comment.
Seems like a good candidate for implementing Iterator and ExactSizeIterator
| /// `impl<S: IndexedSource> IndexedSourceExt for S` below. Bring the trait into | ||
| /// scope (`use vortex_compute::lane_kernels::IndexedSourceExt;`) to call | ||
| /// them with method syntax: `values.try_map_masked_into(&mask, &mut out, f)`. | ||
| pub trait IndexedSourceExt: IndexedSource + Sized { |
There was a problem hiding this comment.
Can we skip the ext traits? just increases the mental load of tracking where things happen IMO
There was a problem hiding this comment.
What's another way of doing this?
There was a problem hiding this comment.
why can't it be default functions on the core trait?
There was a problem hiding this comment.
I really don't think its worth having default functions here we likely want more of these
|
|
||
| let mut values = self; | ||
| let len = values.len(); | ||
| let chunks_count = len / 64; |
There was a problem hiding this comment.
I think making 64 a const here will make it both much clearer (because you can document it once), and the ability to do some more complex cfg based on CPU features (similar code in Arrow)
| // Skip the fallible kernel when type widening or (cached) min/max prove every value fits. | ||
| let target_dtype = DType::Primitive(T::PTYPE, Nullability::NonNullable); | ||
| let infallible = casts_losslessly_to(F::PTYPE, T::PTYPE) | ||
| || cached_values_fit_in(array, &target_dtype) == Some(true); |
| Mask::AllTrue(_) => BufferMut::try_from_trusted_len_iter( | ||
|
|
||
| // Returns `true` if every value of `from` is representable in `to` without loss. | ||
| fn casts_losslessly_to(from: PType, to: PType) -> bool { |
There was a problem hiding this comment.
doesn't need to be a function
There was a problem hiding this comment.
I prefer this only that the body does easily read like that?
|
|
||
| // If F and T have the same byte width, try to take unique ownership of the buffer. | ||
| let same_bit_width = F::PTYPE.byte_width() == T::PTYPE.byte_width(); | ||
| let owned: Option<BufferMut<F>> = if same_bit_width { |
There was a problem hiding this comment.
nit same_bit_width.then(...)
Signed-off-by: Joe Isaacs <[email protected]>
| @@ -0,0 +1,15 @@ | |||
| // SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Should this be in vortex-buffer?
There was a problem hiding this comment.
I did think about putting this there, but it just seemed like the wrong place
Signed-off-by: Joe Isaacs <[email protected]>
Currently use (and arrow) handle fallible operations with scalar (non-SIMD) code.
This PR add a trait and methods to have fast SIMD checked operations (includes cast) but verified else where that
checked_addbenefits