-
Notifications
You must be signed in to change notification settings - Fork 1k
Use Vec
directly in builders
#7984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Vec
directly in generic_bytes_builder
Vec
directly in generic_bytes_builder
Vec
directly in builders
Hi @alamb and @Dandandan, this PR is ready for review. I have changed the implementation to use Have tested the performance for arrow-array locally, half improved half regressed. |
|
||
self.null_buffer_builder.append_n_non_nulls(len); | ||
self.values_builder.append_trusted_len_iter(iter); | ||
self.values_builder.extend(iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no 100% matched function for Vec
that will check the upper bound of the iter. But this function already check the upper bound of size_hint on line 294, I think it's okay to replace it with extend
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the iter
actually implements the stdlib internal TrustedLen
trait then this should get nicely optimized. The only caller inside the codebase seems to be in arrow-select
with a Range iterator, which implements that trait.
pub fn append_null(&mut self) { | ||
self.values_builder | ||
.append_slice(&vec![0u8; self.value_length as usize][..]); | ||
.extend_from_slice(&vec![0u8; self.value_length as usize][..]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the temporary Vec
now by using values_builder.resize(values_builder.len() + value_length, 0_u8)
or values_builder.extend(std::iter::repeat_n(0_u8, len)
let offsets_builder = BufferBuilder::<T::Offset>::new_from_buffer(offsets_buffer); | ||
let value_builder = BufferBuilder::<u8>::new_from_buffer(value_buffer); | ||
let offsets_builder = Vec::from(offsets_buffer.typed_data::<T::Offset>()); | ||
let value_builder = Vec::from(value_buffer.as_slice()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems it does a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace it with ScalarBuffer
instead
null_buffer: Option<MutableBuffer>, | ||
) -> Self { | ||
let values_builder = BufferBuilder::<T::Native>::new_from_buffer(values_buffer); | ||
let values_builder = Vec::from(values_buffer.typed_data::<T::Native>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a copy?
Hi @jhorstmann @Dandandan, thank you very much for the review! I have addressed the comments. However, I noticed that this implementation will fail some tests in |
Are there any benchmarks that would be good for me to run for this PR? |
I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THank you @liamzwbao @Dandandan and @jhorstmann -- this PR looks good to me
I queued up some benchmarks to see if we can see any improvements
arrow-select/src/concat.rs
Outdated
let data = a.to_data(); | ||
assert_eq!(data.buffers()[0].len(), 440); | ||
assert_eq!(data.buffers()[0].capacity(), 448); // Nearest multiple of 64 | ||
assert_eq!(data.buffers()[0].capacity(), 440); // Nearest multiple of 64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments no longer seem correct, but the tests seem to be an improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed outdated comments
🤖 |
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
Thanks again @liamzwbao and @Dandandan and @jhorstmann |
…mum size for the buffer
…mum size for the buffer
Which issue does this PR close?
Vec
directly in builders #7383.Rationale for this change
Use
Vec
instead of builders for primitive typeWhat changes are included in this PR?
Are these changes tested?
Covered by existing tests
Are there any user-facing changes?
No