Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

liamzwbao
Copy link
Contributor

@liamzwbao liamzwbao commented Jul 24, 2025

Which issue does this PR close?

Rationale for this change

Use Vec instead of builders for primitive type

What changes are included in this PR?

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 24, 2025
@liamzwbao liamzwbao changed the title Use Vec directly in generic_bytes_builder Use Vec directly in generic_bytes_builder Jul 24, 2025
@liamzwbao liamzwbao changed the title Use Vec directly in generic_bytes_builder Use Vec directly in builders Jul 24, 2025
@liamzwbao liamzwbao marked this pull request as ready for review July 24, 2025 03:21
@liamzwbao
Copy link
Contributor Author

Hi @alamb and @Dandandan, this PR is ready for review. I have changed the implementation to use Vec in a few builders. Not sure if some of them are not appropriate to migrate to this new implementation.

Have tested the performance for arrow-array locally, half improved half regressed.


self.null_buffer_builder.append_n_non_nulls(len);
self.values_builder.append_trusted_len_iter(iter);
self.values_builder.extend(iter);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no 100% matched function for Vec that will check the upper bound of the iter. But this function already check the upper bound of size_hint on line 294, I think it's okay to replace it with extend directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the iter actually implements the stdlib internal TrustedLen trait then this should get nicely optimized. The only caller inside the codebase seems to be in arrow-select with a Range iterator, which implements that trait.

pub fn append_null(&mut self) {
self.values_builder
.append_slice(&vec![0u8; self.value_length as usize][..]);
.extend_from_slice(&vec![0u8; self.value_length as usize][..]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the temporary Vec now by using values_builder.resize(values_builder.len() + value_length, 0_u8) or values_builder.extend(std::iter::repeat_n(0_u8, len)

let offsets_builder = BufferBuilder::<T::Offset>::new_from_buffer(offsets_buffer);
let value_builder = BufferBuilder::<u8>::new_from_buffer(value_buffer);
let offsets_builder = Vec::from(offsets_buffer.typed_data::<T::Offset>());
let value_builder = Vec::from(value_buffer.as_slice());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems it does a copy?

Copy link
Contributor Author

@liamzwbao liamzwbao Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace it with ScalarBuffer instead

null_buffer: Option<MutableBuffer>,
) -> Self {
let values_builder = BufferBuilder::<T::Native>::new_from_buffer(values_buffer);
let values_builder = Vec::from(values_buffer.typed_data::<T::Native>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a copy?

@liamzwbao
Copy link
Contributor Author

liamzwbao commented Jul 27, 2025

Hi @jhorstmann @Dandandan, thank you very much for the review! I have addressed the comments.

However, I noticed that this implementation will fail some tests in concat.rs because the capacity of Vec does not round up to multiple of 64 like MutableBuffer does. I have updated the tests in this PR for reference. Would appreciate suggestions to address this issue.

@alamb
Copy link
Contributor

alamb commented Aug 1, 2025

Are there any benchmarks that would be good for me to run for this PR?

@liamzwbao
Copy link
Contributor Author

liamzwbao commented Aug 4, 2025

Are there any benchmarks that would be good for me to run for this PR?

I think arrow-array is a good candidate

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THank you @liamzwbao @Dandandan and @jhorstmann -- this PR looks good to me

I queued up some benchmarks to see if we can see any improvements

let data = a.to_data();
assert_eq!(data.buffers()[0].len(), 440);
assert_eq!(data.buffers()[0].capacity(), 448); // Nearest multiple of 64
assert_eq!(data.buffers()[0].capacity(), 440); // Nearest multiple of 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments no longer seem correct, but the tests seem to be an improvement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed outdated comments

@alamb
Copy link
Contributor

alamb commented Aug 6, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-7383 (e0e2da9) to 16794ab diff
BENCH_NAME=filter_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-7383
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Aug 6, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-7383 (e0e2da9) to 16794ab diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-7383
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Aug 6, 2025

🤖: Benchmark completed

Details

group                                                                     issue-7383                             main
-----                                                                     ----------                             ----
take bool 1024                                                            1.00   1384.7±3.74ns        ? ?/sec    1.01   1400.1±6.37ns        ? ?/sec
take bool 512                                                             1.00    721.0±0.61ns        ? ?/sec    1.00    719.6±1.01ns        ? ?/sec
take bool null indices 1024                                               1.00   1167.4±3.40ns        ? ?/sec    1.10   1285.8±8.48ns        ? ?/sec
take bool null values 1024                                                1.00      2.9±0.01µs        ? ?/sec    1.00      2.9±0.01µs        ? ?/sec
take bool null values null indices 1024                                   1.00      2.3±0.01µs        ? ?/sec    1.01      2.3±0.01µs        ? ?/sec
take check bounds i32 1024                                                1.00  1184.4±14.89ns        ? ?/sec    1.07   1270.7±2.73ns        ? ?/sec
take check bounds i32 512                                                 1.00    719.5±5.20ns        ? ?/sec    1.01    726.0±0.97ns        ? ?/sec
take i32 1024                                                             1.02    707.1±1.17ns        ? ?/sec    1.00    692.8±6.76ns        ? ?/sec
take i32 512                                                              1.01    439.2±1.13ns        ? ?/sec    1.00    436.0±0.84ns        ? ?/sec
take i32 null indices 1024                                                1.03    826.9±1.52ns        ? ?/sec    1.00    801.4±1.31ns        ? ?/sec
take i32 null values 1024                                                 1.03      2.1±0.00µs        ? ?/sec    1.00      2.1±0.00µs        ? ?/sec
take i32 null values null indices 1024                                    1.08      2.1±0.00µs        ? ?/sec    1.00   1931.8±3.39ns        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.05      8.4±0.02µs        ? ?/sec    1.00      8.0±0.03µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.02      9.8±0.12µs        ? ?/sec    1.00      9.6±0.09µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.01     20.4±0.10µs        ? ?/sec    1.00     20.2±0.13µs        ? ?/sec
take str 1024                                                             1.00     11.1±0.06µs        ? ?/sec    1.01     11.2±0.05µs        ? ?/sec
take str 512                                                              1.02      5.5±0.02µs        ? ?/sec    1.00      5.4±0.01µs        ? ?/sec
take str null indices 1024                                                1.12      7.8±0.03µs        ? ?/sec    1.00      7.0±0.03µs        ? ?/sec
take str null indices 512                                                 1.10      3.8±0.01µs        ? ?/sec    1.00      3.5±0.01µs        ? ?/sec
take str null values 1024                                                 1.00      8.8±0.07µs        ? ?/sec    1.02      9.0±0.05µs        ? ?/sec
take str null values null indices 1024                                    1.02      7.0±0.02µs        ? ?/sec    1.00      6.9±0.02µs        ? ?/sec
take stringview 1024                                                      1.00    842.8±1.52ns        ? ?/sec    1.01    850.1±1.56ns        ? ?/sec
take stringview 512                                                       1.01    477.2±0.92ns        ? ?/sec    1.00    472.7±1.03ns        ? ?/sec
take stringview null indices 1024                                         1.00  1400.5±22.78ns        ? ?/sec    1.01  1414.5±24.00ns        ? ?/sec
take stringview null indices 512                                          1.00    685.0±2.01ns        ? ?/sec    1.01    693.7±3.16ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.00µs        ? ?/sec    1.03      2.1±0.01µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.4±0.01µs        ? ?/sec    1.02      2.5±0.02µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Aug 6, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-7383 (e0e2da9) to 16794ab diff
BENCH_NAME=array_from_vec
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench array_from_vec
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-7383
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Aug 6, 2025

🤖: Benchmark completed

Details

group                              issue-7383                             main
-----                              ----------                             ----
array_from_vec 128                 1.00    152.0±0.34ns        ? ?/sec    1.01    153.8±0.30ns        ? ?/sec
array_from_vec 256                 1.00    161.0±1.43ns        ? ?/sec    1.01    162.6±0.26ns        ? ?/sec
array_from_vec 512                 1.00    212.4±0.41ns        ? ?/sec    1.00    212.6±0.23ns        ? ?/sec
array_string_from_vec 128          1.00   1115.8±1.52ns        ? ?/sec    1.09   1217.6±2.62ns        ? ?/sec
array_string_from_vec 256          1.00   1939.2±2.78ns        ? ?/sec    1.07      2.1±0.00µs        ? ?/sec
array_string_from_vec 512          1.00      3.3±0.00µs        ? ?/sec    1.02      3.4±0.01µs        ? ?/sec
decimal128_array_from_vec 32768    1.00     99.2±0.50µs        ? ?/sec    1.00     99.1±0.22µs        ? ?/sec
decimal256_array_from_vec 32768    1.00      3.8±0.02µs        ? ?/sec    1.00      3.9±0.02µs        ? ?/sec
struct_array_from_vec 1024         1.01      8.8±0.02µs        ? ?/sec    1.00      8.7±0.02µs        ? ?/sec
struct_array_from_vec 128          1.02   1941.8±5.96ns        ? ?/sec    1.00   1894.6±3.08ns        ? ?/sec
struct_array_from_vec 256          1.00      2.9±0.01µs        ? ?/sec    1.03      2.9±0.01µs        ? ?/sec
struct_array_from_vec 512          1.00      4.8±0.01µs        ? ?/sec    1.00      4.8±0.01µs        ? ?/sec

@alamb alamb merged commit 8d6fbfb into apache:main Aug 11, 2025
26 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 11, 2025

Thanks again @liamzwbao and @Dandandan and @jhorstmann

@liamzwbao liamzwbao deleted the issue-7383 branch August 12, 2025 03:39
timsaucer added a commit to rerun-io/lance that referenced this pull request Sep 19, 2025
timsaucer added a commit to rerun-io/lance that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using Vec directly in builders
4 participants