aggregate fns to have grouped aggregate kernels for sum and count#8314
Conversation
4337366 to
4fd94bc
Compare
Signed-off-by: Onur Satici <[email protected]>
joseph-isaacs
left a comment
There was a problem hiding this comment.
This seems like a good idea
4fd94bc to
7b85f05
Compare
Merging this PR will improve performance by ×4.3
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | count_i32_clustered_nulls |
641.9 µs | 55.2 µs | ×12 |
| ⚡ | Simulation | sum_i32_clustered_nulls |
545.4 µs | 71.8 µs | ×7.6 |
| ⚡ | Simulation | sum_i32_nullable_all_valid |
335.5 µs | 73.4 µs | ×4.6 |
| ⚡ | Simulation | count_varbinview |
252.5 µs | 75.8 µs | ×3.3 |
| ⚡ | Simulation | encode_varbin[(1000, 2)] |
175.4 µs | 158 µs | +11.06% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing os/grouped-agg (5d009d0) with develop (1082a5d)
Signed-off-by: Onur Satici <[email protected]>
Signed-off-by: Onur Satici <[email protected]>
Signed-off-by: Onur Satici <[email protected]>
| /// Explicit ranges extracted from a list-view array. | ||
| ListView { | ||
| /// The `(offset, size)` ranges. | ||
| ranges: Vec<(usize, usize)>, |
There was a problem hiding this comment.
I did try making this lazy but it didn't help much, it is faster but not clear if it is faster across the board
Signed-off-by: Onur Satici <[email protected]>
Summary
aggregate functions to be able to do grouped aggregations before the fallback that slices group by group.
Implements count for all arrays and sum for primitives.
API Changes
Grouped aggregate kernels now receive a single
GroupedArrayenum, coveringListViewArrayandFixedSizeListArray, instead of exposing separate methods for each list representation