Move execute-parent kernels into session registry#8482
Conversation
Merging this PR will improve performance by 22.35%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | take_10k_random |
197.7 µs | 253.7 µs | -22.08% |
| ❌ | Simulation | take_10k_contiguous |
217.6 µs | 274.4 µs | -20.68% |
| ❌ | Simulation | patched_take_10k_contiguous_patches |
232.1 µs | 288.8 µs | -19.64% |
| ❌ | Simulation | patched_take_10k_random |
244.5 µs | 301.2 µs | -18.83% |
| ❌ | Simulation | encode_varbin[(1000, 2)] |
239.6 µs | 294.6 µs | -18.67% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
162.7 µs | 191.1 µs | -14.86% |
| ❌ | Simulation | varbinview_large |
112.6 µs | 130.8 µs | -13.93% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.01)] |
890.1 µs | 1,020.4 µs | -12.77% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.1)] |
890.2 µs | 1,020.4 µs | -12.76% |
| ❌ | Simulation | bench_many_codes_few_values[1024] |
467.5 µs | 526.3 µs | -11.17% |
| ❌ | Simulation | and_true_constant |
14.9 µs | 16.8 µs | -11.1% |
| ❌ | Simulation | or_false_constant |
14.9 µs | 16.8 µs | -10.92% |
| ⚡ | Simulation | search_index_mixed_out_of_range |
255.9 µs | 78.9 µs | ×3.2 |
| ⚡ | Simulation | search_index_above_max |
255.7 µs | 78.9 µs | ×3.2 |
| ⚡ | Simulation | search_index_below_min |
255.7 µs | 78.9 µs | ×3.2 |
| ⚡ | Simulation | search_index_full_range_random |
255.9 µs | 79 µs | ×3.2 |
| ⚡ | Simulation | search_index_in_range |
256.1 µs | 79.2 µs | ×3.2 |
| ⚡ | Simulation | chunked_dict_primitive_into_canonical[u32, (1000, 10, 10)] |
185.4 µs | 87.5 µs | ×2.1 |
| ⚡ | Simulation | chunked_bool_canonical_into[(10, 1000)] |
751.4 µs | 492.8 µs | +52.47% |
| ⚡ | Simulation | chunked_opt_bool_canonical_into[(10, 1000)] |
866.6 µs | 572.3 µs | +51.42% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ngates/session-parent-kernels (b7a1dd9) with develop (aef6307)
61b7163 to
78d6bfd
Compare
Signed-off-by: "Nicholas Gates" <[email protected]> Signed-off-by: Nicholas Gates <[email protected]>
Initialize direct encoding benchmark sessions with their crate-level kernel registrations so they exercise the session execute-parent path instead of fallback materialization. Cache the ArrayKernels handle in ExecutionCtx to avoid repeated session lookups while trying execute-parent kernels. Signed-off-by: "Nicholas Gates" <[email protected]> Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
78d6bfd to
644e00d
Compare
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
…ent-kernels Signed-off-by: Nicholas Gates <[email protected]> # Conflicts: # encodings/bytebool/src/kernel.rs # vortex-array/src/arrays/bool/vtable/kernel.rs
Avoid the per-execution parent kernel cache now that ExecutionCtx already holds the session snapshot, and make patch index lookup use the primitive fast path directly. Signed-off-by: "Nicholas Gates" <[email protected]>
Store vortex-array's built-in kernel registry on ArraySession so sessions that install the built-in array encodings also get their matching execute-parent kernels. Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
|
@claude review it |
|
Claude finished @gatesn's task in 8m 1s —— View job PR Review: Move execute-parent kernels into session registryI traced the refactor end-to-end: the removed Findings1. Stale developer docs reference the removed
2. Lost unconditional debug invariant on execute_parent results (low, debug-only) The deleted vortex_ensure!(result.len() == parent.len(), "Executed parent canonical length mismatch");
vortex_ensure!(result.dtype() == parent.dtype(), "Executed parent canonical dtype mismatch");The new dispatch ( 3. Precedence change for execute-parent overrides (behavioral note, low impact) Previously the executor tried session-registered Verified (no action needed)
PerformanceThe snapshot-per-ctx approach plus reduced dispatch overhead is reflected in CodSpeed's net +24.5%. The 7 flagged regressions (bitpacking Checks not runThis is a read-only review; I did not run |
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Removes the ScalarFnVTable::execute_parent function in favor of purely session-registered kernels.
We already had some kernels on the session, and some on the scalar function. This unifies everything on the session.
I may well follow up by removing ScalarFnVTable::execute also in favor of these kernels.
A definite follow up is to make the structure of our plugin crates uniform. Sometimes kernels live in
compute,kernelorvtablemodules.Closes #8401