do not merge: onpair dfa#8361
Conversation
Evaluate `prefix%` and `%needle%` LIKE patterns directly on OnPair compressed code streams, mirroring the FSST DFA pushdown. Each u16 code is lifted to a byte-level DFA transition (KMP for contains, linear for prefix) by feeding its dictionary token's bytes through the byte table; scanning a row's codes is then one table lookup per code and is exactly equivalent to byte-level matching over the decompressed row. OnPair has no escape code (the trainer always emits all 256 single-byte tokens), so the DFA is strictly simpler than FSST's: no escape sentinel and no escape table. Unsupported pattern shapes (`_`, suffix, ILIKE, needles beyond the u8 state space) return None and fall back to decompression. Wires `LikeExecuteAdaptor(OnPair)` into the parent kernel set. Adds unit tests plus a randomised cross-check against ground-truth starts_with / contains over 600 rows and 14 needles. Signed-off-by: Joe Isaacs <[email protected]>
Add a divan microbenchmark comparing the compressed-domain LIKE pushdown against the decompress-and-match fallback on a 200k-row OnPair-encoded URL column. On this corpus the pushdown is ~1.9-2.2x faster for prefix and ~2.4-3.3x for contains. Two benchmark-enablement knobs: - `VORTEX_ONPAIR_LIKE_PUSHDOWN=0` forces the OnPair LikeKernel to decline (fall back to decompression), so the same binary can A/B the pushdown end-to-end without a rebuild. Read once. - `CLICKBENCH_PARTITIONS=N` caps how many ClickBench shards are fetched and queried, for local/iterative runs (the full suite still defaults to 100). Signed-off-by: Joe Isaacs <[email protected]>
Select the DFA variant once in `OnPairMatcher::scan_to_bitbuf` instead of re-matching the matcher enum per row through a closure, mark the concrete `FlatContainsDfa`/`FlatPrefixDfa::matches` `#[inline]`, and walk row offsets with a running cursor. This lets the row scan monomorphise and inline the DFA step. Controlled microbench (same machine, back-to-back): contains pushdown ~1.16-1.26x faster (e.g. %bonprix% 1.84ms -> 1.46ms), prefix marginally faster. Also add an instrumented characterization test proving where the pushdown actually fires through the execution engine: bare OnPair and Dict(OnPair) both route the predicate to the kernel, but Dict(Shared(OnPair)) -- the shape a dict-encoded column takes when read back from a multi-chunk file -- does not, because `Shared` has no parent-reduce forwarding and canonicalizes (decompresses) instead. This is why the compressed-domain LIKE pushdown does not move end-to-end ClickBench/TPC-H numbers, and it affects FSST identically. Signed-off-by: Joe Isaacs <[email protected]>
A dict-encoded string column reads back as `Dict(codes, Shared(values))`. `Shared` (which dedups the decoded dictionary across row splits) has no parent-reduce forwarding, so a predicate pushed to the values -- `like(Shared(onpair))` -- canonicalizes (decompresses) the source instead of reaching the OnPair/FSST LIKE kernel. Because the filter path's `values_array_uncanonical` reused the projection's `Shared`-wrapped cache, any query that both projects and filters the same column (e.g. ClickBench Q22's `MIN(URL)` + `WHERE URL LIKE`) silently lost the pushdown. Give the predicate path its own bare (non-`Shared`) values cache, built on the same underlying read as the `Shared` projection cache (values are read once). Projection keeps `Shared` for cross-split decode reuse; predicates get bare values so the optimizer can push them into the values encoding. Verified end-to-end on a ClickBench shard (OnPair-encoded `URL`): - Q22-shape (filter + project URL): kernel firings 0 -> 44, query faster. - count(*) filter: still 44 firings, result unchanged. - Q34 (GROUP BY URL, pure decode): unchanged (no decode-cache regression). Also retarget the OnPair characterization test's comment at this layout fix (the array-level `Shared`-blocks-pushdown behavior it pins is what motivates applying predicates to bare values). Signed-off-by: Joe Isaacs <[email protected]>
The per-call DFA table was the dominant cost of the LIKE pushdown on dict-encoded columns (~17% of ClickBench Q21 in a samply profile): it built an `n_states x n_codes` transition for every one of the (up to 4096) dictionary tokens, even though the needle/prefix can only interact with the tokens that contain one of its bytes. A token whose bytes are all absent from the pattern drives the byte table to the same reset state from every *live* state (a non-needle byte falls back to 0 via KMP from any non-accept state; a non-prefix byte fails), and the accept/fail rows are never read because the scan returns the instant it reaches them. So such a token's whole column is just the skip value. Pre-fill the table with the skip value and only compute columns for codes containing a pattern byte; for those, read the token once while advancing all `n_states` start states in lockstep (a per-byte gather). Build-heavy microbench (build + 4k-row scan): ~1.3-1.6x faster, more for rare-byte needles (most tokens skipped), less for common-byte needles like `%google%` on URLs. Randomized ground-truth fuzz test still passes. Signed-off-by: Joe Isaacs <[email protected]>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 244.4 ns | -11.93% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
352.4 µs | 299.7 µs | +17.58% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/relaxed-goodall-e3s5pr (5257888) with develop (0dd6db7)
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
BENCHMARK FAILEDBenchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
|
This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days |
Summary
Closes: #000
Testing