[experiment] Layout Reader#8518
Conversation
Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Checkpoint of in-progress V2 ScanNode work (segment scheduling driver, scheduled segment source, scan scheduler) so agent fixes can be integrated on a clean base. Reviewed/benchmarked state. Signed-off-by: Nicholas Gates <[email protected]>
The scan2 StructScanNode single-field fast paths (single get_item and single-referenced-field expressions) routed straight to the child scan node, bypassing the parent struct's validity mask. Projecting one field out of a nullable struct therefore returned the child's own values and validity with no parent null mask applied, producing wrong nulls (and a non-nullable result where a nullable one was expected). Mirror the v1 struct reader's `array.mask(validity)` behaviour: add a small MaskScanNode that reads an input value and the struct's non-nullable boolean validity child and produces `mask(input, validity)`. Wrap the single-field fast-path results in MaskScanNode when the struct is nullable. The full push_struct path already threads validity through StructValueScanNode, so it is unchanged. Add a V1-vs-V2 differential test harness in vortex-file that scans the same ScanRequest through both paths and asserts equality across flat (nullable + non-nullable), chunked, dict-encoded, zoned, and nested nullable-struct fixtures, plus ports of the v1 struct-null regression tests (test_struct_layout_nulls / test_struct_layout_nested) to the V2 path. Before the fix the five nested-nullable-struct cases failed with "expected i32?, actual i32"; after the fix all 18 cases pass. Signed-off-by: Nicholas Gates <[email protected]> Co-Authored-By: Claude Opus 4.8 <[email protected]>
…filter-first Port of the V1 multi-conjunct filter behavior to the V2 PartitionWorkScheduler driver: (1) sort filter conjuncts cheapest-first in PreparedScanNodeFile::try_new so expensive residuals (e.g. FSST LIKE) run after cheap selective ones; (2) when the demanded-row density falls below EXPR_EVAL_THRESHOLD (0.2), read the residual predicate with selection=need so the leaf returns the compacted array and the expression evaluates over only the demanded rows, scattering the verdict back via Mask::intersect_by_rank. Adds V1-vs-V2 differential cases (low- and high-density multi-conjunct) and a predicate_cost unit test. Improves ClickBench multi-conjunct filters (q22 701->547ms, q23 now < V1). A separate single-LIKE FSST amplification (q21) remains and is tracked separately. Signed-off-by: Nicholas Gates <[email protected]> Co-Authored-By: Claude Opus 4.8 <[email protected]>
V2 parallelizes the join probe, aggregate, and Arrow decode ACROSS DataFusion partitions (V1 instead fans one partition into many split tasks). When a query projected a heavily-encoded column (e.g. a single RunEnd chunk for lineitem.l_orderkey), the opener fed split_aligned_row_range coarse chunk boundaries, which collapsed every byte-range file_group onto one partition and serialized the probe ~2-wide (TPC-H q4 ran 2.6x slower than V1). Feed split_aligned_row_range the scan's own morsel ranges instead: the read-column chunk hints, or the 100k-row fallback when a read column is a single chunk (mirroring PreparedScanNodeFile::splits). Each morsel lands wholly in one partition, so the scan spreads across all of DataFusion's byte-range file_groups with no collapse and no chunk straddling a partition boundary. The assignment is contiguous per partition, so it is correct even when the scan output must preserve order. Also run the Vortex->Arrow conversion on the runtime CPU pool (handle.spawn_cpu + buffered/buffer_unordered) so decode fans out within a partition rather than running serially on the consumer poll thread. TPC-H SF1 (datafusion-bench, VORTEX_SCAN_IMPL=v2): q4 goes from 2.6x slower than V1 to faster than V1; overall ~parity. Signed-off-by: Nicholas Gates <[email protected]> Co-Authored-By: Claude Opus 4.8 <[email protected]>
…H_FULL_PLAN With --show-metrics and VORTEX_BENCH_FULL_PLAN=1, print the DataFusion EXPLAIN ANALYZE-style annotated plan (elapsed_compute / output_rows per operator) to stderr, to localize where wall time goes across scan, HashJoin build/probe, and aggregate. Signed-off-by: Nicholas Gates <[email protected]> Co-Authored-By: Claude Opus 4.8 <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Rename the runtime scan node API to ScanPlan and move the plan and segment primitives into vortex-scan. Layout v2 now expands directly through layout.new_scan_plan with a plan ScanRequest, and the docs describe the v2 path as the layout scan model. Signed-off-by: "Nicholas Gates" <[email protected]>
Signed-off-by: "Nicholas Gates" <[email protected]>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: FineWeb NVMe (base)Verdict: No clear signal (medium confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.925x ➖, 1↑ 2↓)
datafusion / parquet (0.984x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.287x ❌, 2↑ 6↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -46.3% overall, 1↑ 2↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.942x ➖, 6↑ 3↓)
datafusion / parquet (1.014x ➖, 0↑ 2↓)
datafusion / arrow (0.972x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.996x ➖, 3↑ 3↓)
duckdb / parquet (1.003x ➖, 2↑ 3↓)
File Size Changes (17 files changed, -44.4% overall, 4↑ 13↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.982x ➖, 9↑ 7↓)
datafusion / parquet (1.000x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.010x ➖, 9↑ 19↓)
duckdb / parquet (0.994x ➖, 0↑ 0↓)
File Size Changes (30 files changed, -43.4% overall, 3↑ 27↓)
Totals:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: FineWeb S3 (base)Verdict: Likely regression (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.025x ➖, 2↑ 5↓)
datafusion / parquet (0.761x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (1.325x ❌, 2↑ 6↓)
duckdb / parquet (0.817x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=10 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.029x ➖, 2↑ 8↓)
datafusion / parquet (1.035x ➖, 0↑ 0↓)
datafusion / arrow (1.031x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.149x ❌, 0↑ 17↓)
duckdb / parquet (1.010x ➖, 0↑ 1↓)
File Size Changes (47 files changed, -44.5% overall, 12↑ 35↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3 (base)Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.918x ➖, 3↑ 2↓)
datafusion / parquet (0.796x ➖, 5↑ 0↓)
duckdb / vortex-file-compressed (1.016x ➖, 1↑ 0↓)
duckdb / parquet (0.875x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population Genetics (base)Verdict: Likely regression (high confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (7.956x ❌, 0↑ 11↓)
duckdb / parquet (1.118x ❌, 0↑ 8↓)
File Size Changes (3 files changed, -32.3% overall, 0↑ 3↓)
Totals:
|
Signed-off-by: Nicholas Gates <[email protected]>
Signed-off-by: Nicholas Gates <[email protected]>
What feels like the 27th time I've explored this space, I think I might finally be getting somewhere.
This design pulls out essentially a scan engine. Layouts are actually just one way take serialized arrays and construct a ScanPlan, but in theory we could build a ScanPlan by hand or by any other means.
A ScanPlan node can accept push-down of various operations:
This plan can then be used to answer different types of questions:
[more description to come]