Upgrade DataFusion fork to 53#56
Conversation
a79aff5 to
5f59c7b
Compare
| // RecordBatch::try_new_with_options checks that if the schema is NOT NULL | ||
| // the array cannot contain nulls, amongst other checks. | ||
| let (_stream_schema, arrays, num_rows) = b.into_parts(); | ||
| // |
| Ok(None) | ||
| } | ||
|
|
||
| /// Returns `Ok(None)` when the file is not inside a valid partition path |
| { | ||
| // Use SortMergeJoin if hash join is not preferred | ||
| let join_on_len = join_on.len(); | ||
| // Derive sort options from the left input's existing ordering |
| ); | ||
| check_if_same_properties!(self, children); | ||
| Ok(Arc::new(InterleaveExec::try_new(children)?)) | ||
| // Optimizer rewrites can change child partitioning after InterleaveExec |
There was a problem hiding this comment.
c7ba34f Fix: InterleaveExec fallback to UnionExec when children partitioning diverges
zhuqi-lucas
left a comment
There was a problem hiding this comment.
Three branch-52 unit tests aren't in PR head:
test_done_drains_buffered_rows (sort-merge happy-path drain — the drain_in_progress_on_done field is preserved but only the error-path is tested),
test_no_extra_spm_from_output_requirement_single_partition (SPM idempotency — "don't re-add SPM when one already exists"), and
test_sort_pushdown_adds_spm_for_single_partition_requirement (the new
output_requirement_adds_merge_after_partition_preserving_sort covers a similar scenario but starts from UnionExec rather than SPM + Sort(preserve=true) + Repartition)
I am not sure if we need to add those tests to avoid regression?
There was a problem hiding this comment.
commit 05a6c45 (apache#21947)
"Skip unnecessary plan rebuild in adjust_input_keys_ordering for non-join plans"
Should be landed which is in branch-52.
…oin plans (apache#21947) Closes apache#21946 `adjust_input_keys_ordering` returns `Transformed::yes` unconditionally in the default else branch, even when `requirements.data` is empty and no changes were made. This triggers unnecessary `with_new_children` rebuilds on every node in the plan tree for non-join/non-aggregate queries. For plans with custom `ExecutionPlan` nodes whose `with_new_children` is expensive (e.g. nodes that re-evaluate cost functions on rebuild), this causes significant overhead. Add an early return with `Transformed::no` when `requirements.data.is_empty()` in the default else branch of `adjust_input_keys_ordering`. This skips the unnecessary plan tree rebuild for simple scan/filter/limit plans that have no join key reordering requirements. Yes, two unit tests added: - `adjust_input_keys_ordering_no_transform_for_scan` — verifies a bare parquet scan returns `Transformed::no` - `adjust_input_keys_ordering_no_transform_for_filter_scan` — verifies a filter→scan tree returns `Transformed::no` via `transform_down` No. This is a performance optimization that does not change query results or plan structure. (cherry picked from commit 05a6c45)
|
@zhuqi-lucas Nice, I added them back |
LGTM |
Summary