Codestin Search App

xudong963 · 2026-05-17T17:00:16Z

Summary

Upgrade the Massive DataFusion fork to the DataFusion 53 line.
Carry forward the branch-52 fork-specific fixes in the 53-compatible source tree.
Fix the previous DF53 upgrade CI failures in force_hash_collisions, clippy, and order.slt plan expectations.

xudong963 · 2026-05-18T03:13:23Z

-                        // RecordBatch::try_new_with_options checks that if the schema is NOT NULL
-                        // the array cannot contain nulls, amongst other checks.
-                        let (_stream_schema, arrays, num_rows) = b.into_parts();
+                        //


0a0302b Restore DF 51 SchemaAdapter cast behaviour in ParquetOpener (#45)

xudong963 · 2026-05-18T03:15:51Z

    Ok(None)
 }

+/// Returns `Ok(None)` when the file is not inside a valid partition path


b4dbb6a Skip files outside partition structure in hive-partitioned listing tables (#51)

xudong963 · 2026-05-18T03:16:35Z

                {
-                    // Use SortMergeJoin if hash join is not preferred
-                    let join_on_len = join_on.len();
+                    // Derive sort options from the left input's existing ordering


8ca2242 feat: derive SMJ sort options from left child during plan creation (#43)

xudong963 · 2026-05-18T03:19:15Z

-        );
-        check_if_same_properties!(self, children);
-        Ok(Arc::new(InterleaveExec::try_new(children)?))
+        // Optimizer rewrites can change child partitioning after InterleaveExec


c7ba34f Fix: InterleaveExec fallback to UnionExec when children partitioning diverges

xudong963 · 2026-05-18T03:27:40Z

74772cb Fix memory reservation starvation in sort-merge
8dcb444 Cherry-pick: Fix sort merge interleave overflow
795aa28 Cherry pick sort merge fixes 52

zhuqi-lucas

Three branch-52 unit tests aren't in PR head:

test_done_drains_buffered_rows (sort-merge happy-path drain — the drain_in_progress_on_done field is preserved but only the error-path is tested),

test_no_extra_spm_from_output_requirement_single_partition (SPM idempotency — "don't re-add SPM when one already exists"), and

test_sort_pushdown_adds_spm_for_single_partition_requirement (the new
output_requirement_adds_merge_after_partition_preserving_sort covers a similar scenario but starts from UnionExec rather than SPM + Sort(preserve=true) + Repartition)

I am not sure if we need to add those tests to avoid regression?

zhuqi-lucas

commit 05a6c45 (apache#21947)

"Skip unnecessary plan rebuild in adjust_input_keys_ordering for non-join plans"

Should be landed which is in branch-52.

05a6c45

…oin plans (apache#21947) Closes apache#21946 `adjust_input_keys_ordering` returns `Transformed::yes` unconditionally in the default else branch, even when `requirements.data` is empty and no changes were made. This triggers unnecessary `with_new_children` rebuilds on every node in the plan tree for non-join/non-aggregate queries. For plans with custom `ExecutionPlan` nodes whose `with_new_children` is expensive (e.g. nodes that re-evaluate cost functions on rebuild), this causes significant overhead. Add an early return with `Transformed::no` when `requirements.data.is_empty()` in the default else branch of `adjust_input_keys_ordering`. This skips the unnecessary plan tree rebuild for simple scan/filter/limit plans that have no join key reordering requirements. Yes, two unit tests added: - `adjust_input_keys_ordering_no_transform_for_scan` — verifies a bare parquet scan returns `Transformed::no` - `adjust_input_keys_ordering_no_transform_for_filter_scan` — verifies a filter→scan tree returns `Transformed::no` via `transform_down` No. This is a performance optimization that does not change query results or plan structure. (cherry picked from commit 05a6c45)

xudong963 · 2026-05-20T06:15:46Z

@zhuqi-lucas Nice, I added them back

zhuqi-lucas · 2026-05-20T06:26:00Z

@zhuqi-lucas Nice, I added them back

LGTM

github-actions Bot added documentation Improvements or additions to documentation physical-expr development-process sql logical-expr optimizer core substrait sqllogictest ffi physical-plan datasource functions proto common execution spark catalog labels May 17, 2026

xudong963 force-pushed the branch-53 branch from 05a6c45 to eae7bf4 Compare May 17, 2026 17:09

Upgrade Massive fork fixes to DataFusion 53

5f59c7b

xudong963 force-pushed the massive-upgrade-53-atlas branch from a79aff5 to 5f59c7b Compare May 17, 2026 17:10

github-actions Bot removed documentation Improvements or additions to documentation sql substrait ffi proto execution spark labels May 17, 2026

Skip custom hash assertion for forced collisions

f069fa6

xudong963 changed the title ~~Upgrade Massive DataFusion fork to 53~~ Upgrade DataFusion fork to 53 May 18, 2026

xudong963 commented May 18, 2026

View reviewed changes

zhuqi-lucas reviewed May 20, 2026

View reviewed changes

zhuqi-lucas and others added 3 commits May 20, 2026 13:58

Add missing branch-52 regression tests

2914bbc

Restore branch-52 testing submodule pointer

9729d6d

Update astral-tokio-tar for security audit

21ed8ff

zhuqi-lucas approved these changes May 20, 2026

View reviewed changes

xudong963 merged commit d66824f into branch-53 May 20, 2026
63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade DataFusion fork to 53#56

Upgrade DataFusion fork to 53#56
xudong963 merged 6 commits into
branch-53from
massive-upgrade-53-atlas

xudong963 commented May 17, 2026 •

edited

Loading

Uh oh!

xudong963 May 18, 2026

Uh oh!

xudong963 May 18, 2026

Uh oh!

xudong963 May 18, 2026

Uh oh!

xudong963 May 18, 2026

Uh oh!

xudong963 May 18, 2026

Uh oh!

zhuqi-lucas left a comment

Uh oh!

zhuqi-lucas left a comment •

edited

Loading

Uh oh!

xudong963 commented May 20, 2026

Uh oh!

zhuqi-lucas commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xudong963 commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

xudong963 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

xudong963 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas left a comment

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xudong963 commented May 20, 2026

Uh oh!

zhuqi-lucas commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xudong963 commented May 17, 2026 •

edited

Loading

zhuqi-lucas left a comment •

edited

Loading