Codestin Search App

fhan688 · 2026-06-20T05:15:30Z

Purpose

Linked issue: #3278

This PR adds predicate pushdown support for the Hudi lake source on top of the existing Hudi split planner and source reader work.

The sorted reader part is intentionally not included because the current Hudi reader path does not guarantee that emitted records are ordered by primary key. Declaring SortedRecordReader
without that guarantee could break batch union read correctness. Hudi primary-key batch union read can stay unsupported for now, consistent with the current Iceberg behavior.

Brief change log

Add FlussToHudiExpressionPredicateConverter to convert Fluss predicates to Hudi ExpressionPredicates.
- Supports =, !=, <, <=, >, >=, IN, NOT IN, AND, and OR.
- Keeps unsupported predicates as remaining filters.
- Handles Hudi metadata-column offset when mapping Fluss field indexes to Hudi schema fields.
- Uses the actual Avro field position when creating Flink FieldReferenceExpression.
- Builds binary AND/OR trees to match Hudi's predicate evaluation behavior.
Wire predicate pushdown into HudiLakeSource.
- Converts accepted filters in withFilters.
- Stores converted Hudi predicates and passes them to HudiRecordReader.
- Avoids loading Hudi schema for empty filter lists.
Add unit tests for:
- Hudi predicate conversion.
- Hudi source empty-filter behavior.

Tests

mvn -pl fluss-lake/fluss-lake-hudi -Dcheckstyle.skip=true spotless:apply
mvn -pl fluss-lake/fluss-lake-hudi -am clean test -DskipITs -Dcheckstyle.skip=true -DfailIfNoTests=false
git diff --check

API and Format

No public API or storage format change.

Documentation

No documentation change. This is an internal Hudi lake source implementation improvement.

Copilot

Pull request overview

This PR completes the Hudi lake source read path in fluss-lake-hudi by adding predicate pushdown (converting Fluss predicates to Hudi ExpressionPredicates) and introducing a SortedRecordReader implementation for union read over primary-key (MOR) Hudi tables.

Changes:

Added FlussToHudiExpressionPredicateConverter and wired filter pushdown into HudiLakeSource.
Added HudiSortedRecordReader to expose primary-key ordering for union read on MOR tables.
Added/updated unit tests for predicate conversion, comparator behavior, and empty-filter behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/source/HudiLakeSource.java	Stores converted Hudi predicates from `withFilters`, passes them to readers, and uses `HudiSortedRecordReader` for MOR tables.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/source/HudiSortedRecordReader.java	New `SortedRecordReader` wrapper that exposes a primary-key comparator and delegates to `HudiRecordReader`.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/FlussToHudiExpressionPredicateConverter.java	New converter from Fluss `Predicate` to Hudi `ExpressionPredicates` with metadata-field offset handling.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/source/HudiLakeSourceTest.java	Updates filter-pushdown test for empty-filter behavior.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/source/HudiSortedRecordReaderTest.java	Adds tests for primary-key comparator ordering and error cases.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/utils/FlussToHudiExpressionPredicateConverterTest.java	Adds tests for supported/unsupported predicate conversion and compound predicate references.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

luoyuxia · 2026-06-20T05:52:03Z

+import java.util.List;
+
+/** Hudi record reader that exposes primary-key ordering for Fluss union read. */
+public class HudiSortedRecordReader implements SortedRecordReader {


Does the underlying hudi record reader ensure the records emit is ordered? If not, we can just remove HudiSortedRecordReader like iceberg. The SortedRecordReader only needed for batch union read primary key table. We can leave it unsupported. That's fine, we also don't support for icebrg.

Does the underlying hudi record reader ensure the records emit is ordered? If not, we can just remove HudiSortedRecordReader like iceberg. The SortedRecordReader only needed for batch union read primary key table. We can leave it unsupported. That's fine, we also don't support for icebrg.

Thanks for pointing this out. You are right that the current Hudi reader path does not guarantee records are emitted in primary-key order.

SortedRecordReader requires read() to return records ordered by order(), and SortMergeReader relies on each lake iterator being sorted. Since HudiRecordReader currently delegates to
Hudi file-slice readers directly, we cannot safely claim that contract here.

I removed HudiSortedRecordReader and its tests in the latest update. This PR now only keeps Hudi predicate pushdown support. Hudi primary-key batch union read can stay unsupported for now,
consistent with the current Iceberg behavior.

luoyuxia

@fhan688 Thanks for the pr. LGTM overall. Just one question

luoyuxia · 2026-06-20T09:43:16Z


    @Override
    public Planner<HudiSplit> createPlanner(PlannerContext context) throws IOException {
        return new HudiSplitPlanner(hudiConfig, tablePath, context.snapshotId());


do we need to pass predicates to planner to reduce the plan splits?

[lake/hudi] Support predicate pushdown and sorted source reader

797c74b

luoyuxia requested a review from Copilot June 20, 2026 05:43

Copilot started reviewing on behalf of luoyuxia June 20, 2026 05:43 View session

Copilot AI reviewed Jun 20, 2026

View reviewed changes

luoyuxia reviewed Jun 20, 2026

View reviewed changes

fhan688 changed the title ~~[lake/hudi] Support predicate pushdown and sorted source reader~~ [lake/hudi] Support predicate pushdown Jun 20, 2026

[lake/hudi] rm sorted source reader

966541b

luoyuxia reviewed Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lake/hudi] Support predicate pushdown#3501

[lake/hudi] Support predicate pushdown#3501
fhan688 wants to merge 2 commits into
apache:mainfrom
fhan688:Introduce-Hudi-source-sortedReader-and-PredicateConverter

fhan688 commented Jun 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luoyuxia Jun 20, 2026

Uh oh!

fhan688 Jun 20, 2026

Uh oh!

luoyuxia left a comment

Uh oh!

luoyuxia Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fhan688 commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

luoyuxia Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

fhan688 Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

luoyuxia Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fhan688 commented Jun 20, 2026 •

edited

Loading