Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[lake/hudi] Support predicate pushdown#3501

Open
fhan688 wants to merge 2 commits into
apache:mainfrom
fhan688:Introduce-Hudi-source-sortedReader-and-PredicateConverter
Open

[lake/hudi] Support predicate pushdown#3501
fhan688 wants to merge 2 commits into
apache:mainfrom
fhan688:Introduce-Hudi-source-sortedReader-and-PredicateConverter

Conversation

@fhan688

@fhan688 fhan688 commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Purpose

Linked issue: #3278

This PR adds predicate pushdown support for the Hudi lake source on top of the existing Hudi split planner and source reader work.

The sorted reader part is intentionally not included because the current Hudi reader path does not guarantee that emitted records are ordered by primary key. Declaring SortedRecordReader
without that guarantee could break batch union read correctness. Hudi primary-key batch union read can stay unsupported for now, consistent with the current Iceberg behavior.

Brief change log

  • Add FlussToHudiExpressionPredicateConverter to convert Fluss predicates to Hudi ExpressionPredicates.

    • Supports =, !=, <, <=, >, >=, IN, NOT IN, AND, and OR.
    • Keeps unsupported predicates as remaining filters.
    • Handles Hudi metadata-column offset when mapping Fluss field indexes to Hudi schema fields.
    • Uses the actual Avro field position when creating Flink FieldReferenceExpression.
    • Builds binary AND/OR trees to match Hudi's predicate evaluation behavior.
  • Wire predicate pushdown into HudiLakeSource.

    • Converts accepted filters in withFilters.
    • Stores converted Hudi predicates and passes them to HudiRecordReader.
    • Avoids loading Hudi schema for empty filter lists.
  • Add unit tests for:

    • Hudi predicate conversion.
    • Hudi source empty-filter behavior.

Tests

  • mvn -pl fluss-lake/fluss-lake-hudi -Dcheckstyle.skip=true spotless:apply
  • mvn -pl fluss-lake/fluss-lake-hudi -am clean test -DskipITs -Dcheckstyle.skip=true -DfailIfNoTests=false
  • git diff --check

API and Format

No public API or storage format change.

Documentation

No documentation change. This is an internal Hudi lake source implementation improvement.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the Hudi lake source read path in fluss-lake-hudi by adding predicate pushdown (converting Fluss predicates to Hudi ExpressionPredicates) and introducing a SortedRecordReader implementation for union read over primary-key (MOR) Hudi tables.

Changes:

  • Added FlussToHudiExpressionPredicateConverter and wired filter pushdown into HudiLakeSource.
  • Added HudiSortedRecordReader to expose primary-key ordering for union read on MOR tables.
  • Added/updated unit tests for predicate conversion, comparator behavior, and empty-filter behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/source/HudiLakeSource.java Stores converted Hudi predicates from withFilters, passes them to readers, and uses HudiSortedRecordReader for MOR tables.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/source/HudiSortedRecordReader.java New SortedRecordReader wrapper that exposes a primary-key comparator and delegates to HudiRecordReader.
fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/FlussToHudiExpressionPredicateConverter.java New converter from Fluss Predicate to Hudi ExpressionPredicates with metadata-field offset handling.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/source/HudiLakeSourceTest.java Updates filter-pushdown test for empty-filter behavior.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/source/HudiSortedRecordReaderTest.java Adds tests for primary-key comparator ordering and error cases.
fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/utils/FlussToHudiExpressionPredicateConverterTest.java Adds tests for supported/unsupported predicate conversion and compound predicate references.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import java.util.List;

/** Hudi record reader that exposes primary-key ordering for Fluss union read. */
public class HudiSortedRecordReader implements SortedRecordReader {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the underlying hudi record reader ensure the records emit is ordered? If not, we can just remove HudiSortedRecordReader like iceberg. The SortedRecordReader only needed for batch union read primary key table. We can leave it unsupported. That's fine, we also don't support for icebrg.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the underlying hudi record reader ensure the records emit is ordered? If not, we can just remove HudiSortedRecordReader like iceberg. The SortedRecordReader only needed for batch union read primary key table. We can leave it unsupported. That's fine, we also don't support for icebrg.

Thanks for pointing this out. You are right that the current Hudi reader path does not guarantee records are emitted in primary-key order.

SortedRecordReader requires read() to return records ordered by order(), and SortMergeReader relies on each lake iterator being sorted. Since HudiRecordReader currently delegates to
Hudi file-slice readers directly, we cannot safely claim that contract here.

I removed HudiSortedRecordReader and its tests in the latest update. This PR now only keeps Hudi predicate pushdown support. Hudi primary-key batch union read can stay unsupported for now,
consistent with the current Iceberg behavior.

@fhan688 fhan688 changed the title [lake/hudi] Support predicate pushdown and sorted source reader [lake/hudi] Support predicate pushdown Jun 20, 2026

@luoyuxia luoyuxia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhan688 Thanks for the pr. LGTM overall. Just one question


@Override
public Planner<HudiSplit> createPlanner(PlannerContext context) throws IOException {
return new HudiSplitPlanner(hudiConfig, tablePath, context.snapshotId());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to pass predicates to planner to reduce the plan splits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants