[lake/hudi] Support predicate pushdown#3501
Conversation
There was a problem hiding this comment.
Pull request overview
This PR completes the Hudi lake source read path in fluss-lake-hudi by adding predicate pushdown (converting Fluss predicates to Hudi ExpressionPredicates) and introducing a SortedRecordReader implementation for union read over primary-key (MOR) Hudi tables.
Changes:
- Added
FlussToHudiExpressionPredicateConverterand wired filter pushdown intoHudiLakeSource. - Added
HudiSortedRecordReaderto expose primary-key ordering for union read on MOR tables. - Added/updated unit tests for predicate conversion, comparator behavior, and empty-filter behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/source/HudiLakeSource.java | Stores converted Hudi predicates from withFilters, passes them to readers, and uses HudiSortedRecordReader for MOR tables. |
| fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/source/HudiSortedRecordReader.java | New SortedRecordReader wrapper that exposes a primary-key comparator and delegates to HudiRecordReader. |
| fluss-lake/fluss-lake-hudi/src/main/java/org/apache/fluss/lake/hudi/utils/FlussToHudiExpressionPredicateConverter.java | New converter from Fluss Predicate to Hudi ExpressionPredicates with metadata-field offset handling. |
| fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/source/HudiLakeSourceTest.java | Updates filter-pushdown test for empty-filter behavior. |
| fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/source/HudiSortedRecordReaderTest.java | Adds tests for primary-key comparator ordering and error cases. |
| fluss-lake/fluss-lake-hudi/src/test/java/org/apache/fluss/lake/hudi/utils/FlussToHudiExpressionPredicateConverterTest.java | Adds tests for supported/unsupported predicate conversion and compound predicate references. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import java.util.List; | ||
|
|
||
| /** Hudi record reader that exposes primary-key ordering for Fluss union read. */ | ||
| public class HudiSortedRecordReader implements SortedRecordReader { |
There was a problem hiding this comment.
Does the underlying hudi record reader ensure the records emit is ordered? If not, we can just remove HudiSortedRecordReader like iceberg. The SortedRecordReader only needed for batch union read primary key table. We can leave it unsupported. That's fine, we also don't support for icebrg.
There was a problem hiding this comment.
Does the underlying hudi record reader ensure the records emit is ordered? If not, we can just remove HudiSortedRecordReader like iceberg. The SortedRecordReader only needed for batch union read primary key table. We can leave it unsupported. That's fine, we also don't support for icebrg.
Thanks for pointing this out. You are right that the current Hudi reader path does not guarantee records are emitted in primary-key order.
SortedRecordReader requires read() to return records ordered by order(), and SortMergeReader relies on each lake iterator being sorted. Since HudiRecordReader currently delegates to
Hudi file-slice readers directly, we cannot safely claim that contract here.
I removed HudiSortedRecordReader and its tests in the latest update. This PR now only keeps Hudi predicate pushdown support. Hudi primary-key batch union read can stay unsupported for now,
consistent with the current Iceberg behavior.
|
|
||
| @Override | ||
| public Planner<HudiSplit> createPlanner(PlannerContext context) throws IOException { | ||
| return new HudiSplitPlanner(hudiConfig, tablePath, context.snapshotId()); |
There was a problem hiding this comment.
do we need to pass predicates to planner to reduce the plan splits?
Purpose
Linked issue: #3278
This PR adds predicate pushdown support for the Hudi lake source on top of the existing Hudi split planner and source reader work.
The sorted reader part is intentionally not included because the current Hudi reader path does not guarantee that emitted records are ordered by primary key. Declaring
SortedRecordReaderwithout that guarantee could break batch union read correctness. Hudi primary-key batch union read can stay unsupported for now, consistent with the current Iceberg behavior.
Brief change log
Add
FlussToHudiExpressionPredicateConverterto convert Fluss predicates to HudiExpressionPredicates.=,!=,<,<=,>,>=,IN,NOT IN,AND, andOR.FieldReferenceExpression.AND/ORtrees to match Hudi's predicate evaluation behavior.Wire predicate pushdown into
HudiLakeSource.withFilters.HudiRecordReader.Add unit tests for:
Tests
mvn -pl fluss-lake/fluss-lake-hudi -Dcheckstyle.skip=true spotless:applymvn -pl fluss-lake/fluss-lake-hudi -am clean test -DskipITs -Dcheckstyle.skip=true -DfailIfNoTests=falsegit diff --checkAPI and Format
No public API or storage format change.
Documentation
No documentation change. This is an internal Hudi lake source implementation improvement.