-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Spark, Avro: Add support for row lineage in Avro reader #13070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
70bc444
to
c85f6b5
Compare
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/ManifestFileBean.java
Show resolved
Hide resolved
...ons/src/test/java/org/apache/iceberg/spark/extensions/TestRowLevelOperationsWithLineage.java
Outdated
Show resolved
Hide resolved
c85f6b5
to
b860df4
Compare
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
0b9123c
to
75fe689
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. just a couple nit comments
75fe689
to
83fd548
Compare
addFileFieldReadersToPlan(readPlan, record.getFields(), fieldReaders, idToPos, idToConstant); | ||
addMissingFileReadersToPlan(readPlan, idToPos, expected, idToConstant, convert); | ||
return readPlan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't end up refactoring like we do in ParquetValueReaders
, it ended up being a bit hard to read considering avro positions compared to just splitting into two separate methods, one where we just add the file field readers to the read plan first and then we add the missing readers to the plan. That also eliminates cyclomatic complexity
83fd548
to
5ba1dae
Compare
Some related tests look to be failing after the update with refactoring, taking a look |
Still figuring out what the issue is, even after reverting back to the un-refactored code some of the spark 3.4 avro tests still fail with the following. Note, the same Spark 3.5 tests consistently pass.
|
5ba1dae
to
541fae0
Compare
thanks @amogh-jahagirdar for the contribution and @nastra for the review |
Hi @amogh-jahagirdar we have the same exception in our tests (in Trino) after we upgrade to iceberg containing this changes. And it looks like this: as far as I understand this change - you try to read row_id :
Either as constant or by calculating it as first_row_id + position. And in this case test fails, because in previous version - we just read the field with
instead of pure reading. Maybe you can suggest some way how to use Big thanks in advance. |
To be more specific, when we use doesn't contains this information and we always end up with
Instead of : which could actually read row_id from file. |
Sorry for disturbing you I found correct API, we can use : to adjust all needed parameters |
This change adds support for row lineage inheritance in the Avro reader.