-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Core, Spark: Preserve the relative path in RewriteTablePathUtil on staging #13645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @pvary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can update the pull request title to be something better, like "[Core, Spark] Preserve the relative path in RewriteTablePathUtil on staging"
core/src/main/java/org/apache/iceberg/RewriteTablePathUtil.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/RewriteTablePathUtilTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/RewriteTablePathUtilTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. I dont think this will impact existing user as all staging files paths are written in file_list_location
, distributed copy will rely on content of file listing when replicate to destination.
@szehon-ho can you help approve the CI workflows? |
@szehon-ho could you re-run the CI plz ? |
@ebyhr: Does the |
I think on this topic, would it be a good idea to slightly modify the documentation for this procedure? |
i dont know what is the fix for the CI locally it is fine
|
|
@Elbehery Looks like CI is failing on spotless, I would double check if there's any files changed as the result of running linter (spotlessApply command). In case if there's files changed on disk but not reflect in the patch yet, I would add them and push a new commit and this would make next CI run pass build checks. |
@dramaticlly i ran, no diff at all :'(
|
@dramaticlly any other suggestions ? |
@Elbehery You can reproduce CI failures by the below command:
|
yes this helps 👍🏽 |
i think i got it now, it passes locally would you kindly run the CI again ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay, approved workflow. Code changes look ok to me, thanks! some minor items
assertThat(addedDataFiles).hasSize(writeParallelism); | ||
// verify there is no overlap in min-max stats range | ||
if (writeParallelism > 1) { | ||
if (writeParallelism > 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test is perma failing, and this is the fix
please correct me if i am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, if there's unrelated but flaky test, maintainer can help rerun the test to pass the CI, or a separate PR to fix the problem. From what i can tell, this test is configured with writeParallelism to either 1 or 2, so change if condition to > 2 means skip the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test is permanently failing then this is new.
I have seen flaky tests, and @stevenzwu is planning to take care some of them. The current failures seem unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried testRangeDistributionStatisticsMigration
test in both TestFlinkIcebergSinkDistributionMode
and TestFlinkIcebergSinkV2DistributionMode
with Flink 2.0 in Intellij. In both cases, it is consistently failing.
Yes, there is a bit flakiness of this test as reported in issue #11835 . but this is definitely sth new that it fails consistently now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged. @Elbehery please rebase your PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Elbehery I think you want to remove this change flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2DistributionMode.java from your PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed 👍🏽
sorry i forgot to revert this :)
|
||
import org.junit.jupiter.api.Test; | ||
|
||
public class TestRewriteTablePathUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add some tests in TestRewriteTablePathsAction, to test this nested directory case end-to-end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Elbehery can you ensure this comment is addressed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated 👍🏽
f6ec05c
to
e872c2e
Compare
could you run the CI again cc @dramaticlly |
@szehon-ho hello Could u plz re-run the CI ? |
Signed-off-by: Mustafa Elbehery <[email protected]>
@szehon-ho hello Could u plz re-run the CI ? |
btw. the failure in Spark integration CI job is due to jdk version
|
🥳 🥳 🥳 🥳 🥳 🥳 |
@dramaticlly the CI is green any more reviews ? |
cc @nastra |
Merged, thanks @Elbehery and all for extra reviews ! |
fixes #13630