Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Elbehery
Copy link
Contributor

fixes #13630

@Elbehery
Copy link
Contributor Author

cc @pvary

Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can update the pull request title to be something better, like "[Core, Spark] Preserve the relative path in RewriteTablePathUtil on staging"

@Elbehery Elbehery changed the title Iceberg 13630 Iceberg-13630[Core]: Preserve the relative path in RewriteTablePathUtil on staging Jul 23, 2025
Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. I dont think this will impact existing user as all staging files paths are written in file_list_location, distributed copy will rely on content of file listing when replicate to destination.

@dramaticlly
Copy link
Contributor

@szehon-ho can you help approve the CI workflows?

@Elbehery
Copy link
Contributor Author

@szehon-ho could you re-run the CI plz ?

@pvary
Copy link
Contributor

pvary commented Jul 24, 2025

@ebyhr: Does the RewriteTablePathUtil handle files out of the table root? Is this solution working there as well?

@hpinca98
Copy link

I think on this topic, would it be a good idea to slightly modify the documentation for this procedure?
Currently it states "Stages a copy of the Iceberg table's metadata files", but the procedure also stages deletion files. I don t think there is anything else other than these. What do you think?

@Elbehery
Copy link
Contributor Author

i dont know what is the fix for the CI

locally it is fine

export JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home && ./gradlew spotlessCheck

BUILD SUCCESSFUL

@Elbehery
Copy link
Contributor Author

Elbehery commented Jul 24, 2025

./gradlew spotlessApply                                                                                                        iceberg-13630
Starting a Gradle Daemon, 2 incompatible Daemons could not be reused, use --status for details
Configuration on demand is an incubating feature.

[Incubating] Problems report is available at: file:///Users/melbeher/go/src/apache/iceberg/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.14.3/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 4s
71 actionable tasks: 71 up-to-date

@dramaticlly
Copy link
Contributor

./gradlew spotlessApply                                                                                                        iceberg-13630
Starting a Gradle Daemon, 2 incompatible Daemons could not be reused, use --status for details
Configuration on demand is an incubating feature.

[Incubating] Problems report is available at: file:///Users/melbeher/go/src/apache/iceberg/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.14.3/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 4s
71 actionable tasks: 71 up-to-date

@Elbehery
Usually I would run ./gradlew -DallVersions revapi spotlessApply checkStyleMain checkStyleTest before commit

Looks like CI is failing on spotless, I would double check if there's any files changed as the result of running linter (spotlessApply command). In case if there's files changed on disk but not reflect in the patch yet, I would add them and push a new commit and this would make next CI run pass build checks.

@Elbehery
Copy link
Contributor Author

@dramaticlly i ran, no diff at all :'(

./gradlew -DallVersions revapi spotlessApply checkStyleMain checkStyleTest                iceberg-13630
Starting a Gradle Daemon (subsequent builds will be faster)
Configuration on demand is an incubating feature.

> Task :iceberg-api:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-api:testJar
file '/Users/melbeher/go/src/apache/iceberg/build/iceberg-build.properties' will be copied to 'iceberg-build.properties', overwriting file '/Users/melbeher/go/src/apache/iceberg/api/build/resources/test/iceberg-build.properties', which has already been copied there.

> Task :iceberg-core:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-azure:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-aliyun:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-gcp:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-nessie:compileJava
Note: /Users/melbeher/go/src/apache/iceberg/nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-aws:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-parquet:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-core:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-data:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-aws:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-hive-metastore:compileTestJava
Note: /Users/melbeher/go/src/apache/iceberg/hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /Users/melbeher/go/src/apache/iceberg/hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-mr:compileJava
Note: /Users/melbeher/go/src/apache/iceberg/mr/src/main/java/org/apache/iceberg/mr/mapred/MapredIcebergInputFormat.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-data:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-parquet:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-kafka-connect:iceberg-kafka-connect:compileJava
Note: /Users/melbeher/go/src/apache/iceberg/kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/channel/CommitterImpl.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :iceberg-flink:iceberg-flink-2.0:compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-flink:iceberg-flink-2.0:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-spark:iceberg-spark-4.0_2.13:compileScala
[Warn] : javac: [options] system modules path not set in conjunction with -source 11

> Task :iceberg-spark:iceberg-spark-4.0_2.13:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :iceberg-spark:iceberg-spark-extensions-4.0_2.13:compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

[Incubating] Problems report is available at: file:///Users/melbeher/go/src/apache/iceberg/build/reports/problems/problems-report.html

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.14.3/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 1m 4s
213 actionable tasks: 122 executed, 5 from cache, 86 up-to-date
melbeher@melbeher-mac iceberg % git status                                                                                iceberg-13630
On branch iceberg-13630
Your branch and 'origin/iceberg-13630' have diverged,
and have 7 and 1 different commits each, respectively.
  (use "git pull" if you want to integrate the remote branch with yours)

nothing to commit, working tree clean

@Elbehery
Copy link
Contributor Author

@dramaticlly any other suggestions ?

@ebyhr
Copy link
Contributor

ebyhr commented Jul 25, 2025

@Elbehery You can reproduce CI failures by the below command:

./gradlew -DsparkVersions=3.5 :iceberg-spark:iceberg-spark-3.5_2.12:spotlessJavaCheck 

@Elbehery
Copy link
Contributor Author

@Elbehery You can reproduce CI failures by the below command:

./gradlew -DsparkVersions=3.5 :iceberg-spark:iceberg-spark-3.5_2.12:spotlessJavaCheck 

yes this helps 👍🏽

@Elbehery
Copy link
Contributor Author

i think i got it now, it passes locally

would you kindly run the CI again ?

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay, approved workflow. Code changes look ok to me, thanks! some minor items

assertThat(addedDataFiles).hasSize(writeParallelism);
// verify there is no overlap in min-max stats range
if (writeParallelism > 1) {
if (writeParallelism > 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test is perma failing, and this is the fix

please correct me if i am wrong

Copy link
Contributor

@dramaticlly dramaticlly Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, if there's unrelated but flaky test, maintainer can help rerun the test to pass the CI, or a separate PR to fix the problem. From what i can tell, this test is configured with writeParallelism to either 1 or 2, so change if condition to > 2 means skip the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is permanently failing then this is new.
I have seen flaky tests, and @stevenzwu is planning to take care some of them. The current failures seem unrelated.

Copy link
Contributor

@stevenzwu stevenzwu Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried testRangeDistributionStatisticsMigration test in both TestFlinkIcebergSinkDistributionMode and TestFlinkIcebergSinkV2DistributionMode with Flink 2.0 in Intellij. In both cases, it is consistently failing.

Yes, there is a bit flakiness of this test as reported in issue #11835 . but this is definitely sth new that it fails consistently now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged. @Elbehery please rebase your PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍🏽

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Elbehery I think you want to remove this change flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkV2DistributionMode.java from your PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed 👍🏽

sorry i forgot to revert this :)


import org.junit.jupiter.api.Test;

public class TestRewriteTablePathUtil {
Copy link
Member

@szehon-ho szehon-ho Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add some tests in TestRewriteTablePathsAction, to test this nested directory case end-to-end?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Elbehery can you ensure this comment is addressed as well.

Copy link
Contributor Author

@Elbehery Elbehery Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated 👍🏽

@nastra nastra changed the title Iceberg-13630[Core]: Preserve the relative path in RewriteTablePathUtil on staging Core, Spark: Preserve the relative path in RewriteTablePathUtil on staging Jul 29, 2025
@Elbehery Elbehery force-pushed the iceberg-13630 branch 3 times, most recently from f6ec05c to e872c2e Compare July 31, 2025 08:04
@Elbehery
Copy link
Contributor Author

could you run the CI again

cc @dramaticlly

@Elbehery
Copy link
Contributor Author

Elbehery commented Aug 2, 2025

@szehon-ho hello

Could u plz re-run the CI ?

@Elbehery
Copy link
Contributor Author

Elbehery commented Aug 3, 2025

@szehon-ho hello

Could u plz re-run the CI ?

@Elbehery
Copy link
Contributor Author

Elbehery commented Aug 3, 2025

btw. the failure in Spark integration CI job is due to jdk version

JavaVersion javaVersion = JavaVersion.current()
if (javaVersion != JavaVersion.VERSION_17 && javaVersion != JavaVersion.VERSION_21) {
  throw new GradleException("Spark 4.0 build requires JDK 17 or 21 but was executed with JDK " + javaVersion)
}

@Elbehery
Copy link
Contributor Author

Elbehery commented Aug 4, 2025

🥳 🥳 🥳 🥳 🥳 🥳

@Elbehery
Copy link
Contributor Author

Elbehery commented Aug 4, 2025

@dramaticlly the CI is green

any more reviews ?

@Elbehery
Copy link
Contributor Author

Elbehery commented Aug 5, 2025

cc @nastra

@nastra nastra requested a review from szehon-ho August 5, 2025 14:19
@szehon-ho szehon-ho merged commit 7ffc718 into apache:main Aug 5, 2025
42 checks passed
@szehon-ho
Copy link
Member

Merged, thanks @Elbehery and all for extra reviews !

@Elbehery Elbehery deleted the iceberg-13630 branch August 5, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RewriteTablePaths throws FileAlreadyExistsException
7 participants