-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Spark 4.0: Fix source location in stats file copy plan in RewriteTablePathSparkAction #13881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark 4.0: Fix source location in stats file copy plan in RewriteTablePathSparkAction #13881
Conversation
@dramaticlly @szehon-ho - Could you take a look please? |
// Verify the source path points to the actual source location, not staging | ||
assertThat(statsFilePathPair._1()) | ||
.startsWith(sourceTableLocation) | ||
.as("Statistics file source should point to source table location"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just FYI that the .as()
call always needs to come before the final assertion as otherwise it's ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I updated the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anuragmantri , LGTM with some test suggestions
Pair.of( | ||
RewriteTablePathUtil.stagingPath(before.path(), sourcePrefix, stagingDir), | ||
after.path())); | ||
result.add(Pair.of(before.path(), after.path())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch I think we dont need to open and rewrite the content of the stats file, so it only need to copied from source to target, instead of from staging to target.
assertThat(statsFilePathPair._1()) | ||
.as("Statistics file source should point to source table location") | ||
.startsWith(sourceTableLocation); | ||
assertThat(statsFilePathPair._1()) | ||
.as("Statistics file source should NOT point to staging directory") | ||
.doesNotContain("staging"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: those can be combined
assertThat(statsFilePathPair._1())
.as("Statistics file source should point to source table location, not staging")
.startsWith(sourceTableLocation)
.doesNotContain("staging");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Tuple2<String, String> statsFilePathPair = null; | ||
for (Tuple2<String, String> pathPair : filesToMove) { | ||
if (pathPair._1().endsWith(".stats")) { | ||
statsFilePathPair = pathPair; | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can also be replaced with stream
Tuple2<String, String> statsFilePathPair = filesToMove.stream()
.filter(pathPair -> pathPair._1().endsWith(".stats"))
.findFirst()
.orElse(null);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks @anuragmantri , do you plan to backport the change to other Spark version once merged? |
Yes, I will create a PR with spark 3.4 and 3.5 after this is merged. |
Merged. Thanks @anuragmantri for the PR! Thanks @dramaticlly @nastra for the review! |
…copy plan in RewriteTablePathSparkAction
The
statsFileCopyPlan()
incorrectly uses staging directory instead of source directory for stats files.