Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

anuragmantri
Copy link
Contributor

@anuragmantri anuragmantri commented Aug 21, 2025

The statsFileCopyPlan() incorrectly uses staging directory instead of source directory for stats files.

@github-actions github-actions bot added the spark label Aug 21, 2025
@anuragmantri
Copy link
Contributor Author

@dramaticlly @szehon-ho - Could you take a look please?

// Verify the source path points to the actual source location, not staging
assertThat(statsFilePathPair._1())
.startsWith(sourceTableLocation)
.as("Statistics file source should point to source table location");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just FYI that the .as() call always needs to come before the final assertion as otherwise it's ignored

Copy link
Contributor Author

@anuragmantri anuragmantri Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I updated the tests.

Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anuragmantri , LGTM with some test suggestions

Pair.of(
RewriteTablePathUtil.stagingPath(before.path(), sourcePrefix, stagingDir),
after.path()));
result.add(Pair.of(before.path(), after.path()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch I think we dont need to open and rewrite the content of the stats file, so it only need to copied from source to target, instead of from staging to target.

Comment on lines 1006 to 1011
assertThat(statsFilePathPair._1())
.as("Statistics file source should point to source table location")
.startsWith(sourceTableLocation);
assertThat(statsFilePathPair._1())
.as("Statistics file source should NOT point to staging directory")
.doesNotContain("staging");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: those can be combined

   assertThat(statsFilePathPair._1())
       .as("Statistics file source should point to source table location, not staging")
       .startsWith(sourceTableLocation)
       .doesNotContain("staging");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 995 to 1001
Tuple2<String, String> statsFilePathPair = null;
for (Tuple2<String, String> pathPair : filesToMove) {
if (pathPair._1().endsWith(".stats")) {
statsFilePathPair = pathPair;
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can also be replaced with stream

Tuple2<String, String> statsFilePathPair = filesToMove.stream()
       .filter(pathPair -> pathPair._1().endsWith(".stats"))
       .findFirst()
       .orElse(null);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dramaticlly
Copy link
Contributor

Thanks @anuragmantri , do you plan to backport the change to other Spark version once merged?

@anuragmantri
Copy link
Contributor Author

do you plan to backport the change to other Spark version once merged?

Yes, I will create a PR with spark 3.4 and 3.5 after this is merged.

@huaxingao huaxingao merged commit b82dac4 into apache:main Aug 22, 2025
27 checks passed
@huaxingao
Copy link
Contributor

Merged. Thanks @anuragmantri for the PR! Thanks @dramaticlly @nastra for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants