Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Enforce delta.checkpoint.writeStatsAsJson and delta.checkpoint.writeStatsAsStruct option in Delta Lake #13331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 6, 2022

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jul 25, 2022

Description

Enforce delta.checkpoint.writeStatsAsJson and delta.checkpoint.writeStatsAsStruct option in Delta Lake
Fixes #12031

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Delta Lake
* Enforce `delta.checkpoint.writeStatsAsJson` and `delta.checkpoint.writeStatsAsStruct` table properties. ({issue}`12031`)

@cla-bot cla-bot bot added the cla-signed label Jul 25, 2022
@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch 3 times, most recently from d331f3e to 3eb81a3 Compare July 26, 2022 01:21
@ebyhr ebyhr self-assigned this Jul 26, 2022
@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from 3eb81a3 to 28bc1d0 Compare July 26, 2022 03:24
@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from 28bc1d0 to fa3662c Compare July 26, 2022 12:32
@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch 3 times, most recently from 69e7bf7 to ed7f6fe Compare July 27, 2022 01:48
@ebyhr
Copy link
Member Author

ebyhr commented Jul 27, 2022

Fixing CI failures.

@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from ed7f6fe to 79af9a1 Compare July 27, 2022 05:18
@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from 682e9f2 to fccc7ae Compare August 2, 2022 08:40
@ebyhr
Copy link
Member Author

ebyhr commented Aug 2, 2022

Still work in progress, but let me push to check CI results.

@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch 6 times, most recently from ed10956 to 307e863 Compare August 9, 2022 12:31
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found commits that should not be merged: 3 commit(s) that need to be squashed.

@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from 9d0e9f4 to 9e91970 Compare August 22, 2022 11:38
@alexjo2144
Copy link
Member

Generally looks good to me, but take a look at the test failures and merge conflicts.

@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch 2 times, most recently from e62fa9c to 82e50e6 Compare August 23, 2022 23:10
@findepi
Copy link
Member

findepi commented Aug 24, 2022

There are CI failures in Delta tests.

@ebyhr
Copy link
Member Author

ebyhr commented Aug 25, 2022

Let me request review after fixing the failures.

@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from db03308 to f7b48a9 Compare August 26, 2022 08:06
return (long) floatToRawIntBits((float) (double) jsonValue);
}
if (type == DOUBLE) {
return jsonValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(double) -- verify it's a double.

}
if (type instanceof DecimalType) {
BigDecimal decimal;
checkArgument(jsonValue instanceof String || jsonValue instanceof Double, "Value must be instance of String or Double");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to accept both forms?

return Decimals.encodeScaledValue(decimal, ((DecimalType) type).getScale());
}
if (type instanceof VarcharType) {
return jsonValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonValue is likely String and we need a Slice

}

if (isShortDecimal(type)) {
return Decimals.encodeShortScaledValue(decimal, ((DecimalType) type).getScale());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use enhanced instanceof instead of a cast here

BlockBuilder singleRowBlockWriter = blockBuilder.beginBlockEntry();
for (int i = 0; i < values.size(); ++i) {
Type fieldType = fieldTypes.get(i);
Object fieldValue = jsonValueToTrinoValue(fieldType, values.get(rowType.getFields().get(i).getName().orElseThrow()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a message in orElseThrow

BlockBuilder singleRowBlockWriter = blockBuilder.beginBlockEntry();
for (int i = 0; i < values.size(); ++i) {
Type fieldType = fieldTypes.get(i);
Object fieldValue = jsonValueToTrinoValue(fieldType, values.get(rowType.getFields().get(i).getName().orElseThrow()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate that values contains no other entries than the ones we expected?

@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from a1dfa67 to 8dfcbdd Compare August 30, 2022 01:08
@ebyhr ebyhr force-pushed the ebi/delta-checkpoint-writer branch from 8dfcbdd to 4f6ee1d Compare September 1, 2022 08:34
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have follow up issues for row type stats, and maybe one for being able to set these properties from Trino?

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just skimming

Map<String, Object> jsonValues = new HashMap<>();
for (Map.Entry<String, Object> value : values.entrySet()) {
Type type = columnTypeMapping.get(value.getKey());
// TODO: Add support for row type
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr
Copy link
Member Author

ebyhr commented Sep 6, 2022

Do we have follow up issues for row type stats, and maybe one for being able to set these properties from Trino?

Filed #13996 and #13997

@ebyhr ebyhr merged commit 4852d25 into master Sep 6, 2022
@ebyhr ebyhr deleted the ebi/delta-checkpoint-writer branch September 6, 2022 01:36
@ebyhr ebyhr mentioned this pull request Sep 6, 2022
@github-actions github-actions bot added this to the 395 milestone Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

DeltaLake determine stats format in checkpoint based on the table configuration
5 participants