Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

stevenzwu
Copy link
Contributor

@stevenzwu stevenzwu commented Aug 21, 2025

In the field, we have seen some writer producing non-conform metadata in manifest Avro file.

partition-spec: {"spec-id":0,"fields":[]}

Probably the spec wording caused the mis-interpretation that it is the JSON serialization of the whole partition spec. But the spec and the Java reference implementation meant only the partition fields array of the partition spec, as the spec id was encoded as a separate metadata field in the Avro file.

.meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
.meta("partition-spec-id", String.valueOf(spec.specId()))

This PR is to clarify this metadata field for manifest Avro file.

@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Aug 21, 2025
format/spec.md Outdated
|------------|------------|---------------------|----------------------------------------------------------------------------------------------------|
| _required_ | _required_ | `schema` | JSON representation of the table schema at the time the manifest was written |
| _optional_ | _required_ | `schema-id` | ID of the schema used to write the manifest as a string |
| _required_ | _required_ | `partition-spec` | JSON representation of the partition fields array of the partition spec used to write the manifest |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doubt-1] how is the spec-id inferred when the partition-spec-id is empty (since its optional in v1)

[discuss] is it worth calling out that the ParitionSpec if first converted to unbounded partition spec whats put here is the unbounded version of partition field (where transform is just string) rather than Transform object ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifest reader in iceberg-core only use the Avro metadata as backup, as manifest file entry (from manifest list) contains the partition_spec_id.

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestReader.java#L126-L130

Good call-out for the unbounded transform part. maybe I can add a link to the spec
https://iceberg.apache.org/spec/#partition-specs

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also seen this in the past. I don't see any harm in making this more explicit 👍

Co-authored-by: Fokko Driesprong <[email protected]>
@RussellSpitzer
Copy link
Member

I've also seen this in the past. I don't see any harm in making this more explicit 👍

Could you share who was generating the files? We had a user report this but never figured out the origin?

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too, thanks @stevenzwu !

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxingao huaxingao merged commit 9f26691 into apache:main Aug 26, 2025
2 checks passed
@huaxingao
Copy link
Contributor

Merged. Thanks @stevenzwu for the PR! Thanks @singhpk234 @Fokko @RussellSpitzer for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Specification Issues that may introduce spec changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants