Build: Bump Parquet-Java to 1.16.0 #13941

Fokko · 2025-08-28T16:52:14Z

Parquet 1.16.0 is out: https://lists.apache.org/thread/hs3tkc3k9vtcogq08yk7zl2p855voyvc

…-1-16

Fokko · 2025-09-02T07:35:56Z

Looked into teh failed test, and it looks like it splits a single partition into two files:

Turns out there is a small difference in the size due to:

➜  parquet-java git:(d5f86d7c) ✗ git bisect bad                                                                              
d5f86d7c0e9894510e8af6dfd37444843e6d1bc4 is the first bad commit
commit d5f86d7c0e9894510e8af6dfd37444843e6d1bc4
Author: Gang Wu <[[email protected]](mailto:[email protected])>
Date:   Tue Jan 21 16:18:19 2025 +0800

    GH-3133: Fix SizeStatistics to handle omitted histogram (#3134)

 .../apache/parquet/column/statistics/SizeStatistics.java |  6 ++++--
 .../parquet/column/statistics/TestSizeStatistics.java    | 16 ++++++++++++++++
 .../format/converter/ParquetMetadataConverter.java       | 10 ++++++++--

build.gradle

stevenzwu · 2025-09-02T16:12:57Z

spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java

+    List<FileScanTask> files =
+        StreamSupport.stream(table.newScan().planFiles().spliterator(), false)
+            .collect(Collectors.toList());
+    assertThat(files.size()).as("Did not have the expected number of files").isEqualTo(numExpected);


Nit: I thought below is the preferred assertion style with better error msg

assertThat(files).as("Did not have the expected number of files").hasSize(numExpected);

I changed this to easily set a breakpoint and inspect files. I left it like that since it might be helpful in the future.

ok. I will leave it up to you.

changing the assertion doesn't prevent the capability of setting a breakpoint and inspecting files, since the files are collected before the assertion, as new lines 2171-2173 stay as they are in this PR.

List<String> list = Arrays.asList("a", "b", "c"); assertThat(list).hasSize(4);

Above code will fail with the following helpful error msg

Expected size: 4 but was: 3 in: ["a", "b", "c"]

parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java

…-1-16

kevinjqliu · 2025-09-02T17:48:10Z

Could not find org.apache.parquet:parquet-avro:1.16.0.
Searched in the following locations:
- https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-avro/1.16.0/parquet-avro-1.16.0.pom
- file:/home/runner/.m2/repository/org/apache/parquet/parquet-avro/1.16.0/parquet-avro-1.16.0.pom

1.16.0 not here yet https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-avro/

1.16.0RC2 already has 3 binding votes, just not officially released yet
https://lists.apache.org/thread/rb0gorvx1lysch6yxks72h94kqhsp719

…fd-test-parquet-1-16

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java

stevenzwu · 2025-09-02T18:21:43Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java

-        // added to Parquet
-        // Preconditions.checkArgument(
-        //  sType instanceof VariantType, "Invalid variant: %s is not a VariantType", sType);
+      } else if (sType instanceof VariantType


why do we need to allow both?

Ideally we don't need both, but if there is any data that has been written without the annotation, then we can fallback to the Iceberg schema. The important part is that we set the annotations in TypeToMessageType.java.

but if there is any data that has been written without the annotation, then we can fallback to the Iceberg schema.

Do we want to support this scenario?

Ideally not :)

I assume this is for us supporting any Variants written by Spark in 4.0, otherwise we couldn't possibly import them right?

Ah, I didn't think of that use-case. Spark 4 was released before the annotation, so I think you're right there 👍

I believe in the Iceberg Schema / Spark Type first and if the annotation is missing, I think we should still just read and be happy :)

@Fokko can we also add some comment to explain why we are checking both? other non-primitive types (like list and map) only checks the annotation.

Agree that we need to handle both cases (missing annotation but sType is variant and the logical type is variant) to support the existing data.

Also should we switch to fallback the old way as

LogicalTypeAnnotation.variantType(Variant.VARIANT_SPEC_VERSION).equals(annotation) || sType instanceof VariantType.

Updated and added the comments 👍

stevenzwu

LGTM. let' see how CI goes once Parquet binaries are released.

kevinjqliu · 2025-09-03T02:23:21Z

1.16 is released! https://lists.apache.org/thread/nf0m7z256gtq16m3by78mf4w6tpffdqh
https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-avro/1.16.0/

kevinjqliu · 2025-09-03T02:25:10Z

push an empty commit to trigger ci, it takes a while :)

aihuaxu · 2025-09-03T03:21:01Z

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java

-        // added to Parquet
-        // Preconditions.checkArgument(
-        //  sType instanceof VariantType, "Invalid variant: %s is not a VariantType", sType);
+      } else if (sType instanceof VariantType


Agree that we need to handle both cases (missing annotation but sType is variant and the logical type is variant) to support the existing data.

Also should we switch to fallback the old way as

LogicalTypeAnnotation.variantType(Variant.VARIANT_SPEC_VERSION).equals(annotation) || sType instanceof VariantType.

api/src/main/java/org/apache/iceberg/variants/Variant.java

kevinjqliu

LGTM!

…-1-16

…fd-test-parquet-1-16

stevenzwu · 2025-09-03T13:55:51Z

thanks @Fokko for the contribution, and @aihuaxu @RussellSpitzer @nastra @kevinjqliu for the reviews

RussellSpitzer · 2025-09-03T15:29:47Z

push an empty commit to trigger ci, it takes a while :)

Note that you can "re-run" failed tests from the github workloads ui too (if you are on the PMC)

Fokko · 2025-09-03T15:41:37Z

(if you are on the PMC)

I think it should also be available for committers.

Parquet: Test out the Parquet-Java 1.16.0 release

9d97155

github-actions bot added spark build labels Aug 28, 2025

nastra approved these changes Aug 28, 2025

View reviewed changes

Merge branch 'main' of github.com:apache/iceberg into fd-test-parquet…

4d005ad

…-1-16

ebyhr mentioned this pull request Sep 1, 2025

Test: Upgrade to Parquet 1.16.0rc [DO NOT MERGE] #13971

Closed

Merge branch 'main' of github.com:apache/iceberg into fd-test-parquet…

5e6d668

…-1-16

:/

6309f60

Fokko force-pushed the fd-test-parquet-1-16 branch from 66bd095 to 6309f60 Compare September 2, 2025 09:15

Set the annotation

5df9a4b

github-actions bot added the parquet label Sep 2, 2025

Fokko commented Sep 2, 2025

View reviewed changes

build.gradle Outdated Show resolved Hide resolved

Remove staging

871040c

Fokko marked this pull request as ready for review September 2, 2025 16:01

stevenzwu reviewed Sep 2, 2025

View reviewed changes

parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java Outdated Show resolved Hide resolved

Fokko added 2 commits September 2, 2025 19:46

Introduce constant for the Variant version

96fddd4

Merge branch 'main' of github.com:apache/iceberg into fd-test-parquet…

da30312

…-1-16

Merge branch 'fd-test-parquet-1-16' of github.com:Fokko/iceberg into …

a2558cc

…fd-test-parquet-1-16

github-actions bot added the API label Sep 2, 2025

Fokko changed the title ~~Parquet: Test out the Parquet-Java 1.16.0 release~~ Build: Bump Parquet-Java to 1.16.0 Sep 2, 2025

stevenzwu reviewed Sep 2, 2025

View reviewed changes

spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/data/ParquetWithSparkSchemaVisitor.java Outdated Show resolved Hide resolved

stevenzwu reviewed Sep 2, 2025

View reviewed changes

Allow for both

7a2bf7d

Fokko force-pushed the fd-test-parquet-1-16 branch from f960d54 to 7a2bf7d Compare September 2, 2025 19:18

Fokko added 2 commits September 2, 2025 21:29

Revert relying on Iceberg schema

8259b45

Improve error message

fa18217

stevenzwu approved these changes Sep 2, 2025

View reviewed changes

Add sType instanceof VariantType

3b4ba3e

trigger ci

2e77e6c

aihuaxu reviewed Sep 3, 2025

View reviewed changes

kevinjqliu approved these changes Sep 3, 2025

View reviewed changes

Fokko added 3 commits September 3, 2025 07:17

Merge branch 'main' of github.com:apache/iceberg into fd-test-parquet…

a86fe2a

…-1-16

Thanks Aihua

d6edcb1

Merge branch 'fd-test-parquet-1-16' of github.com:Fokko/iceberg into …

bcf6fd4

…fd-test-parquet-1-16

stevenzwu approved these changes Sep 3, 2025

View reviewed changes

stevenzwu merged commit 12ab7fc into apache:main Sep 3, 2025
43 checks passed

Fokko deleted the fd-test-parquet-1-16 branch September 3, 2025 15:41

Build: Bump Parquet-Java to 1.16.0 #13941

Build: Bump Parquet-Java to 1.16.0 #13941

Uh oh!

Conversation

Fokko commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fokko commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

stevenzwu Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevinjqliu commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu commented Sep 3, 2025

Uh oh!

kevinjqliu commented Sep 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stevenzwu commented Sep 3, 2025

Uh oh!

RussellSpitzer commented Sep 3, 2025

Uh oh!

Fokko commented Sep 3, 2025

Uh oh!

Uh oh!

Fokko commented Aug 28, 2025 •

edited

Loading

Fokko commented Sep 2, 2025 •

edited

Loading

stevenzwu Sep 2, 2025 •

edited

Loading

kevinjqliu commented Sep 2, 2025 •

edited

Loading