Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add missing LZ4 and ZSTD compression codec classes #25021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ZacBlanco
Copy link
Contributor

@ZacBlanco ZacBlanco commented Apr 30, 2025

Description

These codecs are available in the writers, but don't seem to have been configured correctly. Trying to write tables with these formats previously threw errors. This change enables LZ4 and ZSTD compression for Parquet writers in Iceberg and Hive

Motivation and Context

When users set the compression_codec session property or *.compression-codec connector property with LZ4 or ZSTD with parquet format as the default, tables would fail to be created due to the codec being null inside HiveCompressionCodec inside of Iceberg. I couldn't find a good reason for keeping these null, so I populated the correct enum variants and added tests to ensure they worked. Since this code is shared between Iceberg and Hive connectors, I added tests for different file type and compression codec variants to ensure we have compatibility across all of the potential configuration combinations.

Impact

  • Users can now set compression_codec to LZ4 and ZSTD when creating iceberg tables with parquet as the default file format
  • Pagefile formats now support LZ4 and ZSTD compression codecs

Test Plan

  • New test matrix for supported file formats and compression codecs in Hive and Iceberg connectors

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support for ZSTD and LZ4 compression codecs in Parquet format

Hive Connector Changes
* Add support for ZSTD and LZ4 compression codecs in Parquet format
* Add support for ZSTD and LZ4 compression codecs in Pagefile format

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Apr 30, 2025
@ZacBlanco ZacBlanco changed the title [Iceberg[ Add support for LZ4 and ZSTD compression codecs [Iceberg] Add support for LZ4 and ZSTD compression codecs Apr 30, 2025
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-compression-codecs branch from 4f83cd2 to 6c8b2a9 Compare April 30, 2025 22:04
@ZacBlanco ZacBlanco marked this pull request as ready for review May 1, 2025 23:42
@ZacBlanco ZacBlanco requested review from hantangwangd and a team as code owners May 1, 2025 23:42
@ZacBlanco ZacBlanco requested a review from jaystarshot May 1, 2025 23:42
@prestodb-ci prestodb-ci requested review from a team, infvg and pramodsatya and removed request for a team May 1, 2025 23:42
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me, just one nit.

@@ -58,6 +57,7 @@ public final class IcebergSessionProperties
private static final String MINIMUM_ASSIGNED_SPLIT_WEIGHT = "minimum_assigned_split_weight";
private static final String NESSIE_REFERENCE_NAME = "nessie_reference_name";
private static final String NESSIE_REFERENCE_HASH = "nessie_reference_hash";
public static final String COMPRESSION_CODEC = "compression_codec";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String COMPRESSION_CODEC = "compression_codec";
static final String COMPRESSION_CODEC = "compression_codec";

nit: default visibility seems enough here.

These codecs are available in the writers, but don't seem to have been
configured correctly. Trying to write tables with these formats
previously threw errors. This change enables LZ4 and ZSTD compression
for Parquet writers in Hive and Iceberg
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-compression-codecs branch from 6c8b2a9 to 97778fe Compare May 2, 2025 23:37
@ZacBlanco ZacBlanco changed the title [Iceberg] Add support for LZ4 and ZSTD compression codecs Add missing LZ4 and ZSTD compression codec classes May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants