-
Notifications
You must be signed in to change notification settings - Fork 360
Feat/iceberg advanced partitioning #3053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Feat/iceberg advanced partitioning #3053
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
|
Hi @rakesh-tmdc, thanks for the contribution, this looks good and useful. In dlt+ we already have an If you're up for it, we can extract the adapter and have your changes delegate partition spec parsing/validation to it to keep behavior consistent across catalogs. |
|
Thanks @burnash , glad to hear this is useful! Extracting the iceberg_adapter and its partition helpers into open source sounds like a great idea — I’d definitely prefer to reuse that instead of duplicating logic. Once it’s available, I can rework my PR so that partition spec parsing/validation delegates to the adapter, which should keep things consistent. Just let me know when/where the adapter lands, and I’ll update accordingly. |
- Add support for advanced partition transforms (year, month, day, hour, bucket, truncate) - Implement explicit partition ordering via index property - Add custom partition naming support - Implement priority system: advanced partitioning overrides legacy partition: True - Add comprehensive validation for partition specifications - Add graceful error handling for PyIceberg limitations - Add performance optimization with early exit for non-partitioned schemas - Update schema typing to support dict/list partition syntax - Add pyiceberg-core>=0.6.0 dependency for advanced transforms - Add comprehensive test suite with 22+ test cases covering all scenarios Backward compatible: existing partition: True syntax continues to work Resolves partition ordering limitations in Iceberg table format
becdd6f to
15aad68
Compare
|
Hi @rakesh-tmdc, I've ported the Now that we have the adapter in place, here's what needs to happen next to complete this PR:
You can also test the backward compatibility with older way to define identity partition by running: where Let me know if you have any questions! |
Hey Team,
I’ve been using dlt for the past 3–4 months, mostly with Apache Iceberg as the destination. Recently, I needed support for Iceberg partitioning, especially for more advanced use cases like time and bucket partitions.
I’ve implemented support for these in a way that’s fully compatible with existing column-level partition configurations:
Still works with earlier formats like:
{ "region": { "partition": true }, "category": { "partition": true } }Now also supports advanced options like:
{ "date_added": { "partition": { "type": "year", "index": 1, "name": "yearly_partition" } }, "user_id": { "partition": { "type": "bucket", "index": 2, "bucket_count": 32, "name": "user_bucket" } }, "region": { "partition": { "type": "identity", "index": 3 } } }Would love feedback from the team!