1.12.1
Core Library
Quality of Life (fixing annoying little things)
- 2529-INFO_TABLES_QUERY_THRESHOLD-as-paramterer-from-config by @amirdataops in #2600
- warn when resolving configs or secrets with placeholder values by @djudjuu in #2636
- Prevent unecessary preliminary connection in dataset by @sh-rp in #2645
- QoL: warning with hint to provide data types for columns with exclusively None values by @anuunchin in #2633
- Fix issue 2690: switch to packaging to remove warning on
import dltby @djudjuu in #2707 - qol: exception formatting by @zilto in #2715
- Regular and standalone resources are now the same thing. Both provide nice typed callables, allow to be renamed and allow to inject secrets and configs in the same way - also when part of an inner function. This unifies injection behavior for all our decorators.
In the example below (1) access_token secrets is allowed in inner resource (2) limit argument with default will be injected from ie.LIMITenv variable which was skipped before
@dlt.source
def source():
@dlt.resource(write_disposition="merge", primary_key="_id")
def documents(access_token=dlt.secrets.value, limit=10):
yield from generate_json_like_data(access_token, limit)
return documents- You can now return data from resources instead of yielding single item. We do not recommend that for code readability.
dltalways wraps resources in generators so your return will be converted to yield. - To return a
DltResourcefrom a resource function you must explicitly type the return value:
@dlt.resource
def rv_resource(name: str) -> DltResource:
return dlt.resource([1, 2, 3], name=name, primary_key="value")
- normalizes config resolve behavior: default values can be overridden from providers but explicit cannot.
⚠️ previously, if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults).⚠️ if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored- We use custom, consistent
wrapandunwrapof functions. our decorators preserve both typing and runtimesignatureof decorated functions.makefungot removed. - if
Incrementalinitializes from anotherIncrementalas native value, it copies original type correctly dlt.resourcecan define configuration section (also using lambdas)
Bugfixes and improvements
- Added write_disposition to sql table config
- Added primary_key and merge_key to sql table config
- Feat: Support clustered tables with custom column order in BigQuery destination by @hsm207 in #2638
- Feat: Add configuration propagation to deltalake.write_deltalake (#2629) by @gaatjeniksaan in #2640
- Add support for creating custom integer-range partition table in BigQuery by @hsm207 in #2676
- Upsert merge strategy for iceberg by @anuunchin in #2671
- Feat/add athena dabatase location option by @eric-pinkham-rw in #2708
- motherduck destination config improvement: uppercase env var by @djudjuu in #2703
- adds parquet support to postgres via adbc by @rudolfix in #2685
- 2681 - fixes null on non null column arrow by @rudolfix in #2721
- removes cffi version of psycopg2
- mssql and snowflake bugfixes by @rudolfix in #2756
- support for deltalake 1.0 @rudolfix in #2721
- allows to skip input data deduplication on
delete-insertmerge to decrease query cost in #2721 - allows to configure configs and pragmas for duckdb, improves sql_client, tests @rudolfix in #2730
- logs resolved traces thread-wise, clears log between pipeline runs @rudolfix in #2730
Chores & tech debt
We switch to uv in the coming days and:
- Simplify workflow files by @sh-rp in #2663
- fix/2677: remove recursive filewatching by @zilto in #2678
- QoL: improved
__repr__()for public interface by @zilto in #2630 - fix: incrementally watch files by @zilto in #2697
- Simplify pipeline test utils by @sh-rp in #2566 (we use data access and
datasetfor testing now) - added constants for
load_idcol in_dlt_loadstable by @zilto in #2729 - Update github workflow setup by @sh-rp in #2728
- fixes leaking datasets tests by @rudolfix in #2730
🧪 Upgrades to data access
- Normalize model files by @sh-rp in #2507
- dlt.transformation implementation by @sh-rp, @zilto and @anuunchin in #2528
- [transformations] decouples sqlglot lineage and schema generation from destination identifiers by @rudolfix in #2705
- All SQL queries are destination agnostic. For example
- Column lineage is computed and inferred.
x-annotationhints are propagated SqlModelrepresent SQL query and is processed in extract, normalize and loaded inloadstep- you can use
scalar()on data access expressions ie.
# get latest processed package id
max_load_id = pipeline.dataset()._dlt_loads.load_id.max().scalar()🧪 Cool experimental stuff:
Check out our new embedded pipeline explorer app
- dlt marimo app pre-release version by @sh-rp in #2662
- add marimo flag to show command by @sh-rp in #2741
dlt pipeline <name> show --marimo
dlt pipeline <name> show --marimo --edituse edit option to enable Notebook/edit mode in Marimo + very cool Ibis dataset explorer
Docs
- docs: dlt+ iceberg destination partitioning by @burnash in #2686
- docs: fix invalid bigquery reference in athena destination by @goober in #2700
- docs: rest_api: clarify dlt resource and rest_api specific parameters by @burnash in #2710
- docs: plus: add merge strategies for dlt+ Iceberg destination by @burnash in #2749
- rest_api: document pagination hierarchy and add tests by @burnash in #2745
- docs: add session parameter to rest_api client configuration by @burnash in #2746
- docs: fix incorrect github_source function calls in tutorial by @axelearning in #2768
We updated contribution guidelines
- By default we do not accept more destinations (except a few like DuckLake or Trino)
- Each PR needs a test and (possibly) docs entry
New Contributors
- @amirdataops made their first contribution in #2600
- @goober made their first contribution in #2700
- @eric-pinkham-rw made their first contribution in #2708
- @axelearning made their first contribution in #2768
Full Changelog: 1.11.0...1.12.0