Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: dlt-hub/dlt

1.17.1

02 Oct 16:53
646f4b5

Choose a tag to compare

This patch release mostly addresses bugs and inconsistencies found in new ducklake destination. The most significant change was to rename catalog_name to ducklake_name in destination.ducklake configuration in #3153

Core Library

  • fix(ducklake): 3140 disambiguate config key and default values by @zilto in #3153
  • fix: explicitly replace postgresql by postgres in ATTACH (ducklake) by @zilto in #3148
  • fix: 3139 pass SQLAlchemy credentials to f-string (ducklake) by @zilto in #3150
  • rest_api: remove unused exceptions.py by @burnash in #3143
  • fix incorrect export by @chulkilee in #3120
  • Fix/3123 close sqlalchemy cursor by @rudolfix in #3136
  • fix: 3145 add read_parquet(use_arrow: bool) by @zilto in #3149
  • restclient: add support for data parameter in RESTClient and rest_api by @burnash in #3134
  • Feat: Custom metrics in the incremental transform by @anuunchin in #3117

Chores

Docs

New Contributors

Full Changelog: 1.17.0...1.17.1

1.17.0

24 Sep 13:41
4361afe

Choose a tag to compare

Core Library

  • dashboard: fixes file opener on WSL by @rudolfix in #3076
  • (bugfix) persist incremental initial value by @rudolfix in #3075
  • (QoL) sets explicit timeouts on trackers by @rudolfix in #3074
  • restclient: misc Paginators improvements by @burnash in #2924
  • Improved pipeline attach command and Dashboard launcher extensions by @sh-rp in #3060
  • Fix parameter reference in IncrementalCursorPathHasValueNone exceptio… by @rik-adegeest in #3070
  • fix: convert local file path to posix before PUT to Databricks destination by @AndreiBondarenko in #3086
  • Fix/67 (relational normalizer) ignore None if child table exists by @sh-rp in #3048
  • Fix/3047 prevent same naming for staging and final datasets by @alkaline-0 in #3096
  • fix: fixed error in import of BaseOperator in airflow_helper.py (#2601) by @ianedmundson1 in #3043
  • makes root key propagation more selective, fallbacks for 2nd degree nesting #2737
  • allows to limit by row count #2737
  • enables ordering or results in filesystem via Incremental sort_order #2737
  • cli: updated error in the dlt pipeline show command by @burnash in #3095
  • Fix: sql_client.raise_database_error creates circular __cause__ dependency by @anuunchin in #3111
  • Feat: allowing custom metrics to be added to dlt resources and transform steps by @anuunchin in #3078
  • feat: ducklake destination (all buckets and catalog combinations supported) by @zilto in #3015

Chores

  • repo(ci): disable docker container autorestart by @zilto in #3083
  • Don't echo pypi token to console on library publish by @sh-rp in #3089
  • Improve pipeline dashboard test coverage by @sh-rp in #3091
  • Run common and dashboard tests also with newest available allowed packages for all deps by @sh-rp in #3100
  • Docs Cloudflare worker deployment by @sh-rp in #3105
  • Updates CONTRIBUTING.md and README.md to remove outdated information and add more info by @sh-rp in #3101
  • Docs docusaurus / cloudflare fixes by @sh-rp in #3114

Verified Sources

Docs

  • explains various backfilling options for sql_database and filesystem with examples and additional tests by @rudolfix in #2737
  • ducklake destination documentation by @rudolfix in #3015

New Contributors

Full Changelog: 1.16.0...1.17.0

1.16.0

10 Sep 07:29
71a2fbb

Choose a tag to compare

Notable changes in this release

Core Library

  • fully support naive and tz-aware timestamp/time data types by @rudolfix in #2570
  • Dashboard updates and fixes by @sh-rp in #3055
  • Fix: Max table nesting is ignored for the first run when import schema path is specified by @anuunchin in #2992
  • fix: avoid private interfaces; explicit compiler mapping by @zilto in #2966
  • Refactor transformations by @sh-rp in #2970
  • Dashboard Improvements by @sh-rp in #2965
  • fix: top level relation by @zilto in #2983
  • fix: MissingDependencyException should inherit ImportError by @zilto in #2977
  • Add remaining paramiko connect params to SFTP filesystem by @AyushPatel101 in #2823
  • Feat: dataset access telemetry by @anuunchin in #3056
  • feat: dlt.Schema.to_dot() graphviz export by @zilto in #2959
  • fix: avoid setting "None" string for aws session token by @tpulmano in #2978
  • fix: dlt.Pipeline.__repr__ by @zilto in #3022
  • pip install marimo -> dlt[workspace] by @djudjuu in #3035
  • fix: improve type hints for dataset and relation by @zilto in #2997
  • Small dashboard fixes by @sh-rp in #3036
  • feat: dlt widgets for marimo by @zilto in #3021
  • feat(dataset): simplify public interface for dlt.Dataset and dlt.Relation by @zilto in #3059

Docs

Tests

  • re-enable python 3.10 common tests by @sh-rp in #2979
  • repo: use ruff check for linting by @zilto in #2967
  • Use license command for testing dlt+ installation by @sh-rp in #3026
  • add up to date check for uv lockfile as first lint step by @sh-rp in #3052

Misc

New Contributors

Full Changelog: 1.15.0...1.16.0

1.15.0

05 Aug 18:06
273420b

Choose a tag to compare

Breaking changes

This version will add .gz extensions to files that are compressed. That includes filesystem destinations, internal working directory and staging locations used to feed other destinations. A few practical hints:

  • Existing filesystem destination will continue storing files without gz extension and they are not affected by the change (existing datasets will retain their behavior where this extension is not added for backwards compatibility)
  • Compressed files uploaded to staging destinations will now have the .gz extension, also if dlt is configured to keep data in stage
  • This does not apply to parquet files.
  • More information can be found in the filesystem destination docs: https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#file-compression

Core Library

  • [Databricks destination] Adding comment and tags for table and columns and applying primary and foreign key constraints in Unity Catalog by @bayees in #2674
  • feat - add crlf support for csv exports by @7amza79 in #2783
  • feat: add has_more boolean flag logic to RESTClient OffsetPaginator by @michaelconan in #2817
  • rest_api: fix: make ProcessingSteps filter and map fields optional by @burnash in #2913
  • Enable and test python 3.14 support by @sh-rp in #2789
  • removes init files from dlt tables in filesystem by @rudolfix in #2868
  • restclient: json param range paginator by @Giackgamba in #2917
  • fix sync destination warning logging call by @sh-rp in #2927
  • fix: missing __repr__ for @dlt.transformation by @zilto in #2940
  • fix: restclient: handle null data in response by @burnash in #2936
  • Fix: saving compressed load files with .gz extension by @anuunchin in #2835
  • fix: prevent DuplicateSchema error when using public schema in Redshift by @franloza in #2953
  • feat: Schema.to_dbml(), auto export schemas in dbml format by @zilto in #2929
  • QoL: improve DataValidationError output: use identifying columns if present by @djudjuu in #2915
  • callback collector by @djudjuu in #2922
  • skips inferring incomplete column when already incomplete by @rudolfix in #2935
  • 2946 sqlalchemy destination fixes (full support for mssq, partial for trino) by @rudolfix in #2951
  • adds precision to _dlt_load_id and _dlt_id columns by @rudolfix in #2951
  • adds json field support for mssql by @rudolfix in #2951
  • fixes clickhouse temporary table engine not propagate to nodes (failed merges fix) by @rudolfix in #2951
  • fixes BIGQUERY numeric creation (when scale was set to 0) by @rudolfix in #2951
  • fix: replace arrow2 with arrow backend for connectorx, enables newest connectorx versions by @zilto in #2933
  • AI Command: extended with IDEs (rules for all major IDEs are supported) by @anuunchin in #2937
  • duckdb bumped to 1.3.2, iceberg scanners updated by @rudolfix in #2958
  • Feat: Allow control over streamed_exec in delta merge upsert by @anuunchin in #2961
  • fix failing top level module imports on projects in dirs that start with a dot by @sh-rp in #2963

Docs

New Contributors

Full Changelog: 1.14.1...1.15.0

1.14.1

16 Jul 20:45
01d3242

Choose a tag to compare

Breaking Changes
If you used pipeline.dataset() and used ibis syntax to write queries please read below:

Core Library

  • fix filesystem config section by @sh-rp in #2865
  • fix: typing for updated datasets and relations Protocols by @zilto in #2870
  • Add workspace extra and rename marimo app to "pipeline dashboard" by @sh-rp in #2876
  • rest_api: Redact secrets in logs, add configurable response body in errors by @burnash in #2867
  • fixes range_start=open in incremental by @rudolfix in #2873
  • feat: autocompletion added for dataset and relation when in Notebook by @zilto in #2891
  • Fix logger.isEnabledFor() TypeError by @burnash in #2882
  • fixes arrow/pandas dependencies in extras and dep groups by @rudolfix in #2895

Docs

  • simplify playground setup cell by @sh-rp in #2857
  • do not run lancedb custom destination example test on forked subprocess by @djudjuu in #2854
  • Added troubleshooting steps for Databricks and other minor updates by @dat-a-man in #2871
  • add ibis dataset migration guide by @sh-rp in #2874
  • docs: adds documentation for column subset selection in sql_database source by @franloza in #2869

New Contributors

Full Changelog: 1.12.3...1.14.1

1.13.0

08 Jul 19:44
4ed21fd

Choose a tag to compare

Core library features

  • Extend CSV quoting options in CsvWriter by @burnash in #2810
  • rest_api: add HeaderCursorPaginator to configuration by @burnash in #2798
  • rest_api: Raise ValueError for incorrect auth config types by @burnash in #2799
  • feat(athena): apply lakeformation tags on database (cont.) by @rudolfix in #2808
  • Add sock argument for SFTPCredentials by @AyushPatel101 in #2803
  • Psycopg2SqlClient: accept extra options by @nicob3y in #2755
  • Update fruitshop source with slightly more data and setup that enables star schema demonstration by @sh-rp in #2845
  • return latest step info by @djudjuu in #2829
  • change secrets.toml file to sources.rest_api_pipeline.github by @kaliole in #2849
  • Chore: Pyiceberg's python contsraint moved from project wide constraints by @anuunchin in #2839

Internals

Cli

Docs

New Contributors

Full Changelog: 1.12.3...1.13.0

1.12.3

25 Jun 19:17
bc82cd0

Choose a tag to compare

Core Library

  • (feat) allows to add SQL statements to schema migration executed after tables were created/altered by @rudolfix in #2791
  • Detect whether query just filters rows or is more complex with sqlglot by @anuunchin in #2619
  • (QoL):adds str and repr to dataset and relation by @rudolfix in #2796
  • fix: added @dlt.transformation to __all__ by @zilto in #2797
  • Fix: Null column type not inferred info/warning floods the output by @anuunchin in #2800
  • rest_api: allow processing multiple DltResource instances by @burnash in #2807
  • marimo app updates by @sh-rp in #2778
  • Hotix - fix marimo start command by @sh-rp in #2812

CI

  • enable linting on python 3.13 by @sh-rp in #2790
  • run all common tests with --resolution lowest-direct on uv sync by @sh-rp in #2787

1.12.2a0

24 Jun 08:24
6138ef0

Choose a tag to compare

1.12.2a0 Pre-release
Pre-release

This is a prelease of dlt and our first build with the uv package manager.

1.12.1

18 Jun 20:01
91420d7

Choose a tag to compare

Core Library

Quality of Life (fixing annoying little things)

  • 2529-INFO_TABLES_QUERY_THRESHOLD-as-paramterer-from-config by @amirdataops in #2600
  • warn when resolving configs or secrets with placeholder values by @djudjuu in #2636
  • Prevent unecessary preliminary connection in dataset by @sh-rp in #2645
  • QoL: warning with hint to provide data types for columns with exclusively None values by @anuunchin in #2633
  • Fix issue 2690: switch to packaging to remove warning on import dlt by @djudjuu in #2707
  • qol: exception formatting by @zilto in #2715
  • Regular and standalone resources are now the same thing. Both provide nice typed callables, allow to be renamed and allow to inject secrets and configs in the same way - also when part of an inner function. This unifies injection behavior for all our decorators.
    In the example below (1) access_token secrets is allowed in inner resource (2) limit argument with default will be injected from ie. LIMIT env variable which was skipped before
@dlt.source
def source():
    @dlt.resource(write_disposition="merge", primary_key="_id")
    def documents(access_token=dlt.secrets.value, limit=10):
        yield from generate_json_like_data(access_token, limit)

    return documents

⚠️ Still we do not recommend to define parametrized inner resources.

  • You can now return data from resources instead of yielding single item. We do not recommend that for code readability.dlt always wraps resources in generators so your return will be converted to yield.
  • To return a DltResource from a resource function you must explicitly type the return value:
@dlt.resource
def rv_resource(name: str) -> DltResource:
    return dlt.resource([1, 2, 3], name=name, primary_key="value")
  • normalizes config resolve behavior: default values can be overridden from providers but explicit cannot.
  • ⚠️ previously, if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults).
  • ⚠️ if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored
  • We use custom, consistent wrap and unwrap of functions. our decorators preserve both typing and runtime signature of decorated functions. makefun got removed.
  • if Incremental initializes from another Incremental as native value, it copies original type correctly
  • dlt.resource can define configuration section (also using lambdas)

Bugfixes and improvements

  • feat: Expand sql table resource config by @xneg in #2396
  • Added write_disposition to sql table config
  • Added primary_key and merge_key to sql table config
  • Feat: Support clustered tables with custom column order in BigQuery destination by @hsm207 in #2638
  • Feat: Add configuration propagation to deltalake.write_deltalake (#2629) by @gaatjeniksaan in #2640
  • Add support for creating custom integer-range partition table in BigQuery by @hsm207 in #2676
  • Upsert merge strategy for iceberg by @anuunchin in #2671
  • Feat/add athena dabatase location option by @eric-pinkham-rw in #2708
  • motherduck destination config improvement: uppercase env var by @djudjuu in #2703
  • adds parquet support to postgres via adbc by @rudolfix in #2685
  • 2681 - fixes null on non null column arrow by @rudolfix in #2721
  • removes cffi version of psycopg2
  • mssql and snowflake bugfixes by @rudolfix in #2756
  • support for deltalake 1.0 @rudolfix in #2721
  • allows to skip input data deduplication on delete-insert merge to decrease query cost in #2721
  • allows to configure configs and pragmas for duckdb, improves sql_client, tests @rudolfix in #2730
  • logs resolved traces thread-wise, clears log between pipeline runs @rudolfix in #2730

Chores & tech debt
We switch to uv in the coming days and:

  • Simplify workflow files by @sh-rp in #2663
  • fix/2677: remove recursive filewatching by @zilto in #2678
  • QoL: improved __repr__() for public interface by @zilto in #2630
  • fix: incrementally watch files by @zilto in #2697
  • Simplify pipeline test utils by @sh-rp in #2566 (we use data access and dataset for testing now)
  • added constants for load_id col in _dlt_loads table by @zilto in #2729
  • Update github workflow setup by @sh-rp in #2728
  • fixes leaking datasets tests by @rudolfix in #2730

🧪 Upgrades to data access

  • All SQL queries are destination agnostic. For example
  • Column lineage is computed and inferred. x-annotation hints are propagated
  • SqlModel represent SQL query and is processed in extract, normalize and loaded in load step
  • you can use scalar() on data access expressions ie.
# get latest processed package id
max_load_id = pipeline.dataset()._dlt_loads.load_id.max().scalar()

🧪 Cool experimental stuff:

Check out our new embedded pipeline explorer app

dlt pipeline <name> show --marimo
dlt pipeline <name> show --marimo --edit

use edit option to enable Notebook/edit mode in Marimo + very cool Ibis dataset explorer

Docs

  • docs: dlt+ iceberg destination partitioning by @burnash in #2686
  • docs: fix invalid bigquery reference in athena destination by @goober in #2700
  • docs: rest_api: clarify dlt resource and rest_api specific parameters by @burnash in #2710
  • docs: plus: add merge strategies for dlt+ Iceberg destination by @burnash in #2749
  • rest_api: document pagination hierarchy and add tests by @burnash in #2745
  • docs: add session parameter to rest_api client configuration by @burnash in #2746
  • docs: fix incorrect github_source function calls in tutorial by @axelearning in #2768

We updated contribution guidelines

  • By default we do not accept more destinations (except a few like DuckLake or Trino)
  • Each PR needs a test and (possibly) docs entry

New Contributors

Full Changelog: 1.11.0...1.12.0

1.12.0

17 Jun 21:43
f6a8f65

Choose a tag to compare

Important

We yanked this release from PyPI after discovering that the minimum allowed version of sqlglot could prevent dlt from being imported. This release has been replaced by version 1.12.1, which includes the same release notes.

Core Library

Quality of Life (fixing annoying little things)

  • 2529-INFO_TABLES_QUERY_THRESHOLD-as-paramterer-from-config by @amirdataops in #2600
  • warn when resolving configs or secrets with placeholder values by @djudjuu in #2636
  • Prevent unecessary preliminary connection in dataset by @sh-rp in #2645
  • QoL: warning with hint to provide data types for columns with exclusively None values by @anuunchin in #2633
  • Fix issue 2690: switch to packaging to remove warning on import dlt by @djudjuu in #2707
  • qol: exception formatting by @zilto in #2715
  • Regular and standalone resources are now the same thing. Both provide nice typed callables, allow to be renamed and allow to inject secrets and configs in the same way - also when part of an inner function. This unifies injection behavior for all our decorators.
    In the example below (1) access_token secrets is allowed in inner resource (2) limit argument with default will be injected from ie. LIMIT env variable which was skipped before
@dlt.source
def source():
    @dlt.resource(write_disposition="merge", primary_key="_id")
    def documents(access_token=dlt.secrets.value, limit=10):
        yield from generate_json_like_data(access_token, limit)

    return documents

⚠️ Still we do not recommend to define parametrized inner resources.

  • You can now return data from resources instead of yielding single item. We do not recommend that for code readability.dlt always wraps resources in generators so your return will be converted to yield.
  • To return a DltResource from a resource function you must explicitly type the return value:
@dlt.resource
def rv_resource(name: str) -> DltResource:
    return dlt.resource([1, 2, 3], name=name, primary_key="value")
  • normalizes config resolve behavior: default values can be overridden from providers but explicit cannot.
  • ⚠️ previously, if those were instances of base configurations, behavior was inconsistent (explicit values were treated like defaults).
  • ⚠️ if native value is found for a config and it does not accept native values, config resolution will fail, previously it was ignored
  • We use custom, consistent wrap and unwrap of functions. our decorators preserve both typing and runtime signature of decorated functions. makefun got removed.
  • if Incremental initializes from another Incremental as native value, it copies original type correctly
  • dlt.resource can define configuration section (also using lambdas)

Bugfixes and improvements

  • feat: Expand sql table resource config by @xneg in #2396
  • Added write_disposition to sql table config
  • Added primary_key and merge_key to sql table config
  • Feat: Support clustered tables with custom column order in BigQuery destination by @hsm207 in #2638
  • Feat: Add configuration propagation to deltalake.write_deltalake (#2629) by @gaatjeniksaan in #2640
  • Add support for creating custom integer-range partition table in BigQuery by @hsm207 in #2676
  • Upsert merge strategy for iceberg by @anuunchin in #2671
  • Feat/add athena dabatase location option by @eric-pinkham-rw in #2708
  • motherduck destination config improvement: uppercase env var by @djudjuu in #2703
  • adds parquet support to postgres via adbc by @rudolfix in #2685
  • 2681 - fixes null on non null column arrow by @rudolfix in #2721
  • removes cffi version of psycopg2
  • mssql and snowflake bugfixes by @rudolfix in #2756
  • allows to configure configs and pragmas for duckdb, improves sql_client, tests @rudolfix in #2730
  • logs resolved traces thread-wise, clears log between pipeline runs @rudolfix in #2730

Chores & tech debt
We switch to uv in the coming days and:

  • Simplify workflow files by @sh-rp in #2663
  • fix/2677: remove recursive filewatching by @zilto in #2678
  • QoL: improved __repr__() for public interface by @zilto in #2630
  • fix: incrementally watch files by @zilto in #2697
  • Simplify pipeline test utils by @sh-rp in #2566 (we use data access and dataset for testing now)
  • added constants for load_id col in _dlt_loads table by @zilto in #2729
  • Update github workflow setup by @sh-rp in #2728
  • fixes leaking datasets tests by @rudolfix in #2730

🧪 Upgrades to data access

  • All SQL queries are destination agnostic. For example
  • Column lineage is computed and inferred. x-annotation hints are propagated
  • SqlModel represent SQL query and is processed in extract, normalize and loaded in load step
  • you can use scalar() on data access expressions ie.
# get latest processed package id
max_load_id = pipeline.dataset()._dlt_loads.load_id.max().scalar()

🧪 Cool experimental stuff:

Check out our new embedded pipeline explorer app

dlt pipeline <name> show --marimo
dlt pipeline <name> show --marimo --edit

use edit option to enable Notebook/edit mode in Marimo + very cool Ibis dataset explorer

Docs

  • docs: dlt+ iceberg destination partitioning by @burnash in #2686
  • docs: fix invalid bigquery reference in athena destination by @goober in #2700
  • docs: rest_api: clarify dlt resource and rest_api specific parameters by @burnash in #2710
  • docs: plus: add merge strategies for dlt+ Iceberg destination by @burnash in #2749
  • rest_api: document pagination hierarchy and add tests by @burnash in #2745
  • docs: add session parameter to rest_api client configuration by @burnash in #2746
  • docs: fix incorrect github_source function calls in tutorial by @axelearning in #2768

We updated contribution guidelines

  • By default we do not accept more destinations (except a few like DuckLake or Trino)
  • Each PR needs a test and (possibly) docs entry

New Contributors

Full Changelog: 1.11.0...1.12.0