-
Couldn't load subscription status.
- Fork 2.7k
Add support for MERGE INTO
#18135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MERGE INTO
#18135
Conversation
…te more fine-grained
| match_initialized = false; | ||
| } | ||
| // now slice and call sink for each of the match conditions | ||
| for (; match_idx < 3; match_idx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is with this magic number?
I assume it's UPDATE, INSERT, DELETE ?
Seems to be this:
match_actions = {MergeActionCondition::WHEN_MATCHED, MergeActionCondition::WHEN_NOT_MATCHED_BY_TARGET,
MergeActionCondition::WHEN_NOT_MATCHED_BY_SOURCE};So we could just use match_actions.size() ?
Add support for `MERGE INTO` (duckdb/duckdb#18135)
Add support for `MERGE INTO` (duckdb/duckdb#18135)
Follow-up from #18135 This PR adds support for the `RETURNING` statement to `MERGE INTO`. The additional virtual column `merge_action` can be used to obtain the merge action that was performed (`UPDATE`, `DELETE` or `INSERT`). For example: ```sql CREATE TABLE Stock(item_id int, balance int); WITH initial_stocks(item_id, volume) AS (VALUES (10, 2200), (20, 1900)) MERGE INTO Stock USING initial_stocks USING (item_id) WHEN MATCHED THEN UPDATE SET balance = balance + volume WHEN NOT MATCHED THEN INSERT VALUES (initial_stocks.item_id, initial_stocks.volume) RETURNING merge_action, *; ┌──────────────┬─────────┬─────────┐ │ merge_action │ item_id │ balance │ │ varchar │ int32 │ int32 │ ├──────────────┼─────────┼─────────┤ │ INSERT │ 10 │ 2200 │ │ INSERT │ 20 │ 1900 │ └──────────────┴─────────┴─────────┘ WITH new_stocks(item_id, volume) AS (VALUES (20, 2200), (30, 1900)) MERGE INTO Stock USING new_stocks USING (item_id) WHEN MATCHED THEN UPDATE SET balance = balance + volume WHEN NOT MATCHED THEN INSERT VALUES (new_stocks.item_id, new_stocks.volume) RETURNING merge_action, *; ┌──────────────┬─────────┬─────────┐ │ merge_action │ item_id │ balance │ │ varchar │ int32 │ int32 │ ├──────────────┼─────────┼─────────┤ │ UPDATE │ 20 │ 4100 │ │ INSERT │ 30 │ 1900 │ └──────────────┴─────────┴─────────┘ ```
|
This is awesome. Been waiting on merge on the duck for a while. |
#18135 introduced the `MERGE INTO` operator. Merge allows us to do upserts - similar to `ON CONFLICT`, but without requiring the target table to have an index. The merge operator has a separate implementation that leans on the existing insert/update/delete operators. The `ON CONFLICT` instead has a fully custom implementation that inlines upserting into the `PhysicalInsert`. This has a number of issues: * The `ON CONFLICT` code is complex, and we now have an alternative code path that can be used to achieve the same functionality. Maintaining both is sub-optimal. * The `ON CONFLICT` code always uses the index for detecting conflicts - effectively forcing an index join to be used This PR unifies the implementation of these two by rewriting the `ON CONFLICT` clause in the binder to an identical `MERGE` statement, i.e. this: ```sql INSERT OR REPLACE INTO tbl FROM source; ``` Becomes (under the hood): ```sql MERGE INTO tbl USING (SELECT DISTINCT ON (key) * FROM source) ON (tbl.key = source.key) WHEN MATCHED THEN UPDATE WHEN NOT MATCHED THEN INSERT ``` ### Behavioral Changes #### Multi-Row Updates The previous upsert implementation threw an error if the same row was updated multiple times in the same statement. This functionality has been removed - this is now allowed. #### WHERE clause changes The ON CONFLICT clause in Postgres has the ability to specify a WHERE clause after the indexed columns as part of the conflict target, e.g.: ```sql ON CONFLICT (a) WHERE index_predicate [conflict_action] ``` This predicate is used to refer to partial indexes. > index_predicate > Used to allow inference of partial unique indexes. Any indexes that satisfy the predicate (which need not actually be partial indexes) can be inferred. Follows CREATE INDEX format. SELECT privilege on any column appearing within index_predicate is required. DuckDB does not support partial indexes. However, this WHERE clause was instead mistakenly used as a WHERE clause on the condition. There is a second WHERE clause that can be specified as part of DO UPDATE SET that is instead used for this: ```sql DO UPDATE SET ... [ WHERE condition ] ``` After this PR, we now throw an error if the WHERE clause is used as part of the conflict target as we don't support partial indexes - and instead only support WHERE as part of DO UPDATE SET. #### WHERE fallback In the previous implementation, if the WHERE clause was not met, we would fallback to inserting. This is not particularly useful, as this would always result in a constraint violation. After all: the `CONFLICT` clause matched for that particular row. This also does not match Postgres' behavior - they fall back to `DO NOTHING` (i.e. ignoring the row). We now implement the same fallback as Postgres. We also no longer support WHERE as part of `DO NOTHING`, as that does not have any purpose anymore (the fallback would also be `DO NOTHING`). ### (De)serialization As we do the rewrite in the binder, this does not apply when deserializing plans from older versions. As such in order to support deserializing older plans, for now the old code is still kept in. It should just not be triggered anymore except when deserializing old plans. ### Benchmarks The `MERGE` code-path automatically switches between an index join (using pushdown) and a full hash join - allowing it to be much faster for bulk appends. Below is a benchmark running `INSERT OR REPLACE` on a full TPC-H SF1 lineitem table with primary key constraint defined: ```sql INSERT OR REPLACE INTO target_lineitem FROM lineitem ``` | main | New | |-------|------| | 23.2s | 2.3s |
|
Thanks @Mytherin. One question though: Wouldn’t |
bump iceberg to latest main [chore] Fix amalgamation build in progress_bar (duckdb/duckdb#18910) Bump inet & aws (duckdb/duckdb#18899) fix: refine query ETA display and Kalman filter stability (duckdb/duckdb#18880) Bump httpfs to v1.4-andium branch (duckdb/duckdb#18898) Encryption now encoded as a bit, centralizing in set/getter (duckdb/duckdb#18897) Add callback for when an extension fails to load, and also log this (duckdb/duckdb#18894) Keep base data scan state alive in ColumnData::Update call (duckdb/duckdb#18893) Expected errors 2053 (duckdb/duckdb#18892) fixing auto-specifying ciphers and remove double storage (duckdb/duckdb#18891) Add rowsort to upsert_default.test (duckdb/duckdb#18890) bump aws and iceberg (duckdb/duckdb#18889) [chore] Bump config test/configs/compressed_in_memory.json to new format (duckdb/duckdb#18888) [Dev] Fix footgun in `string_t::SetSizeAndFinalize` (duckdb/duckdb#18885) Json: no reinterpret<size_t*> (duckdb/duckdb#18886) [C API] Result schema of prepared statements (duckdb/duckdb#18779) Add `COPY (FORMAT BLOB)` to Andium too :^) (duckdb/duckdb#18884) Avoid automatically checkpointing if the database instance has been invalidated (duckdb/duckdb#18881) Update spatial+vss+sqlsmith in preparation for v1.4 (duckdb/duckdb#18882) Internal duckdb/duckdb#5796: Window Progress (duckdb/duckdb#18860) [Test] Small fixes to concurrent attach/detach test (duckdb/duckdb#18862) Update ducdkb iceberg hash (duckdb/duckdb#18873) Storage fuzzing + several fixes (duckdb/duckdb#18876) Bump mbedtls to v3.6.4 (duckdb/duckdb#18871) [minor] Incompatible DB error message: add newline (duckdb/duckdb#18861) Bump & remove patches for delta, avro, excel, encodings, fts (duckdb/duckdb#18869) Add a FORCE_DEBUG flag to force `-DDEBUG`, similar to FORCE_ASSERT (duckdb/duckdb#18872) Expected errors 2053 (duckdb/duckdb#18864) update duckdb azure extension ref for 1.4.0 (duckdb/duckdb#18868) Hold segment lock during GetColumnSegmentInfo (duckdb/duckdb#18859) Centralize attached database paths in a DatabaseFilePathManager which is shared across databases created through the same DBInstanceCache (duckdb/duckdb#18857) Add more encryption modes CTR and CBC (duckdb/duckdb#18619) Bump Ducklake (duckdb/duckdb#18825) No more `wal_encryption` flag (duckdb/duckdb#18851) Fix `NULL` path for `json_each`/`json_tree` (duckdb/duckdb#18852) Make ATTACH OR REPLACE atomic, keep list of used databases in MetaTransaction (duckdb/duckdb#18850) WAL <> DB File Match Fixes (duckdb/duckdb#18849) Add test_env to unit tester (duckdb/duckdb#18847) Merge ossivalis into main (duckdb/duckdb#18844) Bump MySQL/Postgres/SQLite (duckdb/duckdb#18848) Add OnBeginExtensionLoad callback (duckdb/duckdb#18842) Ignore null verification for statistics on structs (duckdb/duckdb#18813) Document storage version flag in CLI + minor rendering fix (duckdb/duckdb#18841) Add the `VARIANT` LogicalType (duckdb/duckdb#18609) Avoid printing '99 hours', given in most cases that means estimate is… (duckdb/duckdb#18839) Don't notify Py pkg when override git describe is set (duckdb/duckdb#18843) Add support for reading/writing native parquet geometry types (duckdb/duckdb#18832) Fix/run function in transaction (duckdb/duckdb#18741) Avoid expensive checkpoints and write amplification by appending row groups, and limiting vacuum operations for the last number of row groups (duckdb/duckdb#18829) fix: silence warnings about signed/unsigned conversions. (duckdb/duckdb#18835) fix: sanitize input for enable_logging (duckdb/duckdb#18830) Ensure a WAL file matches the DB file and checkpoint iteration (duckdb/duckdb#18823) Fix: Preserve database configuration flags for tab completion in DuckDB shell (duckdb/duckdb#18482) Extensions.yml: Pass down save_cache to inner workflows (duckdb/duckdb#18828) Fix format-fix runs on Linux (duckdb/duckdb#18827) Re-add accidentally removed check if copy_from is supported (duckdb/duckdb#18824) [Fix] Bug in fixed-size buffer when throwing out-of-memory (duckdb/duckdb#18769) For BC reasons - keep VARINT as alias for BIGNUM (duckdb/duckdb#18821) Add callback to get a list of copy options, use this to provide suggestions and to erase options from import that are only used during exporting (duckdb/duckdb#18812) fix: Add COLLATE NOCASE support to strpos function (duckdb/duckdb#18819) [CI] install libcurl4-openssl-dev with apt-get (duckdb/duckdb#18811) Provide failing file name in Parquet reader error messages (duckdb/duckdb#18814) Test runner: Expand '{UUID}' into a random UUID (duckdb/duckdb#18809) Expected errors 2053 (duckdb/duckdb#18810) Add support for non-aggregate window functions (duckdb/duckdb#18788) Fix some unindented interactions between `EMPTY_RESULT_PULLUP` and `MATERIALIZED` CTEs (duckdb/duckdb#18805) Typed macro parameters (duckdb/duckdb#18786) Internal duckdb/duckdb#3273: Hashed Sort Callbacks (duckdb/duckdb#18796) [chore] Fixup tidy-check on src/logging/log_manager.cpp by passing const & (duckdb/duckdb#18801) Fixup progress_bar: avoid converting doubles into int32_t unchecked (duckdb/duckdb#18800) Issue duckdb/duckdb#18767: Ignore Timestamp Offsets (duckdb/duckdb#18794) bump httpfs so it includes curl option (duckdb/duckdb#18691) Improve autocomplete suggestions (duckdb/duckdb#18773) Remove everything python-package related (duckdb/duckdb#18789) Support expressions as COPY file target (duckdb/duckdb#18795) fix: coalesce query progress updates to reduce terminal writes (duckdb/duckdb#18672) Task Scheduler: track exact task count, and re-signal on dequeue failure if there are tasks left (duckdb/duckdb#18792) Internal duckdb/duckdb#3273: Parallel Window Masks (duckdb/duckdb#18731) fix: improve speed of GetValue() for STRUCT type (duckdb/duckdb#18785) Add `memory_limit` parameter to `benchmark_runner`/`test_runner.py` (duckdb/duckdb#18790) Treat ENABLE_EXTENSION_AUTOINSTALL as the BOOL that it is (duckdb/duckdb#18778) Move row id logic to separate RowIdColumnData class instead of inlining it into the RowGroup (duckdb/duckdb#18780) Improve error messages for merge / vector reference (duckdb/duckdb#18777) Use microsecond resolution for printing the current timestamp (duckdb/duckdb#18776) Add `file_size_bytes` (de-)serialization (duckdb/duckdb#18775) Propagate `DUCKDB_*_VERSION` in extensions and tests (duckdb/duckdb#18774) [Test Fix] Forward output to file (duckdb/duckdb#18772) [CI] Adjust test configs post logger PR (duckdb/duckdb#18771) Revert "Use 1-based indexing for SQL-based JSON array extraction" (duckdb/duckdb#18758) Make `duckdb_log` return a TIMESTAMP_TZ (duckdb/duckdb#18768) Fix Path Typo in Extension's CMake Warning Message (duckdb/duckdb#18766) Fix index resolution when querying table with index via view (duckdb/duckdb#18319) Fix radix partitioning with more than 10 bits (duckdb/duckdb#18761) Add support for auto-globbing within a directory: if no matches are found for a specific path, we retry with `/**/*.[ext]` appended (duckdb/duckdb#18760) Refactor read_blob and read_text to use MultiFileFunction. (duckdb/duckdb#18706) Add missing expected errors to the test cases (next chunk) (duckdb/duckdb#18753) Minor logging fixes and more benchmarking (duckdb/duckdb#18755) Extensions.yml should also check converted_to_draft (duckdb/duckdb#18754) [Profiling] Add Profiling to Write Function (duckdb/duckdb#18724) Fixing lazy polars execution on query result (duckdb/duckdb#18749) Remove separate WAL encryption flag (duckdb/duckdb#18750) Add leak suppressions to nightly runs (duckdb/duckdb#18748) Append using a SQL query, instead of directly appending to a base table, and support user-provided queries through the QueryAppender (duckdb/duckdb#18738) removed placeholder client directories for node and jdbc, its been > 1 yr (duckdb/duckdb#18757) Add missing expected errors to the test cases (duckdb/duckdb#18746) Add OS X notarization for DuckDB CLI and libduckdb.dylib (duckdb/duckdb#18747) Use correct type for pushing collations in subqueries (duckdb/duckdb#18744) Merge ossivalis into main (duckdb/duckdb#18719) Secrets: if serialization_type is not specified, assume it's a key value secret (duckdb/duckdb#18743) [C API] Function to set a copy callback for bind data (duckdb/duckdb#18739) fix timetravel for default tables (duckdb/duckdb#18240) [unittest] SkipLoggingSameError() to make unittester report one failure per case (duckdb/duckdb#18270) Use 1-based indexing for SQL-based JSON array extraction (duckdb/duckdb#18735) Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641) Internal duckdb/duckdb#5669: Loop Join Thresholds (duckdb/duckdb#18733) Fix PIVOT in multiple statements (duckdb/duckdb#18729) Minor fixes for other catalogs - mostly checking `IsDuckTable()` for unsupported operations (duckdb/duckdb#18720) Added support for blob<->uuid conversions (duckdb/duckdb#18027) #Fix 18558: add row_group scan fast path (duckdb/duckdb#18686) Improved grammar generation script (duckdb/duckdb#18716) Correctly throw an error when too few columns are supplied in MERGE INTO INSERT (duckdb/duckdb#18715) [Profiling] Add Profiling to Read Function (duckdb/duckdb#18661) Fix issue with materialized CTE optimization in flatten_dependent_join (duckdb/duckdb#18714) Add Option to Allocate Using an Arena in `string_t` (duckdb/duckdb#17992) Internal duckdb/duckdb#3273: Hashed Sort States (duckdb/duckdb#18690) Python-style positional/named arguments for macro's (duckdb/duckdb#18684) [Fix] Correctly handle table and index chunks in WAL replay buffering (duckdb/duckdb#18700) Make ART construction iterative via ARTBuilder (duckdb/duckdb#18702) Correctly handle collations for IN (subquery) (duckdb/duckdb#18698) Hold row group lock for entire call of MoveToCollection (duckdb/duckdb#18694) Expected errors 2053 (duckdb/duckdb#18695) Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689) Make sure parse errors are wrapped in ErrorData (duckdb/duckdb#18682) Internal duckdb/duckdb#5366: Window State Arguments (duckdb/duckdb#18676) Expected errors 2053 (duckdb/duckdb#14213) Add `date_trunc()` simplification rules (duckdb/duckdb#18457) Fix the issue where delta_for isn't used in bitpacking when for is unavailable (duckdb/duckdb#18616) fix error message related to wrong memory unit (duckdb/duckdb#18671) Grab lock and double-check that column is not loaded in MoveToCollection (duckdb/duckdb#18677) Correctly allocate uncompressed string data in ZSTD for many giant strings (duckdb/duckdb#18678) Internal duckdb/duckdb#5662: IEJoin Test Plans (duckdb/duckdb#18680) [ Python SQLLogic Tester ] Add `MERGE_INTO` to `statement.type` enum in `result.py` (duckdb/duckdb#18675) Internal duckdb/duckdb#5366: Window Interrupt Arguments (duckdb/duckdb#18651) Correctly set weights in reservoir sample when switch to slow sampling (duckdb/duckdb#18563) [Dev] Add script to create patch from changes in an extension repository (duckdb/duckdb#18620) Python test runner: Fix hash comparison error output (duckdb/duckdb#18626) [CI] skip building encodings extension in InvokeCI (duckdb/duckdb#18655) CLI: Make ETA more of an estimate, and support large_row_rendering for footers (duckdb/duckdb#18656) Merge ossivalis into main (duckdb/duckdb#18644) Python test runner: Fix result check for `COPY ... RETURN_STATS` queries (duckdb/duckdb#18625) Add 1.4 release codename (duckdb/duckdb#18652) Change arrow() to export record batch reader (duckdb/duckdb#18642) bump spatial (on main) (duckdb/duckdb#18197) bump avro to v1.4 (duckdb/duckdb#18434) Make more configs into generic settings (duckdb/duckdb#18592) Add "Hash Zero" verification CI run (duckdb/duckdb#18623) feat: add ETA to progress bar in DuckDB CLI (duckdb/duckdb#18575) wrap httplib ::max() call in `WIN_32` check (duckdb/duckdb#18590) [ART] ART::Erase refactoring (duckdb/duckdb#18595) [CSV Sniffer] Fix type detection issue with union and empty columns (duckdb/duckdb#18606) Add Field IDS to multi file reader for positional deletes (duckdb/duckdb#18617) Re-add `hugeint` to `__internal_compress_string` (duckdb/duckdb#18622) Adjust filter pushdown to latest polars release (duckdb/duckdb#18624) parquet/parquet_multi_file_info.cpp: fix move from stack (duckdb/duckdb#18634) Issue duckdb/duckdb#18631: Streaming Windowed Quantile (duckdb/duckdb#18636) Fix serialization backwards compatability for varargs functions (duckdb/duckdb#18596) [Profiling] Add client context into more read functions (duckdb/duckdb#18514) [CI] Don't zip and upload Code Coverage tests results when Code Coverage got cancelled (duckdb/duckdb#18607) [Test] Fix test case and a benchmark (duckdb/duckdb#18610) Update README.md (duckdb/duckdb#18614) correctly setting log transaction id in ThreadContext (duckdb/duckdb#18536) [Fix] Hidden test failure in test_struct_update.test (duckdb/duckdb#18598) Increment storage version to enable `DICT_FSST` in benchmark file (duckdb/duckdb#18588) fix hidden merge conflict (duckdb/duckdb#18589) Adds a function for updating and adding values in a struct (duckdb/duckdb#15533) Pushdown filters on coalesced outer join keys compared for equality under the join condition (duckdb/duckdb#18169) fix: libduckdb.so missing soversion (duckdb/duckdb#18305) String dictionary hash cache (duckdb/duckdb#18580) Force `LIST`/`ARRAY` child vectors on a Parquet single page (duckdb/duckdb#18578) fix: use thousands separator and decimal for row counts in`duckbox` output format (duckdb/duckdb#18564) Flip left/right delim join based on cardinalities (duckdb/duckdb#18552) [Fix] Adjust shrink threshold back to original count > SHRINK_THRESHOLD (duckdb/duckdb#18582) [CSV Sniffer] Fixing bug of not properly setting skipped rows from sniffer (duckdb/duckdb#18555) fix: add formatting to explain row counts (duckdb/duckdb#18566) Delete FUNDING.json Update FUNDING.json Create FUNDING.json [Indexes] Buffer-managed indexes part 3: segment handle for Node48 and Node256 (duckdb/duckdb#18567) Rename the Varint type to Bignum (duckdb/duckdb#18547) Add compile option standalone-debug for clang (duckdb/duckdb#17433) Fixing compilation with -std=cpp23 (duckdb/duckdb#18557) [easy] [no-op] Minor optimization on iterator lookup (duckdb/duckdb#15349) optimize/parquet: generate movable types for parquet (duckdb/duckdb#18510) Check if `heap_block_ids` is empty before getting start/end when destroying chunks in `TupleDataCollection` (duckdb/duckdb#18556) Implement special-case `VARCHAR` to `JSON[]` casts and vice versa (duckdb/duckdb#18541) [ART] Node::Free refactoring (duckdb/duckdb#18544) [Fix] Follow-up PR to only delete unique row IDs (duckdb/duckdb#18545) Restore missing `test/configs/small_block_size.json` file (duckdb/duckdb#18507) Unittester: Add the `--sort-style` parameter that allows a fallback comparison where results are sorted according to a given sort-style (duckdb/duckdb#18542) Allow overriding openssl version for FIPS compliance (duckdb/duckdb#18499) fix: improve handling variant nulls and nested types (duckdb/duckdb#18538) Add support for explicit clean-up routine in test config, and exit multi-statement execution when an error is encountered (duckdb/duckdb#18539) Use global index, not local id when creating filters in `MultiFileColumnMapper` (duckdb/duckdb#18537) Add `StatementVerifier` for `EXPLAIN` (duckdb/duckdb#18529) Add CAPI to retrieve client context for table functions (duckdb/duckdb#18520) fix: support both field orders for variant struct (duckdb/duckdb#18532) [Varint] Negation, Subtraction and Over/under-flow checking (duckdb/duckdb#18477) ALP test: skip TPC-DS 67 - it is not consistent with floating point numbers (duckdb/duckdb#18528) Consistently detect JSON schema indepent of number of threads (duckdb/duckdb#18522) Internal duckdb/duckdb#16560: Numeric TRUNC Precision (duckdb/duckdb#18511) Dynamically determine dictionary size limit in Parquet writer (if unset) (duckdb/duckdb#18356) Fix incorrect character encoding in GetLastErrorAsString on Windows (duckdb/duckdb#18431) Fix: Write the salt together with the HT offset when determining the value for key comparison (duckdb/duckdb#18374) When tracking evicted_data_per_tag, track actual size on disk after temp file compression (duckdb/duckdb#18521) Adding WITH ORDINALITY to DuckDB (duckdb/duckdb#16581) ParserException for Pragma with named parameters (duckdb/duckdb#18506) Temporarily excluding `Build Pyodide wheel` for Python 3.11 because it fails to build `WASM` wheels (duckdb/duckdb#18508) Remove `immediate_transaction_mode` from DB config options (duckdb/duckdb#18516) Allow expressions to be used in ATTACH / COPY options (duckdb/duckdb#18515) Fix several bugs/fuzzer issues (duckdb/duckdb#18503) Fix: Remove overly strict assertion on empty string value (duckdb/duckdb#18504) Change UNICODE to UTF8 (duckdb/duckdb#17586) Merge ossivalis (duckdb/duckdb#18502) fix: add missing space in AttachInfo::ToString() (duckdb/duckdb#18500) julia: config improvements (duckdb/duckdb#17585) [Profiling] Add client context into read functions (duckdb/duckdb#18438) Fix accidental internal exception in type transformation (duckdb/duckdb#18492) add delta linux back to ci (duckdb/duckdb#18491) Change ctrl-a/ctrl-e to move to start/end of line, not buffer (duckdb/duckdb#18490) Unify `ON CONFLICT` and `MERGE INTO` (duckdb/duckdb#18480) Internal duckdb/duckdb#5384: Window Sorting Polish (duckdb/duckdb#18484) re-nable extensions in invokeci (duckdb/duckdb#18476) SUM and + Operator for Varints (duckdb/duckdb#18424) Internal duckdb/duckdb#5366: WindowDeltaScanner (duckdb/duckdb#18468) Merge ossivalis (duckdb/duckdb#18456) Bump postgres to latest main (duckdb/duckdb#18464) Internal duckdb/duckdb#5385: WindowMergeSortTree Sort Update (duckdb/duckdb#18461) Add support for generic settings, and move many settings over to generic settings (duckdb/duckdb#18447) Buffer index appends during WAL replay (duckdb/duckdb#18313) Internal duckdb/duckdb#5384: WindowDistinctAggregator Sort Update (duckdb/duckdb#18442) Add support for "template" types (duckdb/duckdb#18410) Update pyodide build to 0.28.0 (duckdb/duckdb#18446) Parquet: add row-group ordinal during writing encryption (duckdb/duckdb#18433) Include pyodide build configuration (duckdb/duckdb#18183) [Fix] Block size nightly (duckdb/duckdb#18425) Internal duckdb/duckdb#5368: WindowNaiveAggregator Sort Update (duckdb/duckdb#18409) Internal duckdb/duckdb#5367: SortedAggregateFunction Sort Update (duckdb/duckdb#18408) Refactor extension CI to use extension-ci-tools (duckdb/duckdb#18361) Correctly fetch only base column data in ColumnData::FetchUpdateData (duckdb/duckdb#18423) feat: remove anything following `?` in database name (duckdb/duckdb#18417) Merge `v1.3-ossivalis` in `main` (duckdb/duckdb#18401) Add support for table_constraints of AdbcConnectionGetObjects() (duckdb/duckdb#18181) Add DuckLake back in (duckdb/duckdb#18405) Internal duckdb/duckdb#5294: TIME_NS C API (duckdb/duckdb#18215) Remove incorrect assertion (duckdb/duckdb#18404) [ Python SQLLogic Tester ] Add `MERGE_INTO` statement to duckdb python (duckdb/duckdb#18402) CI: Add separate job for discussion mirroring (duckdb/duckdb#18407) Wrap runner.ExecuteFile, otherwise cleanup is not properly performed (duckdb/duckdb#18400) Internal duckdb/duckdb#3273: Window Hashed Sort (duckdb/duckdb#18337) Store extra metadata blocks in RowGroupPointer, and only flush dirty Metadata blocks (duckdb/duckdb#18398) CI: Fix Discussion mirroring (duckdb/duckdb#18397) Record whether or not cross products are implicit or not, and use this for converting queries back to SQL (duckdb/duckdb#18394) Correct and consistent integer arithmetic error messages (duckdb/duckdb#18393) Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395) Approx database count system function (duckdb/duckdb#18392) Re-use table metadata when table is not altered during checkpoint (duckdb/duckdb#18390) Bump httpfs (duckdb/duckdb#18388) Uncomment skipped decimal REE tests (duckdb/duckdb#18372) Re-enable but deprecate CORE_EXTENSIONS in CMakeLists.txt (duckdb/duckdb#18377) Add missing ninja to workflow file (duckdb/duckdb#18373) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18364) Pass `AttachOptions` to `attach` method, and turn `StorageExtensionInfo` into an `optional_ptr` (duckdb/duckdb#18368) Split up out-of-tree extensions into separate files, and allow out-of-tree extensions to be built using BUILD_EXTENSIONS={ext_name} (duckdb/duckdb#18357) Python external dispatch param fixes (duckdb/duckdb#18359) Revert "[unittest] - fix doubled error headers on `Unexpected failure`" (duckdb/duckdb#18355) Add support for checkpointing in-memory tables (duckdb/duckdb#18348) [C API] Expose expressions and use them in scalar function binding (duckdb/duckdb#18142) Extend PEG parser grammar (duckdb/duckdb#18221) [unittest] - fix doubled error headers on `Unexpected failure` (duckdb/duckdb#18314) Fix condition indexes in join filter pushdown (duckdb/duckdb#18341) download Real Nest data in quiet mode (duckdb/duckdb#18346) Fix debug error in join order optimizer (duckdb/duckdb#18344) Aarch64 backport (duckdb/duckdb#18345) add the from-table-function as parameter to copy-from-bind (duckdb/duckdb#18004) feat: making Parquet write RowGroup.total_compressed_size (duckdb/duckdb#18307) Make storage-version a test parameter (duckdb/duckdb#18324) New Arrow C-API (duckdb/duckdb#18246) feat: Parquet extension add row_group_compressed_size (duckdb/duckdb#18294) Merge ossivalis into main (duckdb/duckdb#18272) SHOW TABLES FROM <qualified_name> (duckdb/duckdb#18179) Add target for installing Python deps. (duckdb/duckdb#18285) Use `FromEpochSeconds` instead of `FromTimeT` in `FileSystem::GetLastModifiedTime` (duckdb/duckdb#18281) [Fix] Adjust test to run with different block sizes (duckdb/duckdb#18277) Use DuckDB cast infrastructure in fmt for new uhugeint/hugeint code (duckdb/duckdb#18275) Use set for row ID scanning during index scans (duckdb/duckdb#18274) Add support for RETURNING to MERGE INTO (duckdb/duckdb#18271) Support HUGEINT in printf and format (duckdb/duckdb#13277) Expanded autocomplete suggestions (duckdb/duckdb#18243) [Parquet] Add read support for the `VARIANT` LogicalType (with shredded encoding) (duckdb/duckdb#18224) Reduce copy in Vector::Reinterpret (duckdb/duckdb#18264) Fixes for gcc 15 (duckdb/duckdb#18261) Fix dictionary-related assertions (duckdb/duckdb#18260) Allow for static libs from extension dependencies to be bundled (duckdb/duckdb#18226) disable WebAssembly duckdb-wasm builds job in NightlyTests triggered by 'workflow_dispatch' event (duckdb/duckdb#18129) Bunch of loosely connected test/CI fixes (duckdb/duckdb#18254) update run_extension_medata_tests.sh (duckdb/duckdb#17976) fixes for some minor llvm 20 complaints (duckdb/duckdb#18257) Fix integer overflow in sequence vector (duckdb/duckdb#18245) Add type safety to `FlatVector::GetData<T>`, `ConstantVector::GetData<T>` and `UnifiedVectorFormat::GetData<T>` (duckdb/duckdb#18256) Slightly higher memory limit for test (duckdb/duckdb#18235) Improve descriptions of thresholds vars affecting join algorithm selection (duckdb/duckdb#17377) Add support for geoarrow encoded geometries in geoparquet files. (duckdb/duckdb#17942) Dictionary functions (duckdb/duckdb#18127) Better `NULL` handling in `TupleDataLayout` (duckdb/duckdb#18069) Track `DataChunk` memory usage in various places (duckdb/duckdb#18191) [Parquet] Add read support for the `VARIANT` LogicalType (duckdb/duckdb#18187) Bugduckdb/duckdb#18163 Fix STDDEV_SAMP undeterminism (duckdb/duckdb#18210) Internal duckdb/duckdb#5264: NLJ Not Distinct (duckdb/duckdb#18216) Improve Parquet reader `NULL` statistics and compress all-`NULL` columns using `CompressedMaterialization` (duckdb/duckdb#18217) Get type of encoded `SortKey` from `TupleDataLayout` (duckdb/duckdb#18218) ci(pyodide): enable WASM exceptions on the latest pyodide build (duckdb/duckdb#18173) Temporary file encryption (duckdb/duckdb#18208) More internal-linkage (duckdb/duckdb#18177) Two-rowID-leaf support in the conflict manager and general refactoring (duckdb/duckdb#18194) [Parquet][Dev] Update the vendored `parquet.thrift` to `3ce0760` (duckdb/duckdb#18195) Parquet reader logging (duckdb/duckdb#18172) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18188) [Profiling] Move the client context into more write functions (duckdb/duckdb#17875) Check if `GetLastSegment` is not `nullptr` in `ColumnData::RevertAppend` (duckdb/duckdb#18171) Reduce lock contention for the instance cache (duckdb/duckdb#18079) fix bug with allowed_paths (duckdb/duckdb#18176) Avoid `realloc` in CSV writer (duckdb/duckdb#18174) fix typo (duckdb/duckdb#18165) Resolve some small build issues (duckdb/duckdb#18162) Implement `replace_type` function (duckdb/duckdb#18077) Issue duckdb/duckdb#17683: TIME_NS Compilation (duckdb/duckdb#18053) Add support for AdbcConnectionGetObjects(table_type) (duckdb/duckdb#18066) Detect when updates have no effect, and skip performing the actual updates if we encounter these nop updates (duckdb/duckdb#18144) Add support for `MERGE INTO` (duckdb/duckdb#18135) Improve sort key comparison performance (duckdb/duckdb#18131) set ::error:: annotations for test runners (duckdb/duckdb#18072) Internal duckdb/duckdb#3273: Window Task Generation (duckdb/duckdb#18113) Update description of 'arrow_lossless_conversion' (duckdb/duckdb#18046) [chore] Merge v1.3-ossivalis on main (duckdb/duckdb#18109) ci: build duckdb against the latest emscripten (duckdb/duckdb#18110) Don't throw `InternalException` in `Sort::Sink` (duckdb/duckdb#18105) TPC-DS: Use BIGINT fields (duckdb/duckdb#18098) [CI] don't run jobs on draft PRs (duckdb/duckdb#18016) Fix correlated subquery unnest fail (duckdb/duckdb#18092) [CSV Reader] Prohibit options delim and sep in same read_csv call (duckdb/duckdb#18096) Add start/end offset percentage options to Python test runner (duckdb/duckdb#18091) Avoid running DraftPR.yml until timeout if token is missing (duckdb/duckdb#18090) Unittest: Configure skip error messages (duckdb/duckdb#18087) Switch to Optional for type hints in polars lazy dataframe function (duckdb/duckdb#18078) Issue duckdb/duckdb#18071: Temporal inf -inf (duckdb/duckdb#18083) Fix some scaling issues (duckdb/duckdb#17985) Unittester: add `on_new_connection` + `on_load` + `skip_tests` options (duckdb/duckdb#18042) Use `timestamp_t` instead of `time_t` for file last modified time (duckdb/duckdb#18037) Add support for class-based expression iteration (duckdb/duckdb#18070) fix star expr exclude error (duckdb/duckdb#18063) Adding WAL encryption (duckdb/duckdb#17955) Avoid adding commands read from a file to the shell history (duckdb/duckdb#18057) Remove match-case statements from polars_io.py (duckdb/duckdb#18052) Merge ossivalis into main (duckdb/duckdb#18036) Add ppc64le spin-wait instruction (duckdb/duckdb#17837) Unittest: Add skip_compiled option that can be used to skip built-in C++ tests (duckdb/duckdb#18034) [Explain] Add the YAML format for EXPLAIN statements (duckdb/duckdb#17572) Remove Linux (32 Bit) job (duckdb/duckdb#18012) [Chore] Minor conflict manager refactoring (duckdb/duckdb#18015) Fix duckdb/duckdb#18007: correctly execute expressions with pivot operator (duckdb/duckdb#18020) c-api to copy vector with selection (duckdb/duckdb#17870) Add support to produce Polars Lazy Dataframes (duckdb/duckdb#17947) Implement consumption and production of Arrow Binary View (duckdb/duckdb#17975) Rework extension loading to go through thread-safe ExtensionManager (duckdb/duckdb#17994) Issue duckdb/duckdb#5123: make_timestamp_ms (duckdb/duckdb#17908) Flag to disable database invalidation (duckdb/duckdb#17938) [Fix] Reset profiling info before preparing a query (duckdb/duckdb#17940) Issue duckdb/duckdb#5144: AsOf Join Threshold (duckdb/duckdb#17979) [CI] Skip some workflows when updating out of tree extensions SHA (duckdb/duckdb#17949) Merge v1.3-ossivalis into main (duckdb/duckdb#17973) [nested] Allow fixed-size arrays to be unnested (duckdb/duckdb#17968) Unit Tester Configuration (duckdb/duckdb#17972) [Nested] Optimize structs in `LIST_VALUE` (duckdb/duckdb#17169) Enable building spatial and encodings extensions (duckdb/duckdb#17960) [Nested] Add `struct_position` and `struct_contains` functions (duckdb/duckdb#17819) Visual Studio 17 (2022) fixes (duckdb/duckdb#17948) [CI Nightly Fix] Skip logging test if not standard block size (duckdb/duckdb#17957) Add v1.3-ossivalis to Cross version workflow (duckdb/duckdb#17906) Unittester failures summary (duckdb/duckdb#16833) Block based encryption (duckdb/duckdb#17275) Do not dispatch JDBC/ODBC jobs in release CI runs (duckdb/duckdb#17937) fix use after free in adbc on invalid stmt (duckdb/duckdb#17927) Fix empty BP block when writing parquet (duckdb/duckdb#17929) Leverage `VectorType` in `ColumnDataCollection` (duckdb/duckdb#17881) Merge v1.3 into main (duckdb/duckdb#17897) Make CTE Materialization the Default Instead of Inlining (duckdb/duckdb#17459) Use an arena linked list for the physical operator children (duckdb/duckdb#17748) Reword GenAI policy (duckdb/duckdb#17895) Issue duckdb/duckdb#17861: FILL Argument Types (duckdb/duckdb#17888) Update function descriptions and examples for list, array, lambda functions (duckdb/duckdb#17886) Add GenAI policy (duckdb/duckdb#17882) Issue duckdb/duckdb#17849: Test FILL Duplicates (duckdb/duckdb#17869) Add STRUCT to MAP cast function (duckdb/duckdb#17799) Issue duckdb/duckdb#17040: FILL Secondary Sorts (duckdb/duckdb#17821) Issue duckdb/duckdb#17153: Window Order Columns (duckdb/duckdb#17835) julia: add missing methods from C-API (duckdb/duckdb#17733) Function Serialization: adapt to removal of overloads by explicitly casting if argument types have changed (duckdb/duckdb#17864) [Indexes] Buffer-managed indexes part 2: segment handle for base nodes (duckdb/duckdb#17828) duckdb/duckdb#17853 Enable flexible page sizes and update Android NDK to r27 in workflow. (duckdb/duckdb#17854) Internal duckdb/duckdb#4991: Remove Epoch_MS(MS) (duckdb/duckdb#17816) Add `duckdb_type` column to parquet_schema (duckdb/duckdb#17852) Merge v1.3 into main (duckdb/duckdb#17851) Fix ICE with Windows ARM64 (duckdb/duckdb#17844) fix: escape using_columns on JoinRef::ToString (duckdb/duckdb#17839) Merge130 (duckdb/duckdb#17833) Replace string for const data ptr in encryption api (duckdb/duckdb#17825) Pushdown pivot filter (duckdb/duckdb#17801) Merge v1.3 into main (duckdb/duckdb#17806) Add qualified parameter to Python GetTableNames API (duckdb/duckdb#17797) Fix propagatesNullValues for case expr (duckdb/duckdb#17796) [Profiling] Propagate the ClientContext into file handle writes (duckdb/duckdb#17754) Ensure we use the same layout in `RadixPartitionedHashTable` and `GroupedAggregateHashTable` (duckdb/duckdb#17790) [Julia] api docs improvements (duckdb/duckdb#15645) [Indexes] Buffer-managed indexes part 1: segment handles (duckdb/duckdb#17758) Mark Upper/LowerComparisonType as const (duckdb/duckdb#17773) Support glibc 2.28 environments (duckdb/duckdb#17776) Pass `ExtensionLoader` when loading extensions, change extension entry function (duckdb/duckdb#17772) Expose file_size_bytes and footer_size in parquet_file_metadata (duckdb/duckdb#17750) [CAPI] Expose ErrorData (duckdb/duckdb#17722) Rename decorator from test_nulls to null_test_parameters (duckdb/duckdb#17760) re-add httpfs apply_patches (duckdb/duckdb#17755) Deprecate windows-2019 runners (duckdb/duckdb#17745) csv_scanner: correct code comment (duckdb/duckdb#17735) Adding additional authenticated data for encryption (duckdb/duckdb#17508) [SQLLogicTester] Introduce `reset label <query label>` in the tester (duckdb/duckdb#17729) Fix windows-2025 build errors (duckdb/duckdb#17726) Aggregation performance (duckdb/duckdb#17718) fix linux extension ci (duckdb/duckdb#17720) Correctly setting the delim offset (duckdb/duckdb#17716) Sorting followup (duckdb/duckdb#17717) Revert "set default for MAIN_BRANCH_VERSIONING to false" (duckdb/duckdb#17708) ClientBufferManager wrapper to access the client context in the buffer manager (duckdb/duckdb#17699) Issue duckdb/duckdb#17040: FILL Window Function (duckdb/duckdb#17686) Merge v1.3-ossivalis into main (duckdb/duckdb#17690) New Sorting Implementation (duckdb/duckdb#17584) Output hashes in unittest and fix order (duckdb/duckdb#17664) Enable profiling output for all operator types (duckdb/duckdb#17665) [C API] Expose duckdb_scalar_function_bind_get_extra_info (duckdb/duckdb#17666) Add rowsort in generate_series test duckdb/duckdb#43 (duckdb/duckdb#17675) bump DuckDB_jll to v1.3.0 (duckdb/duckdb#17677) C API tidying (duckdb/duckdb#17623) fix extension troubleshooting link (duckdb/duckdb#17616) Move query profiler's EndQuery after commit/rollback (duckdb/duckdb#17595) Update function descriptions and examples (duckdb/duckdb#17132) Add support for ToSqlString for union types (duckdb/duckdb#17513) Remove redundant code path in the ConflictManager (duckdb/duckdb#17562) change exception type to not be an internal exception (duckdb/duckdb#17551) Python package devexp improvements (duckdb/duckdb#17483)
This PR adds support for the SQL standard
MERGE INTOstatement.The
MERGE INTOstatement allows the user to perform inserts, updates or deletes conditionally depending on what data is present in the source or target.It is useful to e.g. allow for upserts without requiring an index to be present.
Examples
Suppose we keep track of a list of items in a warehouse, and we buy or sell items.
When buying more items, we either insert them to the table (if they don't exist yet), or we increment the count.
We can model this with a
MERGE INTOstatement with the following clauses:When selling items, we either delete the item if there are none left, or we decrement the count.
In addition, we want to print an error message if we are selling items we don't have.
If we try to sell the same items again, we get the following error as we ran out of books to sell:
Syntax
The complete syntax for
MERGE INTOis as follows:MatchClauses
MergeAction
We support all SQL standard aspects of the
MERGE INTOoperator, together with several extensions also supported by other systems, in addition to some of our own. The extensions we support are:WHEN NOT MATCHED BY SOURCE / BY TARGET- also supported by Postgres, DataBricksDO NOTHINGas an explicit action - also supported by PostgresUPDATE SET * / INSERT *- also supported by DataBricksUPDATE / INSERT BY NAME- DuckDB extensionERROR [ error_message ]- DuckDB extensionUSINGclause for join - DuckDB extensionWHEN NOT MATCHED BY SOURCE / WHEN NOT MATCHED BY TARGET
By default, and as specified in the SQL standard,
WHEN NOT MATCHEDis triggered for a row in the source that is not found in the target table we are merging into. However, in certain cases, we also want to trigger an action when there is a row in the target that has not found a match in the source. This can be specified using theWHEN NOT MATCHED BY SOURCEcondition.For example, if we want to use the
MERGE INTOstatement to make two tables fully equivalent, we can use the following merge into statement:This statement effectively means:
sourceandtarget, update the values intargetso that they are equivalent to that insourcesourcethat does not exist intarget, insert ittargetthat does not exist insource, delete itThe presence of the various
WHEN NOT MATCHEDclause effectively determines the join type we need to use to perform the merge:WHEN NOT MATCHED BY TARGET: LEFT OUTER JOINWHEN NOT MATCHED BY SOURCE: RIGHT OUTER JOININSERT [ BY NAME / BY POSITION ], UPDATE [ BY NAME / BY POSITION]
INSERTandUPDATEby default need a list of clauses, e.g.INSERT VALUES (col1, col2)orUPDATE SET col1 = val, col2 = val.As an extension, DuckDB also supports automatically generating these from the source table.
This is similar to how
INSERTfrom a query works, e.g.:This is also supported for
UPDATE, where we can doUPDATEorUPDATE BY NAMEto update all columns in the target table either by position or by name from the source table.