Re-use metadata of unaltered row groups when checkpointing a table #18395

Mytherin · 2025-07-24T15:55:09Z

Follow-up from #18390

This PR implements metadata re-use at the row group level, so if we are e.g. appending to a large table we no longer rewrite the metadata of unchanged row-groups, and instead refer to the existing metadata on-disk. In addition, this PR also performs a few fixes where we would eagerly load columns unnecessarily.

Performance

Consider a database storing TPC-H SF100. The biggest table (lineitem) has 600M rows.

CALL dbgen(sf=100);

Now insert a single row into the largest table (lineitem) and checkpoint. We measure two times: the time of the CHECKPOINT command, and the full runtime of opening the database, running the insert + checkpoint, and closing the database.

INSERT INTO lineitem VALUES (600000001, 0, 0, 0, 0, 0, 0, 0, '', '', DATE '2000-01-01', DATE '2000-01-01', DATE '2000-01-01', '', '', '');
CHECKPOINT;

Operation	v1.3.2	New
Checkpoint	0.5s	0.11s
Full	0.63s	0.13s

…umns just to not do anything with them

…Metadata blocks (#18398) Follow-up from #18395 This PR adds storage for `extra_metadata_blocks` in the row-group metadata when the latest storage version (v1.4.0) is used. This is a list of metadata blocks that are referenced by the row group metadata, but **not** present in the list of data pointers (`data_pointers`). These blocks can be present when either (1) columns are very wide due to e.g. being deeply nested, or (2) the final column pointer "crosses" the metadata block threshold, as `data_pointers` points only to the beginning of the column metadata. Usually this list is empty - and therefore storing this does not take up a lot of extra storage space. The presence of this list allows us to more efficiently re-use metadata as we know exactly which metadata blocks a row group points to, without having to do any additional deserialization. ### Only Flush Dirty Metadata blocks Previously our metadata manager would flush all metadata blocks, incurring a lot of unnecessary I/O now that we are doing a lot of metadata re-use. This PR reworks this so that we keep track of which blocks are dirty and only flush the dirty blocks. ### Performance Running the benchmark in #18395 again, we now get the following timings: | Operation | v1.3.2 | Re-Use | New | |------------|--------|-------|-------| | Checkpoint | 0.5s | 0.11s | 0.04s | | Full | 0.63s | 0.13s | 0.07s |

Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)

…ion (#18677) Should fix a rare race condition that would cause `start` not to be set correctly after #18395

bump iceberg to latest main [chore] Fix amalgamation build in progress_bar (duckdb/duckdb#18910) Bump inet & aws (duckdb/duckdb#18899) fix: refine query ETA display and Kalman filter stability (duckdb/duckdb#18880) Bump httpfs to v1.4-andium branch (duckdb/duckdb#18898) Encryption now encoded as a bit, centralizing in set/getter (duckdb/duckdb#18897) Add callback for when an extension fails to load, and also log this (duckdb/duckdb#18894) Keep base data scan state alive in ColumnData::Update call (duckdb/duckdb#18893) Expected errors 2053 (duckdb/duckdb#18892) fixing auto-specifying ciphers and remove double storage (duckdb/duckdb#18891) Add rowsort to upsert_default.test (duckdb/duckdb#18890) bump aws and iceberg (duckdb/duckdb#18889) [chore] Bump config test/configs/compressed_in_memory.json to new format (duckdb/duckdb#18888) [Dev] Fix footgun in `string_t::SetSizeAndFinalize` (duckdb/duckdb#18885) Json: no reinterpret<size_t*> (duckdb/duckdb#18886) [C API] Result schema of prepared statements (duckdb/duckdb#18779) Add `COPY (FORMAT BLOB)` to Andium too :^) (duckdb/duckdb#18884) Avoid automatically checkpointing if the database instance has been invalidated (duckdb/duckdb#18881) Update spatial+vss+sqlsmith in preparation for v1.4 (duckdb/duckdb#18882) Internal duckdb/duckdb#5796: Window Progress (duckdb/duckdb#18860) [Test] Small fixes to concurrent attach/detach test (duckdb/duckdb#18862) Update ducdkb iceberg hash (duckdb/duckdb#18873) Storage fuzzing + several fixes (duckdb/duckdb#18876) Bump mbedtls to v3.6.4 (duckdb/duckdb#18871) [minor] Incompatible DB error message: add newline (duckdb/duckdb#18861) Bump & remove patches for delta, avro, excel, encodings, fts (duckdb/duckdb#18869) Add a FORCE_DEBUG flag to force `-DDEBUG`, similar to FORCE_ASSERT (duckdb/duckdb#18872) Expected errors 2053 (duckdb/duckdb#18864) update duckdb azure extension ref for 1.4.0 (duckdb/duckdb#18868) Hold segment lock during GetColumnSegmentInfo (duckdb/duckdb#18859) Centralize attached database paths in a DatabaseFilePathManager which is shared across databases created through the same DBInstanceCache (duckdb/duckdb#18857) Add more encryption modes CTR and CBC (duckdb/duckdb#18619) Bump Ducklake (duckdb/duckdb#18825) No more `wal_encryption` flag (duckdb/duckdb#18851) Fix `NULL` path for `json_each`/`json_tree` (duckdb/duckdb#18852) Make ATTACH OR REPLACE atomic, keep list of used databases in MetaTransaction (duckdb/duckdb#18850) WAL <> DB File Match Fixes (duckdb/duckdb#18849) Add test_env to unit tester (duckdb/duckdb#18847) Merge ossivalis into main (duckdb/duckdb#18844) Bump MySQL/Postgres/SQLite (duckdb/duckdb#18848) Add OnBeginExtensionLoad callback (duckdb/duckdb#18842) Ignore null verification for statistics on structs (duckdb/duckdb#18813) Document storage version flag in CLI + minor rendering fix (duckdb/duckdb#18841) Add the `VARIANT` LogicalType (duckdb/duckdb#18609) Avoid printing '99 hours', given in most cases that means estimate is… (duckdb/duckdb#18839) Don't notify Py pkg when override git describe is set (duckdb/duckdb#18843) Add support for reading/writing native parquet geometry types (duckdb/duckdb#18832) Fix/run function in transaction (duckdb/duckdb#18741) Avoid expensive checkpoints and write amplification by appending row groups, and limiting vacuum operations for the last number of row groups (duckdb/duckdb#18829) fix: silence warnings about signed/unsigned conversions. (duckdb/duckdb#18835) fix: sanitize input for enable_logging (duckdb/duckdb#18830) Ensure a WAL file matches the DB file and checkpoint iteration (duckdb/duckdb#18823) Fix: Preserve database configuration flags for tab completion in DuckDB shell (duckdb/duckdb#18482) Extensions.yml: Pass down save_cache to inner workflows (duckdb/duckdb#18828) Fix format-fix runs on Linux (duckdb/duckdb#18827) Re-add accidentally removed check if copy_from is supported (duckdb/duckdb#18824) [Fix] Bug in fixed-size buffer when throwing out-of-memory (duckdb/duckdb#18769) For BC reasons - keep VARINT as alias for BIGNUM (duckdb/duckdb#18821) Add callback to get a list of copy options, use this to provide suggestions and to erase options from import that are only used during exporting (duckdb/duckdb#18812) fix: Add COLLATE NOCASE support to strpos function (duckdb/duckdb#18819) [CI] install libcurl4-openssl-dev with apt-get (duckdb/duckdb#18811) Provide failing file name in Parquet reader error messages (duckdb/duckdb#18814) Test runner: Expand '{UUID}' into a random UUID (duckdb/duckdb#18809) Expected errors 2053 (duckdb/duckdb#18810) Add support for non-aggregate window functions (duckdb/duckdb#18788) Fix some unindented interactions between `EMPTY_RESULT_PULLUP` and `MATERIALIZED` CTEs (duckdb/duckdb#18805) Typed macro parameters (duckdb/duckdb#18786) Internal duckdb/duckdb#3273: Hashed Sort Callbacks (duckdb/duckdb#18796) [chore] Fixup tidy-check on src/logging/log_manager.cpp by passing const & (duckdb/duckdb#18801) Fixup progress_bar: avoid converting doubles into int32_t unchecked (duckdb/duckdb#18800) Issue duckdb/duckdb#18767: Ignore Timestamp Offsets (duckdb/duckdb#18794) bump httpfs so it includes curl option (duckdb/duckdb#18691) Improve autocomplete suggestions (duckdb/duckdb#18773) Remove everything python-package related (duckdb/duckdb#18789) Support expressions as COPY file target (duckdb/duckdb#18795) fix: coalesce query progress updates to reduce terminal writes (duckdb/duckdb#18672) Task Scheduler: track exact task count, and re-signal on dequeue failure if there are tasks left (duckdb/duckdb#18792) Internal duckdb/duckdb#3273: Parallel Window Masks (duckdb/duckdb#18731) fix: improve speed of GetValue() for STRUCT type (duckdb/duckdb#18785) Add `memory_limit` parameter to `benchmark_runner`/`test_runner.py` (duckdb/duckdb#18790) Treat ENABLE_EXTENSION_AUTOINSTALL as the BOOL that it is (duckdb/duckdb#18778) Move row id logic to separate RowIdColumnData class instead of inlining it into the RowGroup (duckdb/duckdb#18780) Improve error messages for merge / vector reference (duckdb/duckdb#18777) Use microsecond resolution for printing the current timestamp (duckdb/duckdb#18776) Add `file_size_bytes` (de-)serialization (duckdb/duckdb#18775) Propagate `DUCKDB_*_VERSION` in extensions and tests (duckdb/duckdb#18774) [Test Fix] Forward output to file (duckdb/duckdb#18772) [CI] Adjust test configs post logger PR (duckdb/duckdb#18771) Revert "Use 1-based indexing for SQL-based JSON array extraction" (duckdb/duckdb#18758) Make `duckdb_log` return a TIMESTAMP_TZ (duckdb/duckdb#18768) Fix Path Typo in Extension's CMake Warning Message (duckdb/duckdb#18766) Fix index resolution when querying table with index via view (duckdb/duckdb#18319) Fix radix partitioning with more than 10 bits (duckdb/duckdb#18761) Add support for auto-globbing within a directory: if no matches are found for a specific path, we retry with `/**/*.[ext]` appended (duckdb/duckdb#18760) Refactor read_blob and read_text to use MultiFileFunction. (duckdb/duckdb#18706) Add missing expected errors to the test cases (next chunk) (duckdb/duckdb#18753) Minor logging fixes and more benchmarking (duckdb/duckdb#18755) Extensions.yml should also check converted_to_draft (duckdb/duckdb#18754) [Profiling] Add Profiling to Write Function (duckdb/duckdb#18724) Fixing lazy polars execution on query result (duckdb/duckdb#18749) Remove separate WAL encryption flag (duckdb/duckdb#18750) Add leak suppressions to nightly runs (duckdb/duckdb#18748) Append using a SQL query, instead of directly appending to a base table, and support user-provided queries through the QueryAppender (duckdb/duckdb#18738) removed placeholder client directories for node and jdbc, its been > 1 yr (duckdb/duckdb#18757) Add missing expected errors to the test cases (duckdb/duckdb#18746) Add OS X notarization for DuckDB CLI and libduckdb.dylib (duckdb/duckdb#18747) Use correct type for pushing collations in subqueries (duckdb/duckdb#18744) Merge ossivalis into main (duckdb/duckdb#18719) Secrets: if serialization_type is not specified, assume it's a key value secret (duckdb/duckdb#18743) [C API] Function to set a copy callback for bind data (duckdb/duckdb#18739) fix timetravel for default tables (duckdb/duckdb#18240) [unittest] SkipLoggingSameError() to make unittester report one failure per case (duckdb/duckdb#18270) Use 1-based indexing for SQL-based JSON array extraction (duckdb/duckdb#18735) Add (CSV) file logger (duckdb/duckdb#17692) feat: enhance .tables command with schema disambiguation and filtering (duckdb/duckdb#18641) Internal duckdb/duckdb#5669: Loop Join Thresholds (duckdb/duckdb#18733) Fix PIVOT in multiple statements (duckdb/duckdb#18729) Minor fixes for other catalogs - mostly checking `IsDuckTable()` for unsupported operations (duckdb/duckdb#18720) Added support for blob<->uuid conversions (duckdb/duckdb#18027) #Fix 18558: add row_group scan fast path (duckdb/duckdb#18686) Improved grammar generation script (duckdb/duckdb#18716) Correctly throw an error when too few columns are supplied in MERGE INTO INSERT (duckdb/duckdb#18715) [Profiling] Add Profiling to Read Function (duckdb/duckdb#18661) Fix issue with materialized CTE optimization in flatten_dependent_join (duckdb/duckdb#18714) Add Option to Allocate Using an Arena in `string_t` (duckdb/duckdb#17992) Internal duckdb/duckdb#3273: Hashed Sort States (duckdb/duckdb#18690) Python-style positional/named arguments for macro's (duckdb/duckdb#18684) [Fix] Correctly handle table and index chunks in WAL replay buffering (duckdb/duckdb#18700) Make ART construction iterative via ARTBuilder (duckdb/duckdb#18702) Correctly handle collations for IN (subquery) (duckdb/duckdb#18698) Hold row group lock for entire call of MoveToCollection (duckdb/duckdb#18694) Expected errors 2053 (duckdb/duckdb#18695) Issue duckdb/duckdb#18457: DateTrunc Simplification Warnings (duckdb/duckdb#18687) [Python SQLLogicTest] Add `test/sql/pragma/profiling/test_profiling_all.test` to the SKIPPED_TESTS set (duckdb/duckdb#18689) Make sure parse errors are wrapped in ErrorData (duckdb/duckdb#18682) Internal duckdb/duckdb#5366: Window State Arguments (duckdb/duckdb#18676) Expected errors 2053 (duckdb/duckdb#14213) Add `date_trunc()` simplification rules (duckdb/duckdb#18457) Fix the issue where delta_for isn't used in bitpacking when for is unavailable (duckdb/duckdb#18616) fix error message related to wrong memory unit (duckdb/duckdb#18671) Grab lock and double-check that column is not loaded in MoveToCollection (duckdb/duckdb#18677) Correctly allocate uncompressed string data in ZSTD for many giant strings (duckdb/duckdb#18678) Internal duckdb/duckdb#5662: IEJoin Test Plans (duckdb/duckdb#18680) [ Python SQLLogic Tester ] Add `MERGE_INTO` to `statement.type` enum in `result.py` (duckdb/duckdb#18675) Internal duckdb/duckdb#5366: Window Interrupt Arguments (duckdb/duckdb#18651) Correctly set weights in reservoir sample when switch to slow sampling (duckdb/duckdb#18563) [Dev] Add script to create patch from changes in an extension repository (duckdb/duckdb#18620) Python test runner: Fix hash comparison error output (duckdb/duckdb#18626) [CI] skip building encodings extension in InvokeCI (duckdb/duckdb#18655) CLI: Make ETA more of an estimate, and support large_row_rendering for footers (duckdb/duckdb#18656) Merge ossivalis into main (duckdb/duckdb#18644) Python test runner: Fix result check for `COPY ... RETURN_STATS` queries (duckdb/duckdb#18625) Add 1.4 release codename (duckdb/duckdb#18652) Change arrow() to export record batch reader (duckdb/duckdb#18642) bump spatial (on main) (duckdb/duckdb#18197) bump avro to v1.4 (duckdb/duckdb#18434) Make more configs into generic settings (duckdb/duckdb#18592) Add "Hash Zero" verification CI run (duckdb/duckdb#18623) feat: add ETA to progress bar in DuckDB CLI (duckdb/duckdb#18575) wrap httplib ::max() call in `WIN_32` check (duckdb/duckdb#18590) [ART] ART::Erase refactoring (duckdb/duckdb#18595) [CSV Sniffer] Fix type detection issue with union and empty columns (duckdb/duckdb#18606) Add Field IDS to multi file reader for positional deletes (duckdb/duckdb#18617) Re-add `hugeint` to `__internal_compress_string` (duckdb/duckdb#18622) Adjust filter pushdown to latest polars release (duckdb/duckdb#18624) parquet/parquet_multi_file_info.cpp: fix move from stack (duckdb/duckdb#18634) Issue duckdb/duckdb#18631: Streaming Windowed Quantile (duckdb/duckdb#18636) Fix serialization backwards compatability for varargs functions (duckdb/duckdb#18596) [Profiling] Add client context into more read functions (duckdb/duckdb#18514) [CI] Don't zip and upload Code Coverage tests results when Code Coverage got cancelled (duckdb/duckdb#18607) [Test] Fix test case and a benchmark (duckdb/duckdb#18610) Update README.md (duckdb/duckdb#18614) correctly setting log transaction id in ThreadContext (duckdb/duckdb#18536) [Fix] Hidden test failure in test_struct_update.test (duckdb/duckdb#18598) Increment storage version to enable `DICT_FSST` in benchmark file (duckdb/duckdb#18588) fix hidden merge conflict (duckdb/duckdb#18589) Adds a function for updating and adding values in a struct (duckdb/duckdb#15533) Pushdown filters on coalesced outer join keys compared for equality under the join condition (duckdb/duckdb#18169) fix: libduckdb.so missing soversion (duckdb/duckdb#18305) String dictionary hash cache (duckdb/duckdb#18580) Force `LIST`/`ARRAY` child vectors on a Parquet single page (duckdb/duckdb#18578) fix: use thousands separator and decimal for row counts in`duckbox` output format (duckdb/duckdb#18564) Flip left/right delim join based on cardinalities (duckdb/duckdb#18552) [Fix] Adjust shrink threshold back to original count > SHRINK_THRESHOLD (duckdb/duckdb#18582) [CSV Sniffer] Fixing bug of not properly setting skipped rows from sniffer (duckdb/duckdb#18555) fix: add formatting to explain row counts (duckdb/duckdb#18566) Delete FUNDING.json Update FUNDING.json Create FUNDING.json [Indexes] Buffer-managed indexes part 3: segment handle for Node48 and Node256 (duckdb/duckdb#18567) Rename the Varint type to Bignum (duckdb/duckdb#18547) Add compile option standalone-debug for clang (duckdb/duckdb#17433) Fixing compilation with -std=cpp23 (duckdb/duckdb#18557) [easy] [no-op] Minor optimization on iterator lookup (duckdb/duckdb#15349) optimize/parquet: generate movable types for parquet (duckdb/duckdb#18510) Check if `heap_block_ids` is empty before getting start/end when destroying chunks in `TupleDataCollection` (duckdb/duckdb#18556) Implement special-case `VARCHAR` to `JSON[]` casts and vice versa (duckdb/duckdb#18541) [ART] Node::Free refactoring (duckdb/duckdb#18544) [Fix] Follow-up PR to only delete unique row IDs (duckdb/duckdb#18545) Restore missing `test/configs/small_block_size.json` file (duckdb/duckdb#18507) Unittester: Add the `--sort-style` parameter that allows a fallback comparison where results are sorted according to a given sort-style (duckdb/duckdb#18542) Allow overriding openssl version for FIPS compliance (duckdb/duckdb#18499) fix: improve handling variant nulls and nested types (duckdb/duckdb#18538) Add support for explicit clean-up routine in test config, and exit multi-statement execution when an error is encountered (duckdb/duckdb#18539) Use global index, not local id when creating filters in `MultiFileColumnMapper` (duckdb/duckdb#18537) Add `StatementVerifier` for `EXPLAIN` (duckdb/duckdb#18529) Add CAPI to retrieve client context for table functions (duckdb/duckdb#18520) fix: support both field orders for variant struct (duckdb/duckdb#18532) [Varint] Negation, Subtraction and Over/under-flow checking (duckdb/duckdb#18477) ALP test: skip TPC-DS 67 - it is not consistent with floating point numbers (duckdb/duckdb#18528) Consistently detect JSON schema indepent of number of threads (duckdb/duckdb#18522) Internal duckdb/duckdb#16560: Numeric TRUNC Precision (duckdb/duckdb#18511) Dynamically determine dictionary size limit in Parquet writer (if unset) (duckdb/duckdb#18356) Fix incorrect character encoding in GetLastErrorAsString on Windows (duckdb/duckdb#18431) Fix: Write the salt together with the HT offset when determining the value for key comparison (duckdb/duckdb#18374) When tracking evicted_data_per_tag, track actual size on disk after temp file compression (duckdb/duckdb#18521) Adding WITH ORDINALITY to DuckDB (duckdb/duckdb#16581) ParserException for Pragma with named parameters (duckdb/duckdb#18506) Temporarily excluding `Build Pyodide wheel` for Python 3.11 because it fails to build `WASM` wheels (duckdb/duckdb#18508) Remove `immediate_transaction_mode` from DB config options (duckdb/duckdb#18516) Allow expressions to be used in ATTACH / COPY options (duckdb/duckdb#18515) Fix several bugs/fuzzer issues (duckdb/duckdb#18503) Fix: Remove overly strict assertion on empty string value (duckdb/duckdb#18504) Change UNICODE to UTF8 (duckdb/duckdb#17586) Merge ossivalis (duckdb/duckdb#18502) fix: add missing space in AttachInfo::ToString() (duckdb/duckdb#18500) julia: config improvements (duckdb/duckdb#17585) [Profiling] Add client context into read functions (duckdb/duckdb#18438) Fix accidental internal exception in type transformation (duckdb/duckdb#18492) add delta linux back to ci (duckdb/duckdb#18491) Change ctrl-a/ctrl-e to move to start/end of line, not buffer (duckdb/duckdb#18490) Unify `ON CONFLICT` and `MERGE INTO` (duckdb/duckdb#18480) Internal duckdb/duckdb#5384: Window Sorting Polish (duckdb/duckdb#18484) re-nable extensions in invokeci (duckdb/duckdb#18476) SUM and + Operator for Varints (duckdb/duckdb#18424) Internal duckdb/duckdb#5366: WindowDeltaScanner (duckdb/duckdb#18468) Merge ossivalis (duckdb/duckdb#18456) Bump postgres to latest main (duckdb/duckdb#18464) Internal duckdb/duckdb#5385: WindowMergeSortTree Sort Update (duckdb/duckdb#18461) Add support for generic settings, and move many settings over to generic settings (duckdb/duckdb#18447) Buffer index appends during WAL replay (duckdb/duckdb#18313) Internal duckdb/duckdb#5384: WindowDistinctAggregator Sort Update (duckdb/duckdb#18442) Add support for "template" types (duckdb/duckdb#18410) Update pyodide build to 0.28.0 (duckdb/duckdb#18446) Parquet: add row-group ordinal during writing encryption (duckdb/duckdb#18433) Include pyodide build configuration (duckdb/duckdb#18183) [Fix] Block size nightly (duckdb/duckdb#18425) Internal duckdb/duckdb#5368: WindowNaiveAggregator Sort Update (duckdb/duckdb#18409) Internal duckdb/duckdb#5367: SortedAggregateFunction Sort Update (duckdb/duckdb#18408) Refactor extension CI to use extension-ci-tools (duckdb/duckdb#18361) Correctly fetch only base column data in ColumnData::FetchUpdateData (duckdb/duckdb#18423) feat: remove anything following `?` in database name (duckdb/duckdb#18417) Merge `v1.3-ossivalis` in `main` (duckdb/duckdb#18401) Add support for table_constraints of AdbcConnectionGetObjects() (duckdb/duckdb#18181) Add DuckLake back in (duckdb/duckdb#18405) Internal duckdb/duckdb#5294: TIME_NS C API (duckdb/duckdb#18215) Remove incorrect assertion (duckdb/duckdb#18404) [ Python SQLLogic Tester ] Add `MERGE_INTO` statement to duckdb python (duckdb/duckdb#18402) CI: Add separate job for discussion mirroring (duckdb/duckdb#18407) Wrap runner.ExecuteFile, otherwise cleanup is not properly performed (duckdb/duckdb#18400) Internal duckdb/duckdb#3273: Window Hashed Sort (duckdb/duckdb#18337) Store extra metadata blocks in RowGroupPointer, and only flush dirty Metadata blocks (duckdb/duckdb#18398) CI: Fix Discussion mirroring (duckdb/duckdb#18397) Record whether or not cross products are implicit or not, and use this for converting queries back to SQL (duckdb/duckdb#18394) Correct and consistent integer arithmetic error messages (duckdb/duckdb#18393) Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395) Approx database count system function (duckdb/duckdb#18392) Re-use table metadata when table is not altered during checkpoint (duckdb/duckdb#18390) Bump httpfs (duckdb/duckdb#18388) Uncomment skipped decimal REE tests (duckdb/duckdb#18372) Re-enable but deprecate CORE_EXTENSIONS in CMakeLists.txt (duckdb/duckdb#18377) Add missing ninja to workflow file (duckdb/duckdb#18373) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18364) Pass `AttachOptions` to `attach` method, and turn `StorageExtensionInfo` into an `optional_ptr` (duckdb/duckdb#18368) Split up out-of-tree extensions into separate files, and allow out-of-tree extensions to be built using BUILD_EXTENSIONS={ext_name} (duckdb/duckdb#18357) Python external dispatch param fixes (duckdb/duckdb#18359) Revert "[unittest] - fix doubled error headers on `Unexpected failure`" (duckdb/duckdb#18355) Add support for checkpointing in-memory tables (duckdb/duckdb#18348) [C API] Expose expressions and use them in scalar function binding (duckdb/duckdb#18142) Extend PEG parser grammar (duckdb/duckdb#18221) [unittest] - fix doubled error headers on `Unexpected failure` (duckdb/duckdb#18314) Fix condition indexes in join filter pushdown (duckdb/duckdb#18341) download Real Nest data in quiet mode (duckdb/duckdb#18346) Fix debug error in join order optimizer (duckdb/duckdb#18344) Aarch64 backport (duckdb/duckdb#18345) add the from-table-function as parameter to copy-from-bind (duckdb/duckdb#18004) feat: making Parquet write RowGroup.total_compressed_size (duckdb/duckdb#18307) Make storage-version a test parameter (duckdb/duckdb#18324) New Arrow C-API (duckdb/duckdb#18246) feat: Parquet extension add row_group_compressed_size (duckdb/duckdb#18294) Merge ossivalis into main (duckdb/duckdb#18272) SHOW TABLES FROM <qualified_name> (duckdb/duckdb#18179) Add target for installing Python deps. (duckdb/duckdb#18285) Use `FromEpochSeconds` instead of `FromTimeT` in `FileSystem::GetLastModifiedTime` (duckdb/duckdb#18281) [Fix] Adjust test to run with different block sizes (duckdb/duckdb#18277) Use DuckDB cast infrastructure in fmt for new uhugeint/hugeint code (duckdb/duckdb#18275) Use set for row ID scanning during index scans (duckdb/duckdb#18274) Add support for RETURNING to MERGE INTO (duckdb/duckdb#18271) Support HUGEINT in printf and format (duckdb/duckdb#13277) Expanded autocomplete suggestions (duckdb/duckdb#18243) [Parquet] Add read support for the `VARIANT` LogicalType (with shredded encoding) (duckdb/duckdb#18224) Reduce copy in Vector::Reinterpret (duckdb/duckdb#18264) Fixes for gcc 15 (duckdb/duckdb#18261) Fix dictionary-related assertions (duckdb/duckdb#18260) Allow for static libs from extension dependencies to be bundled (duckdb/duckdb#18226) disable WebAssembly duckdb-wasm builds job in NightlyTests triggered by 'workflow_dispatch' event (duckdb/duckdb#18129) Bunch of loosely connected test/CI fixes (duckdb/duckdb#18254) update run_extension_medata_tests.sh (duckdb/duckdb#17976) fixes for some minor llvm 20 complaints (duckdb/duckdb#18257) Fix integer overflow in sequence vector (duckdb/duckdb#18245) Add type safety to `FlatVector::GetData<T>`, `ConstantVector::GetData<T>` and `UnifiedVectorFormat::GetData<T>` (duckdb/duckdb#18256) Slightly higher memory limit for test (duckdb/duckdb#18235) Improve descriptions of thresholds vars affecting join algorithm selection (duckdb/duckdb#17377) Add support for geoarrow encoded geometries in geoparquet files. (duckdb/duckdb#17942) Dictionary functions (duckdb/duckdb#18127) Better `NULL` handling in `TupleDataLayout` (duckdb/duckdb#18069) Track `DataChunk` memory usage in various places (duckdb/duckdb#18191) [Parquet] Add read support for the `VARIANT` LogicalType (duckdb/duckdb#18187) Bugduckdb/duckdb#18163 Fix STDDEV_SAMP undeterminism (duckdb/duckdb#18210) Internal duckdb/duckdb#5264: NLJ Not Distinct (duckdb/duckdb#18216) Improve Parquet reader `NULL` statistics and compress all-`NULL` columns using `CompressedMaterialization` (duckdb/duckdb#18217) Get type of encoded `SortKey` from `TupleDataLayout` (duckdb/duckdb#18218) ci(pyodide): enable WASM exceptions on the latest pyodide build (duckdb/duckdb#18173) Temporary file encryption (duckdb/duckdb#18208) More internal-linkage (duckdb/duckdb#18177) Two-rowID-leaf support in the conflict manager and general refactoring (duckdb/duckdb#18194) [Parquet][Dev] Update the vendored `parquet.thrift` to `3ce0760` (duckdb/duckdb#18195) Parquet reader logging (duckdb/duckdb#18172) Merge `v1.3-ossivalis` into `main` (duckdb/duckdb#18188) [Profiling] Move the client context into more write functions (duckdb/duckdb#17875) Check if `GetLastSegment` is not `nullptr` in `ColumnData::RevertAppend` (duckdb/duckdb#18171) Reduce lock contention for the instance cache (duckdb/duckdb#18079) fix bug with allowed_paths (duckdb/duckdb#18176) Avoid `realloc` in CSV writer (duckdb/duckdb#18174) fix typo (duckdb/duckdb#18165) Resolve some small build issues (duckdb/duckdb#18162) Implement `replace_type` function (duckdb/duckdb#18077) Issue duckdb/duckdb#17683: TIME_NS Compilation (duckdb/duckdb#18053) Add support for AdbcConnectionGetObjects(table_type) (duckdb/duckdb#18066) Detect when updates have no effect, and skip performing the actual updates if we encounter these nop updates (duckdb/duckdb#18144) Add support for `MERGE INTO` (duckdb/duckdb#18135) Improve sort key comparison performance (duckdb/duckdb#18131) set ::error:: annotations for test runners (duckdb/duckdb#18072) Internal duckdb/duckdb#3273: Window Task Generation (duckdb/duckdb#18113) Update description of 'arrow_lossless_conversion' (duckdb/duckdb#18046) [chore] Merge v1.3-ossivalis on main (duckdb/duckdb#18109) ci: build duckdb against the latest emscripten (duckdb/duckdb#18110) Don't throw `InternalException` in `Sort::Sink` (duckdb/duckdb#18105) TPC-DS: Use BIGINT fields (duckdb/duckdb#18098) [CI] don't run jobs on draft PRs (duckdb/duckdb#18016) Fix correlated subquery unnest fail (duckdb/duckdb#18092) [CSV Reader] Prohibit options delim and sep in same read_csv call (duckdb/duckdb#18096) Add start/end offset percentage options to Python test runner (duckdb/duckdb#18091) Avoid running DraftPR.yml until timeout if token is missing (duckdb/duckdb#18090) Unittest: Configure skip error messages (duckdb/duckdb#18087) Switch to Optional for type hints in polars lazy dataframe function (duckdb/duckdb#18078) Issue duckdb/duckdb#18071: Temporal inf -inf (duckdb/duckdb#18083) Fix some scaling issues (duckdb/duckdb#17985) Unittester: add `on_new_connection` + `on_load` + `skip_tests` options (duckdb/duckdb#18042) Use `timestamp_t` instead of `time_t` for file last modified time (duckdb/duckdb#18037) Add support for class-based expression iteration (duckdb/duckdb#18070) fix star expr exclude error (duckdb/duckdb#18063) Adding WAL encryption (duckdb/duckdb#17955) Avoid adding commands read from a file to the shell history (duckdb/duckdb#18057) Remove match-case statements from polars_io.py (duckdb/duckdb#18052) Merge ossivalis into main (duckdb/duckdb#18036) Add ppc64le spin-wait instruction (duckdb/duckdb#17837) Unittest: Add skip_compiled option that can be used to skip built-in C++ tests (duckdb/duckdb#18034) [Explain] Add the YAML format for EXPLAIN statements (duckdb/duckdb#17572) Remove Linux (32 Bit) job (duckdb/duckdb#18012) [Chore] Minor conflict manager refactoring (duckdb/duckdb#18015) Fix duckdb/duckdb#18007: correctly execute expressions with pivot operator (duckdb/duckdb#18020) c-api to copy vector with selection (duckdb/duckdb#17870) Add support to produce Polars Lazy Dataframes (duckdb/duckdb#17947) Implement consumption and production of Arrow Binary View (duckdb/duckdb#17975) Rework extension loading to go through thread-safe ExtensionManager (duckdb/duckdb#17994) Issue duckdb/duckdb#5123: make_timestamp_ms (duckdb/duckdb#17908) Flag to disable database invalidation (duckdb/duckdb#17938) [Fix] Reset profiling info before preparing a query (duckdb/duckdb#17940) Issue duckdb/duckdb#5144: AsOf Join Threshold (duckdb/duckdb#17979) [CI] Skip some workflows when updating out of tree extensions SHA (duckdb/duckdb#17949) Merge v1.3-ossivalis into main (duckdb/duckdb#17973) [nested] Allow fixed-size arrays to be unnested (duckdb/duckdb#17968) Unit Tester Configuration (duckdb/duckdb#17972) [Nested] Optimize structs in `LIST_VALUE` (duckdb/duckdb#17169) Enable building spatial and encodings extensions (duckdb/duckdb#17960) [Nested] Add `struct_position` and `struct_contains` functions (duckdb/duckdb#17819) Visual Studio 17 (2022) fixes (duckdb/duckdb#17948) [CI Nightly Fix] Skip logging test if not standard block size (duckdb/duckdb#17957) Add v1.3-ossivalis to Cross version workflow (duckdb/duckdb#17906) Unittester failures summary (duckdb/duckdb#16833) Block based encryption (duckdb/duckdb#17275) Do not dispatch JDBC/ODBC jobs in release CI runs (duckdb/duckdb#17937) fix use after free in adbc on invalid stmt (duckdb/duckdb#17927) Fix empty BP block when writing parquet (duckdb/duckdb#17929) Leverage `VectorType` in `ColumnDataCollection` (duckdb/duckdb#17881) Merge v1.3 into main (duckdb/duckdb#17897) Make CTE Materialization the Default Instead of Inlining (duckdb/duckdb#17459) Use an arena linked list for the physical operator children (duckdb/duckdb#17748) Reword GenAI policy (duckdb/duckdb#17895) Issue duckdb/duckdb#17861: FILL Argument Types (duckdb/duckdb#17888) Update function descriptions and examples for list, array, lambda functions (duckdb/duckdb#17886) Add GenAI policy (duckdb/duckdb#17882) Issue duckdb/duckdb#17849: Test FILL Duplicates (duckdb/duckdb#17869) Add STRUCT to MAP cast function (duckdb/duckdb#17799) Issue duckdb/duckdb#17040: FILL Secondary Sorts (duckdb/duckdb#17821) Issue duckdb/duckdb#17153: Window Order Columns (duckdb/duckdb#17835) julia: add missing methods from C-API (duckdb/duckdb#17733) Function Serialization: adapt to removal of overloads by explicitly casting if argument types have changed (duckdb/duckdb#17864) [Indexes] Buffer-managed indexes part 2: segment handle for base nodes (duckdb/duckdb#17828) duckdb/duckdb#17853 Enable flexible page sizes and update Android NDK to r27 in workflow. (duckdb/duckdb#17854) Internal duckdb/duckdb#4991: Remove Epoch_MS(MS) (duckdb/duckdb#17816) Add `duckdb_type` column to parquet_schema (duckdb/duckdb#17852) Merge v1.3 into main (duckdb/duckdb#17851) Fix ICE with Windows ARM64 (duckdb/duckdb#17844) fix: escape using_columns on JoinRef::ToString (duckdb/duckdb#17839) Merge130 (duckdb/duckdb#17833) Replace string for const data ptr in encryption api (duckdb/duckdb#17825) Pushdown pivot filter (duckdb/duckdb#17801) Merge v1.3 into main (duckdb/duckdb#17806) Add qualified parameter to Python GetTableNames API (duckdb/duckdb#17797) Fix propagatesNullValues for case expr (duckdb/duckdb#17796) [Profiling] Propagate the ClientContext into file handle writes (duckdb/duckdb#17754) Ensure we use the same layout in `RadixPartitionedHashTable` and `GroupedAggregateHashTable` (duckdb/duckdb#17790) [Julia] api docs improvements (duckdb/duckdb#15645) [Indexes] Buffer-managed indexes part 1: segment handles (duckdb/duckdb#17758) Mark Upper/LowerComparisonType as const (duckdb/duckdb#17773) Support glibc 2.28 environments (duckdb/duckdb#17776) Pass `ExtensionLoader` when loading extensions, change extension entry function (duckdb/duckdb#17772) Expose file_size_bytes and footer_size in parquet_file_metadata (duckdb/duckdb#17750) [CAPI] Expose ErrorData (duckdb/duckdb#17722) Rename decorator from test_nulls to null_test_parameters (duckdb/duckdb#17760) re-add httpfs apply_patches (duckdb/duckdb#17755) Deprecate windows-2019 runners (duckdb/duckdb#17745) csv_scanner: correct code comment (duckdb/duckdb#17735) Adding additional authenticated data for encryption (duckdb/duckdb#17508) [SQLLogicTester] Introduce `reset label <query label>` in the tester (duckdb/duckdb#17729) Fix windows-2025 build errors (duckdb/duckdb#17726) Aggregation performance (duckdb/duckdb#17718) fix linux extension ci (duckdb/duckdb#17720) Correctly setting the delim offset (duckdb/duckdb#17716) Sorting followup (duckdb/duckdb#17717) Revert "set default for MAIN_BRANCH_VERSIONING to false" (duckdb/duckdb#17708) ClientBufferManager wrapper to access the client context in the buffer manager (duckdb/duckdb#17699) Issue duckdb/duckdb#17040: FILL Window Function (duckdb/duckdb#17686) Merge v1.3-ossivalis into main (duckdb/duckdb#17690) New Sorting Implementation (duckdb/duckdb#17584) Output hashes in unittest and fix order (duckdb/duckdb#17664) Enable profiling output for all operator types (duckdb/duckdb#17665) [C API] Expose duckdb_scalar_function_bind_get_extra_info (duckdb/duckdb#17666) Add rowsort in generate_series test duckdb/duckdb#43 (duckdb/duckdb#17675) bump DuckDB_jll to v1.3.0 (duckdb/duckdb#17677) C API tidying (duckdb/duckdb#17623) fix extension troubleshooting link (duckdb/duckdb#17616) Move query profiler's EndQuery after commit/rollback (duckdb/duckdb#17595) Update function descriptions and examples (duckdb/duckdb#17132) Add support for ToSqlString for union types (duckdb/duckdb#17513) Remove redundant code path in the ConflictManager (duckdb/duckdb#17562) change exception type to not be an internal exception (duckdb/duckdb#17551) Python package devexp improvements (duckdb/duckdb#17483)

…erialize, and add logging to checkpoints (#19055) This PR fixes an issue where column segments would not be re-aligned correctly upon `Deserialize` when the row group itself was re-aligned due to a vacuum operation. This could occur following an optimization done in #18395 that would postpone de-serializing column data in `RowGroup::MoveToCollection`. This could lead to an internal exception happening on checkpoint, which would occur when we would vacuum row groups, followed by loading previously unloaded segments. In addition, this PR also adds logging to vacuum/checkpoints.

…r, as that pointer might not always be valid (#19588) When enabling the new [experimental metadata re-use](#18395), it is possible for metadata of *some* row groups to be re-used. This can cause linked lists of metadata blocks to contain invalid references. For example, when writing a bunch of row groups, we might get this layout: ``` METADATA BLOCK 1 ROW GROUP 1 ROW GROUP 2 (pt 1) NEXT BLOCK: 2 -> METADATA BLOCK 2 ROW GROUP 2 (pt 2) ROW GROUP 3 ``` Metadata is stored in a linked list (block 1 -> block 2) - but we don't need to traverse this linked list fully. We store pointers to individual row groups, and can start reading from their position. Now suppose we re-use metadata of `ROW GROUP 1`, but not of the other row groups (because e.g. they have been updated / changed). Since this is fully contained in `METADATA BLOCK 1`, we can garbage collect `METADATA BLOCK 2`, leaving the following metadata block: ``` METADATA BLOCK 1 ROW GROUP 1 ROW GROUP 2 (pt 1) NEXT BLOCK: 2 ``` Now we can safely read this block and read the metadata for `ROW GROUP 1`, **however**, this block contains a reference to a metadata block that is no longer valid and might have been garbage collected. This revealed a problem in the `MetadataReader`. In the current implementation of the `MetadataReader` - when pointing it towards a block, it would eagerly try to figure out the metadata location of *the next block*. This is normally not a problem, however, with these invalid chains, we might try to resolve a block that has been freed up already - causing an internal exception to trigger: ``` Failed to load metadata pointer (id %llu, idx %llu, ptr %llu) ``` This PR resolves the issue by making the MetadataReader lazy. Instead of eagerly resolving the next pointer, we only do this when it is actually required.

Squashed commit of the following: commit 68d7555f68bd25c1a251ccca2e6338949c33986a Merge: 3d4d568674 9c6efc7d89 Author: Mark <[email protected]> Date: Tue Nov 11 11:59:30 2025 +0100 Fix minor crypto issues (#19716) commit 3d4d568674d1e05d221e8326c0d180336c350f18 Merge: 7386b4485d 0dea05daf8 Author: Mark <[email protected]> Date: Tue Nov 11 10:58:18 2025 +0100 Logs to be case-insensitive also at enable_logging callsite (#19734) Currently `CALL enable_logging('http');` would succeed, but then select an empty subset of the available logs (`http` != `HTTP`), due to a quirk in the code. This PR fixes that up. commit 7386b4485d23bc99c9f6efab6ce0e33ecc23222b Merge: 1ef3444f09 d4a77c801b Author: Mark <[email protected]> Date: Tue Nov 11 09:28:13 2025 +0100 Add explicit Initialize(HTTPParam&) method to HTTPClient (#19723) This allow explicit re-initialization of specific parts of HTTPClient(s) This diff would allow patterns such reusing partially constructed (but properly re-initialized) HTTPClient objects ```c++ struct CrossQueryState { // in some state kept around unique_ptr<HTTPClient>& client; }; void SomeFunction() { // ... http_util.Request(get_request, client); // some more logic, same query http_util.Request(get_request, client); } void SomeOtherFunction() { // Re-initialize part of the client, given some settings might have changed auto http_params = HTTPParams(http_util) client->Initialize(http_params); // ... http_util.Request(get_request, client); // some more logic, same query http_util.Request(get_request, client); } ``` Note that PR is fully opt-in from users, while if you implement a file-system abstraction inheriting from HTTPClient you should get a compiler error pointing to implementing the relevant function. commit 9c6efc7d89ee5ca60598c7e43778c0e9b34b266b Author: Mark <[email protected]> Date: Tue Nov 11 08:09:02 2025 +0100 Fix typo commit e52f71387731da1202fc33755922999a472218a1 Author: Mark <[email protected]> Date: Tue Nov 11 08:08:32 2025 +0100 Add require to test commit 1ef3444f09b1df6e4a7cc3ad1d67868ecaa1a6a4 Merge: 8090b8d52e dff5b7f608 Author: Mark <[email protected]> Date: Tue Nov 11 08:07:17 2025 +0100 Bump the Postgres scanner extension (#19730) commit 0dea05daf823237a2de28ec7c0fec53dbb006475 Author: Carlo Piovesan <[email protected]> Date: Tue Nov 11 06:42:36 2025 +0100 Logs to be case-insensitive also at enable_logging callsite commit 8090b8d52ed6bfd31b72013f6800cea89539cc2f Merge: 6667c7a3ec 5e9f88863f Author: Mark <[email protected]> Date: Mon Nov 10 21:34:42 2025 +0100 [Dev] Fix assertion failure for empty ColumnData serialization (#19713) The `PersistentColumnData` constructor asserts that the pointers aren't empty. This assertion will fail if we try to serialize the child of a list, if all lists are empty (as the child will be entirely empty then) Backported fix for problem found by: #19674 commit 6667c7a3ecdc56cc144a9bcf8601001af66e6839 Merge: 3f0ad6958f 4a0f4b0b38 Author: Mark <[email protected]> Date: Mon Nov 10 21:32:58 2025 +0100 Bump httpfs and resume testing on Windows (#19714) commit dff5b7f608b732a0e7c5d9a68e7e8d7db3c48478 Author: Mytherin <[email protected]> Date: Mon Nov 10 21:31:46 2025 +0100 Bump the Postgres scanner extension commit 0e3d0b5af535fcde90d272d95b1d08cb5fb12d15 Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 21:26:43 2025 +0100 remove deleted file from patch commit ffb7be7cc5f27d9945d6868f76ef769a3f8a43d4 Merge: 2142f0b10d 3f0ad6958f Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 21:13:17 2025 +0100 Merge branch 'v1.4-andium' into fix-crypto-issue commit 2142f0b10db72b89c9101fa65ead619182f8e5d1 Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 20:55:18 2025 +0100 fix duplicate job id commit 0a225cb99a130c2b1635d6ced03bc37f01ff9436 Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 20:52:40 2025 +0100 fix ci for encryption commit 3f0ad6958f1952a083bc499fc147f69504a3c6d2 Merge: f3fb834ef7 a1eeb0df6f Author: Mark <[email protected]> Date: Mon Nov 10 20:09:11 2025 +0100 Fix #19700: correctly sort output selection vector in nested selection operations (#19718) Fixes #19700 This probably should be maintained during the actual select - but for now just sorting it afterwards solves the issue. commit f3fb834ef7153b90ef3908eb51a5b85efa580ca5 Merge: 7333a0ae84 c8ddca6f3c Author: Mark <[email protected]> Date: Mon Nov 10 20:09:03 2025 +0100 Fix #19355: correctly resolve subquery in MERGE INTO action condition (#19720) Fixes #19355 commit 7333a0ae84d51729fffe91e67f12c3cee526af2a Merge: 95fcb8f188 6595848a27 Author: Mark <[email protected]> Date: Mon Nov 10 16:46:31 2025 +0100 Bump: delta, ducklake, httpfs (#19715) This PR bumps the following extensions: - `delta` from `0747c23791` to `6515bb2560` - `ducklake` from `022cfb1373` to `77f2512a67` - `httpfs` from `b80c680f86` to `041a782b0b` commit 35f98411037cb0499e236d0cbe20d6b3a0dcc43f Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 14:51:29 2025 +0100 install curl commit d4a77c801bb1a88e634c12bc64e185ef2f147d2d Author: Carlo Piovesan <[email protected]> Date: Mon Nov 10 14:37:42 2025 +0100 Add explicit Initialize(HTTPParams&) method to HTTPClient This allow explicit re-initialization of specific parts of HTTPClient(s) commit 6595848a27bd7fb271c63a99551d8326417320dd Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 11:30:34 2025 +0100 bump extensions commit 7a7726214c86267d476a2edbc68656ebd6253fe8 Author: Sam Ansmink <[email protected]> Date: Mon Nov 10 11:28:32 2025 +0100 fix: ci issues commit 4a0f4b0b38b9d5660c8a5c848d8a1c71bc3220de Author: Carlo Piovesan <[email protected]> Date: Mon Nov 10 11:07:58 2025 +0100 Bump httpfs and resume testing on Windows commit 5e9f88863f5f519620ae01f4ff873f6a2869343f Author: Tishj <[email protected]> Date: Mon Nov 10 10:58:03 2025 +0100 conditionally create the PersistentColumnData, if there are no segments (as could be the case for a list's child), there won't be any data pointers commit 95fcb8f18819b1a77df079a7fcb753a8c2f52844 Merge: 396c86228b 4f3df42f20 Author: Laurens Kuiper <[email protected]> Date: Mon Nov 10 10:50:38 2025 +0100 Bump: aws, ducklake, httpfs, iceberg (#19654) This PR bumps the following extensions: - `aws` from `18803d5e55` to `55bf3621fb` - `ducklake` from `2554312f71` to `022cfb1373` - `httpfs` from `8356a90174` to `b80c680f86` - `iceberg` from `5e22d03133` to `db7c01e92` commit c8ddca6f3c32aa0d3a9536371f9e3ca8cb00753e Author: Mytherin <[email protected]> Date: Mon Nov 10 09:19:31 2025 +0100 Fix #19355: correctly resolve subquery in MERGE INTO action condition commit a1eeb0df6ffc2f129638a2dfaab9a70720c8db1b Author: Mytherin <[email protected]> Date: Mon Nov 10 09:00:35 2025 +0100 Fix #19700: correctly sort output selection vector in nested selection operations commit 396c86228bda46929560affde7effdbab7d4e905 Merge: e3d242509e e501fcbd1a Author: Mark <[email protected]> Date: Sat Nov 8 17:34:13 2025 +0100 Add missing query location to blob cast (#19689) commit e3d242509e5710314921a0d7debd0bedb4d10a3e Merge: 7ce99bc041 1ba198d711 Author: Mark <[email protected]> Date: Sat Nov 8 17:34:04 2025 +0100 Add request timing to HTTP log (#19691) Demo: ```SQL D call enable_logging('HTTP'); D from read_csv_auto('s3://duckdblabs-testing/test.csv'); D select request.type, request.url, request.start_time, request.duration_ms from duckdb_logs_parsed('HTTP'); ┌─────────┬────────────────────────────────────────────────────────────────┬───────────────────────────────┬─────────────┐ │ type │ url │ start_time │ duration_ms │ │ varchar │ varchar │ timestamp with time zone │ int64 │ ├─────────┼────────────────────────────────────────────────────────────────┼───────────────────────────────┼─────────────┤ │ HEAD │ https://duckdblabs-testing.s3.us-east-1.amazonaws.com/test.csv │ 2025-11-07 10:17:56.052202+00 │ 417 │ │ GET │ https://duckdblabs-testing.s3.us-east-1.amazonaws.com/test.csv │ 2025-11-07 10:17:56.478847+00 │ 104 │ └─────────┴────────────────────────────────────────────────────────────────┴───────────────────────────────┴─────────────┘ ``` commit ae518d0a4e439f80c768388fab8f51d667f7e4b7 Author: Sam Ansmink <[email protected]> Date: Fri Nov 7 13:58:22 2025 +0100 minor ci fixes commit e501fcbd1af58cf147b80051b38ddf815d5e1b8c Author: Mytherin <[email protected]> Date: Fri Nov 7 12:37:21 2025 +0100 move commit bc1a683d10150dfe15f2f4d69e505f6337c4fc27 Author: Sam Ansmink <[email protected]> Date: Fri Nov 7 11:22:35 2025 +0100 only load httpfs if necessary commit 1ba198d71106a851fb8234ccfb208ec66b0e1d17 Author: Sam Ansmink <[email protected]> Date: Fri Nov 7 11:16:38 2025 +0100 fix: check if logger exists commit f22e9a06ef6e1b6c999e8c7389b05e40ae9032fc Author: Sam Ansmink <[email protected]> Date: Fri Nov 7 11:13:11 2025 +0100 add test for http log timing commit f474ba123485377f94e5b57600fb720733050c98 Author: Sam Ansmink <[email protected]> Date: Fri Nov 7 11:00:19 2025 +0100 add http timings to logger commit 02bb5d19b9fc7a702184ffcf7d9688b88f54071a Author: Mytherin <[email protected]> Date: Fri Nov 7 09:30:04 2025 +0100 Add query location to blob cast commit 7ce99bc04130615dfc3a39dfb79177a8942fefba Merge: 1555b0488e aea843492d Author: Laurens Kuiper <[email protected]> Date: Fri Nov 7 09:22:48 2025 +0100 Fix InsertRelation on attached database (#19583) Fixes https://github.com/duckdb/duckdb/issues/18396 Related PR in duckdb-python: https://github.com/duckdb/duckdb-python/pull/155 commit 1555b0488e322998e6fd06cc47e1909c7bb4eba4 Merge: 783f08ffd8 98e2c4a75f Author: Laurens Kuiper <[email protected]> Date: Fri Nov 7 08:31:05 2025 +0100 Log total probe matches in hash join (#19683) This is usually evident from the number of tuples coming out of a join, but it can be hard to understand what's going on when doing a `LEFT`/`RIGHT`/`OUTER` join. This PR adds one log call at the end of the hash join to report how many probe matches there were. ```sql D CALL enable_logging('PhysicalOperator'); ┌─────────┐ │ Success │ │ boolean │ ├─────────┤ │ 0 rows │ └─────────┘ D SELECT count(*) FROM range(3_000_000) t1(i) LEFT JOIN range(1_000_000, 2_000_000) t2(i) USING (i); ┌────────────────┐ │ count_star() │ │ int64 │ ├────────────────┤ │ 3000000 │ │ (3.00 million) │ └────────────────┘ D CALL disable_logging(); ┌─────────┐ │ Success │ │ boolean │ ├─────────┤ │ 0 rows │ └─────────┘ D SELECT info.total_probe_matches::BIGINT total_probe_matches FROM duckdb_logs_parsed('PhysicalOperator') WHERE class = 'PhysicalHashJoin' AND event = 'GetData'; ┌─────────────────────┐ │ total_probe_matches │ │ int64 │ ├─────────────────────┤ │ 1000000 │ │ (1.00 million) │ └─────────────────────┘ ``` Here we are able to see that the hash join produced 1M matches, but emitted 3M tuples. commit 783f08ffd89b1d1290b2d3dec0b3ba12d8c233bf Merge: 6c6af22ea4 1d5c9f5f3d Author: Laurens Kuiper <[email protected]> Date: Thu Nov 6 15:57:35 2025 +0100 Fixup linking for LLVM (#19668) See conversation at https://github.com/llvm/llvm-project/issues/77653 This allows again: ``` brew install llvm CMAKE_LLVM_PATH=/opt/homebrew/Cellar/llvm/21.1.5 GEN=ninja make ``` to just work. Arguably very limited, but can as well be fixed. commit 6c6af22ea45effc67dc9e76feec3fb73208750bb Merge: 2892abafa7 f483e95d1c Author: Laurens Kuiper <[email protected]> Date: Thu Nov 6 15:56:49 2025 +0100 Categorize ParseLogMessage as CAN_THROW_RUNTIME_ERROR (#19672) Currently we rely on filtering on query type AND executing scalar function `parse_duckdb_log_message` to not be reordered. This is somehow brittle, and have found locally cases where this cause problems that will result in wrong casts, such as: ``` Conversion Error: Type VARCHAR with value 'ColumnDataCheckpointer FinalAnalyze(COMPRESSION_UNCOMPRESSED) result for main.big.0(VALIDITY): 15360' can't be cast to the destination type STRUCT(metric VARCHAR, "value" VARCHAR) ``` Looking at the executed plan, it would look like: ``` ┌─────────────┴─────────────┐ │ FILTER │ │ ──────────────────── │ │ ((type = 'Metrics') AND │ │ (struct_extract │ │ (parse_duckdb_log_message(│ │ 'Metrics', message), │ │ 'metric') = 'CPU_TIME')) │ │ │ │ ~0 rows │ └─────────────┬─────────────┘ ``` Tagging `parse_duckdb_log_message` as potentially throwing on some input avoids reordering, and avoid the problem while improving the usability of logs. An alternative solution would be use explicit DefaultTryCast (instead of TryCast), at https://github.com/duckdb/duckdb/blob/v1.4-andium/src/function/scalar/system/parse_log_message.cpp#L70, either allow to solve the problem. commit 98e2c4a75f816eae6ef2893bbb581c9913293f2a Author: Laurens Kuiper <[email protected]> Date: Thu Nov 6 15:35:25 2025 +0100 log total probe matches in hash join commit 2892abafa772fffc4402e5125cf16a26c094cb44 Merge: ecc73b2b4b 488069ec8d Author: Laurens Kuiper <[email protected]> Date: Thu Nov 6 14:21:05 2025 +0100 duckdb_logs_parsed to do case-insensitive matching (#19669) This is something me and @Tmonster bumped into while helping a customer debugging an issue. I think it's more intuitive and friendly that user facing functions are case insensitive, given that is the general user expectation around SQL. I am not sure `ILIKE` is the best way to do so (an alternative would be filtering on `lower(1) = lower(2)`). Note that passing `%` signs is currently checked elsewhere, for example: ```sql SELECT message FROM duckdb_logs_parsed('query%') WHERE starts_with(message, 'SELECT 1'); ``` would throw ``` Invalid Input Error: structured_log_schema: 'query%' not found ``` (while `querylog` already work, see test case, given there case-insensitivity comparison was already used) commit aea843492da3f40c30e6e88c12eb6da690348f2e Author: Evert Lammerts <[email protected]> Date: Thu Nov 6 11:40:11 2025 +0100 review feedback commit 094a54b890a2466aad743b1c372809849cdef283 Author: Evert Lammerts <[email protected]> Date: Sat Nov 1 11:22:34 2025 +0100 Fix InsertRelation on attached database commit 4f3df42f208d5e6dc602d2e688911ef13758d3aa Author: Sam Ansmink <[email protected]> Date: Thu Nov 6 11:31:58 2025 +0100 bump iceberg further commit f483e95d1c3983c2ba5758ebba1272f7ff12cd0d Author: Carlo Piovesan <[email protected]> Date: Fri Oct 31 12:25:01 2025 +0100 Improve tests using now working FROM duckdb_logs_parsed() commit 6554c84a73b6c7857d2ec5ebf6f2019ceb56e6dc Author: Carlo Piovesan <[email protected]> Date: Tue Nov 4 12:56:31 2025 +0100 parse_logs_message might throw commit 488069ec8d726d3b19093e8d57101c6c6af8910b Author: Carlo Piovesan <[email protected]> Date: Thu Nov 6 09:29:49 2025 +0100 duckdb_logs_parsed to do case-insensitive matching commit 1d5c9f5f3d18c73e27b0bc4353d549680c5c82d5 Author: Carlo Piovesan <[email protected]> Date: Thu Nov 6 09:13:41 2025 +0100 Fixup linking for LLVM See conversation at https://github.com/llvm/llvm-project/issues/77653 commit ecc73b2b4b10beb175968e55e24e69241d00df1b Merge: 2d69f075ee 4cb677238f Author: Mark <[email protected]> Date: Thu Nov 6 08:58:09 2025 +0100 Always remember extra_metadata_blocks when checkpointing (#19639) This is a follow-up to https://github.com/duckdb/duckdb/pull/19588, adding the following: - Reenables block verification in a new test configuration. It further adds new checks to ensure that the metadata blocks that the RowGroup references after checkpointing corresponds to those that it would see if it were to reload them from disk. This verification would have caught the issue addressed by https://github.com/duckdb/duckdb/pull/19588 - Adds a small tweak in `MetadataWriter::SetWrittenPointers`. This ensures that the table writer does not track an `extra_metadata_block` that did not ever receive any writes as part of that rowgroup (as it immediately skipped to next block when calling `writer.GetMetaBlockPointer()` after `writer.StartWritingColumns`). With the added verification, not having this tweak fails e.g. the following test: ``` test/sql/storage/compression/bitpacking/bitpacking_compression_ratio.test_slow CREATE TABLE test_bitpacked AS SELECT i//2::INT64 AS i FROM range(0, 120000000) tbl(i); ================================================================================ TransactionContext Error: Failed to commit: Failed to create checkpoint because of error: Reloading blocks just written does not yield same blocks: Written: {block_id: 2 index: 32 offset: 0}, {block_id: 2 index: 33 offset: 8}, Read: {block_id: 2 index: 33 offset: 8}, Read Detailed: {block_id: 2 index: 33 offset: 8}, Start pointers: {block_id: 2 index: 33 offset: 8}, Metadata blocks: {block_id: 2 index: 32 offset: 0}, ``` - Ensures that we always update `extra_metadata_blocks` after checkpointing a rowgroup. This speeds up subsequent checkpoints significantly. Right now, if you have a large legacy database, and don't update these old rowgroups, this field is kept as is, and every checkpoint needs to recompute it (even if the database isn't reloaded). Making sure we always have `RowGroup::has_metadata_blocks == true` after each checkpoint, even in case of metadata reuse, will both benefit checkpointing for databases in old storage formats, as well as when starting to use newer storage format on large legacy databases. - Only tangentially related to the issue / PR, but while debugging I noticed that the `deletes_is_loaded` variable is not correctly initialized in all RowGroup constructors (can also be triggered with the assertion I added in `RowGroup::HasChanges()`) commit 46028940c8e429739e73f4d345ec3cab5eb5b01c Author: Sam Ansmink <[email protected]> Date: Wed Nov 5 19:33:58 2025 +0100 bump extension entries commit 2d69f075ee91c42ad4fe4208a4d1f06d0034faff Merge: 7043621a83 e3fb2eb884 Author: Laurens Kuiper <[email protected]> Date: Wed Nov 5 15:27:27 2025 +0100 Enable running all extensions tests as part of the build step (#19631) This is enabled via https://github.com/duckdb/extension-ci-tools/pull/278, that introduced a way to hook into running tests for all extension of a given configuration (as opposed to a single one). Also few minor fixes I bumped into: * disable unused platforms from the external extension builds * remove `[persistence]` tests to be always run * enable `vortex` tests * avoid `httpfs` tests on Windows, to be reverted in a follow up commit 4cb677238f7f4ad4d747f1a1045396fd74765724 Merge: b48cd982e0 7043621a83 Author: Yannick Welsch <[email protected]> Date: Wed Nov 5 14:59:47 2025 +0100 Merge remote-tracking branch 'origin/v1.4-andium' into yw/metadata-reuse-tweaks commit b48cd982e0c59a03cf78a37175ba7272438c2525 Author: Yannick Welsch <[email protected]> Date: Wed Nov 5 14:59:34 2025 +0100 newline commit 490411ab5ae614064e3e4fa94f631dcbbeea68d8 Author: Sam Ansmink <[email protected]> Date: Wed Nov 5 13:55:19 2025 +0100 fix: add more places to securely clear key from memory commit e3fb2eb8843f9ff90ad29fd69938ee6961b644dc Author: Carlo Piovesan <[email protected]> Date: Wed Nov 5 11:07:40 2025 +0100 Avoid testing httpfs on Windows (fix incoming) commit e719c837851f016ea614b28380685de8794ccf39 Author: Carlo Piovesan <[email protected]> Date: Wed Nov 5 11:04:57 2025 +0100 Revert "Add ducklake tests" This reverts commit b77a9615117de845fa48463f09be20a89dea7434. commit 4242618a8d43c2004f55b27b63535ad979302e92 Author: Sam Ansmink <[email protected]> Date: Wed Nov 5 11:03:48 2025 +0100 only autoload if crypto util is not set commit 19232fc414dc7f861dcbad788ba5466d10c27a67 Author: Sam Ansmink <[email protected]> Date: Wed Nov 5 10:14:12 2025 +0100 bump extensions commit 7043621a83d1be17ba6b278f0f7a3ec65df98d93 Merge: db845b80c7 3584a93938 Author: Laurens Kuiper <[email protected]> Date: Wed Nov 5 09:18:39 2025 +0100 Bump MySQL scanner (#19643) Updating the MySQL scanner to include the time zone handling fix to duckdb/duckdb-mysql#166. commit db845b80c76452054e26cf7a2d715769592de925 Merge: f50618b48c 7eccc643ae Author: Laurens Kuiper <[email protected]> Date: Wed Nov 5 09:15:52 2025 +0100 Remove `FlushAll` from `DETACH` (#19644) This was initially added to reduce RSS after `DETACH`ing, but it is now creating a large bottleneck for workloads that aggressively `ATTACH`/`DETACH`. RSS will be freed by further allocation activity, or when `SET allocator_background_threads=true;` is enabled. commit 4978ccd8ec15e7631fd9ed741d338da663b0ff48 Author: Sam Ansmink <[email protected]> Date: Tue Nov 4 16:34:16 2025 +0100 fix: add patch file commit 6ec168d508d9395306b29c62cb0b163b6a77bafb Author: Sam Ansmink <[email protected]> Date: Tue Nov 4 16:13:18 2025 +0100 format commit 67ec072c0ea6a237213f680709773e1342b11065 Author: Sam Ansmink <[email protected]> Date: Tue Nov 4 15:59:04 2025 +0100 fix: tests commit 7eccc643ae57a76a49e61b905f9a9a1857a00084 Author: Laurens Kuiper <[email protected]> Date: Tue Nov 4 15:47:29 2025 +0100 remove flush all from detach commit 3584a93938a4852b0510b0c3d6b3bb13861c4147 Author: Alex Kasko <[email protected]> Date: Tue Nov 4 14:33:21 2025 +0000 Bump MySQL scanner Updating the MySQL scanner to include the time zone handling fix to duckdb/duckdb-mysql#166. commit 250b917ed6f423b56efbd855b2359a498fe2ef8d Author: Sam Ansmink <[email protected]> Date: Tue Nov 4 14:41:32 2025 +0100 fix: various issues with encryption commit f50618b48c3dd04f77ae557e3bb4863f96f74a76 Merge: 66100df7ae 8257973295 Author: Mark <[email protected]> Date: Tue Nov 4 14:26:16 2025 +0100 Fix #19455: correctly extract root table in merge into when running ajoin that contains single-sided predicates that are transformed into filters (#19637) Fixes #19455 commit 82579732952d68dec2b2a44cc1ca04243ac57151 Merge: 6efd4a4fde 66100df7ae Author: Mytherin <[email protected]> Date: Tue Nov 4 14:25:42 2025 +0100 Merge branch 'v1.4-andium' into mergeintointernalerror commit 66100df7aeb321d37f2434416df59dc274948987 Merge: d54d36faae c53eb7a562 Author: Mark <[email protected]> Date: Tue Nov 4 14:24:10 2025 +0100 Detect invalid merge into action and throw exception (#19636) `WHEN NOT MATCHED (BY TARGET)` cannot be combined with `DELETE` or `UPDATE`, since there is no rows in the target table to delete or update. This PR ensures we throw an error when this is attempted. commit ca88f5b2cf9480ac8e57f436fbc89d327d19422a Author: Yannick Welsch <[email protected]> Date: Tue Nov 4 10:57:57 2025 +0100 Use reserve instead commit 133a15ee61a64a831de46e4407f38d8bdd7b71f5 Author: Carlo Piovesan <[email protected]> Date: Tue Nov 4 10:45:20 2025 +0100 Move also [persistence] tests back under ENABLE_UNITTEST_CPP_TESTS commit eb322ce251b5c4347650afc455171d862c51bf34 Author: Carlo Piovesan <[email protected]> Date: Tue Nov 4 10:41:40 2025 +0100 Switch from running on PRs wasm_mvp to wasm_eh commit 9c5f82fa358fcf236cff21499351c1e739ca032a Author: Carlo Piovesan <[email protected]> Date: Tue Nov 4 10:40:15 2025 +0100 Currently no external extension works on wasm or windows or musl To be expanded once that changes commit d54d36faae00120f548b39d1e21d93ca25f17087 Merge: 97fdeddb2b c01c994085 Author: Laurens Kuiper <[email protected]> Date: Tue Nov 4 09:03:51 2025 +0100 Bump: spatial (#19620) This PR bumps the following extensions: - `spatial` from `61ede09bec` to `d83faf88cd` commit 6efd4a4fde180bf7d9c433977921818e5465c92a Author: Mytherin <[email protected]> Date: Tue Nov 4 08:13:56 2025 +0100 Fix #19455: correctly extract root table in merge into when running a join that contains single-sided predicates that are transformed into filters commit c53eb7a56266157f0e9d97bd91be0d36285ec38b Author: Mytherin <[email protected]> Date: Tue Nov 4 08:01:24 2025 +0100 Detect invalid merge into action and throw exception commit 97fdeddb2bd5c34862afd30177c9184f51f6dccd Merge: a0a46d6ed0 87193fd5ab Author: Mark <[email protected]> Date: Tue Nov 4 07:48:43 2025 +0100 Try to prevent overshooting of `FILE_SIZE_BYTES` by pre-emptively increasing bytes written in Parquet writer (#19622) Helps with #19552, but doesn't fully fix the problem. We should look into a more robust fix for v1.5.0, but not for a bugfix release commit a0a46d6ed06dd962a4d6eeb01f3e14f8b275cec4 Merge: 73c0d0db15 3838c4a1ed Author: Mark <[email protected]> Date: Tue Nov 4 07:48:27 2025 +0100 Increase cast-cost of old-style implicit cast to string (#19621) This PR fixes https://github.com/duckdb/duckdb-python/issues/148 The issue is that `list_extract` now has two overloads, one for a templated list `LIST<T>` and for concrete `VARCHAR` inputs. When binding a function we add a really high cost to selecting a templated overload to ensure we always pick something more specific if available. With our current casting rules, we are unable to cast `VARHCAR[]` to `VARCHAR`, and therefore fall back to the list-template as expected. But with old-style casting rules we allow `VARCHAR[]` to `VARCHAR` by also adding a high cost penalty, but its still lower than the cost of casting to the template - even though that would be the better alternative. With old-style casting we basically always have a lower-cost "fallback" option than selecting a template overload. While we should overhaul our casting system to evaluate the cast cost along more axes than just "score", this PR fixes this specific case by just cranking up the cost of old-style implicit to-string casts. commit c01c99408526b3c0d698028083481301af069824 Author: Max Gabrielsson <[email protected]> Date: Mon Nov 3 22:42:37 2025 +0100 extension entries commit b77a9615117de845fa48463f09be20a89dea7434 Author: Carlo Piovesan <[email protected]> Date: Mon Nov 3 17:35:39 2025 +0100 Add ducklake tests commit bd58abcdfb4485a1a9dbb750bd0587803fd1c559 Author: Carlo Piovesan <[email protected]> Date: Mon Nov 3 17:35:15 2025 +0100 Load vortex tests commit 62fe1bff77a60fd690b9911aa7a38b7bc197f865 Author: Carlo Piovesan <[email protected]> Date: Mon Nov 3 22:08:43 2025 +0100 Pass down extensions_test_selection -> complete commit e2604e6f5259453f482e0c49ca10520e89ddf269 Author: Yannick Welsch <[email protected]> Date: Mon Nov 3 19:18:47 2025 +0100 Always has_metadata_blocks after checkpoint commit 73c0d0db15621d3d1c2936816becf27e2c41e2ab Merge: 286924e634 b518b2aa0b Author: Mark <[email protected]> Date: Mon Nov 3 18:24:26 2025 +0100 Improve error message around compression type deprecation/availability checks (#19619) This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/6436 The old code kept only a list of "deprecated" types, and returned a boolean, losing the context whether the compression type was available at one point and is now deprecated OR is newly introduced and not available yet in the storage version that is currently used. commit 0e5a33dae35aab5209a8e959cf48d7525fa7ec8d Author: Yannick Welsch <[email protected]> Date: Thu Oct 30 19:28:54 2025 +0100 Verify blocks commit 286924e6348723138ca4dfd55b749d847bce59a9 Merge: 535f905874 c248313a1d Author: Mark <[email protected]> Date: Mon Nov 3 17:12:32 2025 +0100 bump iceberg (#19618) commit 87193fd5abf342d6ddce9d984e69007a4ccdc7d2 Author: Laurens Kuiper <[email protected]> Date: Mon Nov 3 14:43:08 2025 +0100 try to prevent overshooting by pre-emptively increasing write size commit 3838c4a1edd83dc1373b6077dc6ee478bb996e50 Author: Max Gabrielsson <[email protected]> Date: Mon Nov 3 13:55:53 2025 +0100 increase fallback string cast cost commit 535f90587495e0c8f5974a0968b06b15ad01b32e Merge: d643cefe13 06df593c60 Author: Laurens Kuiper <[email protected]> Date: Mon Nov 3 13:49:57 2025 +0100 [DevEx] Improve error message when FROM clause is omitted (#18995) This PR fixes #18954 If the "similar bindings" is entirely empty, that means that there are no bindings, which can only happen if the FROM clause is entirely missing. commit 9268637337a21b9c03fdc7dceb0a88fbbe001a73 Author: Max Gabrielsson <[email protected]> Date: Mon Nov 3 12:35:30 2025 +0100 bump extensions commit d643cefe13de6873f6fb0ecc0bca1c14111cde11 Merge: 5f8cf7d7f8 c6434fd89a Author: Mark <[email protected]> Date: Mon Nov 3 12:28:33 2025 +0100 Avoid eagerly resolving the next on-disk pointer in the MetadataReader, as that pointer might not always be valid (#19588) When enabling the new [experimental metadata re-use](https://github.com/duckdb/duckdb/pull/18395), it is possible for metadata of *some* row groups to be re-used. This can cause linked lists of metadata blocks to contain invalid references. For example, when writing a bunch of row groups, we might get this layout: ``` METADATA BLOCK 1 ROW GROUP 1 ROW GROUP 2 (pt 1) NEXT BLOCK: 2 -> METADATA BLOCK 2 ROW GROUP 2 (pt 2) ROW GROUP 3 ``` Metadata is stored in a linked list (block 1 -> block 2) - but we don't need to traverse this linked list fully. We store pointers to individual row groups, and can start reading from their position. Now suppose we re-use metadata of `ROW GROUP 1`, but not of the other row groups (because e.g. they have been updated / changed). Since this is fully contained in `METADATA BLOCK 1`, we can garbage collect `METADATA BLOCK 2`, leaving the following metadata block: ``` METADATA BLOCK 1 ROW GROUP 1 ROW GROUP 2 (pt 1) NEXT BLOCK: 2 ``` Now we can safely read this block and read the metadata for `ROW GROUP 1`, **however**, this block contains a reference to a metadata block that is no longer valid and might have been garbage collected. This revealed a problem in the `MetadataReader`. In the current implementation of the `MetadataReader` - when pointing it towards a block, it would eagerly try to figure out the metadata location of *the next block*. This is normally not a problem, however, with these invalid chains, we might try to resolve a block that has been freed up already - causing an internal exception to trigger: ``` Failed to load metadata pointer (id %llu, idx %llu, ptr %llu) ``` This PR resolves the issue by making the MetadataReader lazy. Instead of eagerly resolving the next pointer, we only do this when it is actually required. commit b518b2aa0b06372d583fb203f5cae0011a53a87f Author: Tishj <[email protected]> Date: Mon Nov 3 12:24:43 2025 +0100 enum util fix commit 5f8cf7d7f81981f4b2355959257fa82982c3dd11 Merge: 407720a348 2cdc7f922b Author: Laurens Kuiper <[email protected]> Date: Mon Nov 3 12:22:52 2025 +0100 add vortex external extension (#19580) commit 7c2353cb06d867813b7725f893a6b1092821c807 Author: Tishj <[email protected]> Date: Mon Nov 3 11:21:32 2025 +0100 differentiate between deprecated/not available yet in the check, to improve error reporting commit c248313a1dd40f1569b608b80bdec1229de0b6b4 Author: Tmonster <[email protected]> Date: Mon Nov 3 10:54:40 2025 +0100 bump iceberg commit 407720a34804f0da61d5ba6645c3c44ec6ddf0d8 Merge: 7764771eaa d4fb98d454 Author: Mark <[email protected]> Date: Sun Nov 2 15:01:29 2025 +0100 Wal index deletes (#19477) This adds support for buffering and replaying Index delete operations for WAL replay. During WAL replay, index operations are buffered since the Indexes are not bound yet. During Index binding, the buffered operations are applied to the Index. UnboundIndex is modified to support buffering delete operations on top of inserts. BoundIndex::ApplyBufferedAppends is changed to a BoundIndex::ApplyBufferedReplays which supports replaying both inserts and deletes. Documentation along relevant code paths is added which clarifies the ordering of mapped_column_ids and the index_chunks being buffered. Before, the mapping was any order since it was only coming from Index insert paths. Now, buffering can come from both insert and delete paths, so both need to make sure to buffer index chunks and the mappings in the same order, (which is just in sorted order of the physical Index column IDs). There is also a bug fix for buffering index data on a table with generated columns, since the table chunk being created for replaying buffered operations contained all column types previously, including generated columns, whereas now it only contains the physical column layout which is needed for index operations. (ART Index operations take a chunk of data with only the index columns containing any data, and the non-Indexed columns are empty). A catch block is added to Transaction CleanupState::Flush which was silently throwing away any failures (which caught this WAL replay in the first place). Also, some test coverage for ART duplicate rowids and a LookupInLeaf function was added which allows searching for a rowid in a Leaf that is either inlined, or a gate node to a nested ART. @taniabogatsch commit c6434fd89a7391e428f2cb31e6e3d676d5257b0d Author: Mytherin <[email protected]> Date: Sun Nov 2 14:54:33 2025 +0100 Fix lock order inversion commit eb514c01e4ea4ad434fb87fde70307f64992d52a Merge: 2f3d2db509 7764771eaa Author: Mytherin <[email protected]> Date: Sun Nov 2 09:45:34 2025 +0100 Merge branch 'v1.4-andium' into metadatareusefixes commit 7764771eaa654cb44f5c731e99f5d989951aefb8 Merge: 9ea6e07a29 fc2bf610d0 Author: Mark <[email protected]> Date: Sun Nov 2 09:44:54 2025 +0100 Skip compiling remote optimizer test when TSAN Is enabled (#19590) This test uses `fork` which seems to mess up the thread sanitizer, causing strange errors to occur sporadically. commit fc2bf610d0c9851d1e3f6ad273dcfb47b6ec60a6 Author: Mytherin <[email protected]> Date: Sat Nov 1 23:27:43 2025 +0100 Skip compiling entirely commit a68390e2b1a6f09b899d248881d331e5dbbab89a Author: Mytherin <[email protected]> Date: Sat Nov 1 23:23:18 2025 +0100 Skip fork test with tsan commit 2f3d2db50968fd917f253c2c34cf488290dadfa4 Author: Mytherin <[email protected]> Date: Sat Nov 1 15:27:51 2025 +0100 Avoid eagerly resolving the next on-disk pointer in the MetadataReader, as that pointer might not always be valid commit 9ea6e07a290db878c9da097d407b3a866c43c8e0 Merge: 5f1ce8ba5c a740840f97 Author: Mark <[email protected]> Date: Sat Nov 1 09:25:59 2025 +0100 Fix edge case in uncompressed validity scan with offset and fix off-by-one in ArrayColumnData::Select (#19567) This PR fixes a off-by-one in the consecutive-array-scan optimization implemented in https://github.com/duckdb/duckdb/pull/16356 as well as an edge case in our uncompressed validity data scan. Fixes https://github.com/duckdb/duckdb/issues/19377 I can't figure out how to write a test for this, it seems like no matter what I do I'm unable to replicate the same storage characteristics as the database file provided in the issue above. In the repro we do a scan+skip+scan, where part of the first `validity_t` in the second scan contains a bunch of zeroes at the positions "before" the scan window that remain even after shifting. I've solved it by setting all lower bits up to `result_idx` in the first `validity_t` we scan, but not sure if this is the most elegant solution. Strangely enough If we remove all bitwise logic and just do the same "fall-back" logic as ifdef:ed for `VECTOR_SIZE < 128` it all works though, so the issue has to be part of the bit-manipulation. commit 5f1ce8ba5c0000770412b35a763af417f8fb2b90 Merge: be0142d4ee dbe272dff0 Author: Mark <[email protected]> Date: Sat Nov 1 09:22:00 2025 +0100 [v1.4-andium] Add Profiler output to logger interface (#19572) This is https://github.com/duckdb/duckdb/pull/19546 backported to `v1.4-andium` branch, see conversation there. --- Idea is: if both profiler and logger are enabled, then you can access profiler output also via logger. This is on top / independent of the current choices for where to output the profiler (JSON / graphviz / query-tree / ...). While this might be somewhat wasteful, it's allow for an easier PR and leave unopinionated what should the SQL interface be. Also given ToLog() call is inexpensive (in particular if the logger is disabled), and that it's unclear if logger alone can satisfy profiler necessities, I think going additive is the best path here. Demo: ```sql ATTACH 'my_db.db'; USE my_db; ---- enable profiling to json file PRAGMA profiling_output = 'profiling_output.json'; PRAGMA enable_profiling = 'json'; ---- enable logging (to in-memory table) call enable_logging(); ---- CREATE TABLE small AS FROM range(100); CREATE TABLE medium AS FROM range(10000); CREATE TABLE big AS FROM range(1000000); PRAGMA disable_profiling; SELECT query_id, type, metric, value FROM duckdb_logs_parsed('Metrics') WHERE metric == 'CPU_TIME'; ``` Will result in for example in: ``` ┌──────────┬─────────┬──────────┬───────────────────────┐ │ query_id │ type │ metric │ value │ │ uint64 │ varchar │ varchar │ varchar │ ├──────────┼─────────┼──────────┼───────────────────────┤ │ 10 │ Metrics │ CPU_TIME │ 8.1041e-05 │ │ 11 │ Metrics │ CPU_TIME │ 0.0002499510000000001 │ │ 12 │ Metrics │ CPU_TIME │ 0.02776677799999981 │ └──────────┴─────────┴──────────┴───────────────────────┘ ``` A more complex example would be for example: With the duckdb cli, execute: ```sql PRAGMA profiling_output = 'metrics_folder/tmp_profiling_output.json'; PRAGMA enable_profiling = 'json'; CALL enable_logging(storage='file', storage_path='./metrics_folder'); --- arbitrary queryies CREATE TABLE small AS FROM range(100); CREATE TABLE medium AS FROM range(10000); CREATE TABLE big AS FROM range(1000000); ``` then close, restart duckdb cli, and query what's persisted in the `metric_folder` folder: ```sql PRAGMA disable_profiling; CALL enable_logging(storage='file', storage_path='./metrics_folder'); SELECT queries.message, metrics.metric, TRY_CAST(metrics.value AS DOUBLE) as value FROM duckdb_logs_parsed('QueryLog') queries, duckdb_logs_parsed('Metrics') metrics WHERE queries.query_id = metrics.query_id AND metrics.metric = 'CPU_TIME';``` ``` ``` ┌─────────────────────────────────────────────┬──────────┬─────────────────────────────────────┐ │ message │ metric │ TRY_CAST(metrics."value" AS DOUBLE) │ │ varchar │ varchar │ double │ ├─────────────────────────────────────────────┼──────────┼─────────────────────────────────────┤ │ CREATE TABLE small AS FROM range(100); │ CPU_TIME │ 8.1041e-05 │ │ CREATE TABLE medium AS FROM range(10000); │ CPU_TIME │ 0.0002499510000000001 │ │ CREATE TABLE big AS FROM range(1000000); │ CPU_TIME │ 0.02776677799999981 │ └─────────────────────────────────────────────┴──────────┴─────────────────────────────────────┘ ``` commit be0142d4ee0385262520ae2488e8dd11ac213735 Merge: b68a1696de 7df4151c0d Author: Mark <[email protected]> Date: Sat Nov 1 09:21:19 2025 +0100 fix inconsistent behavior in remote read_file/blob, and prevent union… (#19531) Closes https://github.com/duckdb/duckdb-fuzzer/issues/4208 Closes https://github.com/duckdb/duckdb/issues/19090 Our remote filesystem doesn't actually check that files exist when "globbing" a non-glob pattern. Now we check that the file exists in the read_blob/text function even if we just access the file name. Diff is a bit bigger cause I also moved a bunch of templated stuff into the cpp file. commit 06df593c60bb22973642d776c1c3c3aca85ee0d6 Author: Tishj <[email protected]> Date: Fri Oct 31 15:26:18 2025 +0100 fix up tests commit 2cdc7f922bde5550aa1ecd24dabf23b05fbf202b Author: Sam Ansmink <[email protected]> Date: Fri Oct 31 15:10:31 2025 +0100 add vortex external extension commit b68a1696de1a603b59e39efc25da7fc2826a3135 Merge: 8169d4f15c 9414882f7f Author: Mark <[email protected]> Date: Fri Oct 31 13:45:16 2025 +0100 Release relevant tests to still be run on all builds (#19559) I would propose, at least for the Linux builds, to add back a minimal amount of tests also on release builds. They will ensure at a minimum that: * for a given release, the corresponding storage_version is valid * for a minor release, that the corresponding name has been set There are more tests that we might consider basic enough AND connected to behaviour specific of a release that we might want to add to the `release` tag. Fixes https://github.com/duckdb/duckdb/issues/19354 (together with https://github.com/duckdb/duckdb/pull/19525 that actually added the name). Note that given the current release process happens in advance, eventual test failure are annoying but not fatal, but they will require changes to code. I am not sure if it's worth having a `keep_going_in_all_cases` option, basically turning the boolean into a set, but I think it can be done when need arise. commit 8169d4f15cf556d0ca0ec68d9c876c2bb84aae09 Merge: d9028d09d5 6e2c195859 Author: Mark <[email protected]> Date: Fri Oct 31 13:44:30 2025 +0100 Fix race condition between `Append` and `Scan` (#19571) Update `ColumnData::count` only after actually `Append` the data to avoid Race Condition with `Scan`. See https://github.com/duckdb/duckdb/issues/19570 for details. commit d4fb98d45409bcaaf8c3030c7aa7e40b1f60b9d1 Merge: 0743b590d3 d9028d09d5 Author: Artjom Plaunov <[email protected]> Date: Fri Oct 31 11:16:23 2025 +0100 Merge remote-tracking branch 'upstream/v1.4-andium' into wal-index-deletes commit 0743b590d361041cc167f0634250f78c20f4d332 Author: Artjom Plaunov <[email protected]> Date: Fri Oct 31 11:15:04 2025 +0100 remove C++ test, add extra interleaved index replay SQL test commit 5ca334715faa6c871c8e96029c142aacf53969a7 Author: Tishj <[email protected]> Date: Fri Oct 31 10:43:03 2025 +0100 fix up tests commit 0a6b5fb4919a8092b38e19051a9286eeaaeb392c Merge: 0b1f0e320a d9028d09d5 Author: Tishj <[email protected]> Date: Fri Oct 31 10:38:56 2025 +0100 Merge branch 'v1.4-andium' into missing_from_clause_better_error commit a740840f9772a1702a5ffeec43694c48be3526c5 Author: Max Gabrielsson <[email protected]> Date: Thu Oct 30 18:04:39 2025 +0100 fix consecutive array range calculation, fix validity scanning when bits before result offset are null commit 6e2c195859a496f1f98c20fd887fac944ba0e344 Author: zhangxizhe <[email protected]> Date: Fri Oct 31 13:43:19 2025 +0800 Update `ColumnData::count` only after actually `Append` the data to avoid Race Condition with `Scan`. See `issue #19570` for details. commit d9028d09d56640599dd8307dd9ae6c8837267e9f Merge: 307f9b41ff 6bc51dd58e Author: Laurens Kuiper <[email protected]> Date: Fri Oct 31 08:47:10 2025 +0100 Disable jemalloc on BSD (#19560) Fixes https://github.com/duckdb/duckdb/issues/14363 commit dbe272dff0a63d0d01269cee05945a0b016d219f Author: Carlo Piovesan <[email protected]> Date: Wed Oct 29 23:51:42 2025 +0100 Add Profiler output to logger interface Idea is: if both profiler and logger are enabled, then you can access profiler output also via logger. This is on top / independent of the current choices for where to output the profiler (JSON / graphviz / query-tree / ...). While this might be somewhat wasteful, it's allow for an easier PR and leave unopinionated what should the SQL interface be. Also given ToLog() call is inexpensive (in particular if the logger is disabled), and that it's unclear if logger alone can satisfy profiler necessities, I think going additive is the best path here. Demo: ```sql ATTACH 'my_db.db'; USE my_db; ---- enable profiling to json file PRAGMA profiling_output = 'profiling_output.json'; PRAGMA enable_profiling = 'json'; ---- enable logging (to in-memory table) call enable_logging(); ---- CREATE TABLE small AS FROM range(1000); CREATE TABLE medium AS FROM range(1000000); CREATE TABLE big AS FROM range(1000000000); PRAGMA disable_profiling; SELECT * EXCLUDE timestamp FROM duckdb_logs() WHERE type == 'Metrics' ORDER BY message.split(',')[1], context_id; ``` Will result in for example in: ``` ┌────────────┬─────────┬───────────┬────────────────────────────────────────────────────────────┐ │ context_id │ type │ log_level │ message │ │ uint64 │ varchar │ varchar │ varchar │ ├────────────┼─────────┼───────────┼────────────────────────────────────────────────────────────┤ │ 39 │ Metrics │ INFO │ {'metric': CHECKPOINT_LATENCY, 'value': 0.0} │ │ 44 │ Metrics │ INFO │ {'metric': CHECKPOINT_LATENCY, 'value': 0.0} │ │ 49 │ Metrics │ INFO │ {'metric': CHECKPOINT_LATENCY, 'value': 0.017832} │ │ 39 │ Metrics │ INFO │ {'metric': COMMIT_WRITE_WAL_LATENCY, 'value': 0.000305292} │ │ 44 │ Metrics │ INFO │ {'metric': COMMIT_WRITE_WAL_LATENCY, 'value': 0.003793958} │ │ 49 │ Metrics │ INFO │ {'metric': COMMIT_WRITE_WAL_LATENCY, 'value': 0.0} │ │ 39 │ Metrics │ INFO │ {'metric': CPU_TIME, 'value': 0.000110209} │ │ 44 │ Metrics │ INFO │ {'metric': CPU_TIME, 'value': 0.009471759999999997} │ │ 49 │ Metrics │ INFO │ {'metric': CPU_TIME, 'value': 8.241736770029297} │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ 39 │ Metrics │ INFO │ {'metric': SYSTEM_PEAK_BUFFER_MEMORY, 'value': 36864} │ │ 44 │ Metrics │ INFO │ {'metric': SYSTEM_PEAK_BUFFER_MEMORY, 'value': 6625280} │ │ 49 │ Metrics │ INFO │ {'metric': SYSTEM_PEAK_BUFFER_MEMORY, 'value': 63510528} │ │ 39 │ Metrics │ INFO │ {'metric': TOTAL_BYTES_WRITTEN, 'value': 0} │ │ 44 │ Metrics │ INFO │ {'metric': TOTAL_BYTES_WRITTEN, 'value': 262144} │ │ 49 │ Metrics │ INFO │ {'metric': TOTAL_BYTES_WRITTEN, 'value': 12587008} │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ │ · │ · │ · │ · │ ├────────────┴─────────┴───────────┴────────────────────────────────────────────────────────────┤ │ 57 rows (? shown) 4 columns │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` commit 307f9b41ff0464dba0e0f2504c75747c7ead2ecc Merge: 1cba2e741b 08bf725300 Author: Mark <[email protected]> Date: Thu Oct 30 15:03:25 2025 +0100 [ported from main] Fix bug initializing std::vector for column names (#19555) This 4 line fix was merged with main in #19444. It should be in v1.4-andium as well so that it makes it into v1.4.2. commit 1cba2e741b6622f5be156c061478a6fa66c0f819 Merge: ecb6bfe5b4 80554e4d59 Author: Laurens Kuiper <[email protected]> Date: Thu Oct 30 14:47:58 2025 +0100 Bugfixes: Parquet JSON+DELTA_LENGTH_BYTE_ARRAY and sorting iterator (#19556) This PR fixes an issue introduced in v1.4.1 with the Parquet reader when combining a `JSON` column with `DELTA_LENGTH_BYTE_ARRAY` encoding. The issue was caused by trying to validate an entire block of strings in one go, which is OK for UTF-8, but for JSON. This PR makes it so we validate individual strings if the column has `JSON` type. Fixes https://github.com/duckdb/duckdb/issues/19366 This PR also fixes an issue with the new sorting code, which had an error in the calculation of subtraction under modulo. I've fixed this, and unified the code for `InMemoryBlockIteratorState` and `ExternalBlockIteratorState` with some templating, so now the erroneous calculation should be gone from both state types. Fixes https://github.com/duckdb/duckdb/issues/19498 commit 9414882f7fc81be58af0ec914cbe8c6045af3517 Author: Carlo Piovesan <[email protected]> Date: Thu Oct 30 12:39:48 2025 +0100 Allow back basics tests also in release mode commit 2987acd0d19656e583f30447a91852793ef188f7 Author: Carlo Piovesan <[email protected]> Date: Thu Oct 30 12:36:32 2025 +0100 Add test on codename being registered, and tag it as release commit 6bc51dd58edaf76725810b595a5300044749c0cf Author: Laurens Kuiper <[email protected]> Date: Thu Oct 30 13:24:45 2025 +0100 disable jemalloc BSD commit 80554e4d592ec793676a80b180469a572a247f2a Merge: 5974ef8c03 ecb6bfe5b4 Author: Laurens Kuiper <[email protected]> Date: Thu Oct 30 09:57:58 2025 +0100 Merge branch 'v1.4-andium' into bugfixes_v1.4 commit 08bf725300335d34f05cd6f6f508f78ef57c477b Author: Curt Hagenlocher <[email protected]> Date: Fri Oct 17 14:08:52 2025 -0700 Fix bug initializing std::vector for column names commit ecb6bfe5b483ffd1a2a490275b48ec91501680c4 Merge: 09a36d2f73 94471b8e04 Author: Hannes Mühleisen <[email protected]> Date: Thu Oct 30 09:01:41 2025 +0200 Follow up to staging move (#19551) Follow up to #19539, CF does not like AWS regions commit 94471b8e0472a2507623b2408808156f6ddde764 Author: Hannes Mühleisen <[email protected]> Date: Thu Oct 30 07:49:34 2025 +0200 this region does not exist in cf commit 09a36d2f73d1b2f93682e315761bb3c4973f8ac9 Merge: a23f54fb54 c2a4fc29dc Author: Mark <[email protected]> Date: Wed Oct 29 21:51:05 2025 +0100 [Dev] Disable the use of `ZSTD` if the block_manager is the `InMemoryBlockManager` (#19543) This PR fixes https://github.com/duckdblabs/duckdb-internal/issues/6319 This has to be done because the InMemoryBlockManager doesnt support GetFreeBlockId, which is required by the zstd compression method. I couldn't produce a test for this because I can't reproduce the problem in the unittester, only in the CLI (I assume the storage version prevents in-memory compression???) commit c2a4fc29dceb617c80ab9156d84f2320add29542 Author: Tishj <[email protected]> Date: Wed Oct 29 16:37:20 2025 +0100 add test for disabled zstd compression in memory commit 5974ef8c03afcd01df670a42dd7be0bbb2a6c6ff Author: Laurens Kuiper <[email protected]> Date: Wed Oct 29 16:34:54 2025 +0100 properly set file paht in test commit a35ba26f267eca2fb144e07b14706af2b96270a8 Author: Tishj <[email protected]> Date: Wed Oct 29 15:19:03 2025 +0100 disable the use of ZSTD if the block_manager is the InMemoryBlockManager, since it doesnt support GetFreeBlockId commit fd85508aa0065a18180a6f9af1d4c66842b28964 Author: Laurens Kuiper <[email protected]> Date: Wed Oct 29 15:08:06 2025 +0100 re-add missing initialization commit a23f54fb54c686614cdaf547778b4c6f47bcbf5c Merge: f2e48a73d4 ab586dfaf6 Author: Hannes Mühleisen <[email protected]> Date: Wed Oct 29 14:52:40 2025 +0200 Creating separate OSX cli binaries for each arch (#19538) Also no longer adding the shared library three times because of symlinks commit f2e48a73d42ce538706529e51aec54cfd9f96d84 Merge: 5a6521ca7e ccefe12386 Author: Hannes Mühleisen <[email protected]> Date: Wed Oct 29 14:51:26 2025 +0200 Moving staging to cf and uploading to install bucket (#19539) This adds a custom endpoint for staging uploads so we can move to R2 for this. We also add functionality to upload to the R2 bucket behind `install.duckdb.org`. Once merged, I will update/add the following secrets: - `S3_DUCKDB_STAGING_ENDPOINT` - `S3_DUCKDB_STAGING_ID` - `S3_DUCKDB_STAGING_KEY` - `DUCKDB_INSTALL_S3_ENDPOINT` - `DUCKDB_INSTALL_S3_ID` - `DUCKDB_INSTALL_S3_SECRET` commit f5bc9796be79b602ed1892484e060f0e79083610 Author: Laurens Kuiper <[email protected]> Date: Wed Oct 29 13:43:05 2025 +0100 nicer templating and less code duplication commit ccefe12386007dd65fae1fe3ff1d65bcb45df44d Author: Hannes Mühleisen <[email protected]> Date: Wed Oct 29 14:18:15 2025 +0200 Update .github/workflows/StagedUpload.yml Co-authored-by: Carlo Piovesan <[email protected]> commit 41fc70ae3312599e425d140f7db770f56c2c5c38 Author: Hannes Mühleisen <[email protected]> Date: Wed Oct 29 14:00:41 2025 +0200 Update .github/workflows/StagedUpload.yml Co-authored-by: Carlo Piovesan <[email protected]> commit e8c2d9401b580c64ef5d3cad3cb8d301375ddbd3 Author: Hannes Mühleisen <[email protected]> Date: Wed Oct 29 12:35:30 2025 +0200 moving staging to cf and uploading to install bucket commit 7df4151c0d4967e2dd33eff7f426805df3c56442 Author: Max Gabrielsson <[email protected]> Date: Wed Oct 29 10:58:22 2025 +0100 remove named parameters commit ab586dfaf6bf58fa8376944e599c51efea462cb8 Author: Hannes Mühleisen <[email protected]> Date: Wed Oct 29 11:46:18 2025 +0200 creating separate osx cli binaries for each arch commit 8f30296d7c05c277771bf1fe95b73fafe7fa9d0f Merge: 5dac9f7504 5a6521ca7e Author: Artjom Plaunov <[email protected]> Date: Wed Oct 29 09:39:30 2025 +0100 Merge remote-tracking branch 'upstream/v1.4-andium' into wal-index-deletes commit 5a6521ca7e744205e4c3b67cab8708e2df87073b Merge: 8c7210f9b0 601d68526c Author: Mark <[email protected]> Date: Wed Oct 29 07:55:06 2025 +0100 Add test that either 'latest' or 'vX.Y.Z' are supported STORAGE_VERSIONs (#19527) Connected to https://github.com/duckdb/duckdb/pull/19525, adds a test that would have triggered. That test is not build when actually building releases, so that's not fool-proof, but I think adding this in is helpful. Tested locally to behave as intended both on dev commit (success) and tag (fails, fixed via linked PR). commit 8c7210f9b0270517e1dba11502dc196a3f0cb13c Merge: 7b5c16f2d5 99f26bde2d Author: Mark <[email protected]> Date: Tue Oct 28 18:58:35 2025 +0100 add upcoming patch release to internal versions (#19525) commit 7b5c16f2d51dda602c9ddfed58d71bb6ae3275a0 Merge: 23228babba 295603915b Author: Mark <[email protected]> Date: Tue Oct 28 18:58:16 2025 +0100 Bump multiple extensions (#19522) This PR bumps the following extensions: - `avro` from `7b75062f63` to `93da8a19b4` - `delta` from `03aaf0f073` to `0747c23791` - `ducklake` from `f134ad86f2` to `2554312f71` - `iceberg` from `4f3c5499e5` to `30a2c66f10` - `spatial` from `a6a607fe3a` to `61ede09bec` commit 23228babba519ec70b183b03ea6bc4457b3ed84c Merge: 71a64b5ab4 6a38ac0f69 Author: Mark <[email protected]> Date: Tue Oct 28 18:58:00 2025 +0100 Bump: inet (#19526) This PR bumps the following extensions: - `inet` from `f6a2a14f06` to `fe7f60bb60 (patches removed: 1)` commit 067d6eb0d5c56270f1d24951966191d9c12c3008 Author: Max Gabrielsson <[email protected]> Date: Tue Oct 28 17:33:43 2025 +0100 fix inconsistent behavior in remote read_file/blob, and prevent union_by_name from crashing commit 601d68526c9e616ff08a0e08d949f00dcfb76060 Author: Carlo Piovesan <[email protected]> Date: Tue Oct 28 13:11:45 2025 +0100 Add test that either 'latest' or 'vX.Y.Z' are supported STORAGE_VERSIONs commit c63c5060d01340dc11f39349bf7950fb8eaa455b Author: Laurens Kuiper <[email protected]> Date: Tue Oct 28 15:55:12 2025 +0100 fix #19498 commit 7e52dc5a75532c5413088fbb9f90e6a30f9e5d14 Author: Laurens Kuiper <[email protected]> Date: Tue Oct 28 15:54:56 2025 +0100 add missing test commit 71a64b5ab4005fd2eb63cb3912403fde29f4d7e0 Merge: 76ee047ce4 3856fa8ea8 Author: Mark <[email protected]> Date: Tue Oct 28 14:30:18 2025 +0100 Support non-standard NULL in Parquet again (#19523) https://github.com/duckdb/duckdb/pull/19406 removed support for the non-standard NULL by adding the safe enum casts. Support for this was explicitly added in https://github.com/duckdb/duckdb/pull/11774 We could consider removing support for this - but it shouldn't be done as part of a bug-fix release imo. This also currently breaks merging v1.4 -> main. commit 05fb1249cab3404bc396ccaee0cdb1959ae11481 Author: Laurens Kuiper <[email protected]> Date: Tue Oct 28 14:19:50 2025 +0100 fix #19366 commit 5dac9f750490e1ea601b03d8e3d11db7a9cc0197 Merge: 0d4a78c90f 76ee047ce4 Author: Artjom Plaunov <[email protected]> Date: Tue Oct 28 13:14:30 2025 +0100 Merge remote-tracking branch 'upstream/v1.4-andium' into wal-index-deletes commit 0d4a78c90f6288abe842afab521ba1e7a075307f Author: Artjom Plaunov <[email protected]> Date: Tue Oct 28 13:12:44 2025 +0100 remove int types commit 6a38ac0f699f2f85adda33d61c94c6ec054d89ca Author: Sam Ansmink <[email protected]> Date: Tue Oct 28 13:08:40 2025 +0100 bump extensions commit 3cd616b89657c5489844d8a76d26169554e5af96 Author: Artjom Plaunov <[email protected]> Date: Tue Oct 28 12:57:05 2025 +0100 PR review fixes + more C++ test coverage commit 0fde0c573099c317b0710ed42d87864ee4b75c00 Merge: baa522991e 76ee047ce4 Author: Laurens Kuiper <[email protected]> Date: Tue Oct 28 12:32:44 2025 +0100 Merge branch 'v1.4-andium' into bugfixes_v1.4 commit 99f26bde2d03e9958ac4bd37f5f8a0ac67b2fcd3 Author: Sam Ansmink <[email protected]> Date: Tue Oct 28 12:07:39 2025 +0100 add upcoming patch release to internal versions commit 3856fa8ea82bd8b9c11166102aab602ddf165ee2 Author: Mytherin <[email protected]> Date: Tue Oct 28 11:19:35 2025 +0100 Support non-standard NULL in Parquet again commit 295603915b0ab3a1532cbbe6cf9547f9803e3c46 Author: Sam Ansmink <[email protected]> Date: Tue Oct 28 10:58:22 2025 +0100 bump extensions commit c1d826f2523bd8454426ad7401665e8e69f9dadc Author: Artjom Plaunov <[email protected]> Date: Tue Oct 28 08:55:00 2025 +0100 unnamed name space commit 76ee047ce45bab9472068ea360f9894a3a456a83 Merge: b62b03c4b3 bd3eb153b1 Author: Laurens Kuiper <[email protected]> Date: Tue Oct 28 08:34:42 2025 +0100 Make `DatabaseInstance::…

DuckDB can significantly speed up checkpoints by [reusing existing metadata](#18395) when there have been no changes to a rowgroup. This does not apply for rowgroups with deletes, unfortunately. As soon as someone has run a simple `select count(*)` query and the deletes are loaded, the metadata reuse on checkpoint optimization stops working. This PR here fixes the situation by allowing the optimization to still come into play, even when the deletes are loaded.

DuckDB can significantly speed up checkpoints by [reusing existing metadata](duckdb#18395) when there have been no changes to a rowgroup. This does not apply for rowgroups with deletes, unfortunately. As soon as someone has run a simple `select count(*)` query and the deletes are loaded, the metadata reuse on checkpoint optimization stops working. This PR here fixes the situation by allowing the optimization to still come into play, even when the deletes are loaded.

Mytherin added 11 commits July 24, 2025 10:40

WIP: re-use individual column metadata

f716af5

Merge branch 'metadatareuse' into partialreuse

07703fa

Use new GetColumnPointers method, and avoid unnecessarily loading col…

070c72b

…umns just to not do anything with them

Check if we actually have existing metadata before re-using

f79ba10

Merge branch 'metadatareuse' into partialreuse

b6a5426

Merge branch 'metadatareuse' into partialreuse

7038686

Add partial metadata reuse test

8668e8b

Format

aefd1fc

Get meta block pointers in parallel

70ec651

Perform changes check in parallel

6514fa8

Remove unused method

440d0ba

Mytherin merged commit 8755ee6 into duckdb:main Jul 24, 2025
55 checks passed

Mytherin mentioned this pull request Jul 24, 2025

Store extra metadata blocks in RowGroupPointer, and only flush dirty Metadata blocks #18398

Merged

krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Jul 26, 2025

vendor: Update vendored sources to duckdb/duckdb@8755ee6

6affc66

Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)

krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Jul 26, 2025

vendor: Update vendored sources to duckdb/duckdb@8755ee6

0296a9d

Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)

krlmlr added a commit to krlmlr/duckdb-r that referenced this pull request Jul 26, 2025

vendor: Update vendored sources to duckdb/duckdb@8755ee6

2fd869e

Re-use metadata of unaltered row groups when checkpointing a table (duckdb/duckdb#18395)

Mytherin mentioned this pull request Aug 20, 2025

Grab lock and double-check that column is not loaded in MoveToCollection #18677

Merged

hannes added a commit that referenced this pull request Aug 21, 2025

Grab lock and double-check that column is not loaded in MoveToCollect…

217b82e

…ion (#18677) Should fix a rare race condition that would cause `start` not to be set correctly after #18395

Mytherin deleted the partialreuse branch September 3, 2025 07:06

Mytherin mentioned this pull request Sep 18, 2025

Correctly re-align all child column segments of the ColumnData on Deserialize, and add logging to checkpoints #19055

Merged

Mytherin mentioned this pull request Nov 1, 2025

Avoid eagerly resolving the next on-disk pointer in the MetadataReader, as that pointer might not always be valid #19588

Merged

ywelsch mentioned this pull request Nov 18, 2025

Reuse metadata even in the presence of deletes #19823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-use metadata of unaltered row groups when checkpointing a table #18395

Re-use metadata of unaltered row groups when checkpointing a table #18395

Uh oh!

Mytherin commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Re-use metadata of unaltered row groups when checkpointing a table #18395

Re-use metadata of unaltered row groups when checkpointing a table #18395

Uh oh!

Conversation

Mytherin commented Jul 24, 2025

Performance

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant