Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Mytherin
Copy link
Collaborator

The CTE (de)serialization code in v1.4 has a number of issues:

  • It is writing CTEMaterialize for CTENodes which is not supported in older DuckDB versions, causing forwards compatibility to break and older versions not to be able to read DuckDB files written by v1.4 when they contain CTEs that are explicitly labeled as MATERIALIZED or NOT MATERIALIZED
  • It is not correctly de-duplicating CTENodes from the CommonTableExpressionMap, causing some old CTEs to not be readable anymore (as explained here - Rework CTE binding: remove CTENode, and bind CommonTableExpressionMap directly instead #19351)

This has all already been fundamentally fixed in main in #19351. However, in order to also fix this for following v1.4 versions (v1.4.2) - this PR patches the serialization code in a lower risk manner. Effectively:

  • We no longer write CTEMaterialize for CTENodes. Instead, we only write the CTENode if it should be materialized. Otherwise, we don't write it to the file.
  • When reading a QueryNode, we immediately perform "re-duplication" by extracting CTENodes from the `CommonTableExpressionMap. This fixes an issue where we were not correctly re-duplicating at all levels.
  • When deserializing a CTENode, we perform de-duplication of CTEs within its child. This fixes an issue where the above re-duplication could cause the same CTE to appear multiple times depending on from which version we are de-serializing.

All of this is only necessary for v1.4 - and when we merge v1.4 into main this code should be deleted.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 14, 2025 18:50
@Mytherin Mytherin marked this pull request as ready for review October 14, 2025 19:12
@Mytherin Mytherin merged commit d8ae68f into duckdb:v1.4-andium Oct 15, 2025
94 checks passed
@kryonix
Copy link
Contributor

kryonix commented Oct 15, 2025

At least the pain is only temporary 🙈

Mytherin added a commit that referenced this pull request Oct 17, 2025
Follow-up from #19393

There are a number of issues still caused by serializing CTE nodes -
this PR makes it so that we only serialize CTE nodes when MATERIALIZED
is explicitly defined, and serialize only the CommonTableExpressionMap
otherwise. In addition, we never deserialize CTENodes anymore - and
always reconstruct them from the CommonTableExpressionMap.
Y-- pushed a commit to motherduckdb/public-duckdb that referenced this pull request Oct 17, 2025
…ckdb#19393)

The CTE (de)serialization code in v1.4 has a number of issues:

* It is writing `CTEMaterialize` for CTENodes which is not supported in
older DuckDB versions, causing forwards compatibility to break and older
versions not to be able to read DuckDB files written by v1.4 when they
contain CTEs that are explicitly labeled as `MATERIALIZED` or `NOT
MATERIALIZED`
* It is not correctly de-duplicating CTENodes from the
CommonTableExpressionMap, causing some old CTEs to not be readable
anymore (as explained here -
duckdb#19351)

This has all already been fundamentally fixed in main in
duckdb#19351. However, in order to also
fix this for following v1.4 versions (v1.4.2) - this PR patches the
serialization code in a lower risk manner. Effectively:

* We no longer write `CTEMaterialize` for CTENodes. Instead, we only
write the CTENode if it should be materialized. Otherwise, we don't
write it to the file.
* When reading a `QueryNode`, we immediately perform "re-duplication" by
extracting CTENodes from the `CommonTableExpressionMap. This fixes an
issue where we were not correctly re-duplicating at all levels.
* When deserializing a CTENode, we perform de-duplication of CTEs within
its child. This fixes an issue where the above re-duplication could
cause the same CTE to appear multiple times depending on from which
version we are de-serializing.

All of this is only necessary for v1.4 - and when we merge v1.4 into
main this code should be deleted.

(cherry picked from commit d8ae68f)
Y-- pushed a commit to motherduckdb/public-duckdb that referenced this pull request Oct 17, 2025
This cherry picks the functionality from duckdb#19420

Follow-up from duckdb#19393

There are a number of issues still caused by serializing CTE nodes - this PR makes it so that we only serialize CTE nodes when MATERIALIZED is explicitly defined, and serialize only the CommonTableExpressionMap otherwise. In addition, we never deserialize CTENodes anymore - and always reconstruct them from the CommonTableExpressionMap.
---------

Co-authored-by: Mark <[email protected]>
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Oct 21, 2025
Fixes for CTE (de)serialization compatibility with older versions (duckdb/duckdb#19393)
BUGFIX: Silent failure to write row groups with large lists (duckdb/duckdb#19376)
Throw if non-`VARCHAR` key is passed to `json_object` (duckdb/duckdb#19365)
add test tag support [vfs integration tests p1] (duckdb/duckdb#19331)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Oct 21, 2025
Fixes for CTE (de)serialization compatibility with older versions (duckdb/duckdb#19393)
BUGFIX: Silent failure to write row groups with large lists (duckdb/duckdb#19376)
Throw if non-`VARCHAR` key is passed to `json_object` (duckdb/duckdb#19365)
add test tag support [vfs integration tests p1] (duckdb/duckdb#19331)

Co-authored-by: krlmlr <[email protected]>
@Mytherin Mytherin deleted the cteserdefixes branch December 4, 2025 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants