Fix #16836: rewrite main column data in case of an update that only modifies the validity#16851
Merged
Mytherin merged 2 commits intoduckdb:v1.2-histrionicusfrom Mar 27, 2025
Merged
Conversation
…only modifies the validity
2 tasks
…lues when reading files created by older versions of DuckDB
Collaborator
Author
|
On second thought - I've reworked the PR to revert #15737 instead. While we can fix this on the writing side, when reading files created by older versions of DuckDB we would still produce incorrect results without reverting that PR. In effect - we can no longer rely on dictionary compression correctly containing |
krlmlr
added a commit
to duckdb/duckdb-r
that referenced
this pull request
Apr 8, 2025
Fix duckdb/duckdb#16836: rewrite main column data in case of an update that only modifies the validity (duckdb/duckdb#16851)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #16836
This regression was caused by #15737
Effectively that change introduced an optimization for dictionary-compressed data where the validity data would be read directly from the dictionary - instead of being read from the separate validity data. This is possible because dictionary-compressed data stores validity data (at offset 0 in the dictionary).
However, when doing an
UPDATE, we would not rewrite the dictionary data when changing only the validity - which would then cause the dictionary column to no longer contain the new (updated) validity data. The fix here is to also rewrite the main column data when updating the validity data.Note that we currently do this for all primitive types - we could limit this to compression methods (like dictionary) that need this - but we can leave that for a future PR. (CC @Tishj).