-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[Art][Wal]Unbound index allocations #19901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Art][Wal]Unbound index allocations #19901
Conversation
65dca39 to
795dcef
Compare
795dcef to
9e8ded0
Compare
taniabogatsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Looks great! I just left a bunch of nits and then this is ready to go in from my side. :)
taniabogatsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments / questions.
dc66093 to
b8cc6e8
Compare
b8cc6e8 to
bfe0aad
Compare
taniabogatsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more comments from my side! Let's run CI? :)
|
Yep, thanks for the review! |
|
Thanks for the PR! Perhaps a simpler and more efficient solution here could be to share the
We could have two separate We would have the following collections: With the following nodes: This has a number of advantages:
|
|
Thanks @Mytherin that's a great idea, going to rewrite it! |
…playing buffered index operations
5647eb0 to
e9376c6
Compare
taniabogatsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, looking so shiny now haha - left a few comments. :)
|
@taniabogatsch Thank you for the review! I will run the CI now |
|
Looks great, thanks for the changes! |
[Art][Wal]Unbound index allocations (duckdb/duckdb#19901) Null assertion on denormalized_table argument (duckdb/duckdb#19947)
[Art][Wal]Unbound index allocations (duckdb/duckdb#19901) Null assertion on denormalized_table argument (duckdb/duckdb#19947) Co-authored-by: krlmlr <[email protected]>
Follow up to #19477, fix for https://github.com/duckdblabs/duckdb-internal/issues/6613
The previous PR added support for buffering and replaying WAL index deletes, however that introduced a memory over-allocation issue, as the UnboundIndex was storing a vector of BufferedIndexData, which stored each buffered operation in a ColumnDataCollection. This was extremely wasteful because if there are interleavings (insert -> delete -> ...) a single operation to be replayed would be stored in a ColumnDataCollection with an internal allocation of STANDARD_VECTOR_SIZE.
EDIT: See @Mytherin's comment below, this PR fixes the issue by changing the way buffering works, now we use two buffers, one for inserts, and another for deletes. Since the inserts and deletes may be interleaved, however, we need an additional vector data structure that stores replay operations and their intervals within the respective buffer. This all stored in BufferedIndexReplays within UnboundIndex.
Buffering data is much simpler now, as we can just append directly to either the insert or delete ColumnDataCollection, as well as appending a ReplayRange node (or extending the range of the last node, if the replay operation is the same type of operation).
Replaying is more efficient now, as we now maintain two interleaved scans on the respective contiguous ColumnDataCollections, fetching one DataChunk at a time to replay.