implement combined union and intersection aggregation operation #810
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SUMMARY
Adds a new method to
FastAggregationto allow a large union which will later be intersected with a smaller bitmap to be executed in the context of the intersection to be performed later. The motivation is large IN clauses intersected with other filters within WHERE clauses, e.g.This could be implemented two ways as a user of the library at present:
FastAggregation.orI benchmark against each of these. There are some cases in my simplistic benchmarks where the end user code wins by a little, but there are cases where the new approach wins by a lot. The goal is to provide an implementation of this common combined aggregation which would be the one you would pick if you could only pick one by being the quickest on average across scenarios.
The approach is to iterate the keys of the later-intersected bitmap and perform the union of the other bitmaps only for these keys, before intersecting and appending to the result bitmap. This allows to use a controlled amount of memory by doing the union of each container into an 8kB bitset, which avoids allocation of extra storage in array and run containers, as well as avoiding reallocating containers to adaptively choose the best container - this work is delayed until just before appending.
I benchmarked two main cases: where all the bitmaps are equal, so performance should be similar to or then add, and a contrived case to illustrate the strength of this approach: when the united bitmaps are disjoint with each other (STEPS) and the intersected bitmap only intersects with one of the bitmaps in the IN clause.
Automated Checks
./gradlew testand made sure that my PR does not break any unit test.