Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@richardstartin
Copy link
Member

SUMMARY

Adds a new method to FastAggregation to allow a large union which will later be intersected with a smaller bitmap to be executed in the context of the intersection to be performed later. The motivation is large IN clauses intersected with other filters within WHERE clauses, e.g.

select * from table 
where x in (<huge list>)
and <some selective condition>

This could be implemented two ways as a user of the library at present:

  1. materialise the in clause with FastAggregation.or
     RoaringBitmap.and(toIntersect, FastAggregation.or(toUnite))
  1. avoid materialising the in clause by intersecting the other filter with every single condition specified by the in clause.
    RoaringBitmap result = new RoaringBitmap();
    for (RoaringBitmap bitmap : toUnite) {
      result.or(RoaringBitmap.and(toIntersect, bitmap));
    }

I benchmark against each of these. There are some cases in my simplistic benchmarks where the end user code wins by a little, but there are cases where the new approach wins by a lot. The goal is to provide an implementation of this common combined aggregation which would be the one you would pick if you could only pick one by being the quickest on average across scenarios.

Benchmark                                    (keysInIntersection)  (scenario)  (unionSize)   Mode  Cnt      Score      Error  Units
IntersectionPushdownBenchmark.andEachThenOr                    10       EQUAL           10  thrpt    5   1202.606 ±  517.195  ops/s
IntersectionPushdownBenchmark.orThenAnd                        10       EQUAL           10  thrpt    5   3257.376 ± 2000.025  ops/s
IntersectionPushdownBenchmark.orWithContext                    10       EQUAL           10  thrpt    5   3240.571 ±  118.866  ops/s

IntersectionPushdownBenchmark.andEachThenOr                    10       EQUAL          100  thrpt    5    120.835 ±    4.270  ops/s
IntersectionPushdownBenchmark.orThenAnd                        10       EQUAL          100  thrpt    5    443.392 ±    4.927  ops/s
IntersectionPushdownBenchmark.orWithContext                    10       EQUAL          100  thrpt    5    447.023 ±   20.983  ops/s

IntersectionPushdownBenchmark.andEachThenOr                    10       STEPS           10  thrpt    5  22426.356 ± 7885.007  ops/s
IntersectionPushdownBenchmark.orThenAnd                        10       STEPS           10  thrpt    5  18181.527 ± 1843.511  ops/s
IntersectionPushdownBenchmark.orWithContext                    10       STEPS           10  thrpt    5  20679.570 ±  775.225  ops/s

IntersectionPushdownBenchmark.andEachThenOr                    10       STEPS          100  thrpt    5  22711.372 ± 7253.271  ops/s
IntersectionPushdownBenchmark.orThenAnd                        10       STEPS          100  thrpt    5   5619.156 ±   71.067  ops/s
IntersectionPushdownBenchmark.orWithContext                    10       STEPS          100  thrpt    5  17813.704 ± 1690.157  ops/s

IntersectionPushdownBenchmark.andEachThenOr                   100       EQUAL           10  thrpt    5    135.118 ±    1.941  ops/s
IntersectionPushdownBenchmark.orThenAnd                       100       EQUAL           10  thrpt    5    359.642 ±    3.402  ops/s
IntersectionPushdownBenchmark.orWithContext                   100       EQUAL           10  thrpt    5    328.085 ±   11.337  ops/s

IntersectionPushdownBenchmark.andEachThenOr                   100       EQUAL          100  thrpt    5     12.512 ±    0.234  ops/s
IntersectionPushdownBenchmark.orThenAnd                       100       EQUAL          100  thrpt    5     48.731 ±    0.164  ops/s
IntersectionPushdownBenchmark.orWithContext                   100       EQUAL          100  thrpt    5     49.631 ±    1.425  ops/s

IntersectionPushdownBenchmark.andEachThenOr                   100       STEPS           10  thrpt    5   1742.549 ±   62.649  ops/s
IntersectionPushdownBenchmark.orThenAnd                       100       STEPS           10  thrpt    5   1583.091 ±   16.104  ops/s
IntersectionPushdownBenchmark.orWithContext                   100       STEPS           10  thrpt    5   1157.314 ±   11.057  ops/s

IntersectionPushdownBenchmark.andEachThenOr                   100       STEPS          100  thrpt    5   1697.878 ±   54.095  ops/s
IntersectionPushdownBenchmark.orThenAnd                       100       STEPS          100  thrpt    5    360.201 ±   14.722  ops/s
IntersectionPushdownBenchmark.orWithContext                   100       STEPS          100  thrpt    5   1133.666 ±   36.484  ops/s

IntersectionPushdownBenchmark.andEachThenOr                  1000       EQUAL           10  thrpt    5     13.444 ±    0.155  ops/s
IntersectionPushdownBenchmark.orThenAnd                      1000       EQUAL           10  thrpt    5     35.400 ±    0.775  ops/s
IntersectionPushdownBenchmark.orWithContext                  1000       EQUAL           10  thrpt    5     33.820 ±    1.003  ops/s

IntersectionPushdownBenchmark.andEachThenOr                  1000       EQUAL          100  thrpt    5      1.245 ±    0.011  ops/s
IntersectionPushdownBenchmark.orThenAnd                      1000       EQUAL          100  thrpt    5      4.879 ±    0.060  ops/s
IntersectionPushdownBenchmark.orWithContext                  1000       EQUAL          100  thrpt    5      4.928 ±    0.153  ops/s

IntersectionPushdownBenchmark.andEachThenOr                  1000       STEPS           10  thrpt    5    182.167 ±    4.351  ops/s
IntersectionPushdownBenchmark.orThenAnd                      1000       STEPS           10  thrpt    5    134.971 ±    1.055  ops/s
IntersectionPushdownBenchmark.orWithContext                  1000       STEPS           10  thrpt    5    118.371 ±    1.635  ops/s

IntersectionPushdownBenchmark.andEachThenOr                  1000       STEPS          100  thrpt    5     82.605 ±   36.085  ops/s
IntersectionPushdownBenchmark.orThenAnd                      1000       STEPS          100  thrpt    5      2.289 ±    1.291  ops/s
IntersectionPushdownBenchmark.orWithContext                  1000       STEPS          100  thrpt    5     86.826 ±    2.106  ops/s

The approach is to iterate the keys of the later-intersected bitmap and perform the union of the other bitmaps only for these keys, before intersecting and appending to the result bitmap. This allows to use a controlled amount of memory by doing the union of each container into an 8kB bitset, which avoids allocation of extra storage in array and run containers, as well as avoiding reallocating containers to adaptively choose the best container - this work is delayed until just before appending.

I benchmarked two main cases: where all the bitmaps are equal, so performance should be similar to or then add, and a contrived case to illustrate the strength of this approach: when the united bitmaps are disjoint with each other (STEPS) and the intersected bitmap only intersects with one of the bitmaps in the IN clause.

Automated Checks

  • I have run ./gradlew test and made sure that my PR does not break any unit test.

@lemire
Copy link
Member

lemire commented Oct 27, 2025

Can we do better for the name... maybe orThenAnd?

This PR does not have to do it, but we have tried to keep the buffer package at feature parity.

@richardstartin richardstartin force-pushed the rgs/intersection-pushdown branch from e19e261 to b6464ff Compare October 27, 2025 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants