Comparing changes

The idea is to use a binary instead of lists to to hold messages in a quorum queue's incoming queue. Enqueue = appending to a binary Dequeue = matching the front of a binary Per message memory usage prior to this commit: ``` erts_debug:size([[1|2]]). 4 erts_debug:size([[1|2], [3|4]]). 8 ``` 4 words * 8 bytes (on a 64 bit machine) = 32 bytes Per message memory usage after this commit: 7 bytes to encode the RaIdx + 4 bytes to encode the message size = 11 bytes The message bytes encoding can be further optimised and be below 3 bytes most of the time. Hence, let's say 10 bytes per message. Let's assume we have a 3 node cluster with 500 quorum queues and 200,000 messages each per queue. That's 100 million messages across all queues. Prior to this commit this would require a total of 9.6 GB of memory: 100,000,000 msgs * 32 bytes * 3 nodes = 9.6 GB After this commit this requires only a total of only 3 GB of memory: 100,000,000 msgs * 10 bytes * 3 nodes = 3 GB of memory If there is a message TTL policy set for these queues the savings will be even more significant because prior to this commit the per message memory overhead is 6 words * 8 bytes = 48 bytes: ``` erts_debug:size([[1|[2|5000]]]). 6 erts_debug:size([[1|[2|5000]], [3|[4|5000]]]). 12 ``` Prior to this commit this would require a total of 14.4 GB of memory: 100,000,000 msgs * 48 bytes * 3 nodes = 14.4 GB If we assume addional 6 bytes to encode the expiration timestamp in milliseconds, with a binary encoding, this would require a total of only 4.8 GB of memory: 100,000,000 msgs * 16 bytes * 3 nodes = 4.8 GB The problem with the binary approach is that appending to a binary at a rate of 100,000 msgs/s is catastrophically slow: ``` java -jar target/perf-test.jar -qq -u qq1 -x 1 -y 0 -C 1000000 ``` * sends at 90766 msg/s pior to this commit * sends at 3435 msg/s after this commit After this commit, >80% of CPU time is spent in function `__memmove_evex_unaligned_erms` copying binary data around. Appending frequently to the binary becomes even slower as the binary gets longer. The only practical solution would be a hybrid approach: * lists are used by default * lists will always be used for enqueueing messages (i.e. prepending to the list) * a binary will be created whenever many messages, e.g. 100,000 new messages got accumulated. Such a binary creation will be relatively fast, as explained in https://www.erlang.org/doc/system/binaryhandling.html : "Appending data to a binary as in the example is efficient because it is specially optimized by the runtime system to avoid copying the Acc binary every time." * Once the binary got created, pattern matching at the front and therefore dequeueing messages will be fast. * The binaries themselves will be stored in lists (a list with 10 binaries means ~1 million messages in total in our example). It's questionable though whether such an approach is worth the introduced complexities given that queues are meant to be kept short anyways.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing changes

Open a pull request

Commits on Jun 15, 2025

This comparison is taking too long to generate.

Uh oh!