The idea is to use a binary instead of lists to to hold messages
in a quorum queue's incoming queue.
Enqueue = appending to a binary
Dequeue = matching the front of a binary
Per message memory usage prior to this commit:
```
erts_debug:size([[1|2]]).
4
erts_debug:size([[1|2], [3|4]]).
8
```
4 words * 8 bytes (on a 64 bit machine) = 32 bytes
Per message memory usage after this commit:
7 bytes to encode the RaIdx + 4 bytes to encode the message size = 11 bytes
The message bytes encoding can be further optimised and be
below 3 bytes most of the time.
Hence, let's say 10 bytes per message.
Let's assume we have a 3 node cluster with 500 quorum queues and
200,000 messages each per queue. That's 100 million messages across
all queues.
Prior to this commit this would require a total of 9.6 GB of memory:
100,000,000 msgs * 32 bytes * 3 nodes = 9.6 GB
After this commit this requires only a total of only 3 GB of memory:
100,000,000 msgs * 10 bytes * 3 nodes = 3 GB of memory
If there is a message TTL policy set for these queues the savings will
be even more significant because prior to this commit the per message
memory overhead is 6 words * 8 bytes = 48 bytes:
```
erts_debug:size([[1|[2|5000]]]).
6
erts_debug:size([[1|[2|5000]], [3|[4|5000]]]).
12
```
Prior to this commit this would require a total of 14.4 GB of memory:
100,000,000 msgs * 48 bytes * 3 nodes = 14.4 GB
If we assume addional 6 bytes to encode the expiration timestamp in milliseconds,
with a binary encoding, this would require a total of only 4.8 GB of memory:
100,000,000 msgs * 16 bytes * 3 nodes = 4.8 GB
The problem with the binary approach is that appending to a binary at a
rate of 100,000 msgs/s is catastrophically slow:
```
java -jar target/perf-test.jar -qq -u qq1 -x 1 -y 0 -C 1000000
```
* sends at 90766 msg/s pior to this commit
* sends at 3435 msg/s after this commit
After this commit, >80% of CPU time is spent in function
`__memmove_evex_unaligned_erms` copying binary data around.
Appending frequently to the binary becomes even slower as the binary
gets longer.
The only practical solution would be a hybrid approach:
* lists are used by default
* lists will always be used for enqueueing messages (i.e. prepending to the list)
* a binary will be created whenever many messages, e.g. 100,000 new messages got
accumulated. Such a binary creation will be relatively fast, as
explained in https://www.erlang.org/doc/system/binaryhandling.html :
"Appending data to a binary as in the example is efficient because it
is specially optimized by the runtime system to avoid copying the Acc
binary every time."
* Once the binary got created, pattern matching at the front and therefore
dequeueing messages will be fast.
* The binaries themselves will be stored in lists (a list with 10
binaries means ~1 million messages in total in our example).
It's questionable though whether such an approach is worth the introduced complexities
given that queues are meant to be kept short anyways.