-
Couldn't load subscription status.
- Fork 1.4k
Support Bulk Queue Operations Directly in the RingBuffer #4585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| def zioQueueParallel(): Int = { | ||
|
|
||
| val io = for { | ||
| offers <- IO.forkAll { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forkAll_ since you don't want to measure the overhead of the join concatenation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use forkAll here because we need a fiber that we can await to ensure that the work was completed.
|
|
||
| val io = for { | ||
| offers <- IO.forkAll { | ||
| List.fill(parallelism) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to fill this list outside of the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| repeat(totalSize * 1 / chunkSize * 1 / parallelism)(zioQ.offerAll(chunk).unit) | ||
| } | ||
| } | ||
| takes <- IO.forkAll { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
| } | ||
| } | ||
| takes <- IO.forkAll { | ||
| List.fill(parallelism) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
| else task.flatMap(_ => repeat(task, max - 1)) | ||
|
|
||
| val io = for { | ||
| _ <- repeat(zioQ.offerAll(chunk).unit, totalSize / chunkSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider ZIO#repeatN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This skews the benchmark pretty severely because repeatN inserts a yield between every repetition.
| } | ||
| } | ||
| takes <- IO.forkAll { | ||
| List.fill(parallelism) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
| while (enqHead < enqTail) { | ||
| val a = iterator.next() | ||
| curIdx = posToIdx(enqHead, capacity) | ||
| buf(curIdx) = a.asInstanceOf[AnyRef] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky logic to manually verify. I suggest we make a stress test, which can bombard a ring buffer with concurrent operations (of the newly-added variety) and verify invariants. This can be a pretty robust way to catch race conditions in concurrent code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks great! Love the fact it does not involve changes to existing code.
I had a few suggestions for improving the benchmark. My main suggestion is that I think we need at least one stress test that can verify invariants hold under concurrent load to the newly-added operations. Then will be good to merge!
|
@jdegoes I added concurrency stress tests and refined the benchmarks. Results are similar. Here is current performance. Chunking is marginally faster for parallel operations and significantly faster for sequential operations. Here are the same benchmarks on this branch. The benchmarks without chunking are the same because there is no change for that code. With chunking performance dramatically increases in both the parallel and sequential cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! 💪
Operations on the
RingBuffertake place in three phases:seqvalueheadortailseqUnfortunately given the concurrent nature of the
RingBufferthere is no way to determine whether a group of spaces are available other than checking each of theirseqvalues. However, we can take advantage of bulk operations by doing compare and swap once to reserve a whole block of spaces after we determine whether they are available. This reduces the number of compare and swap operations fromnto1. The disadvantage is that there is more time between getting the initial state and doing the compare and set since we have to check theseqvalues in between.As a baseline, here is the performance of the existing
ZQueuebenchmarks on my local machine for 1,000 elements with no chunking onmaster. Note that this is with my latest optimizations totakeBetween.Here is the performance, also on
masterof a new benchmark that is exactly the same except it usesofferAllandtakeBetweenwith a chunk size of 10.Here is the performance of the same benchmark on this branch.
One risk with this strategy is that under very high contention we could be caught repeatedly retrying. It could be interesting to explore tracking how many times we have retried a bulk operation and either reverting to the single element case or doing exponential backoff on the chunk size we attempt to offer.