-
Couldn't load subscription status.
- Fork 1.4k
Improve performance of TReentrantLock
#3099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of TReentrantLock
#3099
Conversation
|
adjustRead was returning the sum of all read locks in the map, but the only places it's used both say it returns the number of locks held by the current fiber. replaced summing an iterable of values in a map with a single map lookup |
e613f55 to
cf5fe94
Compare
|
@mijicd I've taken a reasonably detailed look at the code (just within Apart from the 'fix' I've done with adjustRead, the only other idea I've come up with is to return unit from all the acquire/release functions. This seems to improve performance - the question is, do we need the number of read/write locks when we create/release them? I imagine the usual way of using the |
| */ | ||
| def writeLocks: STM[Nothing, Int] = data.get.map(_.fold(_ => 0, _.writeLocks)) | ||
|
|
||
| /** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not put the scaladoc on private method.
|
My intuition always is to run 15+ iterations for both phases, it can help with reducing the noise. Regarding the signature change, I don't think it's unreasonable to use |
|
@jdegoes what are your thoughts on swapping the return type to |
|
@jdegoes The magic @unclebob418 is referring to was the trick with locks we spoke about. My hunch here is that |
|
I am fine swapping return types to @mijicd Yes, we should create a ticket for dealing with "retry storms". Honestly optimizing this lock is going to come down to two things:
|
|
I'm already on retry storming. I'd say point 1 should be handled here, together with changing the signatures as you suggested. |
309d7c7 to
02e48d6
Compare
|
@mijicd @jdegoes I've done some more work on this using the unsafe get and set that @mijicd recently added. It gives a nice performance boost over and above swapping to return two alternatives:
|
|
@KamalKang any thoughts on above? do you use the lock counts? |
|
@unclebob418 How do these compare from baseline, pre-optimization? |
|
@jdegoes that's the blue bars, probably should have called it baseline instead of benchmark but .. I didn't :) Reran that when I rebased. It's a significant percentage but still nothing compared to the java reentrant lock (which I also added a benchmark for) or stamped lock. |
|
@unclebob418 Are you open to improving the benchmark in this pull request? There are some problems with it:
To solve these problems: both Java and ZIO implementations should read/write lock manually and explicitly, in sequence; and these should be repeated a lot inside the benchmark methods (e.g. 10,000 times). This is separate from JMH iterations. For the STM one, you can stay inside the STM monad, and just commit the repetition. Do these changes make sense? |
|
@jdegoes I should be able to manage that. Once I've done that I'll compare unit/int return types with the new benchmark and we can make a call on if we're willing to sacrifice a bit of performance for better ergonomics. I don't really want to rewrite all the tests unless we're committed to that. |
|
Gut feel is that with fiber overhead reduced in the benchmarks, the disparity will be greater, but better to measure these things! |
|
@unclebob418 you can check out tmap benchmarks where we added the amortisation @jdegoes mentioned for all single-element operations. |
benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala
Outdated
Show resolved
Hide resolved
benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala
Outdated
Show resolved
Hide resolved
benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala
Outdated
Show resolved
Hide resolved
| def reentrantLockRead(): Unit = | ||
| for (_ <- calls) { | ||
| reentrantLock.lock() | ||
| doWork() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would delete doWork and doWorkM, we don't want to measure the cost of doing any work, even a function call. We want to measure only the lock/unlock overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this have the effect of significantly reducing contention? the locks will be for such a short time that they'll rarely conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you're right about that. Let's do ZIO.yieldTo for doWorkM
For doWork, we are going to have to do a sleep, but the question is how long? If we want it to be fair, it would be the same "size" as doWorkM (which involves submitting work to a thread pool, and having a thread pick up and execute that work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did fiddle with this a bit, main issue is even the smallest sleep (1 ms) kills so many iterations that we probably need to run the benchmark longer to get an accurate measure. I think we probably need to do something crazy like actually update some data structure. Something like the zio "compute pi" exercise - both the readers and writers have to do something to give some level of contention (updating the state or computing in vs out for the readers), but that will still be less than 1 ms, and it's a somewhat legitimate use case; the whole reason for having locks is to prevent concurrent updates in this sort of situation (I'm always suss on benchmarks that don't do anything somewhat useful in the real world).
|
I'm not going to have time for the rest of the week to work on this, so how about we merge these changes (still a significant performance improvement) and we can revisit the benchmarks when I have time. created #3469 for benchmark improvements. |
3e9778b to
0447dc0
Compare
TReentrantLockTReentrantLock
2c69f82 to
475d1ee
Compare
475d1ee to
42d6d04
Compare
|
Can you post the results before and after the change, just for the reference? Besides that, the changes you made look good, and I agree with the idea to tweak the benchmark in #3469. |
|
Sure, here's before and after @mijicd and in text |
|
@unclebob418 yes :) |
#3082