Improve performance of `TReentrantLock` #3099

bertlebee · 2020-03-11T11:29:32Z

bertlebee · 2020-03-11T11:29:42Z

adjustRead was returning the sum of all read locks in the map, but the only places it's used both say it returns the number of locks held by the current fiber. replaced summing an iterable of values in a map with a single map lookup

bertlebee · 2020-03-14T21:41:48Z

@mijicd I've taken a reasonably detailed look at the code (just within TReentrantLock, haven't looked at TRef at all). The adjustRead cahnges mentioned above seems to have improved performance a bit, but not as much as I originally thought. I'm not really trusting the benchmarks. I was getting some inconsistent results so decided to up the warm-up and measurement runs to 20 each, and ran it 3 times in a row overnight with nothing else open. As you can see below, the fluctuation in some of the tests is significantly more than the reported error. Any ideas what may be causing this? I have a theory that sometimes contention spirals out of control and may in rare cases significantly slow down throughput (potentially for the rest of the run), but have no idea how I'd test that.

Apart from the 'fix' I've done with adjustRead, the only other idea I've come up with is to return unit from all the acquire/release functions. This seems to improve performance - the question is, do we need the number of read/write locks when we create/release them? I imagine the usual way of using the TReentrantLock is to use writeLock and readLock and not actually care how many read/write locks there are, only that you have one. If you do care for some reason you could still get the value fairly easily.

mijicd · 2020-03-17T22:19:23Z

core/shared/src/main/scala/zio/stm/TReentrantLock.scala

   */
  def writeLocks: STM[Nothing, Int] = data.get.map(_.fold(_ => 0, _.writeLocks))

+  /**


Maybe not put the scaladoc on private method.

mijicd · 2020-03-17T22:22:53Z

My intuition always is to run 15+ iterations for both phases, it can help with reducing the noise. Regarding the signature change, I don't think it's unreasonable to use Unit (although penalty will be paid in obtaining the count).

bertlebee · 2020-03-18T08:13:31Z

@jdegoes what are your thoughts on swapping the return type to Unit? I seems to improve performance in most (but not all) cases (this is showing totals of 3 runs of 20 iterations each overnight with nothing else open).

@mijicd is working on some other magic at a deeper level which will hopefully help a lot more.

mijicd · 2020-03-18T08:28:49Z

@jdegoes The magic @unclebob418 is referring to was the trick with locks we spoke about. My hunch here is that collect is being dreadful due to retries, but I have no proof about that. Perhaps benchmarking it separately might confirm that (or not)?

jdegoes · 2020-03-18T14:11:13Z

I am fine swapping return types to Any to avoid flatMapping to Unit overhead.

@mijicd Yes, we should create a ticket for dealing with "retry storms".

Honestly optimizing this lock is going to come down to two things:

Minimizing allocations in the happy path, which will require careful inspection and translation from one form to another.
Core STM performance work, which has degraded a bit in recent times due to necessary features like trampolining.

mijicd · 2020-03-18T14:19:22Z

I'm already on retry storming. I'd say point 1 should be handled here, together with changing the signatures as you suggested.

CLAassistant · 2020-03-20T14:08:55Z

All committers have signed the CLA.

bertlebee · 2020-04-21T02:37:26Z

@mijicd @jdegoes I've done some more work on this using the unsafe get and set that @mijicd recently added. It gives a nice performance boost over and above swapping to return unit (and I could do it without flatMapping to unit). The unsafe changes also provide cheaper access to the number of locks. I guess the question is how much performance are we willing to sacrifice for a nicer (and more performant) experience when a user cares about the number of locks?

two alternatives:

provide both options (which would require code duplication)
the middle ground where we provide the number of write locks but not read locks (WriteLock count is available for free - it's the ReadLock count that costs). I don't like the inconsistency here.

benchmark results:

bertlebee · 2020-04-21T02:39:19Z

@KamalKang any thoughts on above? do you use the lock counts?

jdegoes · 2020-04-24T20:24:52Z

@unclebob418 How do these compare from baseline, pre-optimization?

core/shared/src/main/scala/zio/stm/TReentrantLock.scala

bertlebee · 2020-04-24T21:44:20Z

@jdegoes that's the blue bars, probably should have called it baseline instead of benchmark but .. I didn't :) Reran that when I rebased. It's a significant percentage but still nothing compared to the java reentrant lock (which I also added a benchmark for) or stamped lock.

jdegoes · 2020-04-25T10:30:56Z

@unclebob418 Are you open to improving the benchmark in this pull request?

There are some problems with it:

First, writeLock and readLock are used directly, which utilize ZManaged. That gives you safety, ensuring you release what you acquire, but it also adds overhead, and Java's locks don't have this feature, so it's not a fair comparison. So instead of using functions that use ZManaged, we should acquire and release manually.
Second, for purely functional code, it does not make sense to benchmark one operation in isolation and have JMH re-run it a lot. That's because for JMH to execute a purely functional test, it must call unsafeRun. This function creates a fiber (with a lot of resources) and uses a thread, so you end up measuring the overhead of fibers, and not the overhead of the operation itself.

To solve these problems: both Java and ZIO implementations should read/write lock manually and explicitly, in sequence; and these should be repeated a lot inside the benchmark methods (e.g. 10,000 times). This is separate from JMH iterations.

For the STM one, you can stay inside the STM monad, and just commit the repetition.

Do these changes make sense?

bertlebee · 2020-04-25T11:14:02Z

@jdegoes I should be able to manage that. Once I've done that I'll compare unit/int return types with the new benchmark and we can make a call on if we're willing to sacrifice a bit of performance for better ergonomics. I don't really want to rewrite all the tests unless we're committed to that.

bertlebee · 2020-04-25T11:15:52Z

Gut feel is that with fiber overhead reduced in the benchmarks, the disparity will be greater, but better to measure these things!

mijicd · 2020-04-25T11:24:52Z

@unclebob418 you can check out tmap benchmarks where we added the amortisation @jdegoes mentioned for all single-element operations.

benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala

jdegoes · 2020-04-25T13:18:27Z

benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala

+  def reentrantLockRead(): Unit =
+    for (_ <- calls) {
+      reentrantLock.lock()
+      doWork()


I would delete doWork and doWorkM, we don't want to measure the cost of doing any work, even a function call. We want to measure only the lock/unlock overhead.

won't this have the effect of significantly reducing contention? the locks will be for such a short time that they'll rarely conflict

OK, you're right about that. Let's do ZIO.yieldTo for doWorkM

For doWork, we are going to have to do a sleep, but the question is how long? If we want it to be fair, it would be the same "size" as doWorkM (which involves submitting work to a thread pool, and having a thread pick up and execute that work).

I did fiddle with this a bit, main issue is even the smallest sleep (1 ms) kills so many iterations that we probably need to run the benchmark longer to get an accurate measure. I think we probably need to do something crazy like actually update some data structure. Something like the zio "compute pi" exercise - both the readers and writers have to do something to give some level of contention (updating the state or computing in vs out for the readers), but that will still be less than 1 ms, and it's a somewhat legitimate use case; the whole reason for having locks is to prevent concurrent updates in this sort of situation (I'm always suss on benchmarks that don't do anything somewhat useful in the real world).

bertlebee · 2020-04-26T21:55:36Z

I'm not going to have time for the rest of the week to work on this, so how about we merge these changes (still a significant performance improvement) and we can revisit the benchmarks when I have time.

created #3469 for benchmark improvements.

mijicd · 2020-05-03T11:11:28Z

Can you post the results before and after the change, just for the reference? Besides that, the changes you made look good, and I agree with the idea to tweak the benchmark in #3469.

bertlebee · 2020-05-03T23:57:56Z

Sure, here's before and after @mijicd

and in text

[info] Benchmark                                                            Mode  Cnt        Score       Error  Units      Run        Change
[info] TReentrantLockBenchmark.ZioLockBasic                                thrpt   20   522139.251   11491.685  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockBasic:zioLockReadGroup               thrpt   20   230120.090    3995.350  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockBasic:zioLockWriteGroup              thrpt   20   292019.160    8303.931  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockHighContention                       thrpt   20   629125.356   22723.223  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockHighContention:zioLockReadGroup3     thrpt   20   304528.013   11244.138  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockHighContention:zioLockWriteGroup3    thrpt   20   324597.343   11495.718  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockLowContention                        thrpt   20   600956.357   25310.179  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockLowContention:zioLockReadGroup1      thrpt   20   475434.587   18639.291  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockLowContention:zioLockWriteGroup1     thrpt   20   125521.770    6680.379  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockMediumContention                     thrpt   20   628972.304   10211.665  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockMediumContention:zioLockReadGroup2   thrpt   20   409846.666    7833.818  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockMediumContention:zioLockWriteGroup2  thrpt   20   219125.638    2442.258  ops/s        1        before
[info] TReentrantLockBenchmark.ZioLockBasic                                thrpt   20   599344.626   25511.630  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockBasic:zioLockReadGroup               thrpt   20   299051.943   13191.109  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockBasic:zioLockWriteGroup              thrpt   20   300292.684   12327.459  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockHighContention                       thrpt   20   715508.996    9512.984  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockHighContention:zioLockReadGroup3     thrpt   20   360064.931    4667.825  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockHighContention:zioLockWriteGroup3    thrpt   20   355444.065    5042.625  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockLowContention                        thrpt   20   726110.785   20294.538  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockLowContention:zioLockReadGroup1      thrpt   20   581371.937   15294.006  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockLowContention:zioLockWriteGroup1     thrpt   20   144738.848    5064.187  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockMediumContention                     thrpt   20   697996.173   15483.251  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockMediumContention:zioLockReadGroup2   thrpt   20   468654.310    9765.954  ops/s        2         after
[info] TReentrantLockBenchmark.ZioLockMediumContention:zioLockWriteGroup2  thrpt   20   229341.862    5751.922  ops/s        2         after

bertlebee · 2020-05-05T08:44:11Z

@jdegoes @mijicd can we please merge this?

mijicd · 2020-05-05T08:45:20Z

@unclebob418 yes :)

bertlebee force-pushed the improve-treentrantlock-performance branch from e613f55 to cf5fe94 Compare March 14, 2020 20:43

mijicd reviewed Mar 17, 2020

View reviewed changes

bertlebee force-pushed the improve-treentrantlock-performance branch 4 times, most recently from 309d7c7 to 02e48d6 Compare April 21, 2020 02:28

jdegoes reviewed Apr 24, 2020

View reviewed changes

core/shared/src/main/scala/zio/stm/TReentrantLock.scala Outdated Show resolved Hide resolved

jdegoes reviewed Apr 24, 2020

View reviewed changes

core/shared/src/main/scala/zio/stm/TReentrantLock.scala Outdated Show resolved Hide resolved

jdegoes reviewed Apr 24, 2020

View reviewed changes

core/shared/src/main/scala/zio/stm/TReentrantLock.scala Outdated Show resolved Hide resolved

jdegoes reviewed Apr 24, 2020

View reviewed changes

core/shared/src/main/scala/zio/stm/TReentrantLock.scala Outdated Show resolved Hide resolved

mijicd reviewed Apr 25, 2020

View reviewed changes

benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala Outdated Show resolved Hide resolved

benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala Outdated Show resolved Hide resolved

jdegoes reviewed Apr 25, 2020

View reviewed changes

benchmarks/src/main/scala/zio/stm/TReentrantLockBenchmark.scala Outdated Show resolved Hide resolved

jdegoes reviewed Apr 25, 2020

View reviewed changes

bertlebee mentioned this pull request Apr 26, 2020

Reentrant lock benchmark improvements #3469

Closed

bertlebee force-pushed the improve-treentrantlock-performance branch from 3e9778b to 0447dc0 Compare April 26, 2020 23:48

bertlebee changed the title ~~WIP Improve performance of TReentrantLock~~ Improve performance of TReentrantLock Apr 27, 2020

bertlebee force-pushed the improve-treentrantlock-performance branch 3 times, most recently from 2c69f82 to 475d1ee Compare May 2, 2020 00:00

improve reentrant lock performance

42d6d04

bertlebee force-pushed the improve-treentrantlock-performance branch from 475d1ee to 42d6d04 Compare May 2, 2020 23:09

bertlebee requested a review from mijicd May 3, 2020 02:59

mijicd approved these changes May 5, 2020

View reviewed changes

mijicd merged commit b999532 into zio:master May 5, 2020

bertlebee deleted the improve-treentrantlock-performance branch September 21, 2020 23:24

Uh oh!

Improve performance of TReentrantLock #3099

Improve performance of TReentrantLock #3099

Uh oh!

Conversation

bertlebee commented Mar 11, 2020

Uh oh!

bertlebee commented Mar 11, 2020

Uh oh!

bertlebee commented Mar 14, 2020

Uh oh!

mijicd Mar 17, 2020

Choose a reason for hiding this comment

Uh oh!

mijicd commented Mar 17, 2020

Uh oh!

bertlebee commented Mar 18, 2020

Uh oh!

mijicd commented Mar 18, 2020

Uh oh!

jdegoes commented Mar 18, 2020

Uh oh!

mijicd commented Mar 18, 2020

Uh oh!

CLAassistant commented Mar 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertlebee commented Apr 21, 2020

Uh oh!

bertlebee commented Apr 21, 2020

Uh oh!

jdegoes commented Apr 24, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bertlebee commented Apr 24, 2020

Uh oh!

jdegoes commented Apr 25, 2020

Uh oh!

bertlebee commented Apr 25, 2020

Uh oh!

bertlebee commented Apr 25, 2020

Uh oh!

mijicd commented Apr 25, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdegoes Apr 25, 2020

Choose a reason for hiding this comment

Uh oh!

bertlebee Apr 25, 2020

Choose a reason for hiding this comment

Uh oh!

jdegoes Apr 25, 2020

Choose a reason for hiding this comment

Uh oh!

bertlebee Apr 26, 2020

Choose a reason for hiding this comment

Uh oh!

bertlebee commented Apr 26, 2020

Uh oh!

mijicd commented May 3, 2020

Uh oh!

bertlebee commented May 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertlebee commented May 5, 2020

Uh oh!

mijicd commented May 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Improve performance of `TReentrantLock` #3099

Improve performance of `TReentrantLock` #3099

CLAassistant commented Mar 20, 2020 •

edited

Loading

bertlebee commented May 3, 2020 •

edited

Loading