Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mijicd
Copy link
Member

@mijicd mijicd commented Mar 24, 2020

The current implementation of ZSTM suffers from "retry storms." For example, short transactions can force long-running ones to retry.

To verify this behavior, I've set the following benchmark:

  • make a TRef containing a list of 10000 elements
  • have multiple short transactions updating only the head of the list
  • have one transaction increment all elements
  • race them

Running this benchmark on the current implementation produces the following result:

Benchmark                             Mode  Cnt  Score     Error  Units
STMRetryBenchmark.mixedTransactions  thrpt   15  0.125   ± 0.219  ops/s

The solution to this problem is to count the number of retries, and obtain a global lock once this count breaches some predefined limit. Acquiring a global lock should allow a long-running transaction to complete. To avoid paying the high locking "price," the number of max retries should be chosen from [10, 100]. Based on the results of my trial benchmarks, I've decided to set it to 10.

With this change in place, the benchmark mentioned above produced the following result:

Benchmark                             Mode  Cnt  Score     Error  Units
STMRetryBenchmark.mixedTransactions  thrpt   15  425.153   ± 4.321  ops/s

@mijicd mijicd requested a review from jdegoes March 24, 2020 10:30
@adamgfraser
Copy link
Contributor

Does there need to be some concept of how long a lock can last? With this change if one transaction is extremely long it can prevent all other transactions from executing indefinitely. This seems problematic because a local issue in one transaction can prevent the entire system of STM transactions involving this reference from being responsive.

@mijicd
Copy link
Member Author

mijicd commented Mar 24, 2020

@adamgfraser I have to think about that one. It obviously "prioritizes" the ones that were facing numerous retries, I'm not sure whether we can cut them off safely.

@jdegoes
Copy link
Member

jdegoes commented Mar 24, 2020

@adamgfraser We should open a ticket for that /cc @mijicd

@jdegoes jdegoes merged commit 896b193 into zio:master Mar 24, 2020
@mijicd mijicd deleted the prevent-retry-storms branch March 24, 2020 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants