Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@neko-kai
Copy link
Member

@neko-kai neko-kai commented Sep 22, 2019

Also, prevent potential volatility problems with accessing interruptStack from multiple threads by moving interruptible state into Suspended

@neko-kai neko-kai requested a review from jdegoes September 22, 2019 17:26
@neko-kai
Copy link
Member Author

A clean reproduction for testing:

import java.util.concurrent.atomic.{AtomicLong, LongAdder}

import zio.{UIO, _}

import scala.language.reflectiveCalls

object ZioInterruptLeakOrDeadlockRepo extends zio.App {

  val leakedCounter = new AtomicLong(0L)

  val startedCounter = new LongAdder
  val completedCounter = new LongAdder
  val awakeCounter = new LongAdder
  val pendingGauge = new LongAdder

  def run(args: List[String]): ZIO[Environment, Nothing, Int] = {
    val leakOrDeadlockTest = for {
      _ <- UIO {
        startedCounter.increment()
        pendingGauge.increment()
      }

      sleepInterruptFiber <- IO.never.fork

      // This blocks / leaks once every 20,000 rounds or so
      _ <- sleepInterruptFiber.interrupt
        .ensuring(UIO {
          awakeCounter.increment()
          pendingGauge.decrement()
        })

      _ <- UIO(completedCounter.increment())
    } yield ()

    val main = for {
      _ <-
        ZIO.runtime[Any].map(_.Platform.executor.metrics.get)
          .flatMap {
            metrics => UIO(new Thread(() => {
              while (true) {
                println(s"started=${startedCounter.longValue()} awake=${awakeCounter.longValue()} completed=${completedCounter.longValue()} pending=${pendingGauge.longValue()} queued=${metrics.size}")
                Thread.sleep(1000L)
              }
            }).start())
          }

      _ <- leakOrDeadlockTest.forever
    } yield 0

    main
  }
}

@regiskuckaertz
Copy link
Member

wow, you found it! Awesome detective work 🕵

@neko-kai
Copy link
Member Author

Race scenario:

fiber1: kill0 sets from Running to Running, 
fiber2: BEFORE interruptible=false, but AFTER atomic update, sets state=Suspended and exits.
fiber1: Exits because it thinks fiber2 is Running from previous state

@regiskuckaertz
Seems John also found it at the same time, you could say it was a race :D Let's wait for John's feedback first!

}

case Executing(status, observers0) =>
case Executing(status, observers0, _) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra flag is necessary in your fix. In my fix the model changed a bit, so pushing the "set" prior to the atomic ref update should in theory take care of it. But this looks good for now!

jdegoes
jdegoes previously approved these changes Sep 23, 2019
@jdegoes
Copy link
Member

jdegoes commented Sep 23, 2019

@neko-kai Can you add the test?

@neko-kai neko-kai force-pushed the feature/Fix-#785-#1441-interrupt-deadlock branch from b83f879 to 069d2f7 Compare September 24, 2019 23:05
@neko-kai
Copy link
Member Author

@jdegoes I've added a test that just runs IO.fork.interrupt 10K times, multiplied by test builds it should probably be able to catch a regression over many builds.

I've also added a benchmark that does the same IO.fork.interrupt and should definitely start hanging on regression. I've also added a benchmark for race that highlights the perf issue in #1441 - our race is quite a bit slower 😱

[info] Benchmark                                (size)   Mode  Cnt    Score     Error  Units
[info] IOEmptyRaceBenchmark.catsEmptyRace         1000  thrpt   10  315.196 ± 398.678  ops/s
[info] IOEmptyRaceBenchmark.monixEmptyRace        1000  thrpt   10  192.894 ±   7.672  ops/s
[info] IOEmptyRaceBenchmark.zioEmptyRace          1000  thrpt   10   33.152 ±   0.572  ops/s
[info] IOEmptyRaceBenchmark.zioTracedEmptyRace    1000  thrpt   10   25.166 ±   1.548  ops/s

…riable by moving `interrupted` into atomic state

Also, prevent potential volatility problems with accessing interruptStack from multiple threads by moving `interruptible` state into Suspended
@neko-kai neko-kai force-pushed the feature/Fix-#785-#1441-interrupt-deadlock branch from 069d2f7 to aa06c05 Compare September 25, 2019 22:08
@neko-kai
Copy link
Member Author

I believe another inconsistency with exitAsync() can happen as witnessed by the following code:

IO.effectAsyncMaybe[Nothing, Unit] {
  k =>
    Future {
      Thread.sleep(200L)
      k(IO.dieMessage("INCOHERENT CONTINUE"))
    }
    Some(IO.unit)
}
  .flatMap(_ => ZIO.never)
  .ensuring(IO.dieMessage("CAN'T HAPPEN!"))
  .uninterruptible

that "overtakes" the fiber from an async that was "cancelled", but only if the fiber enters suspension at an opportune time

@neko-kai
Copy link
Member Author

neko-kai commented Sep 25, 2019

There's also a remaining race with evaluateLater(ZIO.interrupt) in kill0, illustration:

object ZioIncoherentInterruptAsyncRepro extends zio.App {

  def run(args: List[String]): ZIO[Environment, Nothing, Int] = {
    val test =
      ZIO.effectAsync[Environment, Nothing, Unit](k =>
        Future {
          Thread.sleep(200L)
          k(console.putStrLn("INCOHERENT CONTINUE"))
        })
        .ensuring(console.putStrLn("Async finalizer!") *> ZIO.never)
        .ensuring(console.putStrLn("CAN'T HAPPEN!"))

    test.fork
      .flatMap(_.interrupt.delay(50.millis))
      .as(0)
  }
}

Basically, every single point where multi-threaded access can happen in non-atomic or non-guarded way is a problem (:
Luckily, we only really have async + interrupt as these points.

@neko-kai
Copy link
Member Author

Will address these two - #1792 (comment), #1792 (comment) - separately.

@neko-kai neko-kai merged commit 8b44dcb into zio:master Sep 27, 2019
@iravid
Copy link
Member

iravid commented Sep 27, 2019

@neko-kai Is the performance of race improved after these fixes?

@neko-kai
Copy link
Member Author

@iravid
Nope, I just added a benchmark that showcases the bad perf. Will open an issue about race perf too!

@iravid
Copy link
Member

iravid commented Sep 27, 2019

Ah too bad. Ok :-)

@neko-kai
Copy link
Member Author

@iravid @jdegoes
Race perf moved here – #1844
More async races - #1842, #1843

Twizty pushed a commit to Twizty/zio that referenced this pull request Nov 13, 2019
…riable by moving `interrupted` into atomic state (zio#1792)

* Fix zio#785, Fix zio#1441, prevent race condition on `interrupted` variable by moving `interrupted` into atomic state
Also, prevent potential volatility problems with accessing interruptStack from multiple threads by moving `interruptible` state into Suspended

* return enterAsync after-check, do not memoize `interruptible` in enterAsync loop

* Add benchmarks for fork-interrupt and empty race (fork-interrupt reproduces the deadlock scenario)

* Add regression test in RTSSpec

* Return case class match

* Refactor flatten-effectTotal to effectSuspendTotal

* add missing private in FiberContext
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants