WIP cats-effect Concurrent instance #267

neko-kai · 2018-09-25T11:43:17Z

There are several issues

"acquire of bracket is uninterruptible" deadlocks – Fiber.cancel blocks forever
- – it passes if Fiber.cancel implementation is changed from fiber.interrupt to fiber.interrupt.fork.void
- – I guess it's the correct implementation given the test code – mvar.put blocks and is uninterruptible, so waiting on it can't work. @alexandru I guess cancel docs should mention that it should be 'fire-and-forget'.
next deadlock is in "asyncF registration can be canceled", same test ported to ZIO primitives works correctly
"async cancelable receives cancel signal" is flaky, though when ported to ZIO primitives it works correctly

alexandru · 2018-09-25T11:51:33Z

"acquire of bracket is uninterruptible" deadlocks – Fiber.cancel blocks forever

– it passes if Fiber.cancel implementation is changed from fiber.interrupt to fiber.interrupt.fork.void
– I guess it's the correct implementation given the test code – mvar.put blocks and is uninterruptible, so waiting on it can't work. @alexandru I guess cancel docs should mention that it should be 'fire-and-forget'.

It's not a fire and forget. I changed both cats.effect.IO and monix.eval.Task to back-pressure on cancel for the finalizers to finish:

That said fiber.cancel does complete immediately if there are no finalizers yet installed. So if your mvar.put(a2) blocks due to a race condition (i.e. it sees the MVar as being full and waits for the first mvar.get, then there's nothing for fiber.cancel to wait on.

If that's not expected behavior for ZIO, I'd like to know why ... what is your fiber.interrupt waiting on?

neko-kai · 2018-09-25T11:59:18Z

@alexandru It waits for the target fiber to die completely.

That being said, wouldn't .cancel also block if there was a race + installed finalizers? I may be wrong, but it seems like a bit of a gotcha if cancel may or may not block when canceling a diverging thread depending on whether it allocated resources or not?

alexandru · 2018-09-25T12:13:23Z

As a general opinion, .cancel blocking in general for anything is a bad idea, whether there are finalizers installed or not. It's like waiting for acknowledgement when closing a TCP connection, which has caused endless grief, because when you're cancelling something, you're doing so in response to a race condition. I did modify our IO and Task to back-pressure on the installed finalizers in order to sequence the async finalizers of inner brackets, but with no finalizers installed, there is nothing to wait on.

That said, in this case the behavior of our .cancel has leaked in the law implementation and we'll have to fix it. I certainly don't want to impose any behavior on what cancel should wait on. Sorry about that.

alexandru · 2018-09-25T12:16:06Z

For now I would implement .cancel as fiber.interrupt.fork.void.

I'll open a ticket on the Cats-Effect side and maybe fix that law for the next release.

alexandru · 2018-09-25T12:17:33Z

@Kaishh also, thanks for doing this. Type class laws are hard to come up with because there's a natural tendency to introduce assumptions based on how the test implementation behaves. And this one slipped through.

alexandru · 2018-09-25T12:24:44Z

@Kaishh could the next 2 issues be due to the same issue? Both of them are using Fiber.cancel as well.

jdegoes · 2018-09-25T12:31:12Z

@alexandru The Haskell community lived with a killThread which returned immediately for many years before discovering this property leads to thread leaks (it makes it quite challenging to build applications with deterministic guarantees on thread usage). Now packages like Async can give you a "blocking" version of interruption. You can then regain the original semantic (if you WANT, and you usually don't want) by forking the interruption, which means this newer semantic for interruption is strictly more powerful than the older and troubled fire-and-forget method of interruption.

You can read more here.

Btw would you be interested in making the Monix Task implement the Haskell / ZIO semantic? Ordinary Cats IO could stay the way it is if the behavior on blocking is omitted from the laws.

neko-kai · 2018-09-25T13:10:46Z

@alexandru

could the next 2 issues be due to the same issue? Both of them are using Fiber.cancel as well.

The later ones are discovered after changing Fiber.cancel, I think there canceller is not being run for some reason, but it's not obvious for me to pinpoint yet, since equivalent implementations with ZIO work

neko-kai · 2018-09-25T13:13:55Z

I've also changed ContextShift implementation from IO.shift(ec) *> fa to (IO.shift(ec) *> fa).fork.flatMap(_.join) since ContextShift spec evalOn says it should return to the original thread after running the action, which IO.shift alone will not do.

jdegoes · 2018-09-25T13:28:44Z

I've also changed ContextShift implementation from IO.shift(ec) *> fa to (IO.shift(ec) *> fa).fork.flatMap(_.join) since ContextShift spec evalOn says it should return to the original thread after running the action, which IO.shift alone will not do.

Nice. 👍

The later ones are discovered after changing Fiber.cancel, I think there canceller is not being run for some reason, but it's not obvious for me to pinpoint yet, since equivalent implementations with ZIO work

Not waiting for a fiber to terminate before returning could potentially introduce a race condition. If you need help with this one I can try to take a closer look.

neko-kai · 2018-09-25T22:00:50Z

@jdegoes Feel free to if you have the time! I'll have more time closer to the end of the week

alexandru · 2018-09-27T08:38:31Z

@Kaishh for testing purposes I just published a hash version with the laws being fixed, see PR typelevel/cats-effect#376:

1.0.0-1182d8c

Please test with it and see if the laws are still failing.

neko-kai · 2018-09-27T10:20:36Z

@alexandru Still diverging at 'asyncF registration can be canceled' now without .fork.void

jdegoes · 2018-09-28T15:18:21Z

@Kaishh

If you can add this to ZIO in a simplified form with ZIO data types (Promise instead of Deferred).

 def asyncFRegisterCanBeCancelled[A](a: A) = {
    val lh =  for {
      release <- Deferred[F, A]
      acquire <- Deferred[F, Unit]
      task = F.asyncF[Unit] { _ =>
        F.bracket(acquire.complete(()))(_ => F.never[Unit])(_ => release.complete(a))
      }
      fiber <- F.start(task)
      _ <- acquire.get
      _ <- fiber.cancel
      a <- release.get
    } yield a

    lh <-> F.pure(a)
  }

And it fails, then we will investigate and fix ASAP.

The test looks fine to me.

neko-kai · 2018-09-28T21:58:40Z

@jdegoes The IO version of this test is there since the beginning – https://github.com/scalaz/scalaz-zio/pull/267/files#diff-de7c402961556a213ccd2cf8962f18afR515

The problem is that it works

neko-kai · 2018-10-09T09:48:39Z

Now that I think about it, there's a paradigm mismatch between ZIO and ConcurrentEffect/ContextShift. cats IO's ConcurrentEffect/ContextShift are parameterised by an execution context, as long as you use a pair created from the same ec, your fibers will spawn only on that ec's pool. Frameworks such as fs2 and http4s use this to ensure that they're run within a specific thread pool – when used with ZIO their fibers will at some point unexpectedly shift back to RTS, potentially causing issues.

@alexandru Should this be a part of ConcurrentEffect/ContextShift laws? Libraries are already making this assumption in their API's, e.g. fs2:

  /**
    * Reads all data synchronously from the file at the specified `java.nio.file.Path`.
    */
  def readAll[F[_]: Sync: ContextShift](path: Path,
                                        blockingExecutionContext: ExecutionContext,
                                        chunkSize: Int)(
      ): Stream[F, Byte]

http4s:

object BlazeClientBuilder {
  def apply[F[_]](
      executionContext: ExecutionContext,
      sslContext: Option[SSLContext] = Some(SSLContext.getDefault)): BlazeClientBuilder[F]

alexandru · 2018-10-09T10:10:43Z

ContextShift.ExecuteOn happened due to the need to execute a certain task on a very specific thread pool. Such tasks involve blocking I/O.

But when you do that, you don’t want to stay on the thread pool meant for blocking I/O for the bind continuation of that task. You want to shift back to some thread pool that you’re using for CPU-bound stuff.

This is a common pattern on the JVM that happens naturally with, say, Future or whatever Java does these days. It’s a problem created by the platform’s ability to block 1:1 threads.

I did not want executeOn in ContextShift, however people end up doing stupid stuff if the library doesn’t provide a standard solution for this pattern. And unfortunately shit is not enough, because implementations like Monix’s Task end up auto-forking for fairness so the effects of a shit doesn’t last long.

neko-kai · 2018-10-09T10:36:48Z

@alexandru What happens in monix if the task inside evalOn keeps flatMapping? Will it auto-fork back at some point, same a shift?

alexandru · 2018-10-09T11:24:58Z

@alexandru What happens in monix if the task inside evalOn keeps flatMapping? Will it auto-fork back at some point, same a shift?

With executeOn, no, Monix can keep the thread pool given until the end because I designed it to have the notion of a "default" thread pool internally, a default which can be overridden.

But yes, what you're describing is what eventually happens with solutions based on shift only, which is why the introduction of executeOn was absolutely necessary.

A solution based on shift, if you can do that, is better than nothing though. At least it switches back to your RTS, plus the blocking stuff will probably be at the beginning of the bind chain.

neko-kai · 2018-10-09T12:17:34Z

@alexandru Ok, do you think it's reasonable to add a law or spec to ContextShift that makes it mandatory for evalOn to not auto-fork back to default thread pool? I think it's important for this to be part of the spec and for ZIO to adhere to it (i.e. add a forkOn combinator), because pretty bad things can happen if blocking loops from code that already implicitly relies on this behaviour ever get shifted back to default thread pool.

alexandru · 2018-10-09T13:41:31Z

@Kaishh currently ContextShift is lawless.

The reason is that it is difficult to describe such laws. We have a TestContext implementation in Cats-Effect that can help detect that a task ends up executed on another thread-pool, but I don't know for example how to test if the shift back to the default happens and keep in mind that the implementation needs to work on JavaScript too.

Plus implementations (like the current ZIO apparently) cannot describe executeOn as it was intended, but an implementation based on shift is better than nothing, because in absence of executeOn that's what people end up doing anyway and they do it poorly.

But I guess we should at least make it clear in the documentation or something like that.

alexandru · 2018-10-09T13:44:05Z

Well, now that I think about it, a solution based on our TestContext can work.

Maybe we'll make it happen. Unfortunately I'm all caught up in Monix development atm, so it's low priority.l

jdegoes · 2018-10-09T14:06:24Z

when used with ZIO their fibers will at some point unexpectedly shift back to RTS, potentially causing issues.

This is true but only by changing the yield rate, which defaults to what is essentially "no yielding".

I feel there's a contradiction between async and evalOn. Either the fiber should run on the execution context for the specified IO, or it should jump around as per async, but not both. Otherwise it's poorly defined.

alexandru · 2018-10-09T14:26:09Z

I feel there's a contradiction between async and evalOn. Either the fiber should run on the execution context for the specified IO, or it should jump around as per async, but not both. Otherwise it's poorly defined.

I think that at this point you're bringing your Fiber semantics into the picture unnecessarily.

When blocking I/O is involved, users want to execute that blocking I/O on a separate thread-pool, because blocking I/O is a rare operation that also needs protection against threads starvation and then jump to the thread pool for CPU-bound operations afterwards. You don't want 3 or more thread-pools, you want just 2, at most — that it happens in practice for projects to work with 3 or more, that's only because of opinionated and poorly defined libraries.

If there is such a thing as the IO's "execution context", then users should be able to take advantage of it. Cats-Effect's IO does not have such an execution context, which is fine, but then its run-loop doesn't auto-fork.

ghostdogpr · 2018-10-28T04:43:44Z

@Kaishh @jdegoes I investigated this a little bit and found out that the ZIO version of asyncFRegisterCanBeCancelled test is wrong. If you fix it, it has the same deadlock issue as the catz test.

Currently:

def testAsyncPureCreationIsInterruptible = {
  val io = for {
    release <- Promise.make[Nothing, Int]
    acquire <- Promise.make[Nothing, Unit]
    task = IO.asyncPure[Nothing, Unit] { _ =>
      IO.bracket(acquire.complete(()))(_ => IO.never)(_ => release.complete(42).void)
    }
    fiber <- task.fork
    _ <- acquire.get
    _ <- fiber.interrupt
    a <- release.get
  } yield a

  unsafeRun(io) must_=== 42
}

The bracket arguments are in wrong order: IO.bracket(acquire.complete(()))(_ => IO.never)(_ => release.complete(42).void) should be IO.bracket(acquire.complete(()))(_ => release.complete(42).void)(_ => IO.never). I tried to swap and now the ZIO test hangs as well.

Hope this helps! (with FS2 1.0 and http4s 0.19 both requiring ConcurrentEffect in various places, this ticket is much needed for interop).

neko-kai · 2018-11-30T11:36:09Z

@jdegoes travis passes now

regiskuckaertz

YAY 🎉 This is really awesome.

core/jvm/src/test/scala/scalaz/zio/RTSSpec.scala

interop-cats-laws/src/test/scala/scalaz/zio/interop/catzSpec.scala

interop/jvm/src/main/scala/scalaz/zio/interop/catsjvm.scala

core/shared/src/main/scala/scalaz/zio/IO.scala

interop/jvm/src/main/scala/scalaz/zio/interop/catsjvm.scala

jdegoes · 2018-11-30T15:27:24Z

interop/jvm/src/main/scala/scalaz/zio/interop/catsjvm.scala

 private class CatsEffect extends CatsMonadError[Throwable] with Effect[Task] with CatsSemigroupK[Throwable] with RTS {
-  protected def exitResultToEither[A]: ExitResult[Throwable, A] => Either[Throwable, A] = _.toEither
+  @inline final protected def exitResultToEither[A](e: ExitResult[Throwable, A]): Either[Throwable, A] =
+    e.fold(_.checked[Throwable] match {


This makes sense. Another option would be if there's no checked errors, peeling off the first unchecked error. Not sure it matters much, though.

Matters for existing code that's catching a specific throwable (e.g. doobie). It's hard to decide what would be best here, as on one hand existing code would have no way of dealing with synthetic exceptions from combined Fibers, but OTOH we want to preserve as much information about errors as possible.

jdegoes · 2018-11-30T15:30:11Z

interop/jvm/src/main/scala/scalaz/zio/interop/catsjvm.scala

    release: A => Task[Unit]
-  ): Task[B] = IO.bracket(acquire)(release(_).catchAll(IO.terminate(_)))(use)
+  ): Task[B] =
+    IO.bracket(acquire)(release(_).catchAll(IO.terminate))(use)


io.catchAll(IO.terminate) is so common we could do a method to implement that pattern: io.orDie: IO[Nothing, A]. This way we could make it zero cost. I'll open a ticket if you like the idea.

I like the idea! I've had multiple questions wrt 'how do I make IO[Nothing, ?] from IO[Throwable, ?]'. I think the name might benefit from being long and explicit, e.g. io.dieOnThrowable

core/shared/src/main/scala/scalaz/zio/IO.scala

jdegoes · 2018-12-01T15:14:24Z

@Kaishh Superb work on this! Thanks for all your help and your patience in getting this in.

neko-kai · 2018-12-01T21:14:14Z

@regiskuckaertz I moved out raceAttempt #409 to remove unnecessary impediments to merging

jdegoes

I'm ready to merge on this if @regiskuckaertz doesn't have any objections! There are a couple conflicts, looks like, but easy to fix.

regiskuckaertz · 2018-12-02T17:12:48Z

🎸 🕺

jdegoes · 2018-12-03T13:02:18Z

jdegoes · 2018-12-03T13:02:37Z

@Kaishh Superb work! And amazing patience. 😆 Thanks for all your work on this one...

alexandru · 2018-12-03T13:04:49Z

👍 nice

ghostdogpr · 2018-12-03T14:37:36Z

interop-cats-laws/src/test/scala/scalaz/zio/interop/catzSpec.scala

      }
  }

+  (1 to 50).foreach { s =>


We don't need to run those 50 times anymore, do we?

Hopefully we don't. But, these tests are cheap enough to run and each iteration gives a slightly higher chance of spotting a regression. I know this is a defensive code smell, but IMHO justified by the domain - these are effectively integration tests, that also highly depend on JVM behavior and on upstream cats-effect.

neko-kai mentioned this pull request Sep 25, 2018

Use cats-effect-1.0.0, add Bracket[Task] #247

Merged

alexandru mentioned this pull request Sep 25, 2018

ConcurrentLaws.acquireIsNotCancelable unreasonable assumption about Fiber.cancel typelevel/cats-effect#375

Closed

neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from 3351dbf to 2bf9060 Compare September 25, 2018 13:07

neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from 9669fb6 to cc6725d Compare September 27, 2018 10:19

neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from cc6725d to 1458906 Compare October 28, 2018 00:12

neko-kai added 5 commits November 30, 2018 10:54

cleanup: Remove non-working ConcurrentEffect stub

6c66774

cleanup: depend on cats-effect-1.0.0 outside of laws, fmt

62c39b2

override defaultHandler to reduce stdout noize

b10d51e

remove erroneous test, remove upTo

ccf36ce

Fix zio#348 Add cats-effecty race. fix tests

3a3c160

neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from 88c627e to 3a3c160 Compare November 30, 2018 10:55

bring back deleted tests

d2e59e6

regiskuckaertz requested changes Nov 30, 2018

View reviewed changes

jdegoes reviewed Nov 30, 2018

View reviewed changes

interop/jvm/src/main/scala/scalaz/zio/interop/catsjvm.scala Show resolved Hide resolved

jdegoes reviewed Nov 30, 2018

View reviewed changes

interop/jvm/src/main/scala/scalaz/zio/interop/catsjvm.scala Outdated Show resolved Hide resolved

jdegoes reviewed Nov 30, 2018

View reviewed changes

core/shared/src/main/scala/scalaz/zio/IO.scala Outdated Show resolved Hide resolved

neko-kai added 2 commits December 1, 2018 21:02

address review

c994871

Remove raceAttempt

72af143

jdegoes previously approved these changes Dec 2, 2018

View reviewed changes

regiskuckaertz previously approved these changes Dec 2, 2018

View reviewed changes

Merge branch 'master' into feature/cats-effect-1.0.0-concurrent

3fe5fae

neko-kai dismissed stale reviews from regiskuckaertz and jdegoes via 3fe5fae December 3, 2018 08:47

jdegoes approved these changes Dec 3, 2018

View reviewed changes

jdegoes merged commit 98e0ccf into zio:master Dec 3, 2018

ghostdogpr reviewed Dec 3, 2018

View reviewed changes

Uh oh!

WIP cats-effect Concurrent instance #267

WIP cats-effect Concurrent instance #267

Uh oh!

Conversation

neko-kai commented Sep 25, 2018

Uh oh!

alexandru commented Sep 25, 2018

Uh oh!

neko-kai commented Sep 25, 2018

Uh oh!

alexandru commented Sep 25, 2018

Uh oh!

alexandru commented Sep 25, 2018

Uh oh!

alexandru commented Sep 25, 2018

Uh oh!

alexandru commented Sep 25, 2018

Uh oh!

jdegoes commented Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neko-kai commented Sep 25, 2018

Uh oh!

neko-kai commented Sep 25, 2018

Uh oh!

jdegoes commented Sep 25, 2018

Uh oh!

neko-kai commented Sep 25, 2018

Uh oh!

alexandru commented Sep 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neko-kai commented Sep 27, 2018

Uh oh!

jdegoes commented Sep 28, 2018

Uh oh!

neko-kai commented Sep 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neko-kai commented Oct 9, 2018

Uh oh!

alexandru commented Oct 9, 2018

Uh oh!

neko-kai commented Oct 9, 2018

Uh oh!

alexandru commented Oct 9, 2018

Uh oh!

neko-kai commented Oct 9, 2018

Uh oh!

alexandru commented Oct 9, 2018

Uh oh!

alexandru commented Oct 9, 2018

Uh oh!

jdegoes commented Oct 9, 2018

Uh oh!

alexandru commented Oct 9, 2018

Uh oh!

ghostdogpr commented Oct 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neko-kai commented Nov 30, 2018

Uh oh!

regiskuckaertz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdegoes Nov 30, 2018

Choose a reason for hiding this comment

Uh oh!

neko-kai Dec 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdegoes Nov 30, 2018

jdegoes commented Sep 25, 2018 •

edited

Loading

alexandru commented Sep 27, 2018 •

edited

Loading

neko-kai commented Sep 28, 2018 •

edited

Loading

ghostdogpr commented Oct 28, 2018 •

edited

Loading

neko-kai Dec 1, 2018 •

edited

Loading