Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@neko-kai
Copy link
Member

There are several issues

@alexandru
Copy link

Hi @Kaishh,

"acquire of bracket is uninterruptible" deadlocks – Fiber.cancel blocks forever

– it passes if Fiber.cancel implementation is changed from fiber.interrupt to fiber.interrupt.fork.void
– I guess it's the correct implementation given the test code – mvar.put blocks and is uninterruptible, so waiting on it can't work. @alexandru I guess cancel docs should mention that it should be 'fire-and-forget'.

It's not a fire and forget. I changed both cats.effect.IO and monix.eval.Task to back-pressure on cancel for the finalizers to finish:

That said fiber.cancel does complete immediately if there are no finalizers yet installed. So if your mvar.put(a2) blocks due to a race condition (i.e. it sees the MVar as being full and waits for the first mvar.get, then there's nothing for fiber.cancel to wait on.

If that's not expected behavior for ZIO, I'd like to know why ... what is your fiber.interrupt waiting on?

@neko-kai
Copy link
Member Author

@alexandru It waits for the target fiber to die completely.

That being said, wouldn't .cancel also block if there was a race + installed finalizers? I may be wrong, but it seems like a bit of a gotcha if cancel may or may not block when canceling a diverging thread depending on whether it allocated resources or not?

@alexandru
Copy link

As a general opinion, .cancel blocking in general for anything is a bad idea, whether there are finalizers installed or not. It's like waiting for acknowledgement when closing a TCP connection, which has caused endless grief, because when you're cancelling something, you're doing so in response to a race condition. I did modify our IO and Task to back-pressure on the installed finalizers in order to sequence the async finalizers of inner brackets, but with no finalizers installed, there is nothing to wait on.

That said, in this case the behavior of our .cancel has leaked in the law implementation and we'll have to fix it. I certainly don't want to impose any behavior on what cancel should wait on. Sorry about that.

@alexandru
Copy link

For now I would implement .cancel as fiber.interrupt.fork.void.

I'll open a ticket on the Cats-Effect side and maybe fix that law for the next release.

@alexandru
Copy link

@Kaishh also, thanks for doing this. Type class laws are hard to come up with because there's a natural tendency to introduce assumptions based on how the test implementation behaves. And this one slipped through.

@alexandru
Copy link

@Kaishh could the next 2 issues be due to the same issue? Both of them are using Fiber.cancel as well.

@jdegoes
Copy link
Member

jdegoes commented Sep 25, 2018

@alexandru The Haskell community lived with a killThread which returned immediately for many years before discovering this property leads to thread leaks (it makes it quite challenging to build applications with deterministic guarantees on thread usage). Now packages like Async can give you a "blocking" version of interruption. You can then regain the original semantic (if you WANT, and you usually don't want) by forking the interruption, which means this newer semantic for interruption is strictly more powerful than the older and troubled fire-and-forget method of interruption.

You can read more here.

Btw would you be interested in making the Monix Task implement the Haskell / ZIO semantic? Ordinary Cats IO could stay the way it is if the behavior on blocking is omitted from the laws.

@neko-kai neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from 3351dbf to 2bf9060 Compare September 25, 2018 13:07
@neko-kai
Copy link
Member Author

@alexandru

could the next 2 issues be due to the same issue? Both of them are using Fiber.cancel as well.

The later ones are discovered after changing Fiber.cancel, I think there canceller is not being run for some reason, but it's not obvious for me to pinpoint yet, since equivalent implementations with ZIO work

@neko-kai
Copy link
Member Author

I've also changed ContextShift implementation from IO.shift(ec) *> fa to (IO.shift(ec) *> fa).fork.flatMap(_.join) since ContextShift spec evalOn says it should return to the original thread after running the action, which IO.shift alone will not do.

@jdegoes
Copy link
Member

jdegoes commented Sep 25, 2018

I've also changed ContextShift implementation from IO.shift(ec) *> fa to (IO.shift(ec) *> fa).fork.flatMap(_.join) since ContextShift spec evalOn says it should return to the original thread after running the action, which IO.shift alone will not do.

Nice. 👍

The later ones are discovered after changing Fiber.cancel, I think there canceller is not being run for some reason, but it's not obvious for me to pinpoint yet, since equivalent implementations with ZIO work

Not waiting for a fiber to terminate before returning could potentially introduce a race condition. If you need help with this one I can try to take a closer look.

@neko-kai
Copy link
Member Author

@jdegoes Feel free to if you have the time! I'll have more time closer to the end of the week

@alexandru
Copy link

alexandru commented Sep 27, 2018

@Kaishh for testing purposes I just published a hash version with the laws being fixed, see PR typelevel/cats-effect#376:

1.0.0-1182d8c

Please test with it and see if the laws are still failing.

@neko-kai neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from 9669fb6 to cc6725d Compare September 27, 2018 10:19
@neko-kai
Copy link
Member Author

@alexandru Still diverging at 'asyncF registration can be canceled' now without .fork.void

@jdegoes
Copy link
Member

jdegoes commented Sep 28, 2018

@Kaishh

If you can add this to ZIO in a simplified form with ZIO data types (Promise instead of Deferred).

 def asyncFRegisterCanBeCancelled[A](a: A) = {
    val lh =  for {
      release <- Deferred[F, A]
      acquire <- Deferred[F, Unit]
      task = F.asyncF[Unit] { _ =>
        F.bracket(acquire.complete(()))(_ => F.never[Unit])(_ => release.complete(a))
      }
      fiber <- F.start(task)
      _ <- acquire.get
      _ <- fiber.cancel
      a <- release.get
    } yield a

    lh <-> F.pure(a)
  }

And it fails, then we will investigate and fix ASAP.

The test looks fine to me.

@neko-kai
Copy link
Member Author

neko-kai commented Sep 28, 2018

@jdegoes The IO version of this test is there since the beginning – https://github.com/scalaz/scalaz-zio/pull/267/files#diff-de7c402961556a213ccd2cf8962f18afR515

The problem is that it works

@neko-kai
Copy link
Member Author

neko-kai commented Oct 9, 2018

Now that I think about it, there's a paradigm mismatch between ZIO and ConcurrentEffect/ContextShift. cats IO's ConcurrentEffect/ContextShift are parameterised by an execution context, as long as you use a pair created from the same ec, your fibers will spawn only on that ec's pool. Frameworks such as fs2 and http4s use this to ensure that they're run within a specific thread pool – when used with ZIO their fibers will at some point unexpectedly shift back to RTS, potentially causing issues.

@alexandru Should this be a part of ConcurrentEffect/ContextShift laws? Libraries are already making this assumption in their API's, e.g. fs2:

  /**
    * Reads all data synchronously from the file at the specified `java.nio.file.Path`.
    */
  def readAll[F[_]: Sync: ContextShift](path: Path,
                                        blockingExecutionContext: ExecutionContext,
                                        chunkSize: Int)(
      ): Stream[F, Byte]

http4s:

object BlazeClientBuilder {
  def apply[F[_]](
      executionContext: ExecutionContext,
      sslContext: Option[SSLContext] = Some(SSLContext.getDefault)): BlazeClientBuilder[F]

@alexandru
Copy link

ContextShift.ExecuteOn happened due to the need to execute a certain task on a very specific thread pool. Such tasks involve blocking I/O.

But when you do that, you don’t want to stay on the thread pool meant for blocking I/O for the bind continuation of that task. You want to shift back to some thread pool that you’re using for CPU-bound stuff.

This is a common pattern on the JVM that happens naturally with, say, Future or whatever Java does these days. It’s a problem created by the platform’s ability to block 1:1 threads.

I did not want executeOn in ContextShift, however people end up doing stupid stuff if the library doesn’t provide a standard solution for this pattern. And unfortunately shit is not enough, because implementations like Monix’s Task end up auto-forking for fairness so the effects of a shit doesn’t last long.

@neko-kai
Copy link
Member Author

neko-kai commented Oct 9, 2018

@alexandru What happens in monix if the task inside evalOn keeps flatMapping? Will it auto-fork back at some point, same a shift?

@alexandru
Copy link

@alexandru What happens in monix if the task inside evalOn keeps flatMapping? Will it auto-fork back at some point, same a shift?

With executeOn, no, Monix can keep the thread pool given until the end because I designed it to have the notion of a "default" thread pool internally, a default which can be overridden.

But yes, what you're describing is what eventually happens with solutions based on shift only, which is why the introduction of executeOn was absolutely necessary.

A solution based on shift, if you can do that, is better than nothing though. At least it switches back to your RTS, plus the blocking stuff will probably be at the beginning of the bind chain.

@neko-kai
Copy link
Member Author

neko-kai commented Oct 9, 2018

@alexandru Ok, do you think it's reasonable to add a law or spec to ContextShift that makes it mandatory for evalOn to not auto-fork back to default thread pool? I think it's important for this to be part of the spec and for ZIO to adhere to it (i.e. add a forkOn combinator), because pretty bad things can happen if blocking loops from code that already implicitly relies on this behaviour ever get shifted back to default thread pool.

@alexandru
Copy link

@Kaishh currently ContextShift is lawless.

The reason is that it is difficult to describe such laws. We have a TestContext implementation in Cats-Effect that can help detect that a task ends up executed on another thread-pool, but I don't know for example how to test if the shift back to the default happens and keep in mind that the implementation needs to work on JavaScript too.

Plus implementations (like the current ZIO apparently) cannot describe executeOn as it was intended, but an implementation based on shift is better than nothing, because in absence of executeOn that's what people end up doing anyway and they do it poorly.

But I guess we should at least make it clear in the documentation or something like that.

@alexandru
Copy link

Well, now that I think about it, a solution based on our TestContext can work.

Maybe we'll make it happen. Unfortunately I'm all caught up in Monix development atm, so it's low priority.l

@jdegoes
Copy link
Member

jdegoes commented Oct 9, 2018

when used with ZIO their fibers will at some point unexpectedly shift back to RTS, potentially causing issues.

This is true but only by changing the yield rate, which defaults to what is essentially "no yielding".

I feel there's a contradiction between async and evalOn. Either the fiber should run on the execution context for the specified IO, or it should jump around as per async, but not both. Otherwise it's poorly defined.

@alexandru
Copy link

I feel there's a contradiction between async and evalOn. Either the fiber should run on the execution context for the specified IO, or it should jump around as per async, but not both. Otherwise it's poorly defined.

I think that at this point you're bringing your Fiber semantics into the picture unnecessarily.

When blocking I/O is involved, users want to execute that blocking I/O on a separate thread-pool, because blocking I/O is a rare operation that also needs protection against threads starvation and then jump to the thread pool for CPU-bound operations afterwards. You don't want 3 or more thread-pools, you want just 2, at most — that it happens in practice for projects to work with 3 or more, that's only because of opinionated and poorly defined libraries.

If there is such a thing as the IO's "execution context", then users should be able to take advantage of it. Cats-Effect's IO does not have such an execution context, which is fine, but then its run-loop doesn't auto-fork.

@neko-kai neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from cc6725d to 1458906 Compare October 28, 2018 00:12
@ghostdogpr
Copy link
Member

ghostdogpr commented Oct 28, 2018

@Kaishh @jdegoes I investigated this a little bit and found out that the ZIO version of asyncFRegisterCanBeCancelled test is wrong. If you fix it, it has the same deadlock issue as the catz test.

Currently:

def testAsyncPureCreationIsInterruptible = {
  val io = for {
    release <- Promise.make[Nothing, Int]
    acquire <- Promise.make[Nothing, Unit]
    task = IO.asyncPure[Nothing, Unit] { _ =>
      IO.bracket(acquire.complete(()))(_ => IO.never)(_ => release.complete(42).void)
    }
    fiber <- task.fork
    _ <- acquire.get
    _ <- fiber.interrupt
    a <- release.get
  } yield a

  unsafeRun(io) must_=== 42
}

The bracket arguments are in wrong order: IO.bracket(acquire.complete(()))(_ => IO.never)(_ => release.complete(42).void) should be IO.bracket(acquire.complete(()))(_ => release.complete(42).void)(_ => IO.never). I tried to swap and now the ZIO test hangs as well.

Hope this helps! (with FS2 1.0 and http4s 0.19 both requiring ConcurrentEffect in various places, this ticket is much needed for interop).

@neko-kai neko-kai force-pushed the feature/cats-effect-1.0.0-concurrent branch from 88c627e to 3a3c160 Compare November 30, 2018 10:55
@neko-kai
Copy link
Member Author

@jdegoes travis passes now

Copy link
Member

@regiskuckaertz regiskuckaertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAY 🎉 This is really awesome.

private class CatsEffect extends CatsMonadError[Throwable] with Effect[Task] with CatsSemigroupK[Throwable] with RTS {
protected def exitResultToEither[A]: ExitResult[Throwable, A] => Either[Throwable, A] = _.toEither
@inline final protected def exitResultToEither[A](e: ExitResult[Throwable, A]): Either[Throwable, A] =
e.fold(_.checked[Throwable] match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. Another option would be if there's no checked errors, peeling off the first unchecked error. Not sure it matters much, though.

Copy link
Member Author

@neko-kai neko-kai Dec 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matters for existing code that's catching a specific throwable (e.g. doobie). It's hard to decide what would be best here, as on one hand existing code would have no way of dealing with synthetic exceptions from combined Fibers, but OTOH we want to preserve as much information about errors as possible.

release: A => Task[Unit]
): Task[B] = IO.bracket(acquire)(release(_).catchAll(IO.terminate(_)))(use)
): Task[B] =
IO.bracket(acquire)(release(_).catchAll(IO.terminate))(use)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.catchAll(IO.terminate) is so common we could do a method to implement that pattern: io.orDie: IO[Nothing, A]. This way we could make it zero cost. I'll open a ticket if you like the idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea! I've had multiple questions wrt 'how do I make IO[Nothing, ?] from IO[Throwable, ?]'. I think the name might benefit from being long and explicit, e.g. io.dieOnThrowable

@jdegoes
Copy link
Member

jdegoes commented Dec 1, 2018

@Kaishh Superb work on this! Thanks for all your help and your patience in getting this in.

@neko-kai
Copy link
Member Author

neko-kai commented Dec 1, 2018

@regiskuckaertz I moved out raceAttempt #409 to remove unnecessary impediments to merging

jdegoes
jdegoes previously approved these changes Dec 2, 2018
Copy link
Member

@jdegoes jdegoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to merge on this if @regiskuckaertz doesn't have any objections! There are a couple conflicts, looks like, but easy to fix.

regiskuckaertz
regiskuckaertz previously approved these changes Dec 2, 2018
@regiskuckaertz
Copy link
Member

🎸 🕺

@neko-kai neko-kai dismissed stale reviews from regiskuckaertz and jdegoes via 3fe5fae December 3, 2018 08:47
@jdegoes jdegoes merged commit 98e0ccf into zio:master Dec 3, 2018
@jdegoes
Copy link
Member

jdegoes commented Dec 3, 2018

giphy

@jdegoes
Copy link
Member

jdegoes commented Dec 3, 2018

@Kaishh Superb work! And amazing patience. 😆 Thanks for all your work on this one...

@alexandru
Copy link

👍 nice

}
}

(1 to 50).foreach { s =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to run those 50 times anymore, do we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we don't. But, these tests are cheap enough to run and each iteration gives a slightly higher chance of spotting a regression. I know this is a defensive code smell, but IMHO justified by the domain - these are effectively integration tests, that also highly depend on JVM behavior and on upstream cats-effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants