Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@LGLO
Copy link
Contributor

@LGLO LGLO commented Sep 25, 2019

Fixes #1764

I race test execution with timeout and also race it, using promise, with test timeout plus interruption timeout.

I'm not really proud of TestAspectSpec additions, any hints there?

Copy link
Contributor

@adamgfraser adamgfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good! A few minor comments but overall this is great. Thanks for taking the lead on identifying the issue and fixing it!

import zio.{ clock, Cause, ZIO, ZManaged, ZSchedule }
import zio.duration.Duration
import zio.{ clock, Cause, Promise, ZIO, ZManaged, ZSchedule }
import zio.duration.{ Duration, _ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing but I think you can just import everything since you're already doing a wildcard import.

val sequential: TestAspectPoly = executionStrategy(ExecutionStrategy.Sequential)

/**
* An aspect that times out tests using the specified duration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice documentation!

def timeout(duration: Duration): TestAspect[Nothing, Live[Clock], Nothing, Any, Nothing, Any] =
def timeout(
duration: Duration,
interruptDuration: Duration = 10.seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this a little shorter? It is a trade-off because we want to give enough time to successfully interrupt if we can but we don't want to wait so long that the user just hits CTRL-C and thinks it is something wrong with ZIO instead of their code and misses out on the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I think if users encounter warning, they will lookup what second parameter is for and act upon this.

for {
p <- Promise.make[TestFailure[E], TestSuccess[S]]
_ <- test
.raceAttempt(Live.withLive(ZIO.fail(timeoutFailure))(_.delay(duration)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the failure isn't going to use any mock effects you could just do live around the whole thing to make this a little shorter, but again very minor

.raceAttempt(Live.withLive(ZIO.fail(timeoutFailure))(_.delay(duration)))
.foldM(p.fail, p.succeed)
.fork
_ <- (Live.withLive(ZIO.unit)(_.delay(duration + interruptDuration)) *> p.fail(interruptionTimeoutFailure)).fork
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just do live(ZIO.sleep(...)) *> p.fail here if you wanted. We should probably interrupt this fiber if the first fiber completes.

isSubtype[T](anything)
)

private def testExecutionFailedWith(spec: ZSpec[Live[Clock], Any, String, Any], pred: Throwable => Boolean) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could implement this in terms of the forallTests combinator in TestUtils.scala. Maybe move this there as well so that others can use it in the future?

failed(spec)
}

def timeoutMakesTestsFailAfterGivenDuration: Future[Boolean] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are a little awkward to write because we would really like to use the MockClock here but we don't want to be using ZIO Test to test ZIO Test. I would shorten the timeout durations because we are using real time here and I don't think the actual amount of time matters since the effects will never terminate. You could probably just use 1 nanosecond here and 1 and 2 nanoseconds below.


testExecutionFailedWith(
spec,
cause => cause.isInstanceOf[TestTimeoutException] && cause.getMessage() == "Timeout of 100 ms exceeded."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use a pattern match here if you want to avoid the isInstanceOf check.

@LGLO LGLO requested a review from adamgfraser September 26, 2019 21:05
.raceAttempt(Live.live(ZIO.fail(timeoutFailure).delay(duration)))
.foldM(p.fail, p.succeed)
.fork
_ <- (Live.live(ZIO.unit.delay(duration + interruptDuration)) *> p.fail(interruptionTimeoutFailure)).fork
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to interrupt this fiber if the first fiber completes successfully? The Promise can only be completed once so it won't effect the results but we will have a lot of fibers sitting out there that aren't doing anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it!

assertM(ZIO.never *> ZIO.unit, equalTo(()))
}: ZSpec[Live[Clock], Any, String, Any]) @@ timeout(1.millis)

failedWith(spec, cause => cause == TestTimeoutException("Timeout of 1 ms exceeded."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reduce this to nanoseconds now that we have the #1839 merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will reduce.

@LGLO
Copy link
Contributor Author

LGLO commented Sep 27, 2019

Now it interrupts test/timeout fibers but what I don't really like is that it spams output with fibers traces:

[info] running zio.test.TestMain 
Fiber failed.
A checked error was not handled.
Runtime(Die(zio.test.TestTimeoutException: Timeout of 1 ns exceeded.))

Fiber:7772 was supposed to continue to:
  a future continuation at zio.ZIO.run(ZIO.scala:1110)
  a future continuation at zio.ZIO.bracket_(ZIO.scala:144)

Fiber:7772 execution trace:
  at <unknown>.<unknown>(ZIO.scala:0)
  at zio.internal.FiberContext$InterruptExit$.apply(FiberContext.scala:156)
  at zio.internal.FiberContext$InterruptExit$.apply(FiberContext.scala:147)
  at zio.ZIO.ensuring(ZIO.scala:320)
  at zio.ZIO.onInterrupt(ZIO.scala:620)
  at zio.ZIO.ensuring(ZIO.scala:318)
  at zio.internal.FiberContext$InterruptExit$.apply(FiberContext.scala:147)
  at zio.ZIO$._IdentityFn(ZIO.scala:2541)
  at zio.ZIOFunctions.effectAsyncInterrupt(ZIO.scala:1851)
  at zio.ZIOFunctions.effectAsyncInterrupt(ZIO.scala:1847)
  at zio.ZIOFunctions.effectAsyncInterrupt(ZIO.scala:1847)
  at zio.clock.Clock$Live$$anon$1.sleep(Clock.scala:48)
  at zio.clock.package$.sleep(clock.scala:50)
  at zio.ZIO.bracket_(ZIO.scala:144)
  at zio.internal.FiberContext.evaluateNow(FiberContext.scala:470)
  at zio.test.mock.Live$.live(Live.scala:55)

Fiber:7772 was spawned by:

Fiber:7767 was supposed to continue to:
  a future continuation at zio.ZIO.raceWith(ZIO.scala:946)
  a future continuation at zio.ZIO.raceWith(ZIO.scala:958)
  a future continuation at zio.ZIO.refailWithTrace(ZIO.scala:976)

Fiber:7767 execution trace:
  at zio.ZIO.raceWith(ZIO.scala:945)
  at zio.ZIO.raceWith(ZIO.scala:941)
  at zio.Ref$.make(Ref.scala:169)
  at zio.ZIO.raceWith(ZIO.scala:940)
  at zio.Ref$.make(Ref.scala:169)
  at zio.ZIO.raceWith(ZIO.scala:939)
  at zio.Promise$.make(Promise.scala:239)

Fiber:7767 was spawned by:

Fiber:7765 was supposed to continue to:
  a future continuation at zio.ZIO.raceWith(ZIO.scala:945)
  a future continuation at zio.ZIO.raceWith(ZIO.scala:958)
  a future continuation at zio.ZIO.refailWithTrace(ZIO.scala:976)
  a future continuation at zio.ZIO.run(ZIO.scala:1110)
  a future continuation at zio.ZIO.bracket_(ZIO.scala:144)
  a future continuation at zio.ZIO.run(ZIO.scala:1110)
  a future continuation at zio.ZManaged.use(ZManaged.scala:640)
  a future continuation at zio.test.TestExecutor$.managed(TestExecutor.scala:29)
  a future continuation at zio.test.Spec.foreachExec(Spec.scala:95)
  a future continuation at zio.test.TestUtils$.forAllTests(TestUtils.scala:26)

Fiber:7765 execution trace:
  at zio.ZIO.raceWith(ZIO.scala:941)
  at zio.Ref$.make(Ref.scala:169)
  at zio.ZIO.raceWith(ZIO.scala:940)
  at zio.Ref$.make(Ref.scala:169)
  at zio.ZIO.raceWith(ZIO.scala:939)
  at zio.Promise$.make(Promise.scala:239)
  at zio.ZIO.bracket_(ZIO.scala:144)
  at zio.internal.FiberContext.evaluateNow(FiberContext.scala:470)
  at zio.ZIO.provideSomeManaged(ZIO.scala:757)
  at zio.test.mock.MockEnvironment$.Value(MockEnvironment.scala:54)

Fiber:7765 was spawned by:

Fiber:7764 was supposed to continue to:
  a future continuation at zio.ZIO.toFutureWith(ZIO.scala:1324)
  a future continuation at zio.Runtime.unsafeRunToFuture(Runtime.scala:111)

Fiber:7764 ZIO Execution trace: <empty trace>

Fiber:7764 was spawned by: <empty trace>
Fiber failed.
╥
╠─An error was rethrown with a new trace.
║ 
║ Fiber:7767 was supposed to continue to: <empty trace>
║ 
║ Fiber:7767 execution trace:
║   at zio.ZIO.refailWithTrace(ZIO.scala:976)
║   at zio.ZIO.ensuring(ZIO.scala:315)
║   at zio.ZIO.onInterrupt(ZIO.scala:620)
║   at zio.ZIO.ensuring(ZIO.scala:313)
║   at zio.ZIO.ensuring(ZIO.scala:315)
║   at zio.ZIO.onInterrupt(ZIO.scala:620)
║   at zio.ZIO.ensuring(ZIO.scala:313)
║   at zio.ZIO$._IdentityFn(ZIO.scala:2541)
║   at zio.ZIOFunctions.effectAsyncInterrupt(ZIO.scala:1851)
║   at zio.ZIOFunctions.effectAsyncInterrupt(ZIO.scala:1847)
║   at zio.ZIOFunctions.effectAsyncInterrupt(ZIO.scala:1847)
║   at zio.ZIO.raceWith(ZIO.scala:947)
║   at zio.ZIO.raceWith(ZIO.scala:947)
║   at zio.ZIO.raceWith(ZIO.scala:947)
║   at zio.ZIO.raceWith(ZIO.scala:946)
║   at zio.ZIO.raceWith(ZIO.scala:945)
║   at zio.ZIO.raceWith(ZIO.scala:941)
║   at zio.Ref$.make(Ref.scala:169)
║   at zio.ZIO.raceWith(ZIO.scala:940)
║   at zio.Ref$.make(Ref.scala:169)
║   at zio.ZIO.raceWith(ZIO.scala:939)
║   at zio.Promise$.make(Promise.scala:239)
║ 
║ Fiber:7767 was spawned by:

and so on...

@adamgfraser
Copy link
Contributor

Yes I definitely agree. Is it because the Die is traced so the pattern match here is not working? Does it work if you change cause match to cause.untraced match?

@adamgfraser
Copy link
Contributor

Why are only some of the tests failing? Is there some nondeterminism now?

@LGLO
Copy link
Contributor Author

LGLO commented Sep 27, 2019

I don't know answers for any of these questions. I'll investigate on weekend. Thanks for hints!

@adamgfraser
Copy link
Contributor

@LGLO Spent a little more time on this and I think there are a couple of things going on here.

Regarding the tracing behavior, the same problem actually existed in the commit from before today, we just didn't notice, and it is not an issue with the test reporter. The issue is that in raceAttempt if the losing fiber fails with an error that gets translated into a fiber failure. So for example in:

ZIO.succeed(1).raceAttempt(ZIO.fail("foo"))

The effect will succeed but the fiber failure will also be reported to the console. So the solution to avoid this is to use either on the test and put the timeout exception in ZIO.succeed(Left(e)) instead of ZIO.fail(e).

The other issue is that when we have a race and the loser is uninterruptible, the fiber for the race itself can't be interrupted while it is trying to cancel the loser. So I think your earlier approach of having the two effects race to set a promise and forking it is much better. That got me to:

for {
  p <- Promise.make[TestFailure[E], TestSuccess[S]]
  _ <- test.either
        .race(Live.live(ZIO.succeed(Left(timeoutFailure)).delay(duration)))
        .flatMap(_.fold(p.fail, p.succeed))
        .fork
  _      <- (Live.live(ZIO.unit.delay(duration + interruptDuration)) *> p.fail(interruptionTimeoutFailure)).fork
  result <- p.await
} yield result

I thought that we could handle interrupting the second fiber if it is not needed by moving it before the first fiber in the for comprehension and cancelling it at the same time we fulfilled the promise, but at least in my local testing when I did that I got a warning about another resource leak because we are trying to cancel it after the test has already completed. So not sure we can do that part.

@LGLO
Copy link
Contributor Author

LGLO commented Sep 28, 2019

@adamgfraser I got rid of warnings and fiber traces.
I changed timeout to 10 ms for test with interruptible test fiber. My hypothesis is that it was failing because scheduler was running timeout fiber before test fiber started and become interruptible.

This time ZIOSpec failed, seems unrelated.

@adamgfraser
Copy link
Contributor

This is getting complicated. Can we simplify by using raceWith so we can control how we handle the loser and don't automatically block on trying to interrupt it? How about:

test.either.raceWith(Live.live(ZIO.sleep(duration)))(
  (exit, fiber) => ZIO.done(exit) <* fiber.interrupt,
  (_, fiber) => fiber.interrupt.raceWith(Live.live(ZIO.sleep(interruptDuration)))(
    (_, fiber) => ZIO.succeed(Left(timeoutFailure)) <* fiber.interrupt,
    (_, _) => ZIO.succeed(Left(interruptionTimeoutFailure))
  )
).absolve

I think this will also address the flakiness and let us reduce the timeout back to 2 ns.

@jdegoes
Copy link
Member

jdegoes commented Sep 30, 2019

@LGLO ZIO#timeout should automatically interrupt after the specified time elapses. What behavior did you observe that led to this re-implementation?

@adamgfraser
Copy link
Contributor

@jdegoes Currently if timeout is called on a test that is non-terminating and uninterruptible it can cause the entire test suite to hang. For example, consider:

testM("test") {
  assertM(ZIO.never.uninterruptible, anything)
} @@ timeout(60.seconds)

This is obviously a contrived example, but during development a user can accidentally write code involving such an effect and under the current behavior it could appear that ZIO Test is not working correctly.

@jdegoes
Copy link
Member

jdegoes commented Sep 30, 2019

@adamgfraser Excellent point!

I wonder if this (useful) functionality should be built into ZIO, e.g. timeoutFork or something (a timeout that does not wait for the timed-out action to terminate)?

Then we can use that operator here in the implementation of the timeout aspect.

@adamgfraser
Copy link
Contributor

@jdegoes Yes I think that is a great idea. How about:

  final def timeoutFork(d: Duration): ZIO[R with Clock, E, Either[Fiber[E, A], A]] =
    raceWith(ZIO.sleep(d))(
      (exit, fiber) => ZIO.done(exit).map(Right(_)) <* fiber.interrupt,
      (_, fiber) => fiber.interrupt.flatMap(ZIO.done).fork.map(Left(_))
    )

If the effect is timed out we give the user back a reference to the fiber interrupting the effect so they can decide what further action to take (e.g. ignore it, wait a certain amount of time for it to succeed or else do something else).

@LGLO
Copy link
Contributor Author

LGLO commented Sep 30, 2019

@adamgfraser - your implementation is far more elegant and I think we could go with it or wait for timeoutFork.
It doesn't fix flakiness, I observed 2/100000 failure rate when testing locally.
My guess is that we timeout before test fiber even enters uninterruptible section. I think we cannot solve this, it's just scheduler internal thing.

@adamgfraser
Copy link
Contributor

@LGLO I submitted #1856 to add timeoutFork. Would appreciate any thoughts you have. I'm fine with either merging a cleaned up version of this or waiting until we have timeoutFork in, whichever you prefer.

Good thought checking the flakiness with a large number of repetitions. I'm having a hard time replicating locally using timeoutFork but maybe it is just very rare. Here is the implementation I am using in TestAspect.timeout:

Live
  .withLive(test)(_.either.timeoutFork(duration).flatMap {
    case Left(fiber) =>
      fiber.join.raceWith(ZIO.sleep(interruptDuration))(
        (_, fiber) => ZIO.succeed(Left(timeoutFailure)) <* fiber.interrupt,
        (_, _) => ZIO.succeed(Left(interruptionTimeoutFailure))
      )
    case Right(result) => ZIO.succeed(result)
  })
  .absolve

I'm testing flakiness using TestUtils.nonFlaky and changing the schedule to recur 100,000 times. Maybe we check one more time with the final implementation?

@LGLO
Copy link
Contributor Author

LGLO commented Sep 30, 2019

With parameters 2.nanos and 1.nano I can observe flakiness.
I'm testing it cottage industry way: Future.sequence 100K runs and forall(_ == true) on resulting list. Also I print if cause is not equal to what I expect.

@adamgfraser
Copy link
Contributor

@LGLO Just submitted a PR to your branch to implement TestAspect#timeout in terms of ZIO#timeoutFork.

Implement TestAspect#timeout Via ZIO#timeoutFork
@LGLO
Copy link
Contributor Author

LGLO commented Oct 1, 2019

@ghostdogpr Could you take a look?

@LGLO
Copy link
Contributor Author

LGLO commented Oct 1, 2019

@adamgfraser Thanks for your comments and PR! Now we need someone to review this effort. Could you remove your changes request?

@adamgfraser adamgfraser self-requested a review October 1, 2019 18:46
@adamgfraser
Copy link
Contributor

@LGLO Done.

@ghostdogpr ghostdogpr merged commit b1f9abf into zio:master Oct 1, 2019
@LGLO LGLO deleted the 1764-timeout branch October 11, 2019 04:24
Twizty pushed a commit to Twizty/zio that referenced this pull request Nov 13, 2019
* Add interrupt deadline to 'timeout' TestAspect

* Fixed imports. Using Live.live(...). Moved util funtion to TestUtil.

* failedWith implemented with

* Reduced timeouts to nanoseconds

* Interrupt timeout threads

* Low-level promise operations to avoid warnings.

* Don't try to interrupt completed fiber

* Anti-flakiness
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'timeout' aspect stalls test if timeout happens with uninterruptible effect

4 participants