Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unclean shutdown as a result of a race between shutdown hooks #9807

@RafalSumislawski

Description

@RafalSumislawski

For the purpose of this explanation I will refer to any shutdown sequence that completes without writing any errors to stdout/stderr as "clean shutdown" and any that writes such errors as "unclear shutdown". Clean deallocation of resources is not the focus of this bug report.

Let's have a look at this dead simple ZIO application:

import zio._

object CleanShutdown extends ZIOAppDefault {

  override def run: ZIO[Any, Throwable, Unit] =
    ZIO.never

}

It shuts down cleanly after receiving a SIGTERM/SIGINT. This is a welcome behaviour, but it's surprising given the main implementation https://github.com/zio/zio/blob/series/2.x/core/jvm-native/src/main/scala/zio/ZIOAppPlatformSpecific.scala

If we follow that code here's what I would expect to happen:

  1. The main methods (which runs on the main thread) executes a runtime.unsafe.run
  2. SIGTERM/SIGINT leads to execution of the shutdown hook, which interrupts the fiber
  3. Interruption of fiber complets, the shutdown hook completes
  4. the main thread joins the interrupted fiber, so the runtime.unsafe.run returns an Exit that is Failed/Interrupted
  5. getOrThrowFiberFailure throws a FiberFailure
  6. java.lang.ThreadGroup#uncaughtException prints Exception in thread "main" ... to stderr

But that's not what we see, because in this simple application the ZIO shutdown hook is the slowest (the only) shutdown hook, so right after it completes (3.), the main thread is stopped, and it happens fast enough that it doesn't print that uncaught Exception.

But a real world application may have some other shutdown hooks, and those may run longer than the ZIO shutdown hook. We can simulate it with this application code:

import zio._

object UncleanShutdown extends ZIOAppDefault {

  java.lang.Runtime.getRuntime.addShutdownHook(new Thread {
    override def run(): Unit = {
      Thread.sleep(2000)
    }
  })

  override def run: ZIO[Any, Throwable, Unit] =
    ZIO.uninterruptible(ZIO.sleep(1000.milliseconds)).forever

}

On shutdown this application prints to stderr:

Exception in thread "main" Exception in thread "zio-fiber-163356120" java.lang.InterruptedException: Interrupted by thread "zio-fiber-404068148"
	Suppressed: java.lang.InterruptedException: Interrupted by thread "zio-fiber-404068148"
		Suppressed: java.lang.InterruptedException: Interrupted by thread "zio-fiber-404068148"
			Suppressed: java.lang.InterruptedException: Interrupted by thread "zio-fiber-404068148"

In a real world application the timing of these two hooks would be much more complex and depending on which one is faster we either get an error on shutdown or not.

That's a long explanation, so the state the nature of the bug clearly: I would expect the shutdown sequence to produce the same result regardless of relative timing of shutdown hooks, and I would expect no error printed to stderr.

I doubt that it matters but: I'm using ZIO 2.1.17 on Temurin-17.0.4.1+1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions