ZIO 2.0.3 with Http4s and Blaze performance regression

ZIO 2.0 with Http4s and Blaze performance regression

# Background
This is a carry-over discussion from Discord. It is not necessarily highlighting a specific bug, merely a difference in out-of-the box performance between ZIO 1 (with tracing disabled) and ZIO 2.

After porting a ZIO application from ZIO 1.0.16 to ZIO 2.0.3, it became clear that the ported application doesn't perform as well as before. The original application has been running for many months in production, so there is plenty of historical performance data both in production and in the Gatling driven performance benchmarking environment.

The application is an Http4s service backed by Blaze. In essence, it receives a json payload, parses it, does some computations against some in-memory cached data, then generates a json response and finally records some metrics *for each request* in datadog via UDP.

The ZIO 1 application has tracing disabled. There are 100 instances of these application in production, so small improvements in CPU are useful.

The service must also respond within a target of 3ms, with the maximum workable response time of about 25ms (the caller has 50ms timeout including network, etc). ZGC is used to keep ensure that GC activity doesn't pause for long. JDK17 is used. Flags are `-XX:ActiveProcessorCount=8 -XX:+AlwaysPreTouch -XX:+UseZGC -Xshare:off -Xlog:async -XX:MaxRAMPercentage=70.0` with 8GB allocated, and CPU quota of 8 CPUs out of 36vCPUs for the host

# The issues

The CPU usage has increased in the application considerably. The increase in CPU is not uniform, in the sense that when a lower number of requests per second are running through it, the performance relative to ZIO 1 is worse than when the application is running at 3000 RPS. Each test runs for an hour. See the image for details.

![image](https://user-images.githubusercontent.com/632442/201408318-b8bada03-5d36-4597-89d3-fd96a75b860f.png)

The CPU usage is one issue, but the bigger concern was that the percentiles (95, 99.9, 99.999, etc) for request duration were considerably worse. As the ZIO 2 service uses above 70% CPU, it is expected that the request times get worse, as 70% is max "rule of thumb" CPU for ZGC to be able to proactively GC. The issue is that even at low RPS/CPU usage, the response percentiles look bad. This means that it will be hard to compensate by just adding more CPU resources to fix the issue.

The last point is that the CPU has substantially gone up whilst the service is mostly idle. When the service is just receiving healthchecks from the loadbalancer, there is measureable CPU usage spikes in ZIO 2.

# Attempted remedial work

## Part 1 - Find if anything is blocking either explicitly or implicitly
We had a hunch that perhaps the autoblocking was somehow involved, so went looking for blocking code. It turns out the metrics code (loosely based on https://github.com/zio/zio-metrics/) does a blocking write to the UDP port without explicitly calling within a blocking effect. In ZIO 2, this may or may not be detected by the autoblocking, as the blocked duration is pretty short on the happy path (<2ms). We changed the client to non-blocking mode with `DatagramChannel.configureBlocking(false)` and re-ran the perf tests. This had a hugely positive affect on the response percentiles throughout the 1hr run, not just during the start when autoblocking might not have detected something that is blocking. The change brought ZIO 2 halfway back to ZIO 1 (without the patch). Unfortunately/fortunately when we backported the change to ZIO 1, this also had a positive affect, although much less of an effect than it did on ZIO 2.

## Part 2 - Remove one of the timeouts in Http4s.
We supposed that fewer async calls might cause better performance, and one of the request timeouts was optional, but used an additional async. Removing it made no difference

## Part 3 - Changing runtime flags
At John's suggestion, we modified the runtime flags, removing `WorkStealing` and `EnableFiberRoots` and leaving only `Interruption` and `CooperativeYielding`. This has a very positive effect on response percentiles, bringing stablity and almost the same max response times as ZIO 1. We also had the fixes from part 1 during this test

# Next steps

At this point, with the changes to the UDP to be non-blocking at the socket level, and the runtime changes, the CPU usage is much closer to ZIO 1 without the non-blocking changes. The response percentiles appear to be almost as good as ZIO 1, although next week we will run multiple performance tests to 1) unpack which runtime flag made the difference, and 2) bring smaller confidence bounds around the measurements.

Also we are happy to try anything else that might improve performance. Sorry this issue doesn't have anything very specific in regarding isolating the problems. Hopefully those details will come next week.

# Thoughts

If the issue is the WorkStealing flag then perhaps it should be turned off by default, if for the majority of use-cases it performs worst (TBD). 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ZIO 2.0.3 with Http4s and Blaze performance regression #7517

Background

The issues

Attempted remedial work

Part 1 - Find if anything is blocking either explicitly or implicitly

Part 2 - Remove one of the timeouts in Http4s.

Part 3 - Changing runtime flags

Next steps

Thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ZIO 2.0.3 with Http4s and Blaze performance regression #7517

Description

Background

The issues

Attempted remedial work

Part 1 - Find if anything is blocking either explicitly or implicitly

Part 2 - Remove one of the timeouts in Http4s.

Part 3 - Changing runtime flags

Next steps

Thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions