Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Prometheus metrics aggregator pegging coderd CPU and potentially causing OOM kill #11775

Closed
@mafredri

Description

@mafredri

In the latest scale test, we ran into coderd restarts and upon inspecting logs, we saw that the OOM killer was summoned, that we saw a lot of log messages from the aggregator: update queue is full.

As can be seen from these graphs, both CPU and memory usage spikes coincided with the restart(s).

image

What can also be observed above is that one coderd instance had it's CPU pegged, upon CPU/trace inspection, the finger is once again pointed towards the aggregator:

image

The same is shown in the CPU profile:

image

One code-path that's being executed for a while (as shown in the trace above), is this loop:

for _, m := range req.metrics {

// ping @mtojek for insights since you worked on the initial feature.

Metadata

Metadata

Assignees

Labels

apiArea: HTTP APIs1Bugs that break core workflows. Only humans may set this.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions