-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Version: v2.28.3 7beb95f
Coder Self-Hosted Service Issue
"coder.service exits and restarts due to pubsub watchdog timeout under system memory pressure"
We observed repeated restarts of coder.service coinciding with periods of
severe system memory pressure. During these events, SSH to the host became
temporarily unavailable, and journald logged sustained memory pressure
messages.
The Coder process did not crash or get OOM-killed. Instead, it exited
intentionally after an internal watchdog triggered:
-
The primary error reported was:
ERROR: Unexpected error, shutting down server: pubsub Watchdog timed out
-
The service then attempted a graceful shutdown, but parts of shutdown
exceeded expected deadlines:API server shutdown took longer than 3s: context deadline exceeded
-
systemd recorded a clean exit with
status=1/FAILUREand restarted the
service.
Relevant context:
-
Coder is running with the built-in PostgreSQL enabled.
-
The host experienced sustained memory pressure around the time of failure, as
evidenced by repeated:systemd-journald: Under memory pressure, flushing caches
-
The watchdog timeout appears to be a secondary effect of resource starvation
rather than a direct fault in pubsub itself.
Impact:
- Temporary loss of SSH access to the host.
- Coder service unavailable during shutdown and restart.
- Restart loop occurred multiple times in the same day.
Logs around the failure window are included below for reference.
Here's a list of systemd-journald log lines that could be helpful: clip-20251231-175629.txt
As a mitigation, I think I'm going to upgrade the VM this service is running on: currently running Ubuntu 24.04 on 1 CPU Core, 50 GB Storage, 2 GB RAM.