Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add Prometheus metrics to distinguish graceful vs ungraceful client-agent disconnects #743

Open
@blink-so

Description

@blink-so

Problem

Customers need to monitor for unexpected workspace disconnects to set up proactive alerts for service degradation. Current Prometheus metrics don't distinguish between:

  • Expected disconnects: User intentionally closes session/stops workspace
  • Unexpected disconnects: Network issues, agent crashes, infrastructure problems

The existing coderd_agents_connections metric only shows agent-to-coderd connection status, not client-to-agent disconnects that users actually experience.

Current Limitations

  • coderd_agents_connections{status="disconnected"} includes both graceful and ungraceful disconnects
  • coderd_agents_connections{status="timeout"} only covers connection establishment timeouts
  • No metrics track client-to-agent session disconnects (SSH, VS Code, etc.)

Proposed Solution

Add new Prometheus metrics that leverage the coordinator's existing "graceful disconnect" concept:

coderd_agent_client_disconnects_total{type="graceful|ungraceful", agent_name, username, workspace_name}

This would enable customers to:

  • Alert specifically on ungraceful disconnects: rate(coderd_agent_client_disconnects_total{type="ungraceful"}[5m]) > threshold
  • Monitor service health without noise from normal user behavior
  • Distinguish infrastructure issues from user-initiated actions

Why This Matters

Customers deploying Coder at scale need reliable alerting for actual service degradation. Current metrics generate false positives from normal workspace stops, making it difficult to detect real issues that impact user productivity.

Reference: Customer ticket #3917

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions