You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/best-practices/scale-coder.md
+14-25
Original file line number
Diff line number
Diff line change
@@ -18,20 +18,13 @@ and metrics to identify potential bottlenecks before they negatively affect the
18
18
end-user experience and measure the effects of modifications you make to your
19
19
deployment.
20
20
21
-
**Log output**
22
-
23
-
- Capture log output from Loki, CloudWatch logs, and other tools on your Coder
24
-
Server instances and external provisioner daemons and store them in a
21
+
- Log output
22
+
- Capture log output from Loki, CloudWatch logs, and other tools on your Coder Server instances and external provisioner daemons and store them in a
25
23
searchable log store.
24
+
- Retain logs for a minimum of thirty days, ideally ninety days. This allows you to look back to see when anomalous behaviors began.
26
25
27
-
- Retain logs for a minimum of thirty days, ideally ninety days. This allows
28
-
you to look back to see when anomalous behaviors began.
29
-
30
-
**Metrics**
31
-
32
-
- Capture infrastructure metrics like CPU, memory, open files, and network I/O
33
-
for all Coder Server, external provisioner daemon, workspace proxy, and
34
-
PostgreSQL instances.
26
+
- Metrics
27
+
- Capture infrastructure metrics like CPU, memory, open files, and network I/O for all Coder Server, external provisioner daemon, workspace proxy, and PostgreSQL instances.
35
28
36
29
### Capture Coder server metrics with Prometheus
37
30
@@ -73,27 +66,23 @@ dashboard, for example, [Grafana](../../admin/monitoring/index.md).
73
66
Configure alerting based on these metrics to ensure you surface problems before
74
67
they affect the end-user experience.
75
68
76
-
**CPU and Memory Utilization**
69
+
#### CPU and Memory Utilization
77
70
78
-
- Monitor the utilization as a fraction of the available resources on the
79
-
instance.
71
+
Monitor the utilization as a fraction of the available resources on the instance.
80
72
81
-
Utilization will vary with use throughout the course of a day, week, and
82
-
longer timelines. Monitor trends and pay special attention to the daily and
83
-
weekly peak utilization. Use long-term trends to plan infrastructure upgrades.
73
+
Utilization will vary with use throughout the course of a day, week, and longer timelines. Monitor trends and pay special attention to the daily and weekly peak utilization. Use long-term trends to plan infrastructure upgrades.
84
74
85
-
**Tail latency of Coder Server API requests**
75
+
#### Tail latency of Coder Server API requests
86
76
87
-
- High tail latency can indicate Coder Server or the PostgreSQL database is low
88
-
on resources.
77
+
High tail latency can indicate Coder Server or the PostgreSQL database is low on resources.
89
78
90
-
Use the `coderd_api_request_latencies_seconds` metric.
79
+
- Use the `coderd_api_request_latencies_seconds` metric.
91
80
92
-
**Tail latency of database queries**
81
+
#### Tail latency of database queries
93
82
94
-
-High tail latency can indicate the PostgreSQL database is low in resources.
83
+
High tail latency can indicate the PostgreSQL database is low in resources.
95
84
96
-
Use the `coderd_db_query_latencies_seconds` metric.
85
+
- Use the `coderd_db_query_latencies_seconds` metric.
0 commit comments