You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/best-practices/scale-coder.md
+67-75
Original file line number
Diff line number
Diff line change
@@ -1,34 +1,33 @@
1
-
kl# Scale Coder
1
+
# Scale Coder
2
2
3
-
December 20, 2024
4
-
5
-
---
6
-
7
-
This best practice guide helps you prepare a low-scale Coder deployment so that
8
-
it can be scaled up to a high-scale deployment as use grows, and keep it
9
-
operating smoothly with a high number of active users and workspaces.
3
+
This best practice guide helps you prepare a low-scale Coder deployment that you can
4
+
scale up to a high-scale deployment as use grows, and keep it operating smoothly with a
5
+
high number of active users and workspaces.
10
6
11
7
## Observability
12
8
13
-
Observability is one of the most important aspects to a scalable Coder
14
-
deployment.
9
+
Observability is one of the most important aspects to a scalable Coder deployment.
10
+
When you have visibility into performance and usage metrics, you can make informed
11
+
decisions about what changes you should make.
15
12
16
13
[Monitor your Coder deployment](../../admin/monitoring/index.md) with log output
17
14
and metrics to identify potential bottlenecks before they negatively affect the
18
15
end-user experience and measure the effects of modifications you make to your
19
16
deployment.
20
17
21
18
- Log output
22
-
- Capture log output from Loki, CloudWatch logs, and other tools on your Coder Server instances and external provisioner daemons and store them in a
23
-
searchable log store.
24
-
- Retain logs for a minimum of thirty days, ideally ninety days. This allows you to look back to see when anomalous behaviors began.
19
+
- Capture log output from Loki, CloudWatch logs, and other tools on your Coder Server
20
+
instances and external provisioner daemons and store them in a searchable log store.
21
+
- Retain logs for a minimum of thirty days, ideally ninety days.
22
+
This allows you to look back to see when anomalous behaviors began.
25
23
26
24
- Metrics
27
-
- Capture infrastructure metrics like CPU, memory, open files, and network I/O for all Coder Server, external provisioner daemon, workspace proxy, and PostgreSQL instances.
25
+
- Capture infrastructure metrics like CPU, memory, open files, and network I/O for all
26
+
Coder Server, external provisioner daemon, workspace proxy, and PostgreSQL instances.
28
27
29
28
### Capture Coder server metrics with Prometheus
30
29
31
-
To capture metrics from Coder Server and external provisioner daemons with
30
+
Edit your Helm `values.yaml` to capture metrics from Coder Server and external provisioner daemons with
Retain metric time series for at least six months. This allows you to see
59
58
performance trends relative to user growth.
60
59
61
60
For a more comprehensive overview, integrate metrics with an observability
62
-
dashboard, for example,[Grafana](../../admin/monitoring/index.md).
61
+
dashboard like [Grafana](../../admin/monitoring/index.md).
63
62
64
63
### Observability key metrics
65
64
66
65
Configure alerting based on these metrics to ensure you surface problems before
67
66
they affect the end-user experience.
68
67
69
-
#### CPU and Memory Utilization
70
-
71
-
Monitor the utilization as a fraction of the available resources on the instance.
72
-
73
-
Utilization will vary with use throughout the course of a day, week, and longer timelines. Monitor trends and pay special attention to the daily and weekly peak utilization. Use long-term trends to plan infrastructure upgrades.
74
-
75
-
#### Tail latency of Coder Server API requests
76
-
77
-
High tail latency can indicate Coder Server or the PostgreSQL database is low on resources.
68
+
- CPU and Memory Utilization
69
+
- Monitor the utilization as a fraction of the available resources on the instance.
78
70
79
-
- Use the `coderd_api_request_latencies_seconds` metric.
71
+
Utilization will vary with use throughout the course of a day, week, and longer timelines. Monitor trends and pay special attention to the daily and weekly peak utilization. Use long-term trends to plan infrastructure upgrades.
80
72
81
-
#### Tail latency of database queries
73
+
- Tail latency of Coder Server API requests
74
+
- High tail latency can indicate Coder Server or the PostgreSQL database is low on resources.
75
+
- Use the `coderd_api_request_latencies_seconds` metric.
82
76
83
-
High tail latency can indicate the PostgreSQL database is low in resources.
84
-
85
-
- Use the `coderd_db_query_latencies_seconds` metric.
77
+
- Tail latency of database queries
78
+
- High tail latency can indicate the PostgreSQL database is low in resources.
79
+
- Use the `coderd_db_query_latencies_seconds` metric.
0 commit comments