docs: add new scaling doc to best practices section (#15904)

EdwardAngert · spikecurtis · web-flow · commit 02d0650ae8aa · 2025-01-21T15:02:02.000-05:00
[preview](https://coder.com/docs/@bp-scaling-coder/tutorials/best-practices/scale-coder) --------- Co-authored-by: Spike Curtis <spike@coder.com>
diff --git a/docs/admin/infrastructure/scale-testing.md b/docs/admin/infrastructure/scale-testing.md
@@ -5,35 +5,37 @@ without compromising service. This process encompasses infrastructure setup,
 traffic projections, and aggressive testing to identify and mitigate potential
 bottlenecks.
 
-A dedicated Kubernetes cluster for Coder is recommended to configure, host and
+A dedicated Kubernetes cluster for Coder is recommended to configure, host, and
 manage Coder workloads. Kubernetes provides container orchestration
 capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
 across a distributed infrastructure. This ensures high availability, fault
 tolerance, and scalability for Coder deployments. Coder is deployed on this
 cluster using the
 [Helm chart](../../install/kubernetes.md#4-install-coder-with-helm).
 
+For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
+
 ## Methodology
 
 Our scale tests include the following stages:
 
 1. Prepare environment: create expected users and provision workspaces.
 
-2. SSH connections: establish user connections with agents, verifying their
+1. SSH connections: establish user connections with agents, verifying their
    ability to echo back received content.
 
-3. Web Terminal: verify the PTY connection used for communication with Web
+1. Web Terminal: verify the PTY connection used for communication with Web
    Terminal.
 
-4. Workspace application traffic: assess the handling of user connections with
+1. Workspace application traffic: assess the handling of user connections with
    specific workspace apps, confirming their capability to echo back received
    content effectively.
 
-5. Dashboard evaluation: verify the responsiveness and stability of Coder
+1. Dashboard evaluation: verify the responsiveness and stability of Coder
    dashboards under varying load conditions. This is achieved by simulating user
    interactions using instances of headless Chromium browsers.
 
-6. Cleanup: delete workspaces and users created in step 1.
+1. Cleanup: delete workspaces and users created in step 1.
 
 ## Infrastructure and setup requirements
 
@@ -54,13 +56,16 @@ channel for IDEs with VS Code and JetBrains plugins.
 The basic setup of scale tests environment involves:
 
 1. Scale tests runner (32 vCPU, 128 GB RAM)
-2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
-3. Database: 1 instance (2 vCPU, 32 GB RAM)
-4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
+1. Coder: 2 replicas (4 vCPU, 16 GB RAM)
+1. Database: 1 instance (2 vCPU, 32 GB RAM)
+1. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
+
+The test is deemed successful if:
 
-The test is deemed successful if users did not experience interruptions in their
-workflows, `coderd` did not crash or require restarts, and no other internal
-errors were observed.
+- Users did not experience interruptions in their
+workflows,
+- `coderd` did not crash or require restarts, and
+- No other internal errors were observed.
 
 ## Traffic Projections
 
@@ -90,11 +95,11 @@ Database:
 
 ## Available reference architectures
 
-[Up to 1,000 users](./validated-architectures/1k-users.md)
+- [Up to 1,000 users](./validated-architectures/1k-users.md)
 
-[Up to 2,000 users](./validated-architectures/2k-users.md)
+- [Up to 2,000 users](./validated-architectures/2k-users.md)
 
-[Up to 3,000 users](./validated-architectures/3k-users.md)
+- [Up to 3,000 users](./validated-architectures/3k-users.md)
 
 ## Hardware recommendation
 
@@ -107,7 +112,7 @@ guidance on optimal configurations. A reasonable approach involves using scaling
 formulas based on factors like CPU, memory, and the number of users.
 
 While the minimum requirements specify 1 CPU core and 2 GB of memory per
-`coderd` replica, it is recommended to allocate additional resources depending
+`coderd` replica, we recommend that you allocate additional resources depending
 on the workload size to ensure deployment stability.
 
 #### CPU and memory usage
diff --git a/docs/admin/infrastructure/scale-utility.md b/docs/admin/infrastructure/scale-utility.md
@@ -1,20 +1,23 @@
 # Scale Tests and Utilities
 
-We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
+We scale-test Coder with a built-in utility that can
 be used in your environment for insights into how Coder scales with your
-infrastructure. For scale-testing Kubernetes clusters we recommend to install
+infrastructure. For scale-testing Kubernetes clusters we recommend that you install
 and use the dedicated Coder template,
 [scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).
 
 Learn more about [Coder’s architecture](./architecture.md) and our
 [scale-testing methodology](./scale-testing.md).
 
+For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).
+
 ## Recent scale tests
 
-> Note: the below information is for reference purposes only, and are not
-> intended to be used as guidelines for infrastructure sizing. Review the
-> [Reference Architectures](./validated-architectures/index.md#node-sizing) for
-> hardware sizing recommendations.
+The information in this doc is for reference purposes only, and is not intended
+to be used as guidelines for infrastructure sizing.
+
+Review the [Reference Architectures](./validated-architectures/index.md#node-sizing) for
+hardware sizing recommendations.
 
 | Environment      | Coder CPU | Coder RAM | Coder Replicas | Database          | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested  |
 |------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------|
@@ -25,26 +28,32 @@ Learn more about [Coder’s architecture](./architecture.md) and our
 | Kubernetes (GKE) | 4 cores   | 16 GB     | 2              | db-custom-8-30720 | 2000  | 50                | 2000 simulated                        | `v2.8.4`      | Feb 28, 2024 |
 | Kubernetes (GKE) | 2 cores   | 4 GB      | 2              | db-custom-2-7680  | 1000  | 50                | 1000 simulated                        | `v2.10.2`     | Apr 26, 2024 |
 
-> Note: a simulated connection reads and writes random data at 40KB/s per
-> connection.
+> Note: A simulated connection reads and writes random data at 40KB/s per connection.
 
 ## Scale testing utility
 
 Since Coder's performance is highly dependent on the templates and workflows you
 support, you may wish to use our internal scale testing utility against your own
 environments.
 
-> Note: This utility is experimental. It is not subject to any compatibility
-> guarantees, and may cause interruptions for your users. To avoid potential
-> outages and orphaned resources, we recommend running scale tests on a
-> secondary "staging" environment or a dedicated
-> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
-> Run it against a production environment at your own risk.
+<blockquote class="admonition important">
+
+This utility is experimental.
+
+It is not subject to any compatibility guarantees and may cause interruptions
+for your users.
+To avoid potential outages and orphaned resources, we recommend that you run
+scale tests on a secondary "staging" environment or a dedicated
+[Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
+
+Run it against a production environment at your own risk.
+
+</blockquote>
 
 ### Create workspaces
 
 The following command will provision a number of Coder workspaces using the
-specified template and extra parameters.
+specified template and extra parameters:
 
 ```shell
 coder exp scaletest create-workspaces \
@@ -56,8 +65,6 @@ coder exp scaletest create-workspaces \
         --job-timeout 5h \
         --no-cleanup \
         --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"
-
-# Run `coder exp scaletest create-workspaces --help` for all usage
 ```
 
 The command does the following:
@@ -70,6 +77,12 @@ The command does the following:
 1. If you don't want the creation process to be interrupted by any errors, use
    the `--retry 5` flag.
 
+For more built-in `scaletest` options, use the `--help` flag:
+
+```shell
+coder exp scaletest create-workspaces --help
+```
+
 ### Traffic Generation
 
 Given an existing set of workspaces created previously with `create-workspaces`,
@@ -105,7 +118,11 @@ The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
 1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
    Terminal.
 1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
-   behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_).
+   behavior.
+
+   - `wsdi`: WebSocket discard
+   - `wsec`: WebSocket echo
+   - `wsra`: WebSocket read
 
 ### Cleanup
 
diff --git a/docs/manifest.json b/docs/manifest.json
@@ -243,6 +243,11 @@
 							"title": "Scaling Utilities",
 							"description": "Tools to help you scale your deployment",
 							"path": "./admin/infrastructure/scale-utility.md"
+						},
+						{
+							"title": "Scaling best practices",
+							"description": "How to prepare a Coder deployment for scale",
+							"path": "./tutorials/best-practices/scale-coder.md"
 						}
 					]
 				},
@@ -761,16 +766,21 @@
 					"description": "Guides to help you make the most of your Coder experience",
 					"path": "./tutorials/best-practices/index.md",
 					"children": [
-						{
-							"title": "Security - best practices",
-							"description": "Make your Coder deployment more secure",
-							"path": "./tutorials/best-practices/security-best-practices.md"
-						},
 						{
 							"title": "Organizations - best practices",
 							"description": "How to make the best use of Coder Organizations",
 							"path": "./tutorials/best-practices/organizations.md"
 						},
+						{
+							"title": "Scale Coder",
+							"description": "How to prepare a Coder deployment for scale",
+							"path": "./tutorials/best-practices/scale-coder.md"
+						},
+						{
+							"title": "Security - best practices",
+							"description": "Make your Coder deployment more secure",
+							"path": "./tutorials/best-practices/security-best-practices.md"
+						},
 						{
 							"title": "Speed up your workspaces",
 							"description": "Speed up your Coder templates and workspaces",
diff --git a/docs/tutorials/best-practices/scale-coder.md b/docs/tutorials/best-practices/scale-coder.md