From fc8839d63caf491abba1a7826c98d1b882078a6f Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 30 Nov 2022 21:37:53 +0000 Subject: [PATCH 01/21] docs: scaling Coder --- docs/admin/scale/docker.md | 0 docs/admin/scale/index.md | 37 ++++++++++++++++++++++++++++++++++++ docs/images/icons/growth.svg | 1 + docs/manifest.json | 6 ++++++ 4 files changed, 44 insertions(+) create mode 100644 docs/admin/scale/docker.md create mode 100644 docs/admin/scale/index.md create mode 100644 docs/images/icons/growth.svg diff --git a/docs/admin/scale/docker.md b/docs/admin/scale/docker.md new file mode 100644 index 0000000000000..e69de29bb2d1d diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md new file mode 100644 index 0000000000000..a9ebc63637bf8 --- /dev/null +++ b/docs/admin/scale/index.md @@ -0,0 +1,37 @@ +We regularly scale-test Coder against various reference architectures. Additionally, we provide a [scale testing utility](#scaletest-utility) which can be used in your own environment to give insight on how Coder scales with your deployment's specific templates, images, etc. + +## Reference Architectures + +| Environment | Users | Workspaces | Last tested | Status | +| ----------------------------------------- | ----- | ---------- | ------------ | -------- | +| [Google Kubernetes Engine (GKE)](#) | 100 | 200 | Nov 29, 2022 | Complete | +| [AWS Elastic Kubernetes Service (EKS)](#) | 100 | 200 | Nov 29, 2022 | Complete | +| [Google Compute Engine + Docker](#) | 1000 | 200 | Nov 29, 2022 | Complete | +| [Google Compute Engine + VMs](#) | 1000 | 200 | Nov 29, 2022 | Complete | + +## Scale testing utility + +Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. + +For example, this command will do the following: + +- create 100 workspaces +- establish a SSH connection to each workspace +- run `sleep 3 && echo hello` on each workspace via the web terminal +- close connections, attempt to delete all workspaces +- return results (e.g. `99 succeeded, 1 failed to connect` ) + +```sh +coder loadtest create-workspaces \ + --count 100 \ + --template "my-custom-template" \ + --parameter image="my-custom-image" \ + --run-command "sleep 3 && echo hello" \ + --connect-timeout "10s" + +# Run `coder scaletest --help` for all usage +``` + +> To avoid user outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. + +If a test fails, you can leverage Coder's [performance tracing](#) and [prometheus metrics](#) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. diff --git a/docs/images/icons/growth.svg b/docs/images/icons/growth.svg new file mode 100644 index 0000000000000..2092514651547 --- /dev/null +++ b/docs/images/icons/growth.svg @@ -0,0 +1 @@ + diff --git a/docs/manifest.json b/docs/manifest.json index bac69202bbf1a..edbc66edc970e 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -253,6 +253,12 @@ "icon_path": "./images/icons/plug.svg", "path": "./admin/automation.md" }, + { + "title": "Scaling Coder", + "description": "Reference architecture and load testing tools", + "icon_path": "./images/icons/growth.svg", + "path": "./admin/scale/index.md" + }, { "title": "Audit Logs", "description": "Learn how to use Audit Logs in your Coder deployment", From a587e453a3cd10ccd87947df3d09ba273d2e6bec Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 30 Nov 2022 21:45:27 +0000 Subject: [PATCH 02/21] change icon --- docs/images/icons/growth.svg | 1 - docs/images/icons/scale.svg | 1 + docs/manifest.json | 2 +- 3 files changed, 2 insertions(+), 2 deletions(-) delete mode 100644 docs/images/icons/growth.svg create mode 100644 docs/images/icons/scale.svg diff --git a/docs/images/icons/growth.svg b/docs/images/icons/growth.svg deleted file mode 100644 index 2092514651547..0000000000000 --- a/docs/images/icons/growth.svg +++ /dev/null @@ -1 +0,0 @@ - diff --git a/docs/images/icons/scale.svg b/docs/images/icons/scale.svg new file mode 100644 index 0000000000000..3807fa5707081 --- /dev/null +++ b/docs/images/icons/scale.svg @@ -0,0 +1 @@ + diff --git a/docs/manifest.json b/docs/manifest.json index edbc66edc970e..f80dc6273625b 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -256,7 +256,7 @@ { "title": "Scaling Coder", "description": "Reference architecture and load testing tools", - "icon_path": "./images/icons/growth.svg", + "icon_path": "./images/icons/scale.svg", "path": "./admin/scale/index.md" }, { From fdacfad7a0b4a0a8a47447f34f757bb16e90309c Mon Sep 17 00:00:00 2001 From: Ben Potter Date: Wed, 30 Nov 2022 13:46:28 -0800 Subject: [PATCH 03/21] Update docs/admin/scale/index.md Co-authored-by: Dean Sheather --- docs/admin/scale/index.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index a9ebc63637bf8..95da3b82f6aa2 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -22,12 +22,11 @@ For example, this command will do the following: - return results (e.g. `99 succeeded, 1 failed to connect` ) ```sh -coder loadtest create-workspaces \ +coder scaletest create-workspaces \ --count 100 \ --template "my-custom-template" \ --parameter image="my-custom-image" \ - --run-command "sleep 3 && echo hello" \ - --connect-timeout "10s" + --run-command "sleep 3 && echo hello" # Run `coder scaletest --help` for all usage ``` From 8cd6abb65a94b12485d9df2dbd325f29e937a2e2 Mon Sep 17 00:00:00 2001 From: Ben Potter Date: Wed, 30 Nov 2022 13:46:34 -0800 Subject: [PATCH 04/21] Update docs/admin/scale/index.md Co-authored-by: Dean Sheather --- docs/admin/scale/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 95da3b82f6aa2..a0ff835f29d59 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -28,7 +28,7 @@ coder scaletest create-workspaces \ --parameter image="my-custom-image" \ --run-command "sleep 3 && echo hello" -# Run `coder scaletest --help` for all usage +# Run `coder scaletest create-workspaces --help` for all usage ``` > To avoid user outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. From 7637f86150899c2bd8b69676359dbe92d5ec37ae Mon Sep 17 00:00:00 2001 From: Ben Potter Date: Wed, 30 Nov 2022 13:46:39 -0800 Subject: [PATCH 05/21] Update docs/admin/scale/index.md Co-authored-by: Dean Sheather --- docs/admin/scale/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index a0ff835f29d59..ac0d7b7f81d0c 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -19,7 +19,7 @@ For example, this command will do the following: - establish a SSH connection to each workspace - run `sleep 3 && echo hello` on each workspace via the web terminal - close connections, attempt to delete all workspaces -- return results (e.g. `99 succeeded, 1 failed to connect` ) +- return results (e.g. `99 succeeded, 1 failed to connect`) ```sh coder scaletest create-workspaces \ From c1de2b499131d33b7647b957dbeacde81d48ec41 Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 30 Nov 2022 21:47:38 +0000 Subject: [PATCH 06/21] add prom link --- docs/admin/scale/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index a9ebc63637bf8..389a24dc981d6 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -34,4 +34,4 @@ coder loadtest create-workspaces \ > To avoid user outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. -If a test fails, you can leverage Coder's [performance tracing](#) and [prometheus metrics](#) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. +If a test fails, you can leverage Coder's [performance tracing](#) and [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. From 1cf65aa28432eb1d8f8cb94f59c78d45f3189a5b Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 30 Nov 2022 23:05:13 +0000 Subject: [PATCH 07/21] add plumbing for gke doc --- docs/admin/scale/docker.md | 0 docs/admin/scale/gke.md | 50 ++++++++++++++++++++++++++++++++++++++ docs/admin/scale/index.md | 12 ++++----- docs/manifest.json | 9 ++++++- 4 files changed, 64 insertions(+), 7 deletions(-) delete mode 100644 docs/admin/scale/docker.md create mode 100644 docs/admin/scale/gke.md diff --git a/docs/admin/scale/docker.md b/docs/admin/scale/docker.md deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/docs/admin/scale/gke.md b/docs/admin/scale/gke.md new file mode 100644 index 0000000000000..3eb701a801cd4 --- /dev/null +++ b/docs/admin/scale/gke.md @@ -0,0 +1,50 @@ +# Scaling Coder on Google Kubernetes Engine (GKE) + +This is a reference architecture for Coder on [Google Kubernetes Engine](#). We regurily load test these environments with a standard [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template. + +> Performance and ideal node sizing depends on many factors, including workspace image and the [workspace sizes](https://github.com/coder/coder/issues/3519) you wish to give developers. Use Coder's [scale testing utility](./index.md#scale-testing-utility) to test your own deployment. + +## 50 users + +### Cluster configuration + +- **Autoscaling profile**: `optimize-utilization` + +- **Node pools** + - Default + - **Operating system**: `Ubuntu with containerd` + - **Instance type**: `e2-highcpu-8` + - **Min nodes**: `1` + - **Max nodes**: `4` + +### Coder settings + +- **Replica count**: `1` +- **Provisioner daemons**: `30` +- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) + +## 100 users + +For deployments with 100+ users, we recommend running the Coder server in a separate node pool via taints, tolerations, and nodeselectors. + +### Cluster configuration + +- **Node pools** + - Coder server + - **Instance type**: `e2-highcpu-4` + - **Operating system**: `Ubuntu with containerd` + - **Autoscaling profile**: `optimize-utilization` + - **Min nodes**: `2` + - **Max nodes**: `4` + - Workspaces + - **Instance type**: `e2-highcpu-16` + - **Node**: `Ubuntu with containerd` + - **Autoscaling profile**: `optimize-utilization` + - **Min nodes**: `3` + - **Max nodes**: `10` + +### Coder settings + +- **Replica count**: `4` +- **Provisioner daemons**: `25` +- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index f1dad945fbb8f..81cc1f68928b3 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -2,12 +2,12 @@ We regularly scale-test Coder against various reference architectures. Additiona ## Reference Architectures -| Environment | Users | Workspaces | Last tested | Status | -| ----------------------------------------- | ----- | ---------- | ------------ | -------- | -| [Google Kubernetes Engine (GKE)](#) | 100 | 200 | Nov 29, 2022 | Complete | -| [AWS Elastic Kubernetes Service (EKS)](#) | 100 | 200 | Nov 29, 2022 | Complete | -| [Google Compute Engine + Docker](#) | 1000 | 200 | Nov 29, 2022 | Complete | -| [Google Compute Engine + VMs](#) | 1000 | 200 | Nov 29, 2022 | Complete | +| Environment | Users | Last tested | Status | +| ------------------------------------------------- | ------------- | ------------ | -------- | +| [Google Kubernetes Engine (GKE)](./gke.md) | 50, 100, 1000 | Nov 29, 2022 | Complete | +| [AWS Elastic Kubernetes Service (EKS)](./eks.md) | 50, 100, 1000 | Nov 29, 2022 | Complete | +| [Google Compute Engine + Docker](./gce-docker.md) | 15, 50 | Nov 29, 2022 | Complete | +| [Google Compute Engine + VMs](./gce-vms.md) | 1000 | Nov 29, 2022 | Complete | ## Scale testing utility diff --git a/docs/manifest.json b/docs/manifest.json index f80dc6273625b..6937070c744b4 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -257,7 +257,14 @@ "title": "Scaling Coder", "description": "Reference architecture and load testing tools", "icon_path": "./images/icons/scale.svg", - "path": "./admin/scale/index.md" + "path": "./admin/scale/index.md", + "children": [ + { + "title": "GKE", + "description": "Learn how to scale Coder on GKE", + "path": "./admin/scale/gke.md" + } + ] }, { "title": "Audit Logs", From 933beac63b7b9ff85dd2ac2d5f49186f482ee3f1 Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 30 Nov 2022 23:07:40 +0000 Subject: [PATCH 08/21] add limits/requests --- docs/admin/scale/gke.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/admin/scale/gke.md b/docs/admin/scale/gke.md index 3eb701a801cd4..fe291e10a18bf 100644 --- a/docs/admin/scale/gke.md +++ b/docs/admin/scale/gke.md @@ -22,6 +22,12 @@ This is a reference architecture for Coder on [Google Kubernetes Engine](#). We - **Replica count**: `1` - **Provisioner daemons**: `30` - **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) +- **Coder server limits**: + - CPU: `2 cores` + - RAM: `4 GB` +- **Coder server requests**: + - CPU: `2 cores` + - RAM: `4 GB` ## 100 users @@ -48,3 +54,9 @@ For deployments with 100+ users, we recommend running the Coder server in a sepa - **Replica count**: `4` - **Provisioner daemons**: `25` - **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) +- **Coder server limits**: + - CPU: `4 cores` + - RAM: `8 GB` +- **Coder server requests**: + - CPU: `4 cores` + - RAM: `8 GB` From b31e8132f38e352360688b7ca9892223aa72e142 Mon Sep 17 00:00:00 2001 From: Ben Date: Thu, 1 Dec 2022 23:18:15 +0000 Subject: [PATCH 09/21] changes from feedback --- docs/admin/scale/index.md | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 81cc1f68928b3..1374f5315269c 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -13,24 +13,30 @@ We regularly scale-test Coder against various reference architectures. Additiona Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. -For example, this command will do the following: - -- create 100 workspaces -- establish a SSH connection to each workspace -- run `sleep 3 && echo hello` on each workspace via the web terminal -- close connections, attempt to delete all workspaces -- return results (e.g. `99 succeeded, 1 failed to connect`) +The following command will run the same scenario against your own Coder deployment. You can also specify a template name and any parameter values. ```sh coder scaletest create-workspaces \ --count 100 \ --template "my-custom-template" \ --parameter image="my-custom-image" \ - --run-command "sleep 3 && echo hello" + --run-command "sleep 2 && echo hello" # Run `coder scaletest create-workspaces --help` for all usage ``` -> To avoid user outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. +> To avoid outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. + +The test does the following: + +- create `n` workspaces +- establish SSH connection to each workspace +- run `sleep 3 && echo hello` on each workspace via the web terminal +- close connections, attempt to delete all workspaces +- return results (e.g. `99 succeeded, 1 failed to connect`) + +Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of first waiting for all 100 workspaces to create. + +## Troubleshooting -If a test fails, you can leverage Coder's [performance tracing](#) and [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. +If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [performance tracing](#) and [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. From b493aa97f9f7080bf8b04ed85be37c12a4a307ab Mon Sep 17 00:00:00 2001 From: Ben Date: Thu, 1 Dec 2022 23:21:14 +0000 Subject: [PATCH 10/21] change --- docs/admin/scale/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 1374f5315269c..d0dbf758bfd2e 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -35,7 +35,7 @@ The test does the following: - close connections, attempt to delete all workspaces - return results (e.g. `99 succeeded, 1 failed to connect`) -Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of first waiting for all 100 workspaces to create. +Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of waiting for all 100 workspaces to create. ## Troubleshooting From c5f5af4595dda61ddd2bc405ee425e698f6662c4 Mon Sep 17 00:00:00 2001 From: Ben Date: Tue, 20 Dec 2022 13:44:42 +0000 Subject: [PATCH 11/21] simplify --- docs/admin/scale/gke.md | 62 --------------------------------------- docs/admin/scale/index.md | 16 +++++----- docs/manifest.json | 15 ++-------- 3 files changed, 10 insertions(+), 83 deletions(-) delete mode 100644 docs/admin/scale/gke.md diff --git a/docs/admin/scale/gke.md b/docs/admin/scale/gke.md deleted file mode 100644 index fe291e10a18bf..0000000000000 --- a/docs/admin/scale/gke.md +++ /dev/null @@ -1,62 +0,0 @@ -# Scaling Coder on Google Kubernetes Engine (GKE) - -This is a reference architecture for Coder on [Google Kubernetes Engine](#). We regurily load test these environments with a standard [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template. - -> Performance and ideal node sizing depends on many factors, including workspace image and the [workspace sizes](https://github.com/coder/coder/issues/3519) you wish to give developers. Use Coder's [scale testing utility](./index.md#scale-testing-utility) to test your own deployment. - -## 50 users - -### Cluster configuration - -- **Autoscaling profile**: `optimize-utilization` - -- **Node pools** - - Default - - **Operating system**: `Ubuntu with containerd` - - **Instance type**: `e2-highcpu-8` - - **Min nodes**: `1` - - **Max nodes**: `4` - -### Coder settings - -- **Replica count**: `1` -- **Provisioner daemons**: `30` -- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) -- **Coder server limits**: - - CPU: `2 cores` - - RAM: `4 GB` -- **Coder server requests**: - - CPU: `2 cores` - - RAM: `4 GB` - -## 100 users - -For deployments with 100+ users, we recommend running the Coder server in a separate node pool via taints, tolerations, and nodeselectors. - -### Cluster configuration - -- **Node pools** - - Coder server - - **Instance type**: `e2-highcpu-4` - - **Operating system**: `Ubuntu with containerd` - - **Autoscaling profile**: `optimize-utilization` - - **Min nodes**: `2` - - **Max nodes**: `4` - - Workspaces - - **Instance type**: `e2-highcpu-16` - - **Node**: `Ubuntu with containerd` - - **Autoscaling profile**: `optimize-utilization` - - **Min nodes**: `3` - - **Max nodes**: `10` - -### Coder settings - -- **Replica count**: `4` -- **Provisioner daemons**: `25` -- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) -- **Coder server limits**: - - CPU: `4 cores` - - RAM: `8 GB` -- **Coder server requests**: - - CPU: `4 cores` - - RAM: `8 GB` diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index d0dbf758bfd2e..8deec508c7bb9 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -1,13 +1,13 @@ -We regularly scale-test Coder against various reference architectures. Additionally, we provide a [scale testing utility](#scaletest-utility) which can be used in your own environment to give insight on how Coder scales with your deployment's specific templates, images, etc. +We regularly scale-test Coder with our [scale testing utility](#scaletest-utility). The same utility can be used in your own environment for insights on how Coder scales with your deployment's specific templates, images, etc. -## Reference Architectures +## Recent scale tests -| Environment | Users | Last tested | Status | -| ------------------------------------------------- | ------------- | ------------ | -------- | -| [Google Kubernetes Engine (GKE)](./gke.md) | 50, 100, 1000 | Nov 29, 2022 | Complete | -| [AWS Elastic Kubernetes Service (EKS)](./eks.md) | 50, 100, 1000 | Nov 29, 2022 | Complete | -| [Google Compute Engine + Docker](./gce-docker.md) | 15, 50 | Nov 29, 2022 | Complete | -| [Google Compute Engine + VMs](./gce-vms.md) | 1000 | Nov 29, 2022 | Complete | +> This section is incomplete. Stay tuned for reference architectures for up to 3,000 users. + +| Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Last tested | +| ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------ | +| Kubernetes (GKE) | 1000 | 500 | 10,000 | 10,000 | Dec 20, 2022 | +| Docker (Single VM) | 1000 | 500 | 10,000 | 10,000 | Dec 20, 2022 | ## Scale testing utility diff --git a/docs/manifest.json b/docs/manifest.json index 6937070c744b4..caea081cec322 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -1,9 +1,5 @@ { - "versions": [ - "main", - "v0.8.1", - "v0.7.12" - ], + "versions": ["main", "v0.8.1", "v0.7.12"], "routes": [ { "title": "About", @@ -257,14 +253,7 @@ "title": "Scaling Coder", "description": "Reference architecture and load testing tools", "icon_path": "./images/icons/scale.svg", - "path": "./admin/scale/index.md", - "children": [ - { - "title": "GKE", - "description": "Learn how to scale Coder on GKE", - "path": "./admin/scale/gke.md" - } - ] + "path": "./admin/scale/index.md" }, { "title": "Audit Logs", From cdff43bffeaeaecb602ad9a2a9675b56edbebd04 Mon Sep 17 00:00:00 2001 From: Ben Date: Mon, 2 Jan 2023 15:21:23 +0000 Subject: [PATCH 12/21] changes from colin feedback --- docs/admin/scale/index.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 8deec508c7bb9..ffc61e7b771fe 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -13,19 +13,21 @@ We regularly scale-test Coder with our [scale testing utility](#scaletest-utilit Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. -The following command will run the same scenario against your own Coder deployment. You can also specify a template name and any parameter values. +The following command will run our scale test against your own Coder deployment. You can also specify a template name and any parameter values. ```sh coder scaletest create-workspaces \ - --count 100 \ - --template "my-custom-template" \ - --parameter image="my-custom-image" \ + --count 1000 \ + --template "kubernetes" \ + --concurrency 0 \ + --cleanup-concurrency 0 \ + --parameter "home_disk_size=10" \ --run-command "sleep 2 && echo hello" # Run `coder scaletest create-workspaces --help` for all usage ``` -> To avoid outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. +> To avoid potential outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. The test does the following: @@ -39,4 +41,4 @@ Workspace jobs run concurrently, meaning that the test will attempt to connect t ## Troubleshooting -If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [performance tracing](#) and [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. +If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. From cf15182824b48813f269ac237c5348edc7679bec Mon Sep 17 00:00:00 2001 From: Ben Date: Tue, 17 Jan 2023 19:23:29 +0000 Subject: [PATCH 13/21] more edits from testing --- docs/admin/scale/index.md | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index ffc61e7b771fe..28c240a438e2d 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -1,12 +1,36 @@ -We regularly scale-test Coder with our [scale testing utility](#scaletest-utility). The same utility can be used in your own environment for insights on how Coder scales with your deployment's specific templates, images, etc. +We regularly scale-test Coder with our [scale testing utility](#scaletest-utility). The same utility can be used in your own environment for insights on how Coder performs with your specific templates, images, etc. -## Recent scale tests +## General concepts + +- **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../../about/architecture.md) +- **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../../enterprise.md) +- **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users +- **concurrent connections**: Any connection to a workspace (e.g. SSH, web terminal, `coder_app`) +- **provisioner daemons**: Number of processes responsible for workspace builds, per coderd replica. + + ```text + 2 coderd replicas * 30 provisioner daemons = 30 max concurrent workspace builds + ``` + +- **scaletest**: Our scale-testing utility, built into the `coder` command line. + +## General recommendations + +### Concurrent workspace builds -> This section is incomplete. Stay tuned for reference architectures for up to 3,000 users. +Workspace builds are CPU-intensive, as it relies on Terraform and the various [Terraform providers](https://registry.terraform.io/browse/providers). When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 8 cores per 30 concurrent workspace builds. For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). + +To support 120 concurrent workspace builds, for example: + +- Create a cluster/nodepool with three 8-core nodes (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) +- Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) +- Ensure Coder's [PostgreSQL server](../../admin/configure.md#postgresql-database) can use up to 1.5 cores + +## Recent scale tests | Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Last tested | | ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------ | -| Kubernetes (GKE) | 1000 | 500 | 10,000 | 10,000 | Dec 20, 2022 | +| Kubernetes (GKE) | 3000 | 300 | 10,000 | 10,000 | Jan 10, 2022 | | Docker (Single VM) | 1000 | 500 | 10,000 | 10,000 | Dec 20, 2022 | ## Scale testing utility From 0188d5e5cf33edbad4f8461bb9be83cb1147dc7e Mon Sep 17 00:00:00 2001 From: Ben Date: Tue, 17 Jan 2023 19:40:30 +0000 Subject: [PATCH 14/21] more fixes from Colin feedback --- docs/admin/scale/index.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 28c240a438e2d..0a80db9fa7b72 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -2,19 +2,20 @@ We regularly scale-test Coder with our [scale testing utility](#scaletest-utilit ## General concepts +Coder runs workspace operations in a queue. The number of concurrent builds will be limited to the number of provisioner daemons across all coderd replicas. + +```text +2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds +``` + - **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../../about/architecture.md) - **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../../enterprise.md) - **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users - **concurrent connections**: Any connection to a workspace (e.g. SSH, web terminal, `coder_app`) -- **provisioner daemons**: Number of processes responsible for workspace builds, per coderd replica. - - ```text - 2 coderd replicas * 30 provisioner daemons = 30 max concurrent workspace builds - ``` - +- **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons - **scaletest**: Our scale-testing utility, built into the `coder` command line. -## General recommendations +## Infrastructure recommendations ### Concurrent workspace builds @@ -22,7 +23,7 @@ Workspace builds are CPU-intensive, as it relies on Terraform and the various [T To support 120 concurrent workspace builds, for example: -- Create a cluster/nodepool with three 8-core nodes (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) +- Create a cluster/nodepool with four 8-core nodes (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) - Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) - Ensure Coder's [PostgreSQL server](../../admin/configure.md#postgresql-database) can use up to 1.5 cores @@ -30,8 +31,8 @@ To support 120 concurrent workspace builds, for example: | Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Last tested | | ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------ | -| Kubernetes (GKE) | 3000 | 300 | 10,000 | 10,000 | Jan 10, 2022 | -| Docker (Single VM) | 1000 | 500 | 10,000 | 10,000 | Dec 20, 2022 | +| Kubernetes (GKE) | 1200 | 120 | 10,000 | 10,000 | Jan 10, 2022 | +| Docker (Single VM) | 500 | 50 | 10,000 | 10,000 | Dec 20, 2022 | ## Scale testing utility From 8a6d6726446582be929fc267b52e0545bc179c04 Mon Sep 17 00:00:00 2001 From: Ben Date: Tue, 17 Jan 2023 19:43:43 +0000 Subject: [PATCH 15/21] clarify providers have different resource requirments --- docs/admin/scale/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 0a80db9fa7b72..2893f39317a7b 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -19,7 +19,7 @@ Coder runs workspace operations in a queue. The number of concurrent builds will ### Concurrent workspace builds -Workspace builds are CPU-intensive, as it relies on Terraform and the various [Terraform providers](https://registry.terraform.io/browse/providers). When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 8 cores per 30 concurrent workspace builds. For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). +Workspace builds are CPU-intensive, as it relies on Terraform. Various [Terraform providers](https://registry.terraform.io/browse/providers) have different resource requirements. When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 8 cores per 30 concurrent workspace builds. For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). To support 120 concurrent workspace builds, for example: From 72ae52774ce3a1fc6a57cc5d50c272a5dc5addc8 Mon Sep 17 00:00:00 2001 From: Ben Date: Tue, 17 Jan 2023 21:32:52 +0000 Subject: [PATCH 16/21] kylecarbs feedback --- docs/admin/scale/index.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 2893f39317a7b..02948174c2f69 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -1,13 +1,9 @@ -We regularly scale-test Coder with our [scale testing utility](#scaletest-utility). The same utility can be used in your own environment for insights on how Coder performs with your specific templates, images, etc. +We scale-test Coder with the [same utility](#scaletest-utility) that can be used in your environment for insights into how Coder scales with your infrastructure. ## General concepts Coder runs workspace operations in a queue. The number of concurrent builds will be limited to the number of provisioner daemons across all coderd replicas. -```text -2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds -``` - - **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../../about/architecture.md) - **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../../enterprise.md) - **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users @@ -15,6 +11,10 @@ Coder runs workspace operations in a queue. The number of concurrent builds will - **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons - **scaletest**: Our scale-testing utility, built into the `coder` command line. +```text +2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds +``` + ## Infrastructure recommendations ### Concurrent workspace builds From 0106b3b584990a5dd593985bfe1cb862410f4635 Mon Sep 17 00:00:00 2001 From: Ben Date: Tue, 17 Jan 2023 21:36:16 +0000 Subject: [PATCH 17/21] format --- docs/admin/scale/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 02948174c2f69..9a78305762352 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -8,7 +8,7 @@ Coder runs workspace operations in a queue. The number of concurrent builds will - **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../../enterprise.md) - **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users - **concurrent connections**: Any connection to a workspace (e.g. SSH, web terminal, `coder_app`) -- **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons +- **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons - **scaletest**: Our scale-testing utility, built into the `coder` command line. ```text From e6077298bbf91c6e3c1a81740d58deaa576879ce Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 18 Jan 2023 23:30:50 +0000 Subject: [PATCH 18/21] explain concurrency --- docs/admin/scale/index.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md index 9a78305762352..fcef0e255a4c9 100644 --- a/docs/admin/scale/index.md +++ b/docs/admin/scale/index.md @@ -23,16 +23,16 @@ Workspace builds are CPU-intensive, as it relies on Terraform. Various [Terrafor To support 120 concurrent workspace builds, for example: -- Create a cluster/nodepool with four 8-core nodes (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) +- Create a cluster/nodepool with 4 nodes, 8-core each (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) - Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) - Ensure Coder's [PostgreSQL server](../../admin/configure.md#postgresql-database) can use up to 1.5 cores ## Recent scale tests -| Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Last tested | -| ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------ | -| Kubernetes (GKE) | 1200 | 120 | 10,000 | 10,000 | Jan 10, 2022 | -| Docker (Single VM) | 500 | 50 | 10,000 | 10,000 | Dec 20, 2022 | +| Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Coder Version | Last tested | +| ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------- | ------------ | +| Kubernetes (GKE) | 1200 | 120 | 10,000 | 10,000 | `v0.14.2` | Jan 10, 2022 | +| Docker (Single VM) | 500 | 50 | 10,000 | 10,000 | `v0.13.4` | Dec 20, 2022 | ## Scale testing utility @@ -56,13 +56,13 @@ coder scaletest create-workspaces \ The test does the following: -- create `n` workspaces -- establish SSH connection to each workspace -- run `sleep 3 && echo hello` on each workspace via the web terminal -- close connections, attempt to delete all workspaces -- return results (e.g. `99 succeeded, 1 failed to connect`) +1. create `1000` workspaces +1. establish SSH connection to each workspace +1. run `sleep 3 && echo hello` on each workspace via the web terminal +1. close connections, attempt to delete all workspaces +1. return results (e.g. `998 succeeded, 2 failed to connect`) -Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of waiting for all 100 workspaces to create. +Concurrency is configurable. `concurrency 0` means the scaletest test will attempt to create & connect to all workspaces immediately. ## Troubleshooting From 03653ad5e1e60aea972958ab4a9f14630499d63d Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 18 Jan 2023 23:33:58 +0000 Subject: [PATCH 19/21] move doc --- docs/admin/{scale/index.md => scale.md} | 0 docs/manifest.json | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) rename docs/admin/{scale/index.md => scale.md} (100%) diff --git a/docs/admin/scale/index.md b/docs/admin/scale.md similarity index 100% rename from docs/admin/scale/index.md rename to docs/admin/scale.md diff --git a/docs/manifest.json b/docs/manifest.json index 0c2b65efcc8d4..0a50bde3903d8 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -252,7 +252,7 @@ "title": "Scaling Coder", "description": "Reference architecture and load testing tools", "icon_path": "./images/icons/scale.svg", - "path": "./admin/scale/index.md" + "path": "./admin/scale.md" }, { "title": "Audit Logs", From b73dda086900352a730fa0e4d06a1c55d2b0cc1e Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 18 Jan 2023 23:36:16 +0000 Subject: [PATCH 20/21] consolidate table --- docs/admin/scale.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index fcef0e255a4c9..5cd7bdaef28d2 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -29,10 +29,10 @@ To support 120 concurrent workspace builds, for example: ## Recent scale tests -| Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Coder Version | Last tested | -| ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------- | ------------ | -| Kubernetes (GKE) | 1200 | 120 | 10,000 | 10,000 | `v0.14.2` | Jan 10, 2022 | -| Docker (Single VM) | 500 | 50 | 10,000 | 10,000 | `v0.13.4` | Dec 20, 2022 | +| Environment | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | +| ------------------ | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | +| Kubernetes (GKE) | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | +| Docker (Single VM) | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | ## Scale testing utility From 7d62ee6a11adbd77da7265c0691db9c7189fe640 Mon Sep 17 00:00:00 2001 From: Ben Date: Wed, 18 Jan 2023 23:54:08 +0000 Subject: [PATCH 21/21] fix broken links --- docs/admin/scale.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 5cd7bdaef28d2..1813c24e430f8 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -4,8 +4,8 @@ We scale-test Coder with the [same utility](#scaletest-utility) that can be used Coder runs workspace operations in a queue. The number of concurrent builds will be limited to the number of provisioner daemons across all coderd replicas. -- **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../../about/architecture.md) -- **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../../enterprise.md) +- **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../about/architecture.md) +- **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../enterprise.md) - **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users - **concurrent connections**: Any connection to a workspace (e.g. SSH, web terminal, `coder_app`) - **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons @@ -25,7 +25,7 @@ To support 120 concurrent workspace builds, for example: - Create a cluster/nodepool with 4 nodes, 8-core each (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) - Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) -- Ensure Coder's [PostgreSQL server](../../admin/configure.md#postgresql-database) can use up to 1.5 cores +- Ensure Coder's [PostgreSQL server](./configure.md#postgresql-database) can use up to 1.5 cores ## Recent scale tests @@ -66,4 +66,4 @@ Concurrency is configurable. `concurrency 0` means the scaletest test will attem ## Troubleshooting -If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. +If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [prometheus metrics](./prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc.