From 301cf48b5fdb0ef4e359b47075cca0b854850f9a Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Mon, 18 Mar 2024 13:46:36 +0100 Subject: [PATCH 01/21] deduplicate knowledge scale.md --- docs/admin/scale.md | 102 ++------------------------------------------ 1 file changed, 3 insertions(+), 99 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 58fcd93373dad..024983bb7a528 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -2,104 +2,8 @@ We scale-test Coder with [a built-in utility](#scale-testing-utility) that can be used in your environment for insights into how Coder scales with your infrastructure. -## General concepts - -Coder runs workspace operations in a queue. The number of concurrent builds will -be limited to the number of provisioner daemons across all coderd replicas. - -- **coderd**: Coder’s primary service. Learn more about - [Coder’s architecture](../about/architecture.md) -- **coderd replicas**: Replicas (often via Kubernetes) for high availability, - this is an [enterprise feature](../enterprise.md) -- **concurrent workspace builds**: Workspace operations (e.g. - create/stop/delete/apply) across all users -- **concurrent connections**: Any connection to a workspace (e.g. SSH, web - terminal, `coder_app`) -- **provisioner daemons**: Coder runs one workspace build per provisioner - daemon. One coderd replica can host many daemons -- **scaletest**: Our scale-testing utility, built into the `coder` command line. - -```text -2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds -``` - -## Infrastructure recommendations - -> Note: The below are guidelines for planning your infrastructure. Your mileage -> may vary depending on your templates, workflows, and users. - -When planning your infrastructure, we recommend you consider the following: - -1. CPU and memory requirements for `coderd`. We recommend allocating 1 CPU core - and 2 GB RAM per `coderd` replica at minimum. See - [Concurrent users](#concurrent-users) for more details. -1. CPU and memory requirements for - [external provisioners](../admin/provisioners.md#running-external-provisioners), - if required. We recommend allocating 1 CPU core and 1 GB RAM per 5 concurrent - workspace builds to external provisioners. Note that this may vary depending - on the template used. See - [Concurrent workspace builds](#concurrent-workspace-builds) for more details. - By default, `coderd` runs 3 integrated provisioners. -1. CPU and memory requirements for the database used by `coderd`. We recommend - allocating an additional 1 CPU core to the database used by Coder for every - 1000 active users. -1. CPU and memory requirements for workspaces created by Coder. This will vary - depending on users' needs. However, the Coder agent itself requires at - minimum 0.1 CPU cores and 256 MB to run inside a workspace. - -### Concurrent users - -We recommend allocating 2 CPU cores and 4 GB RAM per `coderd` replica per 1000 -active users. We also recommend allocating an additional 1 CPU core to the -database used by Coder for every 1000 active users. Inactive users do not -consume Coder resources, although workspaces configured to auto-start will -consume resources when they are built. - -Users' primary mode of accessing Coder will also affect resource requirements. -If users will be accessing workspaces primarily via Coder's HTTP interface, we -recommend doubling the number of cores and RAM allocated per user. For example, -if you expect 1000 users accessing workspaces via the web, we recommend -allocating 4 CPU cores and 8 GB RAM. - -Users accessing workspaces via SSH will consume fewer resources, as SSH -connections are not proxied through Coder. - -### Concurrent workspace builds - -Workspace builds are CPU-intensive, as it relies on Terraform. Various -[Terraform providers](https://registry.terraform.io/browse/providers) have -different resource requirements. When tested with our -[kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) -template, `coderd` will consume roughly 0.25 cores per concurrent workspace -build. For effective provisioning, our helm chart prefers to schedule -[one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/coder/values.yaml#L188-L202). - -We recommend: - -- Running `coderd` on a dedicated set of nodes. This will prevent other - workloads from interfering with workspace builds. You can use - [node selectors](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), - or - [taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) - to achieve this. -- Disabling autoscaling for `coderd` nodes. Autoscaling can cause interruptions - for users, see [Autoscaling](#autoscaling) for more details. -- (Enterprise-only) Running external provisioners instead of Coder's built-in - provisioners (`CODER_PROVISIONER_DAEMONS=0`) will separate the load caused by - workspace provisioning on the `coderd` nodes. For more details, see - [External provisioners](../admin/provisioners.md#running-external-provisioners). -- Alternatively, if increasing the number of integrated provisioner daemons in - `coderd` (`CODER_PROVISIONER_DAEMONS>3`), allocate additional resources to - `coderd` to compensate (approx. 0.25 cores and 256 MB per provisioner daemon). - -For example, to support 120 concurrent workspace builds: - -- Create a cluster/nodepool with 4 nodes, 8-core each (AWS: `t3.2xlarge` GCP: - `e2-highcpu-8`) -- Run coderd with 4 replicas, 30 provisioner daemons each. - (`CODER_PROVISIONER_DAEMONS=30`) -- Ensure Coder's [PostgreSQL server](./configure.md#postgresql-database) can use - up to 2 cores and 4 GB RAM +Learn more about [Coder’s architecture](../about/architecture.md) and our +[scale-testing methodology](architectures/index.md#scale-testing-methodology). ## Recent scale tests @@ -228,6 +132,6 @@ an annotation on the coderd deployment. ## Troubleshooting If a load test fails or if you are experiencing performance issues during -day-to-day use, you can leverage Coder's [prometheus metrics](./prometheus.md) +day-to-day use, you can leverage Coder's [Prometheus metrics](./prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. From fa60b7c6a46a12b80739e6c5668fe0ea99872361 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Tue, 19 Mar 2024 12:29:26 +0100 Subject: [PATCH 02/21] Upload coder templates --- examples/scaletests/kubernetes-large/main.tf | 82 ++ .../kubernetes-medium-greedy/main.tf | 196 ++++ examples/scaletests/kubernetes-medium/main.tf | 82 ++ .../scaletests/kubernetes-minimal/main.tf | 164 +++ examples/scaletests/kubernetes-small/main.tf | 82 ++ .../kubernetes-with-podmonitor/README.md | 98 ++ .../kubernetes-with-podmonitor/main.tf | 362 +++++++ .../scaletests/scaletest-runner/Dockerfile | 36 + .../scaletests/scaletest-runner/README.md | 9 + examples/scaletests/scaletest-runner/main.tf | 961 ++++++++++++++++++ .../scaletest-runner/metadata_phase.sh | 6 + .../metadata_previous_phase.sh | 6 + .../scaletest-runner/metadata_status.sh | 6 + .../scaletest-runner/scripts/cleanup.sh | 62 ++ .../scaletest-runner/scripts/lib.sh | 313 ++++++ .../scaletest-runner/scripts/prepare.sh | 67 ++ .../scaletest-runner/scripts/report.sh | 109 ++ .../scaletest-runner/scripts/run.sh | 369 +++++++ .../scaletests/scaletest-runner/shutdown.sh | 30 + .../scaletests/scaletest-runner/startup.sh | 181 ++++ 20 files changed, 3221 insertions(+) create mode 100644 examples/scaletests/kubernetes-large/main.tf create mode 100644 examples/scaletests/kubernetes-medium-greedy/main.tf create mode 100644 examples/scaletests/kubernetes-medium/main.tf create mode 100644 examples/scaletests/kubernetes-minimal/main.tf create mode 100644 examples/scaletests/kubernetes-small/main.tf create mode 100644 examples/scaletests/kubernetes-with-podmonitor/README.md create mode 100644 examples/scaletests/kubernetes-with-podmonitor/main.tf create mode 100644 examples/scaletests/scaletest-runner/Dockerfile create mode 100644 examples/scaletests/scaletest-runner/README.md create mode 100644 examples/scaletests/scaletest-runner/main.tf create mode 100755 examples/scaletests/scaletest-runner/metadata_phase.sh create mode 100755 examples/scaletests/scaletest-runner/metadata_previous_phase.sh create mode 100755 examples/scaletests/scaletest-runner/metadata_status.sh create mode 100755 examples/scaletests/scaletest-runner/scripts/cleanup.sh create mode 100644 examples/scaletests/scaletest-runner/scripts/lib.sh create mode 100755 examples/scaletests/scaletest-runner/scripts/prepare.sh create mode 100755 examples/scaletests/scaletest-runner/scripts/report.sh create mode 100755 examples/scaletests/scaletest-runner/scripts/run.sh create mode 100755 examples/scaletests/scaletest-runner/shutdown.sh create mode 100755 examples/scaletests/scaletest-runner/startup.sh diff --git a/examples/scaletests/kubernetes-large/main.tf b/examples/scaletests/kubernetes-large/main.tf new file mode 100644 index 0000000000000..98d5c552f9eaf --- /dev/null +++ b/examples/scaletests/kubernetes-large/main.tf @@ -0,0 +1,82 @@ + terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.7.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } + } + + provider "coder" {} + + provider "kubernetes" { + config_path = null # always use host + } + + data "coder_workspace" "me" {} + + resource "coder_agent" "main" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" + } + + resource "kubernetes_pod" "main" { + count = data.coder_workspace.me.start_count + metadata { + name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + namespace = "coder-big" + labels = { + "app.kubernetes.io/name" = "coder-workspace" + "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + } + } + spec { + security_context { + run_as_user = "1000" + fs_group = "1000" + } + container { + name = "dev" + image = "docker.io/codercom/enterprise-minimal:ubuntu" + image_pull_policy = "Always" + command = ["sh", "-c", coder_agent.main.init_script] + security_context { + run_as_user = "1000" + } + env { + name = "CODER_AGENT_TOKEN" + value = coder_agent.main.token + } + resources { + requests = { + "cpu" = "4" + "memory" = "4Gi" + } + limits = { + "cpu" = "4" + "memory" = "4Gi" + } + } + } + + affinity { + node_affinity { + required_during_scheduling_ignored_during_execution { + node_selector_term { + match_expressions { + key = "cloud.google.com/gke-nodepool" + operator = "In" + values = ["big-workspaces"] + } + } + } + } + } + } + } diff --git a/examples/scaletests/kubernetes-medium-greedy/main.tf b/examples/scaletests/kubernetes-medium-greedy/main.tf new file mode 100644 index 0000000000000..45f5b970d73c7 --- /dev/null +++ b/examples/scaletests/kubernetes-medium-greedy/main.tf @@ -0,0 +1,196 @@ +terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.7.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } +} + +provider "coder" {} + +provider "kubernetes" { + config_path = null # always use host +} + +data "coder_workspace" "me" {} + +resource "coder_agent" "main" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" + + # Greedy metadata (3072 bytes base64 encoded is 4097 bytes). + metadata { + display_name = "Meta 01" + key = "01_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 02" + key = "0_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 03" + key = "03_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 04" + key = "04_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 05" + key = "05_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 06" + key = "06_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 07" + key = "07_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 08" + key = "08_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 09" + key = "09_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 10" + key = "10_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 11" + key = "11_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 12" + key = "12_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 13" + key = "13_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 14" + key = "14_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 15" + key = "15_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } + metadata { + display_name = "Meta 16" + key = "16_meta" + script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" + interval = 1 + timeout = 10 + } +} + +resource "kubernetes_pod" "main" { + count = data.coder_workspace.me.start_count + metadata { + name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + namespace = "coder-big" + labels = { + "app.kubernetes.io/name" = "coder-workspace" + "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + } + } + spec { + security_context { + run_as_user = "1000" + fs_group = "1000" + } + container { + name = "dev" + image = "docker.io/codercom/enterprise-minimal:ubuntu" + image_pull_policy = "Always" + command = ["sh", "-c", coder_agent.main.init_script] + security_context { + run_as_user = "1000" + } + env { + name = "CODER_AGENT_TOKEN" + value = coder_agent.main.token + } + resources { + requests = { + "cpu" = "2" + "memory" = "2Gi" + } + limits = { + "cpu" = "2" + "memory" = "2Gi" + } + } + } + + affinity { + node_affinity { + required_during_scheduling_ignored_during_execution { + node_selector_term { + match_expressions { + key = "cloud.google.com/gke-nodepool" + operator = "In" + values = ["big-workspaces"] + } + } + } + } + } + } +} diff --git a/examples/scaletests/kubernetes-medium/main.tf b/examples/scaletests/kubernetes-medium/main.tf new file mode 100644 index 0000000000000..b8ce10b4bdb8a --- /dev/null +++ b/examples/scaletests/kubernetes-medium/main.tf @@ -0,0 +1,82 @@ + terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.7.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } + } + + provider "coder" {} + + provider "kubernetes" { + config_path = null # always use host + } + + data "coder_workspace" "me" {} + + resource "coder_agent" "main" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" + } + + resource "kubernetes_pod" "main" { + count = data.coder_workspace.me.start_count + metadata { + name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + namespace = "coder-big" + labels = { + "app.kubernetes.io/name" = "coder-workspace" + "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + } + } + spec { + security_context { + run_as_user = "1000" + fs_group = "1000" + } + container { + name = "dev" + image = "docker.io/codercom/enterprise-minimal:ubuntu" + image_pull_policy = "Always" + command = ["sh", "-c", coder_agent.main.init_script] + security_context { + run_as_user = "1000" + } + env { + name = "CODER_AGENT_TOKEN" + value = coder_agent.main.token + } + resources { + requests = { + "cpu" = "2" + "memory" = "2Gi" + } + limits = { + "cpu" = "2" + "memory" = "2Gi" + } + } + } + + affinity { + node_affinity { + required_during_scheduling_ignored_during_execution { + node_selector_term { + match_expressions { + key = "cloud.google.com/gke-nodepool" + operator = "In" + values = ["big-workspaces"] + } + } + } + } + } + } + } diff --git a/examples/scaletests/kubernetes-minimal/main.tf b/examples/scaletests/kubernetes-minimal/main.tf new file mode 100644 index 0000000000000..6d04fb68a33ed --- /dev/null +++ b/examples/scaletests/kubernetes-minimal/main.tf @@ -0,0 +1,164 @@ +terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.12.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } +} + +provider "coder" {} + +provider "kubernetes" { + config_path = null # always use host +} + +data "coder_workspace" "me" {} + +resource "coder_agent" "m" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" + metadata { + display_name = "CPU Usage" + key = "0_cpu_usage" + script = "coder stat cpu" + interval = 10 + timeout = 1 + } + + metadata { + display_name = "RAM Usage" + key = "1_ram_usage" + script = "coder stat mem" + interval = 10 + timeout = 1 + } +} + +resource "coder_script" "websocat" { + agent_id = coder_agent.m.id + display_name = "websocat" + script = </tmp/code-server.log 2>&1 & + EOT + + # The following metadata blocks are optional. They are used to display + # information about your workspace in the dashboard. You can remove them + # if you don't want to display any information. + # For basic resources, you can use the `coder stat` command. + # If you need more control, you can write your own script. + metadata { + display_name = "CPU Usage" + key = "0_cpu_usage" + script = "coder stat cpu" + interval = 10 + timeout = 1 + } + + metadata { + display_name = "RAM Usage" + key = "1_ram_usage" + script = "coder stat mem" + interval = 10 + timeout = 1 + } + + metadata { + display_name = "Home Disk" + key = "3_home_disk" + script = "coder stat disk --path $${HOME}" + interval = 60 + timeout = 1 + } + + metadata { + display_name = "CPU Usage (Host)" + key = "4_cpu_usage_host" + script = "coder stat cpu --host" + interval = 10 + timeout = 1 + } + + metadata { + display_name = "Memory Usage (Host)" + key = "5_mem_usage_host" + script = "coder stat mem --host" + interval = 10 + timeout = 1 + } + + metadata { + display_name = "Load Average (Host)" + key = "6_load_host" + # get load avg scaled by number of cores + script = </dev/null || return + +get_previous_phase diff --git a/examples/scaletests/scaletest-runner/metadata_status.sh b/examples/scaletests/scaletest-runner/metadata_status.sh new file mode 100755 index 0000000000000..8ec45f0875c1d --- /dev/null +++ b/examples/scaletests/scaletest-runner/metadata_status.sh @@ -0,0 +1,6 @@ +#!/bin/bash + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" 2>/dev/null || return + +get_status diff --git a/examples/scaletests/scaletest-runner/scripts/cleanup.sh b/examples/scaletests/scaletest-runner/scripts/cleanup.sh new file mode 100755 index 0000000000000..c80982497b5e9 --- /dev/null +++ b/examples/scaletests/scaletest-runner/scripts/cleanup.sh @@ -0,0 +1,62 @@ +#!/bin/bash +set -euo pipefail + +[[ $VERBOSE == 1 ]] && set -x + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" + +event=${1:-} + +if [[ -z $event ]]; then + event=manual +fi + +do_cleanup() { + start_phase "Cleanup (${event})" + coder exp scaletest cleanup \ + --cleanup-job-timeout 2h \ + --cleanup-timeout 5h | + tee "${SCALETEST_RESULTS_DIR}/cleanup-${event}.txt" + end_phase +} + +do_scaledown() { + start_phase "Scale down provisioners (${event})" + maybedryrun "$DRY_RUN" kubectl scale deployment/coder-provisioner --replicas 1 + maybedryrun "$DRY_RUN" kubectl rollout status deployment/coder-provisioner + end_phase +} + +case "${event}" in +manual) + echo -n 'WARNING: This will clean up all scaletest resources, continue? (y/n) ' + read -r -n 1 + if [[ $REPLY != [yY] ]]; then + echo $'\nAborting...' + exit 1 + fi + echo + + do_cleanup + do_scaledown + + echo 'Press any key to continue...' + read -s -r -n 1 + ;; +prepare) + do_cleanup + ;; +on_stop) ;; # Do nothing, handled by "shutdown". +always | on_success | on_error | shutdown) + do_cleanup + do_scaledown + ;; +shutdown_scale_down_only) + do_scaledown + ;; +*) + echo "Unknown event: ${event}" >&2 + exit 1 + ;; +esac diff --git a/examples/scaletests/scaletest-runner/scripts/lib.sh b/examples/scaletests/scaletest-runner/scripts/lib.sh new file mode 100644 index 0000000000000..868dd5c078d2e --- /dev/null +++ b/examples/scaletests/scaletest-runner/scripts/lib.sh @@ -0,0 +1,313 @@ +#!/bin/bash +set -euo pipefail + +# Only source this script once, this env comes from sourcing +# scripts/lib.sh from coder/coder below. +if [[ ${SCRIPTS_LIB_IS_SOURCED:-0} == 1 ]]; then + return 0 +fi + +# Source scripts/lib.sh from coder/coder for common functions. +# shellcheck source=scripts/lib.sh +. "${HOME}/coder/scripts/lib.sh" + +# Make shellcheck happy. +DRY_RUN=${DRY_RUN:-0} + +# Environment variables shared between scripts. +SCALETEST_STATE_DIR="${SCALETEST_RUN_DIR}/state" +SCALETEST_PHASE_FILE="${SCALETEST_STATE_DIR}/phase" +# shellcheck disable=SC2034 +SCALETEST_RESULTS_DIR="${SCALETEST_RUN_DIR}/results" +SCALETEST_LOGS_DIR="${SCALETEST_RUN_DIR}/logs" +SCALETEST_PPROF_DIR="${SCALETEST_RUN_DIR}/pprof" +# https://github.com/kubernetes/kubernetes/issues/72501 :-( +SCALETEST_CODER_BINARY="/tmp/coder-full-${SCALETEST_RUN_ID}" + +mkdir -p "${SCALETEST_STATE_DIR}" "${SCALETEST_RESULTS_DIR}" "${SCALETEST_LOGS_DIR}" "${SCALETEST_PPROF_DIR}" + +coder() { + if [[ ! -x "${SCALETEST_CODER_BINARY}" ]]; then + log "Fetching full coder binary..." + fetch_coder_full + fi + maybedryrun "${DRY_RUN}" "${SCALETEST_CODER_BINARY}" "${@}" +} + +show_json() { + maybedryrun "${DRY_RUN}" jq 'del(.. | .logs?)' "${1}" +} + +set_status() { + dry_run= + if [[ ${DRY_RUN} == 1 ]]; then + dry_run=" (dry-run)" + fi + prev_status=$(get_status) + if [[ ${prev_status} != *"Not started"* ]]; then + annotate_grafana_end "status" "Status: ${prev_status}" + fi + echo "$(date -Ins) ${*}${dry_run}" >>"${SCALETEST_STATE_DIR}/status" + + annotate_grafana "status" "Status: ${*}" + + status_lower=$(tr '[:upper:]' '[:lower:]' <<<"${*}") + set_pod_status_annotation "${status_lower}" +} +lock_status() { + chmod 0440 "${SCALETEST_STATE_DIR}/status" +} +get_status() { + # Order of importance (reverse of creation). + if [[ -f "${SCALETEST_STATE_DIR}/status" ]]; then + tail -n1 "${SCALETEST_STATE_DIR}/status" | cut -d' ' -f2- + else + echo "Not started" + fi +} + +phase_num=0 +start_phase() { + # This may be incremented from another script, so we read it every time. + if [[ -f "${SCALETEST_PHASE_FILE}" ]]; then + phase_num=$(grep -c START: "${SCALETEST_PHASE_FILE}") + fi + phase_num=$((phase_num + 1)) + log "Start phase ${phase_num}: ${*}" + echo "$(date -Ins) START:${phase_num}: ${*}" >>"${SCALETEST_PHASE_FILE}" + + GRAFANA_EXTRA_TAGS="${PHASE_TYPE:-phase-default}" annotate_grafana "phase" "Phase ${phase_num}: ${*}" +} +end_phase() { + phase=$(tail -n 1 "${SCALETEST_PHASE_FILE}" | grep "START:${phase_num}:" | cut -d' ' -f3-) + if [[ -z ${phase} ]]; then + log "BUG: Could not find start phase ${phase_num} in ${SCALETEST_PHASE_FILE}" + return 1 + fi + log "End phase ${phase_num}: ${phase}" + echo "$(date -Ins) END:${phase_num}: ${phase}" >>"${SCALETEST_PHASE_FILE}" + + GRAFANA_EXTRA_TAGS="${PHASE_TYPE:-phase-default}" GRAFANA_ADD_TAGS="${PHASE_ADD_TAGS:-}" annotate_grafana_end "phase" "Phase ${phase_num}: ${phase}" +} +get_phase() { + if [[ -f "${SCALETEST_PHASE_FILE}" ]]; then + phase_raw=$(tail -n1 "${SCALETEST_PHASE_FILE}") + phase=$(echo "${phase_raw}" | cut -d' ' -f3-) + if [[ ${phase_raw} == *"END:"* ]]; then + phase+=" [done]" + fi + echo "${phase}" + else + echo "None" + fi +} +get_previous_phase() { + if [[ -f "${SCALETEST_PHASE_FILE}" ]] && [[ $(grep -c START: "${SCALETEST_PHASE_FILE}") -gt 1 ]]; then + grep START: "${SCALETEST_PHASE_FILE}" | tail -n2 | head -n1 | cut -d' ' -f3- + else + echo "None" + fi +} + +annotate_grafana() { + local tags=${1} text=${2} start=${3:-$(($(date +%s) * 1000))} + local json resp id + + if [[ -z $tags ]]; then + tags="scaletest,runner" + else + tags="scaletest,runner,${tags}" + fi + if [[ -n ${GRAFANA_EXTRA_TAGS:-} ]]; then + tags="${tags},${GRAFANA_EXTRA_TAGS}" + fi + + log "Annotating Grafana (start=${start}): ${text} [${tags}]" + + json="$( + jq \ + --argjson time "${start}" \ + --arg text "${text}" \ + --arg tags "${tags}" \ + '{time: $time, tags: $tags | split(","), text: $text}' <<<'{}' + )" + if [[ ${DRY_RUN} == 1 ]]; then + echo "FAKEID:${tags}:${text}:${start}" >>"${SCALETEST_STATE_DIR}/grafana-annotations" + log "Would have annotated Grafana, data=${json}" + return 0 + fi + if ! resp="$( + curl -sSL \ + --insecure \ + -H "Authorization: Bearer ${GRAFANA_API_TOKEN}" \ + -H "Content-Type: application/json" \ + -d "${json}" \ + "${GRAFANA_URL}/api/annotations" + )"; then + # Don't abort scaletest just because we couldn't annotate Grafana. + log "Failed to annotate Grafana: ${resp}" + return 0 + fi + + if [[ $(jq -r '.message' <<<"${resp}") != "Annotation added" ]]; then + log "Failed to annotate Grafana: ${resp}" + return 0 + fi + + log "Grafana annotation added!" + + id="$(jq -r '.id' <<<"${resp}")" + echo "${id}:${tags}:${text}:${start}" >>"${SCALETEST_STATE_DIR}/grafana-annotations" +} +annotate_grafana_end() { + local tags=${1} text=${2} start=${3:-} end=${4:-$(($(date +%s) * 1000))} + local id json resp + + if [[ -z $tags ]]; then + tags="scaletest,runner" + else + tags="scaletest,runner,${tags}" + fi + if [[ -n ${GRAFANA_EXTRA_TAGS:-} ]]; then + tags="${tags},${GRAFANA_EXTRA_TAGS}" + fi + + if ! id=$(grep ":${tags}:${text}:${start}" "${SCALETEST_STATE_DIR}/grafana-annotations" | sort -n | tail -n1 | cut -d: -f1); then + log "NOTICE: Could not find Grafana annotation to end: '${tags}:${text}:${start}', skipping..." + return 0 + fi + + log "Updating Grafana annotation (end=${end}): ${text} [${tags}, add=${GRAFANA_ADD_TAGS:-}]" + + if [[ -n ${GRAFANA_ADD_TAGS:-} ]]; then + json="$( + jq -n \ + --argjson timeEnd "${end}" \ + --arg tags "${tags},${GRAFANA_ADD_TAGS}" \ + '{timeEnd: $timeEnd, tags: $tags | split(",")}' + )" + else + json="$( + jq -n \ + --argjson timeEnd "${end}" \ + '{timeEnd: $timeEnd}' + )" + fi + if [[ ${DRY_RUN} == 1 ]]; then + log "Would have patched Grafana annotation: id=${id}, data=${json}" + return 0 + fi + if ! resp="$( + curl -sSL \ + --insecure \ + -H "Authorization: Bearer ${GRAFANA_API_TOKEN}" \ + -H "Content-Type: application/json" \ + -X PATCH \ + -d "${json}" \ + "${GRAFANA_URL}/api/annotations/${id}" + )"; then + # Don't abort scaletest just because we couldn't annotate Grafana. + log "Failed to annotate Grafana end: ${resp}" + return 0 + fi + + if [[ $(jq -r '.message' <<<"${resp}") != "Annotation patched" ]]; then + log "Failed to annotate Grafana end: ${resp}" + return 0 + fi + + log "Grafana annotation patched!" +} + +wait_baseline() { + s=${1:-2} + PHASE_TYPE="phase-wait" start_phase "Waiting ${s}m to establish baseline" + maybedryrun "$DRY_RUN" sleep $((s * 60)) + PHASE_TYPE="phase-wait" end_phase +} + +get_appearance() { + session_token=$CODER_USER_TOKEN + if [[ -f "${CODER_CONFIG_DIR}/session" ]]; then + session_token="$(<"${CODER_CONFIG_DIR}/session")" + fi + curl -sSL \ + -H "Coder-Session-Token: ${session_token}" \ + "${CODER_URL}/api/v2/appearance" +} +set_appearance() { + local json=$1 color=$2 message=$3 + + session_token=$CODER_USER_TOKEN + if [[ -f "${CODER_CONFIG_DIR}/session" ]]; then + session_token="$(<"${CODER_CONFIG_DIR}/session")" + fi + newjson="$( + jq \ + --arg color "${color}" \ + --arg message "${message}" \ + '. | .service_banner.message |= $message | .service_banner.background_color |= $color' <<<"${json}" + )" + maybedryrun "${DRY_RUN}" curl -sSL \ + -X PUT \ + -H 'Content-Type: application/json' \ + -H "Coder-Session-Token: ${session_token}" \ + --data "${newjson}" \ + "${CODER_URL}/api/v2/appearance" +} + +namespace() { + cat /var/run/secrets/kubernetes.io/serviceaccount/namespace +} +coder_pods() { + kubectl get pods \ + --namespace "$(namespace)" \ + --selector "app.kubernetes.io/name=coder,app.kubernetes.io/part-of=coder" \ + --output jsonpath='{.items[*].metadata.name}' +} + +# fetch_coder_full fetches the full (non-slim) coder binary from one of the coder pods +# running in the same namespace as the current pod. +fetch_coder_full() { + if [[ -x "${SCALETEST_CODER_BINARY}" ]]; then + log "Full Coder binary already exists at ${SCALETEST_CODER_BINARY}" + return 0 + fi + ns=$(namespace) + if [[ -z "${ns}" ]]; then + log "Could not determine namespace!" + return 1 + fi + log "Namespace from serviceaccount token is ${ns}" + pods=$(coder_pods) + if [[ -z ${pods} ]]; then + log "Could not find coder pods!" + return 1 + fi + pod=$(cut -d ' ' -f 1 <<<"${pods}") + if [[ -z ${pod} ]]; then + log "Could not find coder pod!" + return 1 + fi + log "Fetching full Coder binary from ${pod}" + # We need --retries due to https://github.com/kubernetes/kubernetes/issues/60140 :( + maybedryrun "${DRY_RUN}" kubectl \ + --namespace "${ns}" \ + cp \ + --container coder \ + --retries 10 \ + "${pod}:/opt/coder" "${SCALETEST_CODER_BINARY}" + maybedryrun "${DRY_RUN}" chmod +x "${SCALETEST_CODER_BINARY}" + log "Full Coder binary downloaded to ${SCALETEST_CODER_BINARY}" +} + +# set_pod_status_annotation annotates the currently running pod with the key +# com.coder.scaletest.status. It will overwrite the previous status. +set_pod_status_annotation() { + if [[ $# -ne 1 ]]; then + log "BUG: Must specify an annotation value" + return 1 + else + maybedryrun "${DRY_RUN}" kubectl --namespace "$(namespace)" annotate pod "$(hostname)" "com.coder.scaletest.status=$1" --overwrite + fi +} diff --git a/examples/scaletests/scaletest-runner/scripts/prepare.sh b/examples/scaletests/scaletest-runner/scripts/prepare.sh new file mode 100755 index 0000000000000..90b2dd05f945f --- /dev/null +++ b/examples/scaletests/scaletest-runner/scripts/prepare.sh @@ -0,0 +1,67 @@ +#!/bin/bash +set -euo pipefail + +[[ $VERBOSE == 1 ]] && set -x + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" + +mkdir -p "${SCALETEST_STATE_DIR}" +mkdir -p "${SCALETEST_RESULTS_DIR}" + +log "Preparing scaletest workspace environment..." +set_status Preparing + +log "Compressing previous run logs (if applicable)..." +mkdir -p "${HOME}/archive" +for dir in "${HOME}/scaletest-"*; do + if [[ ${dir} = "${SCALETEST_RUN_DIR}" ]]; then + continue + fi + if [[ -d ${dir} ]]; then + name="$(basename "${dir}")" + ( + cd "$(dirname "${dir}")" + ZSTD_CLEVEL=12 maybedryrun "$DRY_RUN" tar --zstd -cf "${HOME}/archive/${name}.tar.zst" "${name}" + ) + maybedryrun "$DRY_RUN" rm -rf "${dir}" + fi +done + +log "Creating coder CLI token (needed for cleanup during shutdown)..." + +mkdir -p "${CODER_CONFIG_DIR}" +echo -n "${CODER_URL}" >"${CODER_CONFIG_DIR}/url" + +set +x # Avoid logging the token. +# Persist configuration for shutdown script too since the +# owner token is invalidated immediately on workspace stop. +export CODER_SESSION_TOKEN=${CODER_USER_TOKEN} +coder tokens delete scaletest_runner >/dev/null 2>&1 || true +# TODO(mafredri): Set TTL? This could interfere with delayed stop though. +token=$(coder tokens create --name scaletest_runner) +if [[ $DRY_RUN == 1 ]]; then + token=${CODER_SESSION_TOKEN} +fi +unset CODER_SESSION_TOKEN +echo -n "${token}" >"${CODER_CONFIG_DIR}/session" +[[ $VERBOSE == 1 ]] && set -x # Restore logging (if enabled). + +if [[ ${SCALETEST_PARAM_CLEANUP_PREPARE} == 1 ]]; then + log "Cleaning up from previous runs (if applicable)..." + "${SCRIPTS_DIR}/cleanup.sh" prepare +fi + +log "Preparation complete!" + +PROVISIONER_REPLICA_COUNT="${SCALETEST_PARAM_CREATE_CONCURRENCY:-0}" +if [[ "${PROVISIONER_REPLICA_COUNT}" -eq 0 ]]; then + # TODO(Cian): what is a good default value here? + echo "Setting PROVISIONER_REPLICA_COUNT to 10 since SCALETEST_PARAM_CREATE_CONCURRENCY is 0" + PROVISIONER_REPLICA_COUNT=10 +fi +log "Scaling up provisioners to ${PROVISIONER_REPLICA_COUNT}..." +maybedryrun "$DRY_RUN" kubectl scale deployment/coder-provisioner \ + --replicas "${PROVISIONER_REPLICA_COUNT}" +log "Waiting for provisioners to scale up..." +maybedryrun "$DRY_RUN" kubectl rollout status deployment/coder-provisioner diff --git a/examples/scaletests/scaletest-runner/scripts/report.sh b/examples/scaletests/scaletest-runner/scripts/report.sh new file mode 100755 index 0000000000000..0c6a5059ba37d --- /dev/null +++ b/examples/scaletests/scaletest-runner/scripts/report.sh @@ -0,0 +1,109 @@ +#!/bin/bash +set -euo pipefail + +[[ $VERBOSE == 1 ]] && set -x + +status=$1 +shift + +case "${status}" in +started) ;; +completed) ;; +failed) ;; +*) + echo "Unknown status: ${status}" >&2 + exit 1 + ;; +esac + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" + +# NOTE(mafredri): API returns HTML if we accidentally use `...//api` vs `.../api`. +# https://github.com/coder/coder/issues/9877 +CODER_URL="${CODER_URL%/}" +buildinfo="$(curl -sSL "${CODER_URL}/api/v2/buildinfo")" +server_version="$(jq -r '.version' <<<"${buildinfo}")" +server_version_commit="$(jq -r '.external_url' <<<"${buildinfo}")" + +# Since `coder show` doesn't support JSON output, we list the workspaces instead. +# Use `command` here to bypass dry run. +workspace_json="$( + command coder list --all --output json | + jq --arg workspace "${CODER_WORKSPACE}" --arg user "${CODER_USER}" 'map(select(.name == $workspace) | select(.owner_name == $user)) | .[0]' +)" +owner_name="$(jq -r '.latest_build.workspace_owner_name' <<<"${workspace_json}")" +workspace_name="$(jq -r '.latest_build.workspace_name' <<<"${workspace_json}")" +initiator_name="$(jq -r '.latest_build.initiator_name' <<<"${workspace_json}")" + +bullet='•' +app_urls_raw="$(jq -r '.latest_build.resources[].agents[]?.apps | map(select(.external == true)) | .[] | .display_name, .url' <<<"${workspace_json}")" +app_urls=() +while read -r app_name; do + read -r app_url + bold= + if [[ ${status} != started ]] && [[ ${app_url} = *to=now* ]]; then + # Update Grafana URL with end stamp and make bold. + app_url="${app_url//to=now/to=$(($(date +%s) * 1000))}" + bold='*' + fi + app_urls+=("${bullet} ${bold}${app_name}${bold}: ${app_url}") +done <<<"${app_urls_raw}" + +params=() +header= + +case "${status}" in +started) + created_at="$(jq -r '.latest_build.created_at' <<<"${workspace_json}")" + params=("${bullet} Options:") + while read -r param; do + params+=(" ${bullet} ${param}") + done <<<"$(jq -r '.latest_build.resources[].agents[]?.environment_variables | to_entries | map(select(.key | startswith("SCALETEST_PARAM_"))) | .[] | "`\(.key)`: `\(.value)`"' <<<"${workspace_json}")" + + header="New scaletest started at \`${created_at}\` by \`${initiator_name}\` on ${CODER_URL} (<${server_version_commit}|\`${server_version}\`>)." + ;; +completed) + completed_at=$(date -Iseconds) + header="Scaletest completed at \`${completed_at}\` (started by \`${initiator_name}\`) on ${CODER_URL} (<${server_version_commit}|\`${server_version}\`>)." + ;; +failed) + failed_at=$(date -Iseconds) + header="Scaletest failed at \`${failed_at}\` (started by \`${initiator_name}\`) on ${CODER_URL} (<${server_version_commit}|\`${server_version}\`>)." + ;; +*) + echo "Unknown status: ${status}" >&2 + exit 1 + ;; +esac + +text_arr=( + "${header}" + "" + "${bullet} *Comment:* ${SCALETEST_COMMENT}" + "${bullet} Workspace (runner): ${CODER_URL}/@${owner_name}/${workspace_name}" + "${bullet} Run ID: ${SCALETEST_RUN_ID}" + "${app_urls[@]}" + "${params[@]}" +) + +text= +for field in "${text_arr[@]}"; do + text+="${field}"$'\n' +done + +json=$( + jq -n --arg text "${text}" '{ + blocks: [ + { + "type": "section", + "text": { + "type": "mrkdwn", + "text": $text + } + } + ] + }' +) + +maybedryrun "${DRY_RUN}" curl -X POST -H 'Content-type: application/json' --data "${json}" "${SLACK_WEBHOOK_URL}" diff --git a/examples/scaletests/scaletest-runner/scripts/run.sh b/examples/scaletests/scaletest-runner/scripts/run.sh new file mode 100755 index 0000000000000..47a6042a18598 --- /dev/null +++ b/examples/scaletests/scaletest-runner/scripts/run.sh @@ -0,0 +1,369 @@ +#!/bin/bash +set -euo pipefail + +[[ $VERBOSE == 1 ]] && set -x + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" + +mapfile -t scaletest_load_scenarios < <(jq -r '. | join ("\n")' <<<"${SCALETEST_PARAM_LOAD_SCENARIOS}") +export SCALETEST_PARAM_LOAD_SCENARIOS=("${scaletest_load_scenarios[@]}") + +log "Running scaletest..." +set_status Running + +start_phase "Creating workspaces" +if [[ ${SCALETEST_PARAM_SKIP_CREATE_WORKSPACES} == 0 ]]; then + # Note that we allow up to 5 failures to bring up the workspace, since + # we're creating a lot of workspaces at once and some of them may fail + # due to network issues or other transient errors. + coder exp scaletest create-workspaces \ + --retry 5 \ + --count "${SCALETEST_PARAM_NUM_WORKSPACES}" \ + --template "${SCALETEST_PARAM_TEMPLATE}" \ + --concurrency "${SCALETEST_PARAM_CREATE_CONCURRENCY}" \ + --timeout 5h \ + --job-timeout 5h \ + --no-cleanup \ + --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json" + show_json "${SCALETEST_RESULTS_DIR}/create-workspaces.json" +fi +end_phase + +wait_baseline "${SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION}" + +non_greedy_agent_traffic_args=() +if [[ ${SCALETEST_PARAM_GREEDY_AGENT} != 1 ]]; then + greedy_agent_traffic() { :; } +else + echo "WARNING: Greedy agent enabled, this may cause the load tests to fail." >&2 + non_greedy_agent_traffic_args=( + # Let the greedy agent traffic command be scraped. + # --scaletest-prometheus-address 0.0.0.0:21113 + # --trace=false + ) + + annotate_grafana greedy_agent "Create greedy agent" + + coder exp scaletest create-workspaces \ + --count 1 \ + --template "${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE}" \ + --concurrency 1 \ + --timeout 5h \ + --job-timeout 5h \ + --no-cleanup \ + --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces-greedy-agent.json" + + wait_baseline "${SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION}" + + greedy_agent_traffic() { + local timeout=${1} scenario=${2} + # Run the greedy test for ~1/3 of the timeout. + delay=$((timeout * 60 / 3)) + + local type=web-terminal + args=() + if [[ ${scenario} == "SSH Traffic" ]]; then + type=ssh + args+=(--ssh) + fi + + sleep "${delay}" + annotate_grafana greedy_agent "${scenario}: Greedy agent traffic" + + # Produce load at about 1000MB/s (25MB/40ms). + set +e + coder exp scaletest workspace-traffic \ + --template "${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE}" \ + --bytes-per-tick $((1024 * 1024 * 25)) \ + --tick-interval 40ms \ + --timeout "$((delay))s" \ + --job-timeout "$((delay))s" \ + --output json:"${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json" \ + --scaletest-prometheus-address 0.0.0.0:21113 \ + --trace=false \ + "${args[@]}" + status=${?} + show_json "${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json" + + export GRAFANA_ADD_TAGS= + if [[ ${status} != 0 ]]; then + GRAFANA_ADD_TAGS=error + fi + annotate_grafana_end greedy_agent "${scenario}: Greedy agent traffic" + + return "${status}" + } +fi + +run_scenario_cmd() { + local scenario=${1} + shift + local command=("$@") + + set +e + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then + annotate_grafana scenario "Load scenario: ${scenario}" + fi + "${command[@]}" + status=${?} + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then + export GRAFANA_ADD_TAGS= + if [[ ${status} != 0 ]]; then + GRAFANA_ADD_TAGS=error + fi + annotate_grafana_end scenario "Load scenario: ${scenario}" + fi + exit "${status}" +} + +declare -a pids=() +declare -A pid_to_scenario=() +declare -A failed=() +target_start=0 +target_end=-1 + +if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then + start_phase "Load scenarios: ${SCALETEST_PARAM_LOAD_SCENARIOS[*]}" +fi +for scenario in "${SCALETEST_PARAM_LOAD_SCENARIOS[@]}"; do + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + start_phase "Load scenario: ${scenario}" + fi + + set +e + status=0 + case "${scenario}" in + "SSH Traffic") + greedy_agent_traffic "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION}" "${scenario}" & + greedy_agent_traffic_pid=$! + + target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') + target_end=$((target_start + target_count)) + if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then + log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." + target_start=0 + target_end=${target_count} + fi + run_scenario_cmd "${scenario}" coder exp scaletest workspace-traffic \ + --template "${SCALETEST_PARAM_TEMPLATE}" \ + --ssh \ + --bytes-per-tick "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_BYTES_PER_TICK}" \ + --tick-interval "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_TICK_INTERVAL}ms" \ + --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION}m" \ + --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION}m30s" \ + --output json:"${SCALETEST_RESULTS_DIR}/traffic-ssh.json" \ + --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ + --target-workspaces "${target_start}:${target_end}" \ + "${non_greedy_agent_traffic_args[@]}" & + pids+=($!) + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + show_json "${SCALETEST_RESULTS_DIR}/traffic-ssh.json" + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + wait "${greedy_agent_traffic_pid}" + status2=$? + if [[ ${status} == 0 ]]; then + status=${status2} + fi + ;; + "Web Terminal Traffic") + greedy_agent_traffic "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION}" "${scenario}" & + greedy_agent_traffic_pid=$! + + target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') + target_end=$((target_start + target_count)) + if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then + log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." + target_start=0 + target_end=${target_count} + fi + run_scenario_cmd "${scenario}" coder exp scaletest workspace-traffic \ + --template "${SCALETEST_PARAM_TEMPLATE}" \ + --bytes-per-tick "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_BYTES_PER_TICK}" \ + --tick-interval "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_TICK_INTERVAL}ms" \ + --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION}m" \ + --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION}m30s" \ + --output json:"${SCALETEST_RESULTS_DIR}/traffic-web-terminal.json" \ + --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ + --target-workspaces "${target_start}:${target_end}" \ + "${non_greedy_agent_traffic_args[@]}" & + pids+=($!) + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + show_json "${SCALETEST_RESULTS_DIR}/traffic-web-terminal.json" + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + wait "${greedy_agent_traffic_pid}" + status2=$? + if [[ ${status} == 0 ]]; then + status=${status2} + fi + ;; + "App Traffic") + greedy_agent_traffic "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION}" "${scenario}" & + greedy_agent_traffic_pid=$! + + target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') + target_end=$((target_start + target_count)) + if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then + log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." + target_start=0 + target_end=${target_count} + fi + run_scenario_cmd "${scenario}" coder exp scaletest workspace-traffic \ + --template "${SCALETEST_PARAM_TEMPLATE}" \ + --bytes-per-tick "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_BYTES_PER_TICK}" \ + --tick-interval "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_TICK_INTERVAL}ms" \ + --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION}m" \ + --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION}m30s" \ + --output json:"${SCALETEST_RESULTS_DIR}/traffic-app.json" \ + --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ + --app "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_MODE}" \ + --target-workspaces "${target_start}:${target_end}" \ + "${non_greedy_agent_traffic_args[@]}" & + pids+=($!) + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + show_json "${SCALETEST_RESULTS_DIR}/traffic-app.json" + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + wait "${greedy_agent_traffic_pid}" + status2=$? + if [[ ${status} == 0 ]]; then + status=${status2} + fi + ;; + "Dashboard Traffic") + target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') + target_end=$((target_start + target_count)) + if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then + log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." + target_start=0 + target_end=${target_count} + fi + # TODO: Remove this once the dashboard traffic command is fixed, + # (i.e. once images are no longer dumped into PWD). + mkdir -p dashboard + pushd dashboard + run_scenario_cmd "${scenario}" coder exp scaletest dashboard \ + --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_DURATION}m" \ + --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_DURATION}m30s" \ + --output json:"${SCALETEST_RESULTS_DIR}/traffic-dashboard.json" \ + --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ + --target-users "${target_start}:${target_end}" \ + >"${SCALETEST_RESULTS_DIR}/traffic-dashboard-output.log" & + pids+=($!) + popd + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + show_json "${SCALETEST_RESULTS_DIR}/traffic-dashboard.json" + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + ;; + + # Debug scenarios, for testing the runner. + "debug:greedy_agent_traffic") + greedy_agent_traffic 10 "${scenario}" & + pids+=($!) + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + ;; + "debug:success") + { + maybedryrun "$DRY_RUN" sleep 10 + true + } & + pids+=($!) + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + ;; + "debug:error") + { + maybedryrun "$DRY_RUN" sleep 10 + false + } & + pids+=($!) + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then + wait "${pids[-1]}" + status=$? + else + SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) + fi + ;; + + *) + log "WARNING: Unknown load scenario: ${scenario}, skipping..." + ;; + esac + set -e + + # Allow targeting to be distributed evenly across workspaces when each + # scenario is run concurrently and all percentages add up to 100. + target_start=${target_end} + + if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then + pid_to_scenario+=(["${pids[-1]}"]="${scenario}") + # Stagger the start of each scenario to avoid a burst of load and deted + # problematic scenarios. + sleep $((SCALETEST_PARAM_LOAD_SCENARIO_CONCURRENCY_STAGGER_DELAY_MINS * 60)) + continue + fi + + if ((status > 0)); then + log "Load scenario failed: ${scenario} (exit=${status})" + failed+=(["${scenario}"]="${status}") + PHASE_ADD_TAGS=error end_phase + else + end_phase + fi + + wait_baseline "${SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION}" +done +if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then + wait "${pids[@]}" + # Wait on all pids will wait until all have exited, but we need to + # check their individual exit codes. + for pid in "${pids[@]}"; do + wait "${pid}" + status=${?} + scenario=${pid_to_scenario[${pid}]} + if ((status > 0)); then + log "Load scenario failed: ${scenario} (exit=${status})" + failed+=(["${scenario}"]="${status}") + fi + done + if ((${#failed[@]} > 0)); then + PHASE_ADD_TAGS=error end_phase + else + end_phase + fi +fi + +if ((${#failed[@]} > 0)); then + log "Load scenarios failed: ${!failed[*]}" + for scenario in "${!failed[@]}"; do + log " ${scenario}: exit=${failed[$scenario]}" + done + exit 1 +fi + +log "Scaletest complete!" +set_status Complete diff --git a/examples/scaletests/scaletest-runner/shutdown.sh b/examples/scaletests/scaletest-runner/shutdown.sh new file mode 100755 index 0000000000000..9e75864d73120 --- /dev/null +++ b/examples/scaletests/scaletest-runner/shutdown.sh @@ -0,0 +1,30 @@ +#!/bin/bash +set -e + +[[ $VERBOSE == 1 ]] && set -x + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" + +cleanup() { + coder tokens remove scaletest_runner >/dev/null 2>&1 || true + rm -f "${CODER_CONFIG_DIR}/session" +} +trap cleanup EXIT + +annotate_grafana "workspace" "Agent stopping..." + +shutdown_event=shutdown_scale_down_only +if [[ ${SCALETEST_PARAM_CLEANUP_STRATEGY} == on_stop ]]; then + shutdown_event=shutdown +fi +"${SCRIPTS_DIR}/cleanup.sh" "${shutdown_event}" + +annotate_grafana_end "workspace" "Agent running" + +appearance_json="$(get_appearance)" +service_banner_message=$(jq -r '.service_banner.message' <<<"${appearance_json}") +service_banner_message="${service_banner_message/% | */}" +service_banner_color="#4CD473" # Green. + +set_appearance "${appearance_json}" "${service_banner_color}" "${service_banner_message}" diff --git a/examples/scaletests/scaletest-runner/startup.sh b/examples/scaletests/scaletest-runner/startup.sh new file mode 100755 index 0000000000000..3e4eb94f41810 --- /dev/null +++ b/examples/scaletests/scaletest-runner/startup.sh @@ -0,0 +1,181 @@ +#!/bin/bash +set -euo pipefail + +[[ $VERBOSE == 1 ]] && set -x + +if [[ ${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE} == "${SCALETEST_PARAM_TEMPLATE}" ]]; then + echo "ERROR: Greedy agent template must be different from the scaletest template." >&2 + exit 1 +fi + +if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]] && [[ ${SCALETEST_PARAM_GREEDY_AGENT} == 1 ]]; then + echo "ERROR: Load scenario concurrency and greedy agent test cannot be enabled at the same time." >&2 + exit 1 +fi + +# Unzip scripts and add to path. +# shellcheck disable=SC2153 +echo "Extracting scaletest scripts into ${SCRIPTS_DIR}..." +base64 -d <<<"${SCRIPTS_ZIP}" >/tmp/scripts.zip +rm -rf "${SCRIPTS_DIR}" || true +mkdir -p "${SCRIPTS_DIR}" +unzip -o /tmp/scripts.zip -d "${SCRIPTS_DIR}" +# Chmod to work around https://github.com/coder/coder/issues/10034 +chmod +x "${SCRIPTS_DIR}"/*.sh +rm /tmp/scripts.zip + +echo "Cloning coder/coder repo..." +if [[ ! -d "${HOME}/coder" ]]; then + git clone https://github.com/coder/coder.git "${HOME}/coder" +fi +(cd "${HOME}/coder" && git fetch -a && git checkout "${SCALETEST_PARAM_REPO_BRANCH}" && git pull) + +# Store the input parameters (for debugging). +env | grep "^SCALETEST_" | sort >"${SCALETEST_RUN_DIR}/environ.txt" + +# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh +. "${SCRIPTS_DIR}/lib.sh" + +appearance_json="$(get_appearance)" +service_banner_message=$(jq -r '.service_banner.message' <<<"${appearance_json}") +service_banner_message="${service_banner_message/% | */}" +service_banner_color="#D65D0F" # Orange. + +annotate_grafana "workspace" "Agent running" # Ended in shutdown.sh. + +{ + pids=() + ports=() + declare -A pods=() + next_port=6061 + for pod in $(kubectl get pods -l app.kubernetes.io/name=coder -o jsonpath='{.items[*].metadata.name}'); do + maybedryrun "${DRY_RUN}" kubectl -n coder-big port-forward "${pod}" "${next_port}:6060" & + pids+=($!) + ports+=("${next_port}") + pods[${next_port}]="${pod}" + next_port=$((next_port + 1)) + done + + trap 'trap - EXIT; kill -INT "${pids[@]}"; exit 1' INT EXIT + + while :; do + # Sleep for short periods of time so that we can exit quickly. + # This adds up to ~300 when accounting for profile and trace. + for ((i = 0; i < 285; i++)); do + sleep 1 + done + log "Grabbing pprof dumps" + start="$(date +%s)" + annotate_grafana "pprof" "Grab pprof dumps (start=${start})" + for type in allocs block heap goroutine mutex 'profile?seconds=10' 'trace?seconds=5'; do + for port in "${ports[@]}"; do + tidy_type="${type//\?/_}" + tidy_type="${tidy_type//=/_}" + maybedryrun "${DRY_RUN}" curl -sSL --output "${SCALETEST_PPROF_DIR}/pprof-${tidy_type}-${pods[${port}]}-${start}.gz" "http://localhost:${port}/debug/pprof/${type}" + done + done + annotate_grafana_end "pprof" "Grab pprof dumps (start=${start})" + done +} & +pprof_pid=$! + +logs_gathered=0 +gather_logs() { + if ((logs_gathered == 1)); then + return + fi + logs_gathered=1 + + # Gather logs from all coderd and provisioner instances, and all workspaces. + annotate_grafana "logs" "Gather logs" + podsraw="$( + kubectl -n coder-big get pods -l app.kubernetes.io/name=coder -o name + kubectl -n coder-big get pods -l app.kubernetes.io/name=coder-provisioner -o name || true + kubectl -n coder-big get pods -l app.kubernetes.io/name=coder-workspace -o name | grep "^pod/scaletest-" || true + )" + mapfile -t pods <<<"${podsraw}" + for pod in "${pods[@]}"; do + pod_name="${pod#pod/}" + kubectl -n coder-big logs "${pod}" --since-time="${SCALETEST_RUN_START_TIME}" >"${SCALETEST_LOGS_DIR}/${pod_name}.txt" + done + annotate_grafana_end "logs" "Gather logs" +} + +set_appearance "${appearance_json}" "${service_banner_color}" "${service_banner_message} | Scaletest running: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE})!" + +# Show failure in the UI if script exits with error. +on_exit() { + code=${?} + trap - ERR EXIT + set +e + + kill -INT "${pprof_pid}" + + message_color="#4CD473" # Green. + message_status=COMPLETE + if ((code > 0)); then + message_color="#D94A5D" # Red. + message_status=FAILED + fi + + # In case the test failed before gathering logs, gather them before + # cleaning up, whilst the workspaces are still present. + gather_logs + + case "${SCALETEST_PARAM_CLEANUP_STRATEGY}" in + on_stop) + # Handled by shutdown script. + ;; + on_success) + if ((code == 0)); then + set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE}), cleaning up..." + "${SCRIPTS_DIR}/cleanup.sh" "${SCALETEST_PARAM_CLEANUP_STRATEGY}" + fi + ;; + on_error) + if ((code > 0)); then + set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE}), cleaning up..." + "${SCRIPTS_DIR}/cleanup.sh" "${SCALETEST_PARAM_CLEANUP_STRATEGY}" + fi + ;; + *) + set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE}), cleaning up..." + "${SCRIPTS_DIR}/cleanup.sh" "${SCALETEST_PARAM_CLEANUP_STRATEGY}" + ;; + esac + + set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE})!" + + annotate_grafana_end "" "Start scaletest: ${SCALETEST_COMMENT}" + + wait "${pprof_pid}" + exit "${code}" +} +trap on_exit EXIT + +on_err() { + code=${?} + trap - ERR + set +e + + log "Scaletest failed!" + GRAFANA_EXTRA_TAGS=error set_status "Failed (exit=${code})" + "${SCRIPTS_DIR}/report.sh" failed + lock_status # Ensure we never rewrite the status after a failure. + + exit "${code}" +} +trap on_err ERR + +# Pass session token since `prepare.sh` has not yet run. +CODER_SESSION_TOKEN=$CODER_USER_TOKEN "${SCRIPTS_DIR}/report.sh" started +annotate_grafana "" "Start scaletest: ${SCALETEST_COMMENT}" + +"${SCRIPTS_DIR}/prepare.sh" + +"${SCRIPTS_DIR}/run.sh" + +# Gather logs before ending the test. +gather_logs + +"${SCRIPTS_DIR}/report.sh" completed From 358f99025eb0ff75cb2601fe150f261150f130ae Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Tue, 19 Mar 2024 15:05:11 +0100 Subject: [PATCH 03/21] make fmt --- examples/scaletests/kubernetes-large/main.tf | 134 +++++++++--------- .../kubernetes-medium-greedy/main.tf | 6 +- examples/scaletests/kubernetes-medium/main.tf | 134 +++++++++--------- examples/scaletests/kubernetes-small/main.tf | 134 +++++++++--------- .../kubernetes-with-podmonitor/main.tf | 18 +-- 5 files changed, 213 insertions(+), 213 deletions(-) diff --git a/examples/scaletests/kubernetes-large/main.tf b/examples/scaletests/kubernetes-large/main.tf index 98d5c552f9eaf..352db67bbcf22 100644 --- a/examples/scaletests/kubernetes-large/main.tf +++ b/examples/scaletests/kubernetes-large/main.tf @@ -1,82 +1,82 @@ - terraform { - required_providers { - coder = { - source = "coder/coder" - version = "~> 0.7.0" - } - kubernetes = { - source = "hashicorp/kubernetes" - version = "~> 2.18" - } - } +terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.7.0" } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } +} - provider "coder" {} +provider "coder" {} - provider "kubernetes" { - config_path = null # always use host - } +provider "kubernetes" { + config_path = null # always use host +} - data "coder_workspace" "me" {} +data "coder_workspace" "me" {} - resource "coder_agent" "main" { - os = "linux" - arch = "amd64" - startup_script_timeout = 180 - startup_script = "" - } +resource "coder_agent" "main" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" +} - resource "kubernetes_pod" "main" { - count = data.coder_workspace.me.start_count - metadata { - name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - namespace = "coder-big" - labels = { - "app.kubernetes.io/name" = "coder-workspace" - "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - } +resource "kubernetes_pod" "main" { + count = data.coder_workspace.me.start_count + metadata { + name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + namespace = "coder-big" + labels = { + "app.kubernetes.io/name" = "coder-workspace" + "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + } + } + spec { + security_context { + run_as_user = "1000" + fs_group = "1000" + } + container { + name = "dev" + image = "docker.io/codercom/enterprise-minimal:ubuntu" + image_pull_policy = "Always" + command = ["sh", "-c", coder_agent.main.init_script] + security_context { + run_as_user = "1000" } - spec { - security_context { - run_as_user = "1000" - fs_group = "1000" + env { + name = "CODER_AGENT_TOKEN" + value = coder_agent.main.token + } + resources { + requests = { + "cpu" = "4" + "memory" = "4Gi" } - container { - name = "dev" - image = "docker.io/codercom/enterprise-minimal:ubuntu" - image_pull_policy = "Always" - command = ["sh", "-c", coder_agent.main.init_script] - security_context { - run_as_user = "1000" - } - env { - name = "CODER_AGENT_TOKEN" - value = coder_agent.main.token - } - resources { - requests = { - "cpu" = "4" - "memory" = "4Gi" - } - limits = { - "cpu" = "4" - "memory" = "4Gi" - } - } + limits = { + "cpu" = "4" + "memory" = "4Gi" } + } + } - affinity { - node_affinity { - required_during_scheduling_ignored_during_execution { - node_selector_term { - match_expressions { - key = "cloud.google.com/gke-nodepool" - operator = "In" - values = ["big-workspaces"] - } - } + affinity { + node_affinity { + required_during_scheduling_ignored_during_execution { + node_selector_term { + match_expressions { + key = "cloud.google.com/gke-nodepool" + operator = "In" + values = ["big-workspaces"] } } } } } + } +} diff --git a/examples/scaletests/kubernetes-medium-greedy/main.tf b/examples/scaletests/kubernetes-medium-greedy/main.tf index 45f5b970d73c7..a0a5dd8742c56 100644 --- a/examples/scaletests/kubernetes-medium-greedy/main.tf +++ b/examples/scaletests/kubernetes-medium-greedy/main.tf @@ -137,7 +137,7 @@ resource "coder_agent" "main" { script = "dd if=/dev/urandom bs=3072 count=1 status=none | base64" interval = 1 timeout = 10 - } + } } resource "kubernetes_pod" "main" { @@ -184,9 +184,9 @@ resource "kubernetes_pod" "main" { required_during_scheduling_ignored_during_execution { node_selector_term { match_expressions { - key = "cloud.google.com/gke-nodepool" + key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces"] + values = ["big-workspaces"] } } } diff --git a/examples/scaletests/kubernetes-medium/main.tf b/examples/scaletests/kubernetes-medium/main.tf index b8ce10b4bdb8a..5dcd9588c1b33 100644 --- a/examples/scaletests/kubernetes-medium/main.tf +++ b/examples/scaletests/kubernetes-medium/main.tf @@ -1,82 +1,82 @@ - terraform { - required_providers { - coder = { - source = "coder/coder" - version = "~> 0.7.0" - } - kubernetes = { - source = "hashicorp/kubernetes" - version = "~> 2.18" - } - } +terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.7.0" } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } +} - provider "coder" {} +provider "coder" {} - provider "kubernetes" { - config_path = null # always use host - } +provider "kubernetes" { + config_path = null # always use host +} - data "coder_workspace" "me" {} +data "coder_workspace" "me" {} - resource "coder_agent" "main" { - os = "linux" - arch = "amd64" - startup_script_timeout = 180 - startup_script = "" - } +resource "coder_agent" "main" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" +} - resource "kubernetes_pod" "main" { - count = data.coder_workspace.me.start_count - metadata { - name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - namespace = "coder-big" - labels = { - "app.kubernetes.io/name" = "coder-workspace" - "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - } +resource "kubernetes_pod" "main" { + count = data.coder_workspace.me.start_count + metadata { + name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + namespace = "coder-big" + labels = { + "app.kubernetes.io/name" = "coder-workspace" + "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + } + } + spec { + security_context { + run_as_user = "1000" + fs_group = "1000" + } + container { + name = "dev" + image = "docker.io/codercom/enterprise-minimal:ubuntu" + image_pull_policy = "Always" + command = ["sh", "-c", coder_agent.main.init_script] + security_context { + run_as_user = "1000" } - spec { - security_context { - run_as_user = "1000" - fs_group = "1000" + env { + name = "CODER_AGENT_TOKEN" + value = coder_agent.main.token + } + resources { + requests = { + "cpu" = "2" + "memory" = "2Gi" } - container { - name = "dev" - image = "docker.io/codercom/enterprise-minimal:ubuntu" - image_pull_policy = "Always" - command = ["sh", "-c", coder_agent.main.init_script] - security_context { - run_as_user = "1000" - } - env { - name = "CODER_AGENT_TOKEN" - value = coder_agent.main.token - } - resources { - requests = { - "cpu" = "2" - "memory" = "2Gi" - } - limits = { - "cpu" = "2" - "memory" = "2Gi" - } - } + limits = { + "cpu" = "2" + "memory" = "2Gi" } + } + } - affinity { - node_affinity { - required_during_scheduling_ignored_during_execution { - node_selector_term { - match_expressions { - key = "cloud.google.com/gke-nodepool" - operator = "In" - values = ["big-workspaces"] - } - } + affinity { + node_affinity { + required_during_scheduling_ignored_during_execution { + node_selector_term { + match_expressions { + key = "cloud.google.com/gke-nodepool" + operator = "In" + values = ["big-workspaces"] } } } } } + } +} diff --git a/examples/scaletests/kubernetes-small/main.tf b/examples/scaletests/kubernetes-small/main.tf index b11308b4a2ccf..b59e4989544f5 100644 --- a/examples/scaletests/kubernetes-small/main.tf +++ b/examples/scaletests/kubernetes-small/main.tf @@ -1,82 +1,82 @@ - terraform { - required_providers { - coder = { - source = "coder/coder" - version = "~> 0.7.0" - } - kubernetes = { - source = "hashicorp/kubernetes" - version = "~> 2.18" - } - } +terraform { + required_providers { + coder = { + source = "coder/coder" + version = "~> 0.7.0" } + kubernetes = { + source = "hashicorp/kubernetes" + version = "~> 2.18" + } + } +} - provider "coder" {} +provider "coder" {} - provider "kubernetes" { - config_path = null # always use host - } +provider "kubernetes" { + config_path = null # always use host +} - data "coder_workspace" "me" {} +data "coder_workspace" "me" {} - resource "coder_agent" "main" { - os = "linux" - arch = "amd64" - startup_script_timeout = 180 - startup_script = "" - } +resource "coder_agent" "main" { + os = "linux" + arch = "amd64" + startup_script_timeout = 180 + startup_script = "" +} - resource "kubernetes_pod" "main" { - count = data.coder_workspace.me.start_count - metadata { - name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - namespace = "coder-big" - labels = { - "app.kubernetes.io/name" = "coder-workspace" - "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - } +resource "kubernetes_pod" "main" { + count = data.coder_workspace.me.start_count + metadata { + name = "coder-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + namespace = "coder-big" + labels = { + "app.kubernetes.io/name" = "coder-workspace" + "app.kubernetes.io/instance" = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + } + } + spec { + security_context { + run_as_user = "1000" + fs_group = "1000" + } + container { + name = "dev" + image = "docker.io/codercom/enterprise-base:ubuntu" + image_pull_policy = "Always" + command = ["sh", "-c", coder_agent.main.init_script] + security_context { + run_as_user = "1000" } - spec { - security_context { - run_as_user = "1000" - fs_group = "1000" + env { + name = "CODER_AGENT_TOKEN" + value = coder_agent.main.token + } + resources { + requests = { + "cpu" = "1" + "memory" = "1Gi" } - container { - name = "dev" - image = "docker.io/codercom/enterprise-base:ubuntu" - image_pull_policy = "Always" - command = ["sh", "-c", coder_agent.main.init_script] - security_context { - run_as_user = "1000" - } - env { - name = "CODER_AGENT_TOKEN" - value = coder_agent.main.token - } - resources { - requests = { - "cpu" = "1" - "memory" = "1Gi" - } - limits = { - "cpu" = "1" - "memory" = "1Gi" - } - } + limits = { + "cpu" = "1" + "memory" = "1Gi" } + } + } - affinity { - node_affinity { - required_during_scheduling_ignored_during_execution { - node_selector_term { - match_expressions { - key = "cloud.google.com/gke-nodepool" - operator = "In" - values = ["big-workspaces"] - } - } + affinity { + node_affinity { + required_during_scheduling_ignored_during_execution { + node_selector_term { + match_expressions { + key = "cloud.google.com/gke-nodepool" + operator = "In" + values = ["big-workspaces"] } } } } } + } +} diff --git a/examples/scaletests/kubernetes-with-podmonitor/main.tf b/examples/scaletests/kubernetes-with-podmonitor/main.tf index 1c6c732377728..722cbe71f7692 100644 --- a/examples/scaletests/kubernetes-with-podmonitor/main.tf +++ b/examples/scaletests/kubernetes-with-podmonitor/main.tf @@ -289,8 +289,8 @@ resource "kubernetes_pod" "main" { } port { container_port = 21112 - name = "prometheus-http" - protocol = "TCP" + name = "prometheus-http" + protocol = "TCP" } } @@ -325,9 +325,9 @@ resource "kubernetes_pod" "main" { required_during_scheduling_ignored_during_execution { node_selector_term { match_expressions { - key = "cloud.google.com/gke-nodepool" + key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-misc"] # avoid placing on the same nodes as scaletest workspaces + values = ["big-misc"] # avoid placing on the same nodes as scaletest workspaces } } } @@ -339,21 +339,21 @@ resource "kubernetes_pod" "main" { resource "kubernetes_manifest" "pod_monitor" { count = data.coder_workspace.me.start_count manifest = { - apiVersion = "monitoring.coreos.com/v1" - kind = "PodMonitor" + apiVersion = "monitoring.coreos.com/v1" + kind = "PodMonitor" metadata = { namespace = var.namespace - name = "podmonitor-${local.workspace_pod_name}" + name = "podmonitor-${local.workspace_pod_name}" } spec = { selector = { matchLabels = { - "app.kubernetes.io/instance": "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" + "app.kubernetes.io/instance" : "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" } } podMetricsEndpoints = [ { - port = "prometheus-http" + port = "prometheus-http" interval = "15s" } ] From 51f9f0cf35d69ee0480aad2d904fa42be3527ff5 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Tue, 19 Mar 2024 15:29:04 +0100 Subject: [PATCH 04/21] Use mock Grafana url --- examples/scaletests/scaletest-runner/main.tf | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/examples/scaletests/scaletest-runner/main.tf b/examples/scaletests/scaletest-runner/main.tf index 2a6eb8ca21ed5..256b79d96320b 100644 --- a/examples/scaletests/scaletest-runner/main.tf +++ b/examples/scaletests/scaletest-runner/main.tf @@ -44,7 +44,7 @@ locals { scaletest_run_id = "scaletest-${replace(time_static.start_time.rfc3339, ":", "-")}" scaletest_run_dir = "/home/coder/${local.scaletest_run_id}" scaletest_run_start_time = time_static.start_time.rfc3339 - grafana_url = "https://stats.dev.c8s.io" + grafana_url = "https://grafana.corp.tld" grafana_dashboard_uid = "qLVSTR-Vz" grafana_dashboard_name = "coderv2-loadtest-dashboard" } @@ -736,8 +736,7 @@ resource "coder_app" "prometheus" { agent_id = coder_agent.main.id slug = "01-prometheus" display_name = "Prometheus" - // https://stats.dev.c8s.io:9443/classic/graph?g0.range_input=2h&g0.end_input=2023-09-08%2015%3A58&g0.stacked=0&g0.expr=rate(pg_stat_database_xact_commit%7Bcluster%3D%22big%22%2Cdatname%3D%22big-coder%22%7D%5B1m%5D)&g0.tab=0 - url = "https://stats.dev.c8s.io:9443" + url = "https://grafana.corp.tld:9443" icon = "https://prometheus.io/assets/favicons/favicon-32x32.png" external = true } From 4bafa5c64739ab05bffef66ef0c1c70e7add5848 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Wed, 20 Mar 2024 12:47:21 +0100 Subject: [PATCH 05/21] READMEs --- examples/scaletests/kubernetes-large/README.md | 5 +++++ examples/scaletests/kubernetes-medium-greedy/README.md | 5 +++++ examples/scaletests/kubernetes-medium/README.md | 5 +++++ examples/scaletests/kubernetes-minimal/README.md | 5 +++++ examples/scaletests/kubernetes-small/README.md | 5 +++++ 5 files changed, 25 insertions(+) create mode 100644 examples/scaletests/kubernetes-large/README.md create mode 100644 examples/scaletests/kubernetes-medium-greedy/README.md create mode 100644 examples/scaletests/kubernetes-medium/README.md create mode 100644 examples/scaletests/kubernetes-minimal/README.md create mode 100644 examples/scaletests/kubernetes-small/README.md diff --git a/examples/scaletests/kubernetes-large/README.md b/examples/scaletests/kubernetes-large/README.md new file mode 100644 index 0000000000000..2b0ae5cc296be --- /dev/null +++ b/examples/scaletests/kubernetes-large/README.md @@ -0,0 +1,5 @@ +# kubernetes-large + +Provisions a large-sized workspace with no persistent storage. + +_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` diff --git a/examples/scaletests/kubernetes-medium-greedy/README.md b/examples/scaletests/kubernetes-medium-greedy/README.md new file mode 100644 index 0000000000000..22e94bb262616 --- /dev/null +++ b/examples/scaletests/kubernetes-medium-greedy/README.md @@ -0,0 +1,5 @@ +# kubernetes-medium-greedy + +Provisions a medium-sized workspace with no persistent storage. Greedy agent variant. + +_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` diff --git a/examples/scaletests/kubernetes-medium/README.md b/examples/scaletests/kubernetes-medium/README.md new file mode 100644 index 0000000000000..e2d5eae983114 --- /dev/null +++ b/examples/scaletests/kubernetes-medium/README.md @@ -0,0 +1,5 @@ +# kubernetes-medium + +Provisions a medium-sized workspace with no persistent storage. + +_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` diff --git a/examples/scaletests/kubernetes-minimal/README.md b/examples/scaletests/kubernetes-minimal/README.md new file mode 100644 index 0000000000000..c56d3d477f821 --- /dev/null +++ b/examples/scaletests/kubernetes-minimal/README.md @@ -0,0 +1,5 @@ +# kubernetes-minimal + +Provisions a medium-sized workspace with no persistent storage. Greedy agent variant. + +_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` diff --git a/examples/scaletests/kubernetes-small/README.md b/examples/scaletests/kubernetes-small/README.md new file mode 100644 index 0000000000000..56efbb98c3cb3 --- /dev/null +++ b/examples/scaletests/kubernetes-small/README.md @@ -0,0 +1,5 @@ +# kubernetes-small + +Provisions a small-sized workspace with no persistent storage. + +_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` From 7dc0d5e27f7a9c1e68dbb1272ef2ee7d547c9bfd Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Wed, 20 Mar 2024 13:05:09 +0100 Subject: [PATCH 06/21] More todos --- docs/admin/scale.md | 16 ++++++++++++++++ examples/scaletests/scaletest-runner/main.tf | 6 +++--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 024983bb7a528..1ecd4a9b2f781 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -91,6 +91,22 @@ coder exp scaletest cleanup This will delete all workspaces and users with the prefix `scaletest-`. +## Scale testing template + +TODO + +### Parameters + +TODO + +### Kubernetes cluster + +TODO + +### Observability + +TODO Grafana and logs + ## Autoscaling We generally do not recommend using an autoscaler that modifies the number of diff --git a/examples/scaletests/scaletest-runner/main.tf b/examples/scaletests/scaletest-runner/main.tf index 256b79d96320b..2d17c66435f62 100644 --- a/examples/scaletests/scaletest-runner/main.tf +++ b/examples/scaletests/scaletest-runner/main.tf @@ -736,9 +736,9 @@ resource "coder_app" "prometheus" { agent_id = coder_agent.main.id slug = "01-prometheus" display_name = "Prometheus" - url = "https://grafana.corp.tld:9443" - icon = "https://prometheus.io/assets/favicons/favicon-32x32.png" - external = true + url = "https://grafana.corp.tld:9443" + icon = "https://prometheus.io/assets/favicons/favicon-32x32.png" + external = true } resource "coder_app" "manual_cleanup" { From 7df6afe6b3e786ebd4496f48348a3d9d01678c1d Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Wed, 20 Mar 2024 13:36:02 +0100 Subject: [PATCH 07/21] Move scaletests --- docs/admin/scale.md | 4 +++- .../templates}/kubernetes-large/README.md | 0 .../templates}/kubernetes-large/main.tf | 0 .../templates}/kubernetes-medium-greedy/README.md | 0 .../templates}/kubernetes-medium-greedy/main.tf | 0 .../templates}/kubernetes-medium/README.md | 0 .../templates}/kubernetes-medium/main.tf | 0 .../templates}/kubernetes-minimal/README.md | 0 .../templates}/kubernetes-minimal/main.tf | 0 .../templates}/kubernetes-small/README.md | 0 .../templates}/kubernetes-small/main.tf | 0 .../templates}/kubernetes-with-podmonitor/README.md | 0 .../templates}/kubernetes-with-podmonitor/main.tf | 0 13 files changed, 3 insertions(+), 1 deletion(-) rename {examples/scaletests => scaletest/templates}/kubernetes-large/README.md (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-large/main.tf (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-medium-greedy/README.md (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-medium-greedy/main.tf (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-medium/README.md (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-medium/main.tf (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-minimal/README.md (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-minimal/main.tf (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-small/README.md (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-small/main.tf (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-with-podmonitor/README.md (100%) rename {examples/scaletests => scaletest/templates}/kubernetes-with-podmonitor/main.tf (100%) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 1ecd4a9b2f781..dd018b98562ec 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -1,6 +1,8 @@ We scale-test Coder with [a built-in utility](#scale-testing-utility) that can be used in your environment for insights into how Coder scales with your -infrastructure. +infrastructure. For scale-testing Kubernetes clusters we recommend to install +and use the dedicated Coder template, +[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner). Learn more about [Coder’s architecture](../about/architecture.md) and our [scale-testing methodology](architectures/index.md#scale-testing-methodology). diff --git a/examples/scaletests/kubernetes-large/README.md b/scaletest/templates/kubernetes-large/README.md similarity index 100% rename from examples/scaletests/kubernetes-large/README.md rename to scaletest/templates/kubernetes-large/README.md diff --git a/examples/scaletests/kubernetes-large/main.tf b/scaletest/templates/kubernetes-large/main.tf similarity index 100% rename from examples/scaletests/kubernetes-large/main.tf rename to scaletest/templates/kubernetes-large/main.tf diff --git a/examples/scaletests/kubernetes-medium-greedy/README.md b/scaletest/templates/kubernetes-medium-greedy/README.md similarity index 100% rename from examples/scaletests/kubernetes-medium-greedy/README.md rename to scaletest/templates/kubernetes-medium-greedy/README.md diff --git a/examples/scaletests/kubernetes-medium-greedy/main.tf b/scaletest/templates/kubernetes-medium-greedy/main.tf similarity index 100% rename from examples/scaletests/kubernetes-medium-greedy/main.tf rename to scaletest/templates/kubernetes-medium-greedy/main.tf diff --git a/examples/scaletests/kubernetes-medium/README.md b/scaletest/templates/kubernetes-medium/README.md similarity index 100% rename from examples/scaletests/kubernetes-medium/README.md rename to scaletest/templates/kubernetes-medium/README.md diff --git a/examples/scaletests/kubernetes-medium/main.tf b/scaletest/templates/kubernetes-medium/main.tf similarity index 100% rename from examples/scaletests/kubernetes-medium/main.tf rename to scaletest/templates/kubernetes-medium/main.tf diff --git a/examples/scaletests/kubernetes-minimal/README.md b/scaletest/templates/kubernetes-minimal/README.md similarity index 100% rename from examples/scaletests/kubernetes-minimal/README.md rename to scaletest/templates/kubernetes-minimal/README.md diff --git a/examples/scaletests/kubernetes-minimal/main.tf b/scaletest/templates/kubernetes-minimal/main.tf similarity index 100% rename from examples/scaletests/kubernetes-minimal/main.tf rename to scaletest/templates/kubernetes-minimal/main.tf diff --git a/examples/scaletests/kubernetes-small/README.md b/scaletest/templates/kubernetes-small/README.md similarity index 100% rename from examples/scaletests/kubernetes-small/README.md rename to scaletest/templates/kubernetes-small/README.md diff --git a/examples/scaletests/kubernetes-small/main.tf b/scaletest/templates/kubernetes-small/main.tf similarity index 100% rename from examples/scaletests/kubernetes-small/main.tf rename to scaletest/templates/kubernetes-small/main.tf diff --git a/examples/scaletests/kubernetes-with-podmonitor/README.md b/scaletest/templates/kubernetes-with-podmonitor/README.md similarity index 100% rename from examples/scaletests/kubernetes-with-podmonitor/README.md rename to scaletest/templates/kubernetes-with-podmonitor/README.md diff --git a/examples/scaletests/kubernetes-with-podmonitor/main.tf b/scaletest/templates/kubernetes-with-podmonitor/main.tf similarity index 100% rename from examples/scaletests/kubernetes-with-podmonitor/main.tf rename to scaletest/templates/kubernetes-with-podmonitor/main.tf From 03279ae6f429ba74be752fdf5afd303d9effc075 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Wed, 20 Mar 2024 13:37:07 +0100 Subject: [PATCH 08/21] Move around templates --- .../scaletests/scaletest-runner/Dockerfile | 36 - .../scaletests/scaletest-runner/README.md | 9 - examples/scaletests/scaletest-runner/main.tf | 960 ------------------ .../scaletest-runner/metadata_phase.sh | 6 - .../metadata_previous_phase.sh | 6 - .../scaletest-runner/metadata_status.sh | 6 - .../scaletest-runner/scripts/cleanup.sh | 62 -- .../scaletest-runner/scripts/lib.sh | 313 ------ .../scaletest-runner/scripts/prepare.sh | 67 -- .../scaletest-runner/scripts/report.sh | 109 -- .../scaletest-runner/scripts/run.sh | 369 ------- .../scaletests/scaletest-runner/shutdown.sh | 30 - .../scaletests/scaletest-runner/startup.sh | 181 ---- scaletest/templates/scaletest-runner/main.tf | 11 +- 14 files changed, 6 insertions(+), 2159 deletions(-) delete mode 100644 examples/scaletests/scaletest-runner/Dockerfile delete mode 100644 examples/scaletests/scaletest-runner/README.md delete mode 100644 examples/scaletests/scaletest-runner/main.tf delete mode 100755 examples/scaletests/scaletest-runner/metadata_phase.sh delete mode 100755 examples/scaletests/scaletest-runner/metadata_previous_phase.sh delete mode 100755 examples/scaletests/scaletest-runner/metadata_status.sh delete mode 100755 examples/scaletests/scaletest-runner/scripts/cleanup.sh delete mode 100644 examples/scaletests/scaletest-runner/scripts/lib.sh delete mode 100755 examples/scaletests/scaletest-runner/scripts/prepare.sh delete mode 100755 examples/scaletests/scaletest-runner/scripts/report.sh delete mode 100755 examples/scaletests/scaletest-runner/scripts/run.sh delete mode 100755 examples/scaletests/scaletest-runner/shutdown.sh delete mode 100755 examples/scaletests/scaletest-runner/startup.sh diff --git a/examples/scaletests/scaletest-runner/Dockerfile b/examples/scaletests/scaletest-runner/Dockerfile deleted file mode 100644 index 9aa016b534a17..0000000000000 --- a/examples/scaletests/scaletest-runner/Dockerfile +++ /dev/null @@ -1,36 +0,0 @@ -# This image is used to run scaletest jobs and, although it is inside -# the template directory, it is built separately and pushed to -# gcr.io/coder-dev-1/scaletest-runner:latest. -# -# Future improvements will include versioning and including the version -# in the template push. - -FROM codercom/enterprise-base:ubuntu - -ARG DEBIAN_FRONTEND=noninteractive - -USER root - -# TODO(mafredri): Remove unneeded dependencies once we have a clear idea of what's needed. -RUN wget --quiet -O /tmp/terraform.zip https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_linux_amd64.zip \ - && unzip /tmp/terraform.zip -d /usr/local/bin \ - && rm /tmp/terraform.zip \ - && terraform --version - -RUN wget --quiet -O /tmp/envsubst "https://github.com/a8m/envsubst/releases/download/v1.2.0/envsubst-$(uname -s)-$(uname -m)" \ - && chmod +x /tmp/envsubst \ - && mv /tmp/envsubst /usr/local/bin - -RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list \ - && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - \ - && apt-get update \ - && apt-get install --yes \ - google-cloud-cli \ - jq \ - kubectl \ - zstd \ - && gcloud --version \ - && kubectl version --client \ - && rm -rf /var/lib/apt/lists/* - -USER coder diff --git a/examples/scaletests/scaletest-runner/README.md b/examples/scaletests/scaletest-runner/README.md deleted file mode 100644 index 6c048211e1ad4..0000000000000 --- a/examples/scaletests/scaletest-runner/README.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -name: Scaletest Runner -description: Run a scaletest. -tags: [local] ---- - -# Scaletest Runner - -Run a scaletest. diff --git a/examples/scaletests/scaletest-runner/main.tf b/examples/scaletests/scaletest-runner/main.tf deleted file mode 100644 index 2d17c66435f62..0000000000000 --- a/examples/scaletests/scaletest-runner/main.tf +++ /dev/null @@ -1,960 +0,0 @@ -terraform { - required_providers { - coder = { - source = "coder/coder" - version = "~> 0.12" - } - kubernetes = { - source = "hashicorp/kubernetes" - version = "~> 2.22" - } - } -} - -resource "time_static" "start_time" { - # We don't set `count = data.coder_workspace.me.start_count` here because then - # we can't use this value in `locals`, but we want to trigger recreation when - # the scaletest is restarted. - triggers = { - count : data.coder_workspace.me.start_count - token : data.coder_workspace.me.owner_session_token # Rely on this being re-generated every start. - } -} - -resource "null_resource" "permission_check" { - count = data.coder_workspace.me.start_count - - # Limit which users can create a workspace in this template. - # The "default" user and workspace are present because they are needed - # for the plan, and consequently, updating the template. - lifecycle { - precondition { - condition = can(regex("^(default/default|scaletest/runner)$", "${data.coder_workspace.me.owner}/${data.coder_workspace.me.name}")) - error_message = "User and workspace name is not allowed, expected 'scaletest/runner'." - } - } -} - -locals { - workspace_pod_name = "coder-scaletest-runner-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - workspace_pod_instance = "coder-workspace-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - workspace_pod_termination_grace_period_seconds = 5 * 60 * 60 # 5 hours (cleanup timeout). - service_account_name = "scaletest-sa" - home_disk_size = 10 - scaletest_run_id = "scaletest-${replace(time_static.start_time.rfc3339, ":", "-")}" - scaletest_run_dir = "/home/coder/${local.scaletest_run_id}" - scaletest_run_start_time = time_static.start_time.rfc3339 - grafana_url = "https://grafana.corp.tld" - grafana_dashboard_uid = "qLVSTR-Vz" - grafana_dashboard_name = "coderv2-loadtest-dashboard" -} - -data "coder_provisioner" "me" { -} - -data "coder_workspace" "me" { -} - -data "coder_parameter" "verbose" { - order = 1 - type = "bool" - name = "Verbose" - default = false - description = "Show debug output." - mutable = true - ephemeral = true -} - -data "coder_parameter" "dry_run" { - order = 2 - type = "bool" - name = "Dry-run" - default = true - description = "Perform a dry-run to see what would happen." - mutable = true - ephemeral = true -} - -data "coder_parameter" "repo_branch" { - order = 3 - type = "string" - name = "Branch" - default = "main" - description = "Branch of coder/coder repo to check out (only useful for developing the runner)." - mutable = true -} - -data "coder_parameter" "comment" { - order = 4 - type = "string" - name = "Comment" - default = "" - description = "Describe **what** you're testing and **why** you're testing it." - mutable = true - ephemeral = true -} - -data "coder_parameter" "create_concurrency" { - order = 10 - type = "number" - name = "Create concurrency" - default = 10 - description = "The number of workspaces to create concurrently." - mutable = true - - # Setting zero = unlimited, but perhaps not a good idea, - # we can raise this limit instead. - validation { - min = 1 - max = 100 - } -} - -data "coder_parameter" "job_concurrency" { - order = 11 - type = "number" - name = "Job concurrency" - default = 0 - description = "The number of concurrent jobs (e.g. when producing workspace traffic)." - mutable = true - - # Setting zero = unlimited, but perhaps not a good idea, - # we can raise this limit instead. - validation { - min = 0 - } -} - -data "coder_parameter" "cleanup_concurrency" { - order = 12 - type = "number" - name = "Cleanup concurrency" - default = 10 - description = "The number of concurrent cleanup jobs." - mutable = true - - # Setting zero = unlimited, but perhaps not a good idea, - # we can raise this limit instead. - validation { - min = 1 - max = 100 - } -} - -data "coder_parameter" "cleanup_strategy" { - order = 13 - name = "Cleanup strategy" - default = "always" - description = "The strategy used to cleanup workspaces after the scaletest is complete." - mutable = true - ephemeral = true - option { - name = "Always" - value = "always" - description = "Automatically cleanup workspaces after the scaletest ends." - } - option { - name = "On stop" - value = "on_stop" - description = "Cleanup workspaces when the workspace is stopped." - } - option { - name = "On success" - value = "on_success" - description = "Automatically cleanup workspaces after the scaletest is complete if no error occurs." - } - option { - name = "On error" - value = "on_error" - description = "Automatically cleanup workspaces after the scaletest is complete if an error occurs." - } -} - -data "coder_parameter" "cleanup_prepare" { - order = 14 - type = "bool" - name = "Cleanup before scaletest" - default = true - description = "Cleanup existing scaletest users and workspaces before the scaletest starts (prepare phase)." - mutable = true - ephemeral = true -} - - -data "coder_parameter" "workspace_template" { - order = 20 - name = "workspace_template" - display_name = "Workspace Template" - description = "The template used for workspace creation." - default = "kubernetes-minimal" - icon = "/emojis/1f4dc.png" # Scroll. - mutable = true - option { - name = "Minimal" - value = "kubernetes-minimal" # Feather. - icon = "/emojis/1fab6.png" - description = "Sized to fit approx. 32 per t2d-standard-8 instance." - } - option { - name = "Small" - value = "kubernetes-small" - icon = "/emojis/1f42d.png" # Mouse. - description = "Provisions a small-sized workspace with no persistent storage." - } - option { - name = "Medium" - value = "kubernetes-medium" - icon = "/emojis/1f436.png" # Dog. - description = "Provisions a medium-sized workspace with no persistent storage." - } - option { - name = "Medium (Greedy)" - value = "kubernetes-medium-greedy" - icon = "/emojis/1f436.png" # Dog. - description = "Provisions a medium-sized workspace with no persistent storage. Greedy agent variant." - } - option { - name = "Large" - value = "kubernetes-large" - icon = "/emojis/1f434.png" # Horse. - description = "Provisions a large-sized workspace with no persistent storage." - } -} - -data "coder_parameter" "num_workspaces" { - order = 21 - type = "number" - name = "Number of workspaces to create" - default = 100 - description = "The scaletest suite will create this number of workspaces." - mutable = true - - validation { - min = 0 - max = 2000 - } -} - -data "coder_parameter" "skip_create_workspaces" { - order = 22 - type = "bool" - name = "DEBUG: Skip creating workspaces" - default = false - description = "Skip creating workspaces (for resuming failed scaletests or debugging)" - mutable = true -} - - -data "coder_parameter" "load_scenarios" { - order = 23 - name = "Load Scenarios" - type = "list(string)" - description = "The load scenarios to run." - mutable = true - ephemeral = true - default = jsonencode([ - "SSH Traffic", - "Web Terminal Traffic", - "App Traffic", - "Dashboard Traffic", - ]) -} - -data "coder_parameter" "load_scenario_run_concurrently" { - order = 24 - name = "Run Load Scenarios Concurrently" - type = "bool" - default = false - description = "Run all load scenarios concurrently, this setting enables the load scenario percentages so that they can be assigned a percentage of 1-100%." - mutable = true -} - -data "coder_parameter" "load_scenario_concurrency_stagger_delay_mins" { - order = 25 - name = "Load Scenario Concurrency Stagger Delay" - type = "number" - default = 3 - description = "The number of minutes to wait between starting each load scenario when run concurrently." - mutable = true -} - -data "coder_parameter" "load_scenario_ssh_traffic_duration" { - order = 30 - name = "SSH Traffic Duration" - type = "number" - description = "The duration of the SSH traffic load scenario in minutes." - mutable = true - default = 30 - validation { - min = 1 - max = 1440 // 24 hours. - } -} - -data "coder_parameter" "load_scenario_ssh_bytes_per_tick" { - order = 31 - name = "SSH Bytes Per Tick" - type = "number" - description = "The number of bytes to send per tick in the SSH traffic load scenario." - mutable = true - default = 1024 - validation { - min = 1 - } -} - -data "coder_parameter" "load_scenario_ssh_tick_interval" { - order = 32 - name = "SSH Tick Interval" - type = "number" - description = "The number of milliseconds between each tick in the SSH traffic load scenario." - mutable = true - default = 100 - validation { - min = 1 - } -} - -data "coder_parameter" "load_scenario_ssh_traffic_percentage" { - order = 33 - name = "SSH Traffic Percentage" - type = "number" - description = "The percentage of workspaces that should be targeted for SSH traffic." - mutable = true - default = 100 - validation { - min = 1 - max = 100 - } -} - -data "coder_parameter" "load_scenario_web_terminal_traffic_duration" { - order = 40 - name = "Web Terminal Traffic Duration" - type = "number" - description = "The duration of the web terminal traffic load scenario in minutes." - mutable = true - default = 30 - validation { - min = 1 - max = 1440 // 24 hours. - } -} - -data "coder_parameter" "load_scenario_web_terminal_bytes_per_tick" { - order = 41 - name = "Web Terminal Bytes Per Tick" - type = "number" - description = "The number of bytes to send per tick in the web terminal traffic load scenario." - mutable = true - default = 1024 - validation { - min = 1 - } -} - -data "coder_parameter" "load_scenario_web_terminal_tick_interval" { - order = 42 - name = "Web Terminal Tick Interval" - type = "number" - description = "The number of milliseconds between each tick in the web terminal traffic load scenario." - mutable = true - default = 100 - validation { - min = 1 - } -} - -data "coder_parameter" "load_scenario_web_terminal_traffic_percentage" { - order = 43 - name = "Web Terminal Traffic Percentage" - type = "number" - description = "The percentage of workspaces that should be targeted for web terminal traffic." - mutable = true - default = 100 - validation { - min = 1 - max = 100 - } -} - -data "coder_parameter" "load_scenario_app_traffic_duration" { - order = 50 - name = "App Traffic Duration" - type = "number" - description = "The duration of the app traffic load scenario in minutes." - mutable = true - default = 30 - validation { - min = 1 - max = 1440 // 24 hours. - } -} - -data "coder_parameter" "load_scenario_app_bytes_per_tick" { - order = 51 - name = "App Bytes Per Tick" - type = "number" - description = "The number of bytes to send per tick in the app traffic load scenario." - mutable = true - default = 1024 - validation { - min = 1 - } -} - -data "coder_parameter" "load_scenario_app_tick_interval" { - order = 52 - name = "App Tick Interval" - type = "number" - description = "The number of milliseconds between each tick in the app traffic load scenario." - mutable = true - default = 100 - validation { - min = 1 - } -} - -data "coder_parameter" "load_scenario_app_traffic_percentage" { - order = 53 - name = "App Traffic Percentage" - type = "number" - description = "The percentage of workspaces that should be targeted for app traffic." - mutable = true - default = 100 - validation { - min = 1 - max = 100 - } -} - -data "coder_parameter" "load_scenario_app_traffic_mode" { - order = 54 - name = "App Traffic Mode" - default = "wsec" - description = "The mode of the app traffic load scenario." - mutable = true - option { - name = "WebSocket Echo" - value = "wsec" - description = "Send traffic to the workspace via the app websocket and read it back." - } - option { - name = "WebSocket Read (Random)" - value = "wsra" - description = "Read traffic from the workspace via the app websocket." - } - option { - name = "WebSocket Write (Discard)" - value = "wsdi" - description = "Send traffic to the workspace via the app websocket." - } -} - -data "coder_parameter" "load_scenario_dashboard_traffic_duration" { - order = 60 - name = "Dashboard Traffic Duration" - type = "number" - description = "The duration of the dashboard traffic load scenario in minutes." - mutable = true - default = 30 - validation { - min = 1 - max = 1440 // 24 hours. - } -} - -data "coder_parameter" "load_scenario_dashboard_traffic_percentage" { - order = 61 - name = "Dashboard Traffic Percentage" - type = "number" - description = "The percentage of users that should be targeted for dashboard traffic." - mutable = true - default = 100 - validation { - min = 1 - max = 100 - } -} - -data "coder_parameter" "load_scenario_baseline_duration" { - order = 100 - name = "Baseline Wait Duration" - type = "number" - description = "The duration to wait before starting a load scenario in minutes." - mutable = true - default = 5 - validation { - min = 0 - max = 60 - } -} - -data "coder_parameter" "greedy_agent" { - order = 200 - type = "bool" - name = "Greedy Agent" - default = false - description = "If true, the agent will attempt to consume all available resources." - mutable = true - ephemeral = true -} - -data "coder_parameter" "greedy_agent_template" { - order = 201 - name = "Greedy Agent Template" - display_name = "Greedy Agent Template" - description = "The template used for the greedy agent workspace (must not be same as workspace template)." - default = "kubernetes-medium-greedy" - icon = "/emojis/1f4dc.png" # Scroll. - mutable = true - option { - name = "Minimal" - value = "kubernetes-minimal" # Feather. - icon = "/emojis/1fab6.png" - description = "Sized to fit approx. 32 per t2d-standard-8 instance." - } - option { - name = "Small" - value = "kubernetes-small" - icon = "/emojis/1f42d.png" # Mouse. - description = "Provisions a small-sized workspace with no persistent storage." - } - option { - name = "Medium" - value = "kubernetes-medium" - icon = "/emojis/1f436.png" # Dog. - description = "Provisions a medium-sized workspace with no persistent storage." - } - option { - name = "Medium (Greedy)" - value = "kubernetes-medium-greedy" - icon = "/emojis/1f436.png" # Dog. - description = "Provisions a medium-sized workspace with no persistent storage. Greedy agent variant." - } - option { - name = "Large" - value = "kubernetes-large" - icon = "/emojis/1f434.png" # Horse. - description = "Provisions a large-sized workspace with no persistent storage." - } -} - -data "coder_parameter" "namespace" { - order = 999 - type = "string" - name = "Namespace" - default = "coder-big" - description = "The Kubernetes namespace to create the scaletest runner resources in." -} - -data "archive_file" "scripts_zip" { - type = "zip" - output_path = "${path.module}/scripts.zip" - source_dir = "${path.module}/scripts" -} - -resource "coder_agent" "main" { - arch = data.coder_provisioner.me.arch - dir = local.scaletest_run_dir - os = "linux" - env = { - VERBOSE : data.coder_parameter.verbose.value ? "1" : "0", - DRY_RUN : data.coder_parameter.dry_run.value ? "1" : "0", - CODER_CONFIG_DIR : "/home/coder/.config/coderv2", - CODER_USER_TOKEN : data.coder_workspace.me.owner_session_token, - CODER_URL : data.coder_workspace.me.access_url, - CODER_USER : data.coder_workspace.me.owner, - CODER_WORKSPACE : data.coder_workspace.me.name, - - # Global scaletest envs that may affect each `coder exp scaletest` invocation. - CODER_SCALETEST_PROMETHEUS_ADDRESS : "0.0.0.0:21112", - CODER_SCALETEST_PROMETHEUS_WAIT : "60s", - CODER_SCALETEST_CONCURRENCY : "${data.coder_parameter.job_concurrency.value}", - CODER_SCALETEST_CLEANUP_CONCURRENCY : "${data.coder_parameter.cleanup_concurrency.value}", - - # Expose as params as well, for reporting (TODO(mafredri): refactor, only have one). - SCALETEST_PARAM_SCALETEST_CONCURRENCY : "${data.coder_parameter.job_concurrency.value}", - SCALETEST_PARAM_SCALETEST_CLEANUP_CONCURRENCY : "${data.coder_parameter.cleanup_concurrency.value}", - - # Local envs passed as arguments to `coder exp scaletest` invocations. - SCALETEST_RUN_ID : local.scaletest_run_id, - SCALETEST_RUN_DIR : local.scaletest_run_dir, - SCALETEST_RUN_START_TIME : local.scaletest_run_start_time, - SCALETEST_PROMETHEUS_START_PORT : "21112", - - # Comment is a scaletest param, but we want to surface it separately from - # the rest, so we use a different name. - SCALETEST_COMMENT : data.coder_parameter.comment.value != "" ? data.coder_parameter.comment.value : "No comment provided", - - SCALETEST_PARAM_TEMPLATE : data.coder_parameter.workspace_template.value, - SCALETEST_PARAM_REPO_BRANCH : data.coder_parameter.repo_branch.value, - SCALETEST_PARAM_NUM_WORKSPACES : data.coder_parameter.num_workspaces.value, - SCALETEST_PARAM_SKIP_CREATE_WORKSPACES : data.coder_parameter.skip_create_workspaces.value ? "1" : "0", - SCALETEST_PARAM_CREATE_CONCURRENCY : "${data.coder_parameter.create_concurrency.value}", - SCALETEST_PARAM_CLEANUP_STRATEGY : data.coder_parameter.cleanup_strategy.value, - SCALETEST_PARAM_CLEANUP_PREPARE : data.coder_parameter.cleanup_prepare.value ? "1" : "0", - SCALETEST_PARAM_LOAD_SCENARIOS : data.coder_parameter.load_scenarios.value, - SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY : data.coder_parameter.load_scenario_run_concurrently.value ? "1" : "0", - SCALETEST_PARAM_LOAD_SCENARIO_CONCURRENCY_STAGGER_DELAY_MINS : "${data.coder_parameter.load_scenario_concurrency_stagger_delay_mins.value}", - SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION : "${data.coder_parameter.load_scenario_ssh_traffic_duration.value}", - SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_BYTES_PER_TICK : "${data.coder_parameter.load_scenario_ssh_bytes_per_tick.value}", - SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_TICK_INTERVAL : "${data.coder_parameter.load_scenario_ssh_tick_interval.value}", - SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_PERCENTAGE : "${data.coder_parameter.load_scenario_ssh_traffic_percentage.value}", - SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION : "${data.coder_parameter.load_scenario_web_terminal_traffic_duration.value}", - SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_BYTES_PER_TICK : "${data.coder_parameter.load_scenario_web_terminal_bytes_per_tick.value}", - SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_TICK_INTERVAL : "${data.coder_parameter.load_scenario_web_terminal_tick_interval.value}", - SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_PERCENTAGE : "${data.coder_parameter.load_scenario_web_terminal_traffic_percentage.value}", - SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION : "${data.coder_parameter.load_scenario_app_traffic_duration.value}", - SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_BYTES_PER_TICK : "${data.coder_parameter.load_scenario_app_bytes_per_tick.value}", - SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_TICK_INTERVAL : "${data.coder_parameter.load_scenario_app_tick_interval.value}", - SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_PERCENTAGE : "${data.coder_parameter.load_scenario_app_traffic_percentage.value}", - SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_MODE : data.coder_parameter.load_scenario_app_traffic_mode.value, - SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_DURATION : "${data.coder_parameter.load_scenario_dashboard_traffic_duration.value}", - SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_PERCENTAGE : "${data.coder_parameter.load_scenario_dashboard_traffic_percentage.value}", - SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION : "${data.coder_parameter.load_scenario_baseline_duration.value}", - SCALETEST_PARAM_GREEDY_AGENT : data.coder_parameter.greedy_agent.value ? "1" : "0", - SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE : data.coder_parameter.greedy_agent_template.value, - - GRAFANA_URL : local.grafana_url, - - SCRIPTS_ZIP : filebase64(data.archive_file.scripts_zip.output_path), - SCRIPTS_DIR : "/tmp/scripts", - } - display_apps { - vscode = false - ssh_helper = false - } - startup_script_timeout = 86400 - shutdown_script_timeout = 7200 - startup_script_behavior = "blocking" - startup_script = file("startup.sh") - shutdown_script = file("shutdown.sh") - - # IDEA(mafredri): It would be pretty cool to define metadata to expect JSON output, each field/item could become a separate metadata item. - # Scaletest metadata. - metadata { - display_name = "Scaletest status" - key = "00_scaletest_status" - script = file("metadata_status.sh") - interval = 1 - timeout = 1 - } - - metadata { - display_name = "Scaletest phase" - key = "01_scaletest_phase" - script = file("metadata_phase.sh") - interval = 1 - timeout = 1 - } - - metadata { - display_name = "Scaletest phase (previous)" - key = "02_scaletest_previous_phase" - script = file("metadata_previous_phase.sh") - interval = 1 - timeout = 1 - } - - # Misc workspace metadata. - metadata { - display_name = "CPU Usage" - key = "80_cpu_usage" - script = "coder stat cpu" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "RAM Usage" - key = "81_ram_usage" - script = "coder stat mem" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "Home Disk" - key = "82_home_disk" - script = "coder stat disk --path $${HOME}" - interval = 60 - timeout = 1 - } - - metadata { - display_name = "CPU Usage (Host)" - key = "83_cpu_usage_host" - script = "coder stat cpu --host" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "Memory Usage (Host)" - key = "84_mem_usage_host" - script = "coder stat mem --host" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "Load Average (Host)" - key = "85_load_host" - # Get load avg scaled by number of cores. - script = <<-EOS - echo "`cat /proc/loadavg | awk '{ print $1 }'` `nproc`" | awk '{ printf "%0.2f", $1/$2 }' - EOS - interval = 60 - timeout = 1 - } -} - -module "code-server" { - source = "https://registry.coder.com/modules/code-server" - agent_id = coder_agent.main.id - install_version = "4.8.3" - folder = local.scaletest_run_dir -} - -module "filebrowser" { - source = "https://registry.coder.com/modules/filebrowser" - agent_id = coder_agent.main.id - folder = local.scaletest_run_dir -} - -resource "coder_app" "grafana" { - agent_id = coder_agent.main.id - slug = "00-grafana" - display_name = "Grafana" - url = "${local.grafana_url}/d/${local.grafana_dashboard_uid}/${local.grafana_dashboard_name}?orgId=1&from=${time_static.start_time.unix * 1000}&to=now" - icon = "https://grafana.com/static/assets/img/fav32.png" - external = true -} - -resource "coder_app" "prometheus" { - agent_id = coder_agent.main.id - slug = "01-prometheus" - display_name = "Prometheus" - url = "https://grafana.corp.tld:9443" - icon = "https://prometheus.io/assets/favicons/favicon-32x32.png" - external = true -} - -resource "coder_app" "manual_cleanup" { - agent_id = coder_agent.main.id - slug = "02-manual-cleanup" - display_name = "Manual cleanup" - icon = "/emojis/1f9f9.png" - command = "/tmp/scripts/cleanup.sh manual" -} - -resource "kubernetes_persistent_volume_claim" "home" { - depends_on = [null_resource.permission_check] - metadata { - name = "${local.workspace_pod_name}-home" - namespace = data.coder_parameter.namespace.value - labels = { - "app.kubernetes.io/name" = "coder-pvc" - "app.kubernetes.io/instance" = "coder-pvc-${lower(data.coder_workspace.me.owner)}-${lower(data.coder_workspace.me.name)}" - "app.kubernetes.io/part-of" = "coder" - // Coder specific labels. - "com.coder.resource" = "true" - "com.coder.workspace.id" = data.coder_workspace.me.id - "com.coder.workspace.name" = data.coder_workspace.me.name - "com.coder.user.id" = data.coder_workspace.me.owner_id - "com.coder.user.username" = data.coder_workspace.me.owner - } - annotations = { - "com.coder.user.email" = data.coder_workspace.me.owner_email - } - } - wait_until_bound = false - spec { - access_modes = ["ReadWriteOnce"] - resources { - requests = { - storage = "${local.home_disk_size}Gi" - } - } - } -} - -resource "kubernetes_pod" "main" { - depends_on = [null_resource.permission_check] - count = data.coder_workspace.me.start_count - metadata { - name = local.workspace_pod_name - namespace = data.coder_parameter.namespace.value - labels = { - "app.kubernetes.io/name" = "coder-workspace" - "app.kubernetes.io/instance" = local.workspace_pod_instance - "app.kubernetes.io/part-of" = "coder" - // Coder specific labels. - "com.coder.resource" = "true" - "com.coder.workspace.id" = data.coder_workspace.me.id - "com.coder.workspace.name" = data.coder_workspace.me.name - "com.coder.user.id" = data.coder_workspace.me.owner_id - "com.coder.user.username" = data.coder_workspace.me.owner - } - annotations = { - "com.coder.user.email" = data.coder_workspace.me.owner_email - } - } - # Set the pod delete timeout to termination_grace_period_seconds + 1m. - timeouts { - delete = "${(local.workspace_pod_termination_grace_period_seconds + 120)}s" - } - spec { - security_context { - run_as_user = "1000" - fs_group = "1000" - } - - # Allow this pod to perform scale tests. - service_account_name = local.service_account_name - - # Allow the coder agent to perform graceful shutdown and cleanup of - # scaletest resources. We add an extra minute so ensure work - # completion is prioritized over timeout. - termination_grace_period_seconds = local.workspace_pod_termination_grace_period_seconds + 60 - - container { - name = "dev" - image = "gcr.io/coder-dev-1/scaletest-runner:latest" - image_pull_policy = "Always" - command = ["sh", "-c", coder_agent.main.init_script] - security_context { - run_as_user = "1000" - } - env { - name = "CODER_AGENT_TOKEN" - value = coder_agent.main.token - } - env { - name = "CODER_AGENT_LOG_DIR" - value = "${local.scaletest_run_dir}/logs" - } - env { - name = "GRAFANA_API_TOKEN" - value_from { - secret_key_ref { - name = data.kubernetes_secret.grafana_editor_api_token.metadata[0].name - key = "token" - } - } - } - env { - name = "SLACK_WEBHOOK_URL" - value_from { - secret_key_ref { - name = data.kubernetes_secret.slack_scaletest_notifications_webhook_url.metadata[0].name - key = "url" - } - } - } - resources { - requests = { - "cpu" = "250m" - "memory" = "512Mi" - } - } - volume_mount { - mount_path = "/home/coder" - name = "home" - read_only = false - } - dynamic "port" { - for_each = data.coder_parameter.load_scenario_run_concurrently.value ? jsondecode(data.coder_parameter.load_scenarios.value) : [""] - iterator = it - content { - container_port = 21112 + it.key - name = "prom-http${it.key}" - protocol = "TCP" - } - } - } - - volume { - name = "home" - persistent_volume_claim { - claim_name = kubernetes_persistent_volume_claim.home.metadata.0.name - read_only = false - } - } - - affinity { - pod_anti_affinity { - // This affinity attempts to spread out all workspace pods evenly across - // nodes. - preferred_during_scheduling_ignored_during_execution { - weight = 1 - pod_affinity_term { - topology_key = "kubernetes.io/hostname" - label_selector { - match_expressions { - key = "app.kubernetes.io/name" - operator = "In" - values = ["coder-workspace"] - } - } - } - } - } - node_affinity { - required_during_scheduling_ignored_during_execution { - node_selector_term { - match_expressions { - key = "cloud.google.com/gke-nodepool" - operator = "In" - values = ["big-workspacetraffic"] # Avoid placing on the same nodes as scaletest workspaces. - } - } - } - } - } - } -} - -data "kubernetes_secret" "grafana_editor_api_token" { - metadata { - name = "grafana-editor-api-token" - namespace = data.coder_parameter.namespace.value - } -} - -data "kubernetes_secret" "slack_scaletest_notifications_webhook_url" { - metadata { - name = "slack-scaletest-notifications-webhook-url" - namespace = data.coder_parameter.namespace.value - } -} - -resource "kubernetes_manifest" "pod_monitor" { - count = data.coder_workspace.me.start_count - manifest = { - apiVersion = "monitoring.coreos.com/v1" - kind = "PodMonitor" - metadata = { - namespace = data.coder_parameter.namespace.value - name = "podmonitor-${local.workspace_pod_name}" - } - spec = { - selector = { - matchLabels = { - "app.kubernetes.io/instance" : local.workspace_pod_instance - } - } - podMetricsEndpoints = [ - # NOTE(mafredri): We could add more information here by including the - # scenario name in the port name (although it's limited to 15 chars so - # it needs to be short). That said, someone looking at the stats can - # assume that there's a 1-to-1 mapping between scenario# and port. - for i, _ in data.coder_parameter.load_scenario_run_concurrently.value ? jsondecode(data.coder_parameter.load_scenarios.value) : [""] : { - port = "prom-http${i}" - interval = "15s" - } - ] - } - } -} diff --git a/examples/scaletests/scaletest-runner/metadata_phase.sh b/examples/scaletests/scaletest-runner/metadata_phase.sh deleted file mode 100755 index 755a8ba084db7..0000000000000 --- a/examples/scaletests/scaletest-runner/metadata_phase.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/bash - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -get_phase diff --git a/examples/scaletests/scaletest-runner/metadata_previous_phase.sh b/examples/scaletests/scaletest-runner/metadata_previous_phase.sh deleted file mode 100755 index c858687b72ad8..0000000000000 --- a/examples/scaletests/scaletest-runner/metadata_previous_phase.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/bash - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" 2>/dev/null || return - -get_previous_phase diff --git a/examples/scaletests/scaletest-runner/metadata_status.sh b/examples/scaletests/scaletest-runner/metadata_status.sh deleted file mode 100755 index 8ec45f0875c1d..0000000000000 --- a/examples/scaletests/scaletest-runner/metadata_status.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/bash - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" 2>/dev/null || return - -get_status diff --git a/examples/scaletests/scaletest-runner/scripts/cleanup.sh b/examples/scaletests/scaletest-runner/scripts/cleanup.sh deleted file mode 100755 index c80982497b5e9..0000000000000 --- a/examples/scaletests/scaletest-runner/scripts/cleanup.sh +++ /dev/null @@ -1,62 +0,0 @@ -#!/bin/bash -set -euo pipefail - -[[ $VERBOSE == 1 ]] && set -x - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -event=${1:-} - -if [[ -z $event ]]; then - event=manual -fi - -do_cleanup() { - start_phase "Cleanup (${event})" - coder exp scaletest cleanup \ - --cleanup-job-timeout 2h \ - --cleanup-timeout 5h | - tee "${SCALETEST_RESULTS_DIR}/cleanup-${event}.txt" - end_phase -} - -do_scaledown() { - start_phase "Scale down provisioners (${event})" - maybedryrun "$DRY_RUN" kubectl scale deployment/coder-provisioner --replicas 1 - maybedryrun "$DRY_RUN" kubectl rollout status deployment/coder-provisioner - end_phase -} - -case "${event}" in -manual) - echo -n 'WARNING: This will clean up all scaletest resources, continue? (y/n) ' - read -r -n 1 - if [[ $REPLY != [yY] ]]; then - echo $'\nAborting...' - exit 1 - fi - echo - - do_cleanup - do_scaledown - - echo 'Press any key to continue...' - read -s -r -n 1 - ;; -prepare) - do_cleanup - ;; -on_stop) ;; # Do nothing, handled by "shutdown". -always | on_success | on_error | shutdown) - do_cleanup - do_scaledown - ;; -shutdown_scale_down_only) - do_scaledown - ;; -*) - echo "Unknown event: ${event}" >&2 - exit 1 - ;; -esac diff --git a/examples/scaletests/scaletest-runner/scripts/lib.sh b/examples/scaletests/scaletest-runner/scripts/lib.sh deleted file mode 100644 index 868dd5c078d2e..0000000000000 --- a/examples/scaletests/scaletest-runner/scripts/lib.sh +++ /dev/null @@ -1,313 +0,0 @@ -#!/bin/bash -set -euo pipefail - -# Only source this script once, this env comes from sourcing -# scripts/lib.sh from coder/coder below. -if [[ ${SCRIPTS_LIB_IS_SOURCED:-0} == 1 ]]; then - return 0 -fi - -# Source scripts/lib.sh from coder/coder for common functions. -# shellcheck source=scripts/lib.sh -. "${HOME}/coder/scripts/lib.sh" - -# Make shellcheck happy. -DRY_RUN=${DRY_RUN:-0} - -# Environment variables shared between scripts. -SCALETEST_STATE_DIR="${SCALETEST_RUN_DIR}/state" -SCALETEST_PHASE_FILE="${SCALETEST_STATE_DIR}/phase" -# shellcheck disable=SC2034 -SCALETEST_RESULTS_DIR="${SCALETEST_RUN_DIR}/results" -SCALETEST_LOGS_DIR="${SCALETEST_RUN_DIR}/logs" -SCALETEST_PPROF_DIR="${SCALETEST_RUN_DIR}/pprof" -# https://github.com/kubernetes/kubernetes/issues/72501 :-( -SCALETEST_CODER_BINARY="/tmp/coder-full-${SCALETEST_RUN_ID}" - -mkdir -p "${SCALETEST_STATE_DIR}" "${SCALETEST_RESULTS_DIR}" "${SCALETEST_LOGS_DIR}" "${SCALETEST_PPROF_DIR}" - -coder() { - if [[ ! -x "${SCALETEST_CODER_BINARY}" ]]; then - log "Fetching full coder binary..." - fetch_coder_full - fi - maybedryrun "${DRY_RUN}" "${SCALETEST_CODER_BINARY}" "${@}" -} - -show_json() { - maybedryrun "${DRY_RUN}" jq 'del(.. | .logs?)' "${1}" -} - -set_status() { - dry_run= - if [[ ${DRY_RUN} == 1 ]]; then - dry_run=" (dry-run)" - fi - prev_status=$(get_status) - if [[ ${prev_status} != *"Not started"* ]]; then - annotate_grafana_end "status" "Status: ${prev_status}" - fi - echo "$(date -Ins) ${*}${dry_run}" >>"${SCALETEST_STATE_DIR}/status" - - annotate_grafana "status" "Status: ${*}" - - status_lower=$(tr '[:upper:]' '[:lower:]' <<<"${*}") - set_pod_status_annotation "${status_lower}" -} -lock_status() { - chmod 0440 "${SCALETEST_STATE_DIR}/status" -} -get_status() { - # Order of importance (reverse of creation). - if [[ -f "${SCALETEST_STATE_DIR}/status" ]]; then - tail -n1 "${SCALETEST_STATE_DIR}/status" | cut -d' ' -f2- - else - echo "Not started" - fi -} - -phase_num=0 -start_phase() { - # This may be incremented from another script, so we read it every time. - if [[ -f "${SCALETEST_PHASE_FILE}" ]]; then - phase_num=$(grep -c START: "${SCALETEST_PHASE_FILE}") - fi - phase_num=$((phase_num + 1)) - log "Start phase ${phase_num}: ${*}" - echo "$(date -Ins) START:${phase_num}: ${*}" >>"${SCALETEST_PHASE_FILE}" - - GRAFANA_EXTRA_TAGS="${PHASE_TYPE:-phase-default}" annotate_grafana "phase" "Phase ${phase_num}: ${*}" -} -end_phase() { - phase=$(tail -n 1 "${SCALETEST_PHASE_FILE}" | grep "START:${phase_num}:" | cut -d' ' -f3-) - if [[ -z ${phase} ]]; then - log "BUG: Could not find start phase ${phase_num} in ${SCALETEST_PHASE_FILE}" - return 1 - fi - log "End phase ${phase_num}: ${phase}" - echo "$(date -Ins) END:${phase_num}: ${phase}" >>"${SCALETEST_PHASE_FILE}" - - GRAFANA_EXTRA_TAGS="${PHASE_TYPE:-phase-default}" GRAFANA_ADD_TAGS="${PHASE_ADD_TAGS:-}" annotate_grafana_end "phase" "Phase ${phase_num}: ${phase}" -} -get_phase() { - if [[ -f "${SCALETEST_PHASE_FILE}" ]]; then - phase_raw=$(tail -n1 "${SCALETEST_PHASE_FILE}") - phase=$(echo "${phase_raw}" | cut -d' ' -f3-) - if [[ ${phase_raw} == *"END:"* ]]; then - phase+=" [done]" - fi - echo "${phase}" - else - echo "None" - fi -} -get_previous_phase() { - if [[ -f "${SCALETEST_PHASE_FILE}" ]] && [[ $(grep -c START: "${SCALETEST_PHASE_FILE}") -gt 1 ]]; then - grep START: "${SCALETEST_PHASE_FILE}" | tail -n2 | head -n1 | cut -d' ' -f3- - else - echo "None" - fi -} - -annotate_grafana() { - local tags=${1} text=${2} start=${3:-$(($(date +%s) * 1000))} - local json resp id - - if [[ -z $tags ]]; then - tags="scaletest,runner" - else - tags="scaletest,runner,${tags}" - fi - if [[ -n ${GRAFANA_EXTRA_TAGS:-} ]]; then - tags="${tags},${GRAFANA_EXTRA_TAGS}" - fi - - log "Annotating Grafana (start=${start}): ${text} [${tags}]" - - json="$( - jq \ - --argjson time "${start}" \ - --arg text "${text}" \ - --arg tags "${tags}" \ - '{time: $time, tags: $tags | split(","), text: $text}' <<<'{}' - )" - if [[ ${DRY_RUN} == 1 ]]; then - echo "FAKEID:${tags}:${text}:${start}" >>"${SCALETEST_STATE_DIR}/grafana-annotations" - log "Would have annotated Grafana, data=${json}" - return 0 - fi - if ! resp="$( - curl -sSL \ - --insecure \ - -H "Authorization: Bearer ${GRAFANA_API_TOKEN}" \ - -H "Content-Type: application/json" \ - -d "${json}" \ - "${GRAFANA_URL}/api/annotations" - )"; then - # Don't abort scaletest just because we couldn't annotate Grafana. - log "Failed to annotate Grafana: ${resp}" - return 0 - fi - - if [[ $(jq -r '.message' <<<"${resp}") != "Annotation added" ]]; then - log "Failed to annotate Grafana: ${resp}" - return 0 - fi - - log "Grafana annotation added!" - - id="$(jq -r '.id' <<<"${resp}")" - echo "${id}:${tags}:${text}:${start}" >>"${SCALETEST_STATE_DIR}/grafana-annotations" -} -annotate_grafana_end() { - local tags=${1} text=${2} start=${3:-} end=${4:-$(($(date +%s) * 1000))} - local id json resp - - if [[ -z $tags ]]; then - tags="scaletest,runner" - else - tags="scaletest,runner,${tags}" - fi - if [[ -n ${GRAFANA_EXTRA_TAGS:-} ]]; then - tags="${tags},${GRAFANA_EXTRA_TAGS}" - fi - - if ! id=$(grep ":${tags}:${text}:${start}" "${SCALETEST_STATE_DIR}/grafana-annotations" | sort -n | tail -n1 | cut -d: -f1); then - log "NOTICE: Could not find Grafana annotation to end: '${tags}:${text}:${start}', skipping..." - return 0 - fi - - log "Updating Grafana annotation (end=${end}): ${text} [${tags}, add=${GRAFANA_ADD_TAGS:-}]" - - if [[ -n ${GRAFANA_ADD_TAGS:-} ]]; then - json="$( - jq -n \ - --argjson timeEnd "${end}" \ - --arg tags "${tags},${GRAFANA_ADD_TAGS}" \ - '{timeEnd: $timeEnd, tags: $tags | split(",")}' - )" - else - json="$( - jq -n \ - --argjson timeEnd "${end}" \ - '{timeEnd: $timeEnd}' - )" - fi - if [[ ${DRY_RUN} == 1 ]]; then - log "Would have patched Grafana annotation: id=${id}, data=${json}" - return 0 - fi - if ! resp="$( - curl -sSL \ - --insecure \ - -H "Authorization: Bearer ${GRAFANA_API_TOKEN}" \ - -H "Content-Type: application/json" \ - -X PATCH \ - -d "${json}" \ - "${GRAFANA_URL}/api/annotations/${id}" - )"; then - # Don't abort scaletest just because we couldn't annotate Grafana. - log "Failed to annotate Grafana end: ${resp}" - return 0 - fi - - if [[ $(jq -r '.message' <<<"${resp}") != "Annotation patched" ]]; then - log "Failed to annotate Grafana end: ${resp}" - return 0 - fi - - log "Grafana annotation patched!" -} - -wait_baseline() { - s=${1:-2} - PHASE_TYPE="phase-wait" start_phase "Waiting ${s}m to establish baseline" - maybedryrun "$DRY_RUN" sleep $((s * 60)) - PHASE_TYPE="phase-wait" end_phase -} - -get_appearance() { - session_token=$CODER_USER_TOKEN - if [[ -f "${CODER_CONFIG_DIR}/session" ]]; then - session_token="$(<"${CODER_CONFIG_DIR}/session")" - fi - curl -sSL \ - -H "Coder-Session-Token: ${session_token}" \ - "${CODER_URL}/api/v2/appearance" -} -set_appearance() { - local json=$1 color=$2 message=$3 - - session_token=$CODER_USER_TOKEN - if [[ -f "${CODER_CONFIG_DIR}/session" ]]; then - session_token="$(<"${CODER_CONFIG_DIR}/session")" - fi - newjson="$( - jq \ - --arg color "${color}" \ - --arg message "${message}" \ - '. | .service_banner.message |= $message | .service_banner.background_color |= $color' <<<"${json}" - )" - maybedryrun "${DRY_RUN}" curl -sSL \ - -X PUT \ - -H 'Content-Type: application/json' \ - -H "Coder-Session-Token: ${session_token}" \ - --data "${newjson}" \ - "${CODER_URL}/api/v2/appearance" -} - -namespace() { - cat /var/run/secrets/kubernetes.io/serviceaccount/namespace -} -coder_pods() { - kubectl get pods \ - --namespace "$(namespace)" \ - --selector "app.kubernetes.io/name=coder,app.kubernetes.io/part-of=coder" \ - --output jsonpath='{.items[*].metadata.name}' -} - -# fetch_coder_full fetches the full (non-slim) coder binary from one of the coder pods -# running in the same namespace as the current pod. -fetch_coder_full() { - if [[ -x "${SCALETEST_CODER_BINARY}" ]]; then - log "Full Coder binary already exists at ${SCALETEST_CODER_BINARY}" - return 0 - fi - ns=$(namespace) - if [[ -z "${ns}" ]]; then - log "Could not determine namespace!" - return 1 - fi - log "Namespace from serviceaccount token is ${ns}" - pods=$(coder_pods) - if [[ -z ${pods} ]]; then - log "Could not find coder pods!" - return 1 - fi - pod=$(cut -d ' ' -f 1 <<<"${pods}") - if [[ -z ${pod} ]]; then - log "Could not find coder pod!" - return 1 - fi - log "Fetching full Coder binary from ${pod}" - # We need --retries due to https://github.com/kubernetes/kubernetes/issues/60140 :( - maybedryrun "${DRY_RUN}" kubectl \ - --namespace "${ns}" \ - cp \ - --container coder \ - --retries 10 \ - "${pod}:/opt/coder" "${SCALETEST_CODER_BINARY}" - maybedryrun "${DRY_RUN}" chmod +x "${SCALETEST_CODER_BINARY}" - log "Full Coder binary downloaded to ${SCALETEST_CODER_BINARY}" -} - -# set_pod_status_annotation annotates the currently running pod with the key -# com.coder.scaletest.status. It will overwrite the previous status. -set_pod_status_annotation() { - if [[ $# -ne 1 ]]; then - log "BUG: Must specify an annotation value" - return 1 - else - maybedryrun "${DRY_RUN}" kubectl --namespace "$(namespace)" annotate pod "$(hostname)" "com.coder.scaletest.status=$1" --overwrite - fi -} diff --git a/examples/scaletests/scaletest-runner/scripts/prepare.sh b/examples/scaletests/scaletest-runner/scripts/prepare.sh deleted file mode 100755 index 90b2dd05f945f..0000000000000 --- a/examples/scaletests/scaletest-runner/scripts/prepare.sh +++ /dev/null @@ -1,67 +0,0 @@ -#!/bin/bash -set -euo pipefail - -[[ $VERBOSE == 1 ]] && set -x - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -mkdir -p "${SCALETEST_STATE_DIR}" -mkdir -p "${SCALETEST_RESULTS_DIR}" - -log "Preparing scaletest workspace environment..." -set_status Preparing - -log "Compressing previous run logs (if applicable)..." -mkdir -p "${HOME}/archive" -for dir in "${HOME}/scaletest-"*; do - if [[ ${dir} = "${SCALETEST_RUN_DIR}" ]]; then - continue - fi - if [[ -d ${dir} ]]; then - name="$(basename "${dir}")" - ( - cd "$(dirname "${dir}")" - ZSTD_CLEVEL=12 maybedryrun "$DRY_RUN" tar --zstd -cf "${HOME}/archive/${name}.tar.zst" "${name}" - ) - maybedryrun "$DRY_RUN" rm -rf "${dir}" - fi -done - -log "Creating coder CLI token (needed for cleanup during shutdown)..." - -mkdir -p "${CODER_CONFIG_DIR}" -echo -n "${CODER_URL}" >"${CODER_CONFIG_DIR}/url" - -set +x # Avoid logging the token. -# Persist configuration for shutdown script too since the -# owner token is invalidated immediately on workspace stop. -export CODER_SESSION_TOKEN=${CODER_USER_TOKEN} -coder tokens delete scaletest_runner >/dev/null 2>&1 || true -# TODO(mafredri): Set TTL? This could interfere with delayed stop though. -token=$(coder tokens create --name scaletest_runner) -if [[ $DRY_RUN == 1 ]]; then - token=${CODER_SESSION_TOKEN} -fi -unset CODER_SESSION_TOKEN -echo -n "${token}" >"${CODER_CONFIG_DIR}/session" -[[ $VERBOSE == 1 ]] && set -x # Restore logging (if enabled). - -if [[ ${SCALETEST_PARAM_CLEANUP_PREPARE} == 1 ]]; then - log "Cleaning up from previous runs (if applicable)..." - "${SCRIPTS_DIR}/cleanup.sh" prepare -fi - -log "Preparation complete!" - -PROVISIONER_REPLICA_COUNT="${SCALETEST_PARAM_CREATE_CONCURRENCY:-0}" -if [[ "${PROVISIONER_REPLICA_COUNT}" -eq 0 ]]; then - # TODO(Cian): what is a good default value here? - echo "Setting PROVISIONER_REPLICA_COUNT to 10 since SCALETEST_PARAM_CREATE_CONCURRENCY is 0" - PROVISIONER_REPLICA_COUNT=10 -fi -log "Scaling up provisioners to ${PROVISIONER_REPLICA_COUNT}..." -maybedryrun "$DRY_RUN" kubectl scale deployment/coder-provisioner \ - --replicas "${PROVISIONER_REPLICA_COUNT}" -log "Waiting for provisioners to scale up..." -maybedryrun "$DRY_RUN" kubectl rollout status deployment/coder-provisioner diff --git a/examples/scaletests/scaletest-runner/scripts/report.sh b/examples/scaletests/scaletest-runner/scripts/report.sh deleted file mode 100755 index 0c6a5059ba37d..0000000000000 --- a/examples/scaletests/scaletest-runner/scripts/report.sh +++ /dev/null @@ -1,109 +0,0 @@ -#!/bin/bash -set -euo pipefail - -[[ $VERBOSE == 1 ]] && set -x - -status=$1 -shift - -case "${status}" in -started) ;; -completed) ;; -failed) ;; -*) - echo "Unknown status: ${status}" >&2 - exit 1 - ;; -esac - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -# NOTE(mafredri): API returns HTML if we accidentally use `...//api` vs `.../api`. -# https://github.com/coder/coder/issues/9877 -CODER_URL="${CODER_URL%/}" -buildinfo="$(curl -sSL "${CODER_URL}/api/v2/buildinfo")" -server_version="$(jq -r '.version' <<<"${buildinfo}")" -server_version_commit="$(jq -r '.external_url' <<<"${buildinfo}")" - -# Since `coder show` doesn't support JSON output, we list the workspaces instead. -# Use `command` here to bypass dry run. -workspace_json="$( - command coder list --all --output json | - jq --arg workspace "${CODER_WORKSPACE}" --arg user "${CODER_USER}" 'map(select(.name == $workspace) | select(.owner_name == $user)) | .[0]' -)" -owner_name="$(jq -r '.latest_build.workspace_owner_name' <<<"${workspace_json}")" -workspace_name="$(jq -r '.latest_build.workspace_name' <<<"${workspace_json}")" -initiator_name="$(jq -r '.latest_build.initiator_name' <<<"${workspace_json}")" - -bullet='•' -app_urls_raw="$(jq -r '.latest_build.resources[].agents[]?.apps | map(select(.external == true)) | .[] | .display_name, .url' <<<"${workspace_json}")" -app_urls=() -while read -r app_name; do - read -r app_url - bold= - if [[ ${status} != started ]] && [[ ${app_url} = *to=now* ]]; then - # Update Grafana URL with end stamp and make bold. - app_url="${app_url//to=now/to=$(($(date +%s) * 1000))}" - bold='*' - fi - app_urls+=("${bullet} ${bold}${app_name}${bold}: ${app_url}") -done <<<"${app_urls_raw}" - -params=() -header= - -case "${status}" in -started) - created_at="$(jq -r '.latest_build.created_at' <<<"${workspace_json}")" - params=("${bullet} Options:") - while read -r param; do - params+=(" ${bullet} ${param}") - done <<<"$(jq -r '.latest_build.resources[].agents[]?.environment_variables | to_entries | map(select(.key | startswith("SCALETEST_PARAM_"))) | .[] | "`\(.key)`: `\(.value)`"' <<<"${workspace_json}")" - - header="New scaletest started at \`${created_at}\` by \`${initiator_name}\` on ${CODER_URL} (<${server_version_commit}|\`${server_version}\`>)." - ;; -completed) - completed_at=$(date -Iseconds) - header="Scaletest completed at \`${completed_at}\` (started by \`${initiator_name}\`) on ${CODER_URL} (<${server_version_commit}|\`${server_version}\`>)." - ;; -failed) - failed_at=$(date -Iseconds) - header="Scaletest failed at \`${failed_at}\` (started by \`${initiator_name}\`) on ${CODER_URL} (<${server_version_commit}|\`${server_version}\`>)." - ;; -*) - echo "Unknown status: ${status}" >&2 - exit 1 - ;; -esac - -text_arr=( - "${header}" - "" - "${bullet} *Comment:* ${SCALETEST_COMMENT}" - "${bullet} Workspace (runner): ${CODER_URL}/@${owner_name}/${workspace_name}" - "${bullet} Run ID: ${SCALETEST_RUN_ID}" - "${app_urls[@]}" - "${params[@]}" -) - -text= -for field in "${text_arr[@]}"; do - text+="${field}"$'\n' -done - -json=$( - jq -n --arg text "${text}" '{ - blocks: [ - { - "type": "section", - "text": { - "type": "mrkdwn", - "text": $text - } - } - ] - }' -) - -maybedryrun "${DRY_RUN}" curl -X POST -H 'Content-type: application/json' --data "${json}" "${SLACK_WEBHOOK_URL}" diff --git a/examples/scaletests/scaletest-runner/scripts/run.sh b/examples/scaletests/scaletest-runner/scripts/run.sh deleted file mode 100755 index 47a6042a18598..0000000000000 --- a/examples/scaletests/scaletest-runner/scripts/run.sh +++ /dev/null @@ -1,369 +0,0 @@ -#!/bin/bash -set -euo pipefail - -[[ $VERBOSE == 1 ]] && set -x - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -mapfile -t scaletest_load_scenarios < <(jq -r '. | join ("\n")' <<<"${SCALETEST_PARAM_LOAD_SCENARIOS}") -export SCALETEST_PARAM_LOAD_SCENARIOS=("${scaletest_load_scenarios[@]}") - -log "Running scaletest..." -set_status Running - -start_phase "Creating workspaces" -if [[ ${SCALETEST_PARAM_SKIP_CREATE_WORKSPACES} == 0 ]]; then - # Note that we allow up to 5 failures to bring up the workspace, since - # we're creating a lot of workspaces at once and some of them may fail - # due to network issues or other transient errors. - coder exp scaletest create-workspaces \ - --retry 5 \ - --count "${SCALETEST_PARAM_NUM_WORKSPACES}" \ - --template "${SCALETEST_PARAM_TEMPLATE}" \ - --concurrency "${SCALETEST_PARAM_CREATE_CONCURRENCY}" \ - --timeout 5h \ - --job-timeout 5h \ - --no-cleanup \ - --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json" - show_json "${SCALETEST_RESULTS_DIR}/create-workspaces.json" -fi -end_phase - -wait_baseline "${SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION}" - -non_greedy_agent_traffic_args=() -if [[ ${SCALETEST_PARAM_GREEDY_AGENT} != 1 ]]; then - greedy_agent_traffic() { :; } -else - echo "WARNING: Greedy agent enabled, this may cause the load tests to fail." >&2 - non_greedy_agent_traffic_args=( - # Let the greedy agent traffic command be scraped. - # --scaletest-prometheus-address 0.0.0.0:21113 - # --trace=false - ) - - annotate_grafana greedy_agent "Create greedy agent" - - coder exp scaletest create-workspaces \ - --count 1 \ - --template "${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE}" \ - --concurrency 1 \ - --timeout 5h \ - --job-timeout 5h \ - --no-cleanup \ - --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces-greedy-agent.json" - - wait_baseline "${SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION}" - - greedy_agent_traffic() { - local timeout=${1} scenario=${2} - # Run the greedy test for ~1/3 of the timeout. - delay=$((timeout * 60 / 3)) - - local type=web-terminal - args=() - if [[ ${scenario} == "SSH Traffic" ]]; then - type=ssh - args+=(--ssh) - fi - - sleep "${delay}" - annotate_grafana greedy_agent "${scenario}: Greedy agent traffic" - - # Produce load at about 1000MB/s (25MB/40ms). - set +e - coder exp scaletest workspace-traffic \ - --template "${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE}" \ - --bytes-per-tick $((1024 * 1024 * 25)) \ - --tick-interval 40ms \ - --timeout "$((delay))s" \ - --job-timeout "$((delay))s" \ - --output json:"${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json" \ - --scaletest-prometheus-address 0.0.0.0:21113 \ - --trace=false \ - "${args[@]}" - status=${?} - show_json "${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json" - - export GRAFANA_ADD_TAGS= - if [[ ${status} != 0 ]]; then - GRAFANA_ADD_TAGS=error - fi - annotate_grafana_end greedy_agent "${scenario}: Greedy agent traffic" - - return "${status}" - } -fi - -run_scenario_cmd() { - local scenario=${1} - shift - local command=("$@") - - set +e - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then - annotate_grafana scenario "Load scenario: ${scenario}" - fi - "${command[@]}" - status=${?} - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then - export GRAFANA_ADD_TAGS= - if [[ ${status} != 0 ]]; then - GRAFANA_ADD_TAGS=error - fi - annotate_grafana_end scenario "Load scenario: ${scenario}" - fi - exit "${status}" -} - -declare -a pids=() -declare -A pid_to_scenario=() -declare -A failed=() -target_start=0 -target_end=-1 - -if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then - start_phase "Load scenarios: ${SCALETEST_PARAM_LOAD_SCENARIOS[*]}" -fi -for scenario in "${SCALETEST_PARAM_LOAD_SCENARIOS[@]}"; do - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - start_phase "Load scenario: ${scenario}" - fi - - set +e - status=0 - case "${scenario}" in - "SSH Traffic") - greedy_agent_traffic "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION}" "${scenario}" & - greedy_agent_traffic_pid=$! - - target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') - target_end=$((target_start + target_count)) - if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then - log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." - target_start=0 - target_end=${target_count} - fi - run_scenario_cmd "${scenario}" coder exp scaletest workspace-traffic \ - --template "${SCALETEST_PARAM_TEMPLATE}" \ - --ssh \ - --bytes-per-tick "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_BYTES_PER_TICK}" \ - --tick-interval "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_TICK_INTERVAL}ms" \ - --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION}m" \ - --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_SSH_TRAFFIC_DURATION}m30s" \ - --output json:"${SCALETEST_RESULTS_DIR}/traffic-ssh.json" \ - --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ - --target-workspaces "${target_start}:${target_end}" \ - "${non_greedy_agent_traffic_args[@]}" & - pids+=($!) - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - show_json "${SCALETEST_RESULTS_DIR}/traffic-ssh.json" - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - wait "${greedy_agent_traffic_pid}" - status2=$? - if [[ ${status} == 0 ]]; then - status=${status2} - fi - ;; - "Web Terminal Traffic") - greedy_agent_traffic "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION}" "${scenario}" & - greedy_agent_traffic_pid=$! - - target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') - target_end=$((target_start + target_count)) - if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then - log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." - target_start=0 - target_end=${target_count} - fi - run_scenario_cmd "${scenario}" coder exp scaletest workspace-traffic \ - --template "${SCALETEST_PARAM_TEMPLATE}" \ - --bytes-per-tick "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_BYTES_PER_TICK}" \ - --tick-interval "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_TICK_INTERVAL}ms" \ - --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION}m" \ - --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_WEB_TERMINAL_TRAFFIC_DURATION}m30s" \ - --output json:"${SCALETEST_RESULTS_DIR}/traffic-web-terminal.json" \ - --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ - --target-workspaces "${target_start}:${target_end}" \ - "${non_greedy_agent_traffic_args[@]}" & - pids+=($!) - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - show_json "${SCALETEST_RESULTS_DIR}/traffic-web-terminal.json" - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - wait "${greedy_agent_traffic_pid}" - status2=$? - if [[ ${status} == 0 ]]; then - status=${status2} - fi - ;; - "App Traffic") - greedy_agent_traffic "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION}" "${scenario}" & - greedy_agent_traffic_pid=$! - - target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') - target_end=$((target_start + target_count)) - if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then - log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." - target_start=0 - target_end=${target_count} - fi - run_scenario_cmd "${scenario}" coder exp scaletest workspace-traffic \ - --template "${SCALETEST_PARAM_TEMPLATE}" \ - --bytes-per-tick "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_BYTES_PER_TICK}" \ - --tick-interval "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_TICK_INTERVAL}ms" \ - --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION}m" \ - --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_DURATION}m30s" \ - --output json:"${SCALETEST_RESULTS_DIR}/traffic-app.json" \ - --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ - --app "${SCALETEST_PARAM_LOAD_SCENARIO_APP_TRAFFIC_MODE}" \ - --target-workspaces "${target_start}:${target_end}" \ - "${non_greedy_agent_traffic_args[@]}" & - pids+=($!) - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - show_json "${SCALETEST_RESULTS_DIR}/traffic-app.json" - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - wait "${greedy_agent_traffic_pid}" - status2=$? - if [[ ${status} == 0 ]]; then - status=${status2} - fi - ;; - "Dashboard Traffic") - target_count=$(jq -n --argjson percentage "${SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_PERCENTAGE}" --argjson num_workspaces "${SCALETEST_PARAM_NUM_WORKSPACES}" '$percentage / 100 * $num_workspaces | floor') - target_end=$((target_start + target_count)) - if [[ ${target_end} -gt ${SCALETEST_PARAM_NUM_WORKSPACES} ]]; then - log "WARNING: Target count ${target_end} exceeds number of workspaces ${SCALETEST_PARAM_NUM_WORKSPACES}, using ${SCALETEST_PARAM_NUM_WORKSPACES} instead." - target_start=0 - target_end=${target_count} - fi - # TODO: Remove this once the dashboard traffic command is fixed, - # (i.e. once images are no longer dumped into PWD). - mkdir -p dashboard - pushd dashboard - run_scenario_cmd "${scenario}" coder exp scaletest dashboard \ - --timeout "${SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_DURATION}m" \ - --job-timeout "${SCALETEST_PARAM_LOAD_SCENARIO_DASHBOARD_TRAFFIC_DURATION}m30s" \ - --output json:"${SCALETEST_RESULTS_DIR}/traffic-dashboard.json" \ - --scaletest-prometheus-address "0.0.0.0:${SCALETEST_PROMETHEUS_START_PORT}" \ - --target-users "${target_start}:${target_end}" \ - >"${SCALETEST_RESULTS_DIR}/traffic-dashboard-output.log" & - pids+=($!) - popd - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - show_json "${SCALETEST_RESULTS_DIR}/traffic-dashboard.json" - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - ;; - - # Debug scenarios, for testing the runner. - "debug:greedy_agent_traffic") - greedy_agent_traffic 10 "${scenario}" & - pids+=($!) - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - ;; - "debug:success") - { - maybedryrun "$DRY_RUN" sleep 10 - true - } & - pids+=($!) - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - ;; - "debug:error") - { - maybedryrun "$DRY_RUN" sleep 10 - false - } & - pids+=($!) - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 0 ]]; then - wait "${pids[-1]}" - status=$? - else - SCALETEST_PROMETHEUS_START_PORT=$((SCALETEST_PROMETHEUS_START_PORT + 1)) - fi - ;; - - *) - log "WARNING: Unknown load scenario: ${scenario}, skipping..." - ;; - esac - set -e - - # Allow targeting to be distributed evenly across workspaces when each - # scenario is run concurrently and all percentages add up to 100. - target_start=${target_end} - - if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then - pid_to_scenario+=(["${pids[-1]}"]="${scenario}") - # Stagger the start of each scenario to avoid a burst of load and deted - # problematic scenarios. - sleep $((SCALETEST_PARAM_LOAD_SCENARIO_CONCURRENCY_STAGGER_DELAY_MINS * 60)) - continue - fi - - if ((status > 0)); then - log "Load scenario failed: ${scenario} (exit=${status})" - failed+=(["${scenario}"]="${status}") - PHASE_ADD_TAGS=error end_phase - else - end_phase - fi - - wait_baseline "${SCALETEST_PARAM_LOAD_SCENARIO_BASELINE_DURATION}" -done -if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]]; then - wait "${pids[@]}" - # Wait on all pids will wait until all have exited, but we need to - # check their individual exit codes. - for pid in "${pids[@]}"; do - wait "${pid}" - status=${?} - scenario=${pid_to_scenario[${pid}]} - if ((status > 0)); then - log "Load scenario failed: ${scenario} (exit=${status})" - failed+=(["${scenario}"]="${status}") - fi - done - if ((${#failed[@]} > 0)); then - PHASE_ADD_TAGS=error end_phase - else - end_phase - fi -fi - -if ((${#failed[@]} > 0)); then - log "Load scenarios failed: ${!failed[*]}" - for scenario in "${!failed[@]}"; do - log " ${scenario}: exit=${failed[$scenario]}" - done - exit 1 -fi - -log "Scaletest complete!" -set_status Complete diff --git a/examples/scaletests/scaletest-runner/shutdown.sh b/examples/scaletests/scaletest-runner/shutdown.sh deleted file mode 100755 index 9e75864d73120..0000000000000 --- a/examples/scaletests/scaletest-runner/shutdown.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/bash -set -e - -[[ $VERBOSE == 1 ]] && set -x - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -cleanup() { - coder tokens remove scaletest_runner >/dev/null 2>&1 || true - rm -f "${CODER_CONFIG_DIR}/session" -} -trap cleanup EXIT - -annotate_grafana "workspace" "Agent stopping..." - -shutdown_event=shutdown_scale_down_only -if [[ ${SCALETEST_PARAM_CLEANUP_STRATEGY} == on_stop ]]; then - shutdown_event=shutdown -fi -"${SCRIPTS_DIR}/cleanup.sh" "${shutdown_event}" - -annotate_grafana_end "workspace" "Agent running" - -appearance_json="$(get_appearance)" -service_banner_message=$(jq -r '.service_banner.message' <<<"${appearance_json}") -service_banner_message="${service_banner_message/% | */}" -service_banner_color="#4CD473" # Green. - -set_appearance "${appearance_json}" "${service_banner_color}" "${service_banner_message}" diff --git a/examples/scaletests/scaletest-runner/startup.sh b/examples/scaletests/scaletest-runner/startup.sh deleted file mode 100755 index 3e4eb94f41810..0000000000000 --- a/examples/scaletests/scaletest-runner/startup.sh +++ /dev/null @@ -1,181 +0,0 @@ -#!/bin/bash -set -euo pipefail - -[[ $VERBOSE == 1 ]] && set -x - -if [[ ${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE} == "${SCALETEST_PARAM_TEMPLATE}" ]]; then - echo "ERROR: Greedy agent template must be different from the scaletest template." >&2 - exit 1 -fi - -if [[ ${SCALETEST_PARAM_LOAD_SCENARIO_RUN_CONCURRENTLY} == 1 ]] && [[ ${SCALETEST_PARAM_GREEDY_AGENT} == 1 ]]; then - echo "ERROR: Load scenario concurrency and greedy agent test cannot be enabled at the same time." >&2 - exit 1 -fi - -# Unzip scripts and add to path. -# shellcheck disable=SC2153 -echo "Extracting scaletest scripts into ${SCRIPTS_DIR}..." -base64 -d <<<"${SCRIPTS_ZIP}" >/tmp/scripts.zip -rm -rf "${SCRIPTS_DIR}" || true -mkdir -p "${SCRIPTS_DIR}" -unzip -o /tmp/scripts.zip -d "${SCRIPTS_DIR}" -# Chmod to work around https://github.com/coder/coder/issues/10034 -chmod +x "${SCRIPTS_DIR}"/*.sh -rm /tmp/scripts.zip - -echo "Cloning coder/coder repo..." -if [[ ! -d "${HOME}/coder" ]]; then - git clone https://github.com/coder/coder.git "${HOME}/coder" -fi -(cd "${HOME}/coder" && git fetch -a && git checkout "${SCALETEST_PARAM_REPO_BRANCH}" && git pull) - -# Store the input parameters (for debugging). -env | grep "^SCALETEST_" | sort >"${SCALETEST_RUN_DIR}/environ.txt" - -# shellcheck disable=SC2153 source=scaletest/templates/scaletest-runner/scripts/lib.sh -. "${SCRIPTS_DIR}/lib.sh" - -appearance_json="$(get_appearance)" -service_banner_message=$(jq -r '.service_banner.message' <<<"${appearance_json}") -service_banner_message="${service_banner_message/% | */}" -service_banner_color="#D65D0F" # Orange. - -annotate_grafana "workspace" "Agent running" # Ended in shutdown.sh. - -{ - pids=() - ports=() - declare -A pods=() - next_port=6061 - for pod in $(kubectl get pods -l app.kubernetes.io/name=coder -o jsonpath='{.items[*].metadata.name}'); do - maybedryrun "${DRY_RUN}" kubectl -n coder-big port-forward "${pod}" "${next_port}:6060" & - pids+=($!) - ports+=("${next_port}") - pods[${next_port}]="${pod}" - next_port=$((next_port + 1)) - done - - trap 'trap - EXIT; kill -INT "${pids[@]}"; exit 1' INT EXIT - - while :; do - # Sleep for short periods of time so that we can exit quickly. - # This adds up to ~300 when accounting for profile and trace. - for ((i = 0; i < 285; i++)); do - sleep 1 - done - log "Grabbing pprof dumps" - start="$(date +%s)" - annotate_grafana "pprof" "Grab pprof dumps (start=${start})" - for type in allocs block heap goroutine mutex 'profile?seconds=10' 'trace?seconds=5'; do - for port in "${ports[@]}"; do - tidy_type="${type//\?/_}" - tidy_type="${tidy_type//=/_}" - maybedryrun "${DRY_RUN}" curl -sSL --output "${SCALETEST_PPROF_DIR}/pprof-${tidy_type}-${pods[${port}]}-${start}.gz" "http://localhost:${port}/debug/pprof/${type}" - done - done - annotate_grafana_end "pprof" "Grab pprof dumps (start=${start})" - done -} & -pprof_pid=$! - -logs_gathered=0 -gather_logs() { - if ((logs_gathered == 1)); then - return - fi - logs_gathered=1 - - # Gather logs from all coderd and provisioner instances, and all workspaces. - annotate_grafana "logs" "Gather logs" - podsraw="$( - kubectl -n coder-big get pods -l app.kubernetes.io/name=coder -o name - kubectl -n coder-big get pods -l app.kubernetes.io/name=coder-provisioner -o name || true - kubectl -n coder-big get pods -l app.kubernetes.io/name=coder-workspace -o name | grep "^pod/scaletest-" || true - )" - mapfile -t pods <<<"${podsraw}" - for pod in "${pods[@]}"; do - pod_name="${pod#pod/}" - kubectl -n coder-big logs "${pod}" --since-time="${SCALETEST_RUN_START_TIME}" >"${SCALETEST_LOGS_DIR}/${pod_name}.txt" - done - annotate_grafana_end "logs" "Gather logs" -} - -set_appearance "${appearance_json}" "${service_banner_color}" "${service_banner_message} | Scaletest running: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE})!" - -# Show failure in the UI if script exits with error. -on_exit() { - code=${?} - trap - ERR EXIT - set +e - - kill -INT "${pprof_pid}" - - message_color="#4CD473" # Green. - message_status=COMPLETE - if ((code > 0)); then - message_color="#D94A5D" # Red. - message_status=FAILED - fi - - # In case the test failed before gathering logs, gather them before - # cleaning up, whilst the workspaces are still present. - gather_logs - - case "${SCALETEST_PARAM_CLEANUP_STRATEGY}" in - on_stop) - # Handled by shutdown script. - ;; - on_success) - if ((code == 0)); then - set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE}), cleaning up..." - "${SCRIPTS_DIR}/cleanup.sh" "${SCALETEST_PARAM_CLEANUP_STRATEGY}" - fi - ;; - on_error) - if ((code > 0)); then - set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE}), cleaning up..." - "${SCRIPTS_DIR}/cleanup.sh" "${SCALETEST_PARAM_CLEANUP_STRATEGY}" - fi - ;; - *) - set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE}), cleaning up..." - "${SCRIPTS_DIR}/cleanup.sh" "${SCALETEST_PARAM_CLEANUP_STRATEGY}" - ;; - esac - - set_appearance "${appearance_json}" "${message_color}" "${service_banner_message} | Scaletest ${message_status}: [${CODER_USER}/${CODER_WORKSPACE}](${CODER_URL}/@${CODER_USER}/${CODER_WORKSPACE})!" - - annotate_grafana_end "" "Start scaletest: ${SCALETEST_COMMENT}" - - wait "${pprof_pid}" - exit "${code}" -} -trap on_exit EXIT - -on_err() { - code=${?} - trap - ERR - set +e - - log "Scaletest failed!" - GRAFANA_EXTRA_TAGS=error set_status "Failed (exit=${code})" - "${SCRIPTS_DIR}/report.sh" failed - lock_status # Ensure we never rewrite the status after a failure. - - exit "${code}" -} -trap on_err ERR - -# Pass session token since `prepare.sh` has not yet run. -CODER_SESSION_TOKEN=$CODER_USER_TOKEN "${SCRIPTS_DIR}/report.sh" started -annotate_grafana "" "Start scaletest: ${SCALETEST_COMMENT}" - -"${SCRIPTS_DIR}/prepare.sh" - -"${SCRIPTS_DIR}/run.sh" - -# Gather logs before ending the test. -gather_logs - -"${SCRIPTS_DIR}/report.sh" completed diff --git a/scaletest/templates/scaletest-runner/main.tf b/scaletest/templates/scaletest-runner/main.tf index 42fa785cc4732..2d17c66435f62 100644 --- a/scaletest/templates/scaletest-runner/main.tf +++ b/scaletest/templates/scaletest-runner/main.tf @@ -44,7 +44,7 @@ locals { scaletest_run_id = "scaletest-${replace(time_static.start_time.rfc3339, ":", "-")}" scaletest_run_dir = "/home/coder/${local.scaletest_run_id}" scaletest_run_start_time = time_static.start_time.rfc3339 - grafana_url = "https://stats.dev.c8s.io" + grafana_url = "https://grafana.corp.tld" grafana_dashboard_uid = "qLVSTR-Vz" grafana_dashboard_name = "coderv2-loadtest-dashboard" } @@ -625,6 +625,8 @@ resource "coder_agent" "main" { vscode = false ssh_helper = false } + startup_script_timeout = 86400 + shutdown_script_timeout = 7200 startup_script_behavior = "blocking" startup_script = file("startup.sh") shutdown_script = file("shutdown.sh") @@ -734,10 +736,9 @@ resource "coder_app" "prometheus" { agent_id = coder_agent.main.id slug = "01-prometheus" display_name = "Prometheus" - // https://stats.dev.c8s.io:9443/classic/graph?g0.range_input=2h&g0.end_input=2023-09-08%2015%3A58&g0.stacked=0&g0.expr=rate(pg_stat_database_xact_commit%7Bcluster%3D%22big%22%2Cdatname%3D%22big-coder%22%7D%5B1m%5D)&g0.tab=0 - url = "https://stats.dev.c8s.io:9443" - icon = "https://prometheus.io/assets/favicons/favicon-32x32.png" - external = true + url = "https://grafana.corp.tld:9443" + icon = "https://prometheus.io/assets/favicons/favicon-32x32.png" + external = true } resource "coder_app" "manual_cleanup" { From 4fc71432f05699ff1d9476640734abdd799aef3d Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Wed, 20 Mar 2024 17:49:23 +0100 Subject: [PATCH 09/21] command --- docs/admin/scale.md | 92 ++++++++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 35 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index dd018b98562ec..f16efaf2d2086 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -10,7 +10,9 @@ Learn more about [Coder’s architecture](../about/architecture.md) and our ## Recent scale tests > Note: the below information is for reference purposes only, and are not -> intended to be used as guidelines for infrastructure sizing. +> intended to be used as guidelines for infrastructure sizing. Review the +> [Reference Architectures](architectures/index.md) for hardware sizing +> recommendations. | Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | | ---------------- | --------- | --------- | -------------- | ----------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | @@ -29,58 +31,76 @@ Since Coder's performance is highly dependent on the templates and workflows you support, you may wish to use our internal scale testing utility against your own environments. -> Note: This utility is intended for internal use only. It is not subject to any -> compatibility guarantees, and may cause interruptions for your users. To avoid -> potential outages and orphaned resources, we recommend running scale tests on -> a secondary "staging" environment. Run it against a production environment at -> your own risk. +> Note: This utility is experimental. It is not subject to any compatibility +> guarantees, and may cause interruptions for your users. To avoid potential +> outages and orphaned resources, we recommend running scale tests on a +> secondary "staging" environment or a dedicated +> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/templates). +> Run it against a production environment at your own risk. -### Workspace Creation +### Create workspaces -The following command will run our scale test against your own Coder deployment. -You can also specify a template name and any parameter values. +The following command will provision a number of Coder workspaces using the +specified template and extra parameters. ```shell coder exp scaletest create-workspaces \ - --count 1000 \ - --template "kubernetes" \ - --concurrency 0 \ - --cleanup-concurrency 0 \ - --parameter "home_disk_size=10" \ - --run-command "sleep 2 && echo hello" + --retry 5 \ + --count "${SCALETEST_PARAM_NUM_WORKSPACES}" \ + --template "${SCALETEST_PARAM_TEMPLATE}" \ + --concurrency "${SCALETEST_PARAM_CREATE_CONCURRENCY}" \ + --timeout 5h \ + --job-timeout 5h \ + --no-cleanup \ + --output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json" # Run `coder exp scaletest create-workspaces --help` for all usage ``` -The test does the following: +The command does the following: -1. create `1000` workspaces -1. establish SSH connection to each workspace -1. run `sleep 3 && echo hello` on each workspace via the web terminal -1. close connections, attempt to delete all workspaces -1. return results (e.g. `998 succeeded, 2 failed to connect`) - -Concurrency is configurable. `concurrency 0` means the scaletest test will -attempt to create & connect to all workspaces immediately. - -If you wish to leave the workspaces running for a period of time, you can -specify `--no-cleanup` to skip the cleanup step. You are responsible for -deleting these resources later. +1. Create `${SCALETEST_PARAM_NUM_WORKSPACES}` workspaces concurrently + (concurrency level: `${SCALETEST_PARAM_CREATE_CONCURRENCY}`) using the + template `${SCALETEST_PARAM_TEMPLATE}`. +1. Leave workspaces running to use in next steps (`--no-cleanup` option). +1. Store provisioning results in JSON format. +1. If you don't want the creation process to be interrupted by any errors, use + the `--retry 5` flag. ### Traffic Generation Given an existing set of workspaces created previously with `create-workspaces`, -the following command will generate traffic similar to that of Coder's web -terminal against those workspaces. +the following command will generate traffic similar to that of Coder's Web +Terminal against those workspaces. ```shell +# Produce load at about 1000MB/s (25MB/40ms). coder exp scaletest workspace-traffic \ - --byes-per-tick 128 \ - --tick-interval 100ms \ - --concurrency 0 + --template "${SCALETEST_PARAM_GREEDY_AGENT_TEMPLATE}" \ + --bytes-per-tick $((1024 * 1024 * 25)) \ + --tick-interval 40ms \ + --timeout "$((delay))s" \ + --job-timeout "$((delay))s" \ + --scaletest-prometheus-address 0.0.0.0:21113 \ + --target-workspaces "0:100" \ + --trace=false \ + --output json:"${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json" ``` -To generate SSH traffic, add the `--ssh` flag. +Traffic generation can be parametrized: + +1. Send `bytes-per-tick` every `tick-interval`. +1. Enable tracing for performance debugging. +1. Target a range of workspaces with `--target-workspaces 0:100`. +1. For dashboard traffic: Target a range of users with `--target-users 0:100`. +1. Store provisioning results in JSON format. + +The `workspace-traffic` supports also other modes - SSH traffic, workspace app: + +1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web + Terminal. +1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app + behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_). ### Cleanup @@ -88,7 +108,9 @@ The scaletest utility will attempt to clean up all workspaces it creates. If you wish to clean up all workspaces, you can run the following command: ```shell -coder exp scaletest cleanup +coder exp scaletest cleanup \ + --cleanup-job-timeout 2h \ + --cleanup-timeout 15min ``` This will delete all workspaces and users with the prefix `scaletest-`. From feb8f9f5965500207cfe2012730cd8eccb3942fe Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 Mar 2024 12:32:36 +0100 Subject: [PATCH 10/21] WIP --- docs/admin/scale.md | 45 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index f16efaf2d2086..a8c5a00efcf57 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -117,15 +117,54 @@ This will delete all workspaces and users with the prefix `scaletest-`. ## Scale testing template -TODO +Besides the CLI utility, consider using a dedicated +[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner) +template for testing large scale Kubernetes clusters. + +The template deploys a main workspace with scripts used to orchestrate Coder to +create workspaces, generate workspace traffic, or load tests workspace apps. ### Parameters -TODO +The _scaletest-runner_ offers the following configuration options: + +- workspace template selecting Kubernetes cluster size: + minimal/small/medium/large (_default_: minimal) +- number of workspaces +- wait duration between scenarios or staggered approach + +The template exposes parameters to control the traffic dimensions for SSH +connections, workspace apps, and dashboard tests: + +- traffic duration of the load test scenario +- traffic percentage of targeted workspaces +- bytes per tick and tick interval +- _For workspace apps_: modes (echo, read random data, or write and discard) + +Scale testing concurrency can be controlled with the following parameters: + +- enable parallel scenarios - interleave different traffic patterns (SSH, + workspace apps, dashboard traffic, etc.) +- workspace creation concurrency level (_default_: 10) +- job concurrency level - generate workspace traffic using multiple jobs + (_default_: 0) +- cleanup concurrency level ### Kubernetes cluster -TODO +Depending on the traffic projections, operators can deploy different sample +clusters to perform scale tests. It is recommend to learn how to operate the +scaletest-runner before running it against the staging cluster (or production at +your own risk). + +There are a few cluster options available: + +- minimal +- small +- medium +- large + +TODO greedy ### Observability From 01c4297b8fd1595ae7bea33045cf5ab66cb6dedd Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 Mar 2024 13:05:52 +0100 Subject: [PATCH 11/21] Clusters --- docs/admin/scale.md | 53 +++++++++++-------- .../templates/kubernetes-minimal/README.md | 2 +- 2 files changed, 33 insertions(+), 22 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index a8c5a00efcf57..b8ea79dad4c3d 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -117,54 +117,65 @@ This will delete all workspaces and users with the prefix `scaletest-`. ## Scale testing template -Besides the CLI utility, consider using a dedicated +Consider using a dedicated [scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner) -template for testing large scale Kubernetes clusters. +template alongside the CLI utility for testing large-scale Kubernetes clusters. -The template deploys a main workspace with scripts used to orchestrate Coder to -create workspaces, generate workspace traffic, or load tests workspace apps. +The template deploys a main workspace with scripts used to orchestrate Coder, +creating workspaces, generating workspace traffic, or load-testing workspace +apps. ### Parameters The _scaletest-runner_ offers the following configuration options: -- workspace template selecting Kubernetes cluster size: +- Workspace template selecting Kubernetes cluster size: minimal/small/medium/large (_default_: minimal) -- number of workspaces -- wait duration between scenarios or staggered approach +- Number of workspaces +- Wait duration between scenarios or staggered approach The template exposes parameters to control the traffic dimensions for SSH connections, workspace apps, and dashboard tests: -- traffic duration of the load test scenario -- traffic percentage of targeted workspaces -- bytes per tick and tick interval +- Traffic duration of the load test scenario +- Traffic percentage of targeted workspaces +- Bytes per tick and tick interval - _For workspace apps_: modes (echo, read random data, or write and discard) Scale testing concurrency can be controlled with the following parameters: -- enable parallel scenarios - interleave different traffic patterns (SSH, +- Enable parallel scenarios - interleave different traffic patterns (SSH, workspace apps, dashboard traffic, etc.) -- workspace creation concurrency level (_default_: 10) -- job concurrency level - generate workspace traffic using multiple jobs +- Workspace creation concurrency level (_default_: 10) +- Job concurrency level - generate workspace traffic using multiple jobs (_default_: 0) -- cleanup concurrency level +- Cleanup concurrency level ### Kubernetes cluster Depending on the traffic projections, operators can deploy different sample -clusters to perform scale tests. It is recommend to learn how to operate the +clusters to perform scale tests. It is recommended to learn how to operate the scaletest-runner before running it against the staging cluster (or production at your own risk). -There are a few cluster options available: +There are a few cluster options +[available](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner): -- minimal -- small -- medium -- large +| Cluster size | vCPU | Memory | Persisted storage | Details | +| ------------ | ---- | ------ | ----------------- | ----------------------------------------------------- | +| minimal | 1 | 2 Gi | None | | +| small | 1 | 1 Gi | None | | +| medium | 2 | 2 Gi | None | Medium-sized cluster offers the greedy agent variant. | +| large | 4 | 4 Gi | None | | -TODO greedy +#### Greedy agent + +The greedy agent variant is a template modification that forces the Coder agent +to transmit large metadata (size: 4K) while emitting stats. The transmission of +large chunks puts extra overhead on coderd instances and agents while processing +and storing the data. + +Use this template variant to verify limits of the cluster performance. ### Observability diff --git a/scaletest/templates/kubernetes-minimal/README.md b/scaletest/templates/kubernetes-minimal/README.md index c56d3d477f821..a4e76f8b24611 100644 --- a/scaletest/templates/kubernetes-minimal/README.md +++ b/scaletest/templates/kubernetes-minimal/README.md @@ -1,5 +1,5 @@ # kubernetes-minimal -Provisions a medium-sized workspace with no persistent storage. Greedy agent variant. +Provisions a minimal-sized workspace with no persistent storage. _Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` From e9b7803e6d15dee9cba420afe0a76b11a7284c9e Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Thu, 21 Mar 2024 13:07:31 +0100 Subject: [PATCH 12/21] no pod monitor --- .../kubernetes-with-podmonitor/README.md | 98 ----- .../kubernetes-with-podmonitor/main.tf | 362 ------------------ 2 files changed, 460 deletions(-) delete mode 100644 scaletest/templates/kubernetes-with-podmonitor/README.md delete mode 100644 scaletest/templates/kubernetes-with-podmonitor/main.tf diff --git a/scaletest/templates/kubernetes-with-podmonitor/README.md b/scaletest/templates/kubernetes-with-podmonitor/README.md deleted file mode 100644 index 6c04af8ea6a63..0000000000000 --- a/scaletest/templates/kubernetes-with-podmonitor/README.md +++ /dev/null @@ -1,98 +0,0 @@ ---- -name: Develop in Kubernetes -description: Get started with Kubernetes development. -tags: [cloud, kubernetes] -icon: /icon/k8s.png ---- - -# Getting started - -This template creates a pod running the `codercom/enterprise-base:ubuntu` image. - -## Authentication - -This template can authenticate using in-cluster authentication, or using a kubeconfig local to the -Coder host. For additional authentication options, consult the [Kubernetes provider -documentation](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs). - -### kubeconfig on Coder host - -If the Coder host has a local `~/.kube/config`, you can use this to authenticate -with Coder. Make sure this is done with same user that's running the `coder` service. - -To use this authentication, set the parameter `use_kubeconfig` to true. - -### In-cluster authentication - -If the Coder host runs in a Pod on the same Kubernetes cluster as you are creating workspaces in, -you can use in-cluster authentication. - -To use this authentication, set the parameter `use_kubeconfig` to false. - -The Terraform provisioner will automatically use the service account associated with the pod to -authenticate to Kubernetes. Be sure to bind a [role with appropriate permission](#rbac) to the -service account. For example, assuming the Coder host runs in the same namespace as you intend -to create workspaces: - -```yaml -apiVersion: v1 -kind: ServiceAccount -metadata: - name: coder - ---- -apiVersion: rbac.authorization.k8s.io/v1 -kind: RoleBinding -metadata: - name: coder -subjects: - - kind: ServiceAccount - name: coder -roleRef: - kind: Role - name: coder - apiGroup: rbac.authorization.k8s.io -``` - -Then start the Coder host with `serviceAccountName: coder` in the pod spec. - -### Authenticate against external clusters - -You may want to deploy workspaces on a cluster outside of the Coder control plane. Refer to the [Coder docs](https://coder.com/docs/v2/latest/platforms/kubernetes/additional-clusters) to learn how to modify your template to authenticate against external clusters. - -## Namespace - -The target namespace in which the pod will be deployed is defined via the `coder_workspace` -variable. The namespace must exist prior to creating workspaces. - -## Persistence - -The `/home/coder` directory in this example is persisted via the attached PersistentVolumeClaim. -Any data saved outside of this directory will be wiped when the workspace stops. - -Since most binary installations and environment configurations live outside of -the `/home` directory, we suggest including these in the `startup_script` argument -of the `coder_agent` resource block, which will run each time the workspace starts up. - -For example, when installing the `aws` CLI, the install script will place the -`aws` binary in `/usr/local/bin/aws`. To ensure the `aws` CLI is persisted across -workspace starts/stops, include the following code in the `coder_agent` resource -block of your workspace template: - -```terraform -resource "coder_agent" "main" { - startup_script = <<-EOT - set -e - # install AWS CLI - curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" - unzip awscliv2.zip - sudo ./aws/install - EOT -} -``` - -## code-server - -`code-server` is installed via the `startup_script` argument in the `coder_agent` -resource block. The `coder_app` resource is defined to access `code-server` through -the dashboard UI over `localhost:13337`. diff --git a/scaletest/templates/kubernetes-with-podmonitor/main.tf b/scaletest/templates/kubernetes-with-podmonitor/main.tf deleted file mode 100644 index 722cbe71f7692..0000000000000 --- a/scaletest/templates/kubernetes-with-podmonitor/main.tf +++ /dev/null @@ -1,362 +0,0 @@ -terraform { - required_providers { - coder = { - source = "coder/coder" - version = "~> 0.7.0" - } - kubernetes = { - source = "hashicorp/kubernetes" - version = "~> 2.18" - } - } -} - -provider "coder" { -} - -variable "use_kubeconfig" { - type = bool - description = <<-EOF - Use host kubeconfig? (true/false) - - Set this to false if the Coder host is itself running as a Pod on the same - Kubernetes cluster as you are deploying workspaces to. - - Set this to true if the Coder host is running outside the Kubernetes cluster - for workspaces. A valid "~/.kube/config" must be present on the Coder host. - EOF - default = false -} - -variable "namespace" { - type = string - description = "The Kubernetes namespace to create workspaces in (must exist prior to creating workspaces)" -} - -data "coder_parameter" "cpu" { - name = "cpu" - display_name = "CPU" - description = "The number of CPU cores" - default = "2" - icon = "/icon/memory.svg" - mutable = true - option { - name = "2 Cores" - value = "2" - } - option { - name = "4 Cores" - value = "4" - } - option { - name = "6 Cores" - value = "6" - } - option { - name = "8 Cores" - value = "8" - } -} - -data "coder_parameter" "memory" { - name = "memory" - display_name = "Memory" - description = "The amount of memory in GB" - default = "2" - icon = "/icon/memory.svg" - mutable = true - option { - name = "2 GB" - value = "2" - } - option { - name = "4 GB" - value = "4" - } - option { - name = "6 GB" - value = "6" - } - option { - name = "8 GB" - value = "8" - } - option { - name = "16 GB" - value = "16" - } - option { - name = "24 GB" - value = "24" - } -} - -data "coder_parameter" "home_disk_size" { - name = "home_disk_size" - display_name = "Home disk size" - description = "The size of the home disk in GB" - default = "10" - type = "number" - icon = "/emojis/1f4be.png" - mutable = false - validation { - min = 1 - max = 99999 - } -} - -provider "kubernetes" { - # Authenticate via ~/.kube/config or a Coder-specific ServiceAccount, depending on admin preferences - config_path = var.use_kubeconfig == true ? "~/.kube/config" : null -} - -data "coder_workspace" "me" {} - -resource "coder_agent" "main" { - os = "linux" - arch = "amd64" - startup_script_timeout = 180 - startup_script = <<-EOT - set -e - - # install and start code-server - curl -fsSL https://code-server.dev/install.sh | sh -s -- --method=standalone --prefix=/tmp/code-server --version 4.11.0 - /tmp/code-server/bin/code-server --auth none --port 13337 >/tmp/code-server.log 2>&1 & - EOT - - # The following metadata blocks are optional. They are used to display - # information about your workspace in the dashboard. You can remove them - # if you don't want to display any information. - # For basic resources, you can use the `coder stat` command. - # If you need more control, you can write your own script. - metadata { - display_name = "CPU Usage" - key = "0_cpu_usage" - script = "coder stat cpu" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "RAM Usage" - key = "1_ram_usage" - script = "coder stat mem" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "Home Disk" - key = "3_home_disk" - script = "coder stat disk --path $${HOME}" - interval = 60 - timeout = 1 - } - - metadata { - display_name = "CPU Usage (Host)" - key = "4_cpu_usage_host" - script = "coder stat cpu --host" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "Memory Usage (Host)" - key = "5_mem_usage_host" - script = "coder stat mem --host" - interval = 10 - timeout = 1 - } - - metadata { - display_name = "Load Average (Host)" - key = "6_load_host" - # get load avg scaled by number of cores - script = < Date: Thu, 21 Mar 2024 15:13:38 +0100 Subject: [PATCH 13/21] Mention graphs --- docs/admin/scale.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index b8ea79dad4c3d..b3337afea696f 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -170,16 +170,31 @@ There are a few cluster options #### Greedy agent -The greedy agent variant is a template modification that forces the Coder agent -to transmit large metadata (size: 4K) while emitting stats. The transmission of -large chunks puts extra overhead on coderd instances and agents while processing +The greedy agent variant is a template modification that makes the Coder agent +transmit large metadata (size: 4K) while reporting stats. The transmission of +large chunks puts extra overhead on coderd instances and agents when handling and storing the data. Use this template variant to verify limits of the cluster performance. ### Observability -TODO Grafana and logs +During scale tests, operators can monitor progress using a Grafana dashboard. +Coder offers a comprehensive overview +[dashboard](https://github.com/coder/coder/blob/main/scaletest/scaletest_dashboard.json) +that can seamlessly integrate into the internal Grafana deployment. + +This dashboard provides insights into various aspects, including: + +- Utilization of resources within the Coder control plane (CPU, memory, pods) +- Database performance metrics (CPU, memory, I/O, connections, queries) +- Coderd API performance (requests, latency, error rate) +- Resource consumption within Coder workspaces (CPU, memory, network usage) +- Internal metrics related to provisioner jobs + +It is highly recommended to deploy a solution for centralized log collection and +aggregation. The presence of error logs may indicate an underscaled deployment +of Coder, necessitating action from operators. ## Autoscaling From 9a53217ab4ed92e4cc294b08be75a84c92469694 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 08:58:00 +0100 Subject: [PATCH 14/21] Cian's comments --- docs/admin/scale.md | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index b3337afea696f..942f34209ab00 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -35,7 +35,7 @@ environments. > guarantees, and may cause interruptions for your users. To avoid potential > outages and orphaned resources, we recommend running scale tests on a > secondary "staging" environment or a dedicated -> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/templates). +> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform). > Run it against a production environment at your own risk. ### Create workspaces @@ -82,7 +82,7 @@ coder exp scaletest workspace-traffic \ --timeout "$((delay))s" \ --job-timeout "$((delay))s" \ --scaletest-prometheus-address 0.0.0.0:21113 \ - --target-workspaces "0:100" \ + --target-workspaces "0:100" \ --trace=false \ --output json:"${SCALETEST_RESULTS_DIR}/traffic-${type}-greedy-agent.json" ``` @@ -94,6 +94,8 @@ Traffic generation can be parametrized: 1. Target a range of workspaces with `--target-workspaces 0:100`. 1. For dashboard traffic: Target a range of users with `--target-users 0:100`. 1. Store provisioning results in JSON format. +1. Expose a dedicated Prometheus address (`--scaletest-prometheus-address`) for + scaletest-specific metrics. The `workspace-traffic` supports also other modes - SSH traffic, workspace app: @@ -129,8 +131,9 @@ apps. The _scaletest-runner_ offers the following configuration options: -- Workspace template selecting Kubernetes cluster size: - minimal/small/medium/large (_default_: minimal) +- Workspace size selection: minimal/small/medium/large (_default_: minimal, + which contains just enough resources for a Coder agent to run without + additional workloads) - Number of workspaces - Wait duration between scenarios or staggered approach @@ -161,12 +164,12 @@ your own risk). There are a few cluster options [available](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner): -| Cluster size | vCPU | Memory | Persisted storage | Details | -| ------------ | ---- | ------ | ----------------- | ----------------------------------------------------- | -| minimal | 1 | 2 Gi | None | | -| small | 1 | 1 Gi | None | | -| medium | 2 | 2 Gi | None | Medium-sized cluster offers the greedy agent variant. | -| large | 4 | 4 Gi | None | | +| Workspace size | vCPU | Memory | Persisted storage | Details | +| -------------- | ---- | ------ | ----------------- | ----------------------------------------------------- | +| minimal | 1 | 2 Gi | None | | +| small | 1 | 1 Gi | None | | +| medium | 2 | 2 Gi | None | Medium-sized cluster offers the greedy agent variant. | +| large | 4 | 4 Gi | None | | #### Greedy agent From d1b0ddca5ea9fc145add37908c3adafbf997412e Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 09:26:33 +0100 Subject: [PATCH 15/21] WIP --- docs/admin/scale.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 942f34209ab00..25779d60de48e 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -156,13 +156,13 @@ Scale testing concurrency can be controlled with the following parameters: ### Kubernetes cluster -Depending on the traffic projections, operators can deploy different sample -clusters to perform scale tests. It is recommended to learn how to operate the -scaletest-runner before running it against the staging cluster (or production at -your own risk). +It is recommended to learn how to operate the _scaletest-runner_ before running +it against the staging cluster (or production at your own risk). Coder provides +different +[workspace configurations](https://github.com/coder/coder/tree/main/scaletest/templates) +that operators can deploy depending on the traffic projections. -There are a few cluster options -[available](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner): +There are a few cluster options available: | Workspace size | vCPU | Memory | Persisted storage | Details | | -------------- | ---- | ------ | ----------------- | ----------------------------------------------------- | From fe4e743ec8da7f3a11d4364f6d806163eab6fe77 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 09:36:56 +0100 Subject: [PATCH 16/21] Noted --- docs/admin/scale.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 25779d60de48e..6accd8ce1e5b9 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -195,6 +195,9 @@ This dashboard provides insights into various aspects, including: - Resource consumption within Coder workspaces (CPU, memory, network usage) - Internal metrics related to provisioner jobs +Note: Database metrics are disabled by default and can be enabled by setting the +environment variable `CODER_PROMETHEUS_COLLECT_DB_METRICS` to `true`. + It is highly recommended to deploy a solution for centralized log collection and aggregation. The presence of error logs may indicate an underscaled deployment of Coder, necessitating action from operators. From e91f036057f0b9a073010353e582b37da277a2b2 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 11:12:55 +0100 Subject: [PATCH 17/21] Fix --- scaletest/templates/kubernetes-minimal/main.tf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scaletest/templates/kubernetes-minimal/main.tf b/scaletest/templates/kubernetes-minimal/main.tf index 6d04fb68a33ed..3bd56046f400b 100644 --- a/scaletest/templates/kubernetes-minimal/main.tf +++ b/scaletest/templates/kubernetes-minimal/main.tf @@ -152,7 +152,7 @@ resource "kubernetes_deployment" "main" { match_expressions { key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces", "big-workspaces2"] + values = ["big-workspaces"] } } } From e9189e3b5ddce92b31db3c5d1974df1b283946b0 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 11:24:45 +0100 Subject: [PATCH 18/21] Use template vars --- scaletest/templates/kubernetes-large/main.tf | 8 +++++++- scaletest/templates/kubernetes-medium-greedy/main.tf | 8 +++++++- scaletest/templates/kubernetes-medium/main.tf | 8 +++++++- scaletest/templates/kubernetes-minimal/main.tf | 8 +++++++- scaletest/templates/kubernetes-small/main.tf | 8 +++++++- 5 files changed, 35 insertions(+), 5 deletions(-) diff --git a/scaletest/templates/kubernetes-large/main.tf b/scaletest/templates/kubernetes-large/main.tf index 352db67bbcf22..161d4448bab64 100644 --- a/scaletest/templates/kubernetes-large/main.tf +++ b/scaletest/templates/kubernetes-large/main.tf @@ -17,6 +17,12 @@ provider "kubernetes" { config_path = null # always use host } +variable "kubernetes_nodepool_workspaces" { + description = "Kubernetes nodepool for Coder workspaces" + type = string + default = "big-workspaces" +} + data "coder_workspace" "me" {} resource "coder_agent" "main" { @@ -72,7 +78,7 @@ resource "kubernetes_pod" "main" { match_expressions { key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces"] + values = ["${var.kubernetes_nodepool_workspaces}"] } } } diff --git a/scaletest/templates/kubernetes-medium-greedy/main.tf b/scaletest/templates/kubernetes-medium-greedy/main.tf index a0a5dd8742c56..8a70eced34426 100644 --- a/scaletest/templates/kubernetes-medium-greedy/main.tf +++ b/scaletest/templates/kubernetes-medium-greedy/main.tf @@ -17,6 +17,12 @@ provider "kubernetes" { config_path = null # always use host } +variable "kubernetes_nodepool_workspaces" { + description = "Kubernetes nodepool for Coder workspaces" + type = string + default = "big-workspaces" +} + data "coder_workspace" "me" {} resource "coder_agent" "main" { @@ -186,7 +192,7 @@ resource "kubernetes_pod" "main" { match_expressions { key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces"] + values = ["${var.kubernetes_nodepool_workspaces}"] } } } diff --git a/scaletest/templates/kubernetes-medium/main.tf b/scaletest/templates/kubernetes-medium/main.tf index 5dcd9588c1b33..5e3980a0e252e 100644 --- a/scaletest/templates/kubernetes-medium/main.tf +++ b/scaletest/templates/kubernetes-medium/main.tf @@ -17,6 +17,12 @@ provider "kubernetes" { config_path = null # always use host } +variable "kubernetes_nodepool_workspaces" { + description = "Kubernetes nodepool for Coder workspaces" + type = string + default = "big-workspaces" +} + data "coder_workspace" "me" {} resource "coder_agent" "main" { @@ -72,7 +78,7 @@ resource "kubernetes_pod" "main" { match_expressions { key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces"] + values = ["${var.kubernetes_nodepool_workspaces}"] } } } diff --git a/scaletest/templates/kubernetes-minimal/main.tf b/scaletest/templates/kubernetes-minimal/main.tf index 3bd56046f400b..7ad97f7a89e85 100644 --- a/scaletest/templates/kubernetes-minimal/main.tf +++ b/scaletest/templates/kubernetes-minimal/main.tf @@ -17,6 +17,12 @@ provider "kubernetes" { config_path = null # always use host } +variable "kubernetes_nodepool_workspaces" { + description = "Kubernetes nodepool for Coder workspaces" + type = string + default = "big-workspaces" +} + data "coder_workspace" "me" {} resource "coder_agent" "m" { @@ -152,7 +158,7 @@ resource "kubernetes_deployment" "main" { match_expressions { key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces"] + values = ["${var.kubernetes_nodepool_workspaces}"] } } } diff --git a/scaletest/templates/kubernetes-small/main.tf b/scaletest/templates/kubernetes-small/main.tf index b59e4989544f5..0c81ba245b1df 100644 --- a/scaletest/templates/kubernetes-small/main.tf +++ b/scaletest/templates/kubernetes-small/main.tf @@ -17,6 +17,12 @@ provider "kubernetes" { config_path = null # always use host } +variable "kubernetes_nodepool_workspaces" { + description = "Kubernetes nodepool for Coder workspaces" + type = string + default = "big-workspaces" +} + data "coder_workspace" "me" {} resource "coder_agent" "main" { @@ -72,7 +78,7 @@ resource "kubernetes_pod" "main" { match_expressions { key = "cloud.google.com/gke-nodepool" operator = "In" - values = ["big-workspaces"] + values = ["${var.kubernetes_nodepool_workspaces}"] } } } From 00eecf5cdf80ad434f46770f4fef92d2cbcfb4b7 Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 11:27:19 +0100 Subject: [PATCH 19/21] Add note --- docs/admin/scale.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 6accd8ce1e5b9..8f059c0e86c79 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -171,6 +171,9 @@ There are a few cluster options available: | medium | 2 | 2 Gi | None | Medium-sized cluster offers the greedy agent variant. | | large | 4 | 4 Gi | None | | +Note: Review the selected cluster template and edit the node affinity to match +your setup. + #### Greedy agent The greedy agent variant is a template modification that makes the Coder agent From 6a69f7b65671d70af7b1c56cd7e6fe962b4baabc Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 11:40:58 +0100 Subject: [PATCH 20/21] Fix: nodepool --- scaletest/templates/kubernetes-large/README.md | 4 +++- scaletest/templates/kubernetes-medium-greedy/README.md | 4 +++- scaletest/templates/kubernetes-medium/README.md | 4 +++- scaletest/templates/kubernetes-minimal/README.md | 4 +++- scaletest/templates/kubernetes-small/README.md | 4 +++- 5 files changed, 15 insertions(+), 5 deletions(-) diff --git a/scaletest/templates/kubernetes-large/README.md b/scaletest/templates/kubernetes-large/README.md index 2b0ae5cc296be..5621780243ada 100644 --- a/scaletest/templates/kubernetes-large/README.md +++ b/scaletest/templates/kubernetes-large/README.md @@ -2,4 +2,6 @@ Provisions a large-sized workspace with no persistent storage. -_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` +_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool. +By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`. +The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`. diff --git a/scaletest/templates/kubernetes-medium-greedy/README.md b/scaletest/templates/kubernetes-medium-greedy/README.md index 22e94bb262616..d29c36f10da3a 100644 --- a/scaletest/templates/kubernetes-medium-greedy/README.md +++ b/scaletest/templates/kubernetes-medium-greedy/README.md @@ -2,4 +2,6 @@ Provisions a medium-sized workspace with no persistent storage. Greedy agent variant. -_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` +_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool. +By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`. +The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`. diff --git a/scaletest/templates/kubernetes-medium/README.md b/scaletest/templates/kubernetes-medium/README.md index e2d5eae983114..6f63bfb62c25a 100644 --- a/scaletest/templates/kubernetes-medium/README.md +++ b/scaletest/templates/kubernetes-medium/README.md @@ -2,4 +2,6 @@ Provisions a medium-sized workspace with no persistent storage. -_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` +_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool. +By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`. +The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`. diff --git a/scaletest/templates/kubernetes-minimal/README.md b/scaletest/templates/kubernetes-minimal/README.md index a4e76f8b24611..767570337dbf6 100644 --- a/scaletest/templates/kubernetes-minimal/README.md +++ b/scaletest/templates/kubernetes-minimal/README.md @@ -2,4 +2,6 @@ Provisions a minimal-sized workspace with no persistent storage. -_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` +_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool. +By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`. +The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`. diff --git a/scaletest/templates/kubernetes-small/README.md b/scaletest/templates/kubernetes-small/README.md index 56efbb98c3cb3..df5475bd32d70 100644 --- a/scaletest/templates/kubernetes-small/README.md +++ b/scaletest/templates/kubernetes-small/README.md @@ -2,4 +2,6 @@ Provisions a small-sized workspace with no persistent storage. -_Requires_: `cloud.google.com/gke-nodepool` = `big-workspaces` +_Note_: It is assumed you will be running workspaces on a dedicated GKE nodepool. +By default, this template sets a node affinity of `cloud.google.com/gke-nodepool` = `big-workspaces`. +The nodepool affinity can be customized with the variable `kubernetes_nodepool_workspaces`. From 8d1609075076778197e98093ed475b673b3a121c Mon Sep 17 00:00:00 2001 From: Marcin Tojek Date: Fri, 22 Mar 2024 12:21:18 +0100 Subject: [PATCH 21/21] Try: force make gen --- .github/workflows/ci.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index ad21801cbdab4..8db445e798f42 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -604,6 +604,9 @@ jobs: - name: Setup sqlc uses: ./.github/actions/setup-sqlc + - name: make gen + run: "make --output-sync -j -B gen" + - name: Format run: | cd offlinedocs