Thanks to visit codestin.com
Credit goes to github.com

Skip to content

docs: add new scaling doc to best practices section #15904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 21 additions & 16 deletions docs/admin/infrastructure/scale-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,37 @@ without compromising service. This process encompasses infrastructure setup,
traffic projections, and aggressive testing to identify and mitigate potential
bottlenecks.

A dedicated Kubernetes cluster for Coder is recommended to configure, host and
A dedicated Kubernetes cluster for Coder is recommended to configure, host, and
manage Coder workloads. Kubernetes provides container orchestration
capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
across a distributed infrastructure. This ensures high availability, fault
tolerance, and scalability for Coder deployments. Coder is deployed on this
cluster using the
[Helm chart](../../install/kubernetes.md#4-install-coder-with-helm).

For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).

## Methodology

Our scale tests include the following stages:

1. Prepare environment: create expected users and provision workspaces.

2. SSH connections: establish user connections with agents, verifying their
1. SSH connections: establish user connections with agents, verifying their
ability to echo back received content.

3. Web Terminal: verify the PTY connection used for communication with Web
1. Web Terminal: verify the PTY connection used for communication with Web
Terminal.

4. Workspace application traffic: assess the handling of user connections with
1. Workspace application traffic: assess the handling of user connections with
specific workspace apps, confirming their capability to echo back received
content effectively.

5. Dashboard evaluation: verify the responsiveness and stability of Coder
1. Dashboard evaluation: verify the responsiveness and stability of Coder
dashboards under varying load conditions. This is achieved by simulating user
interactions using instances of headless Chromium browsers.

6. Cleanup: delete workspaces and users created in step 1.
1. Cleanup: delete workspaces and users created in step 1.

## Infrastructure and setup requirements

Expand All @@ -54,13 +56,16 @@ channel for IDEs with VS Code and JetBrains plugins.
The basic setup of scale tests environment involves:

1. Scale tests runner (32 vCPU, 128 GB RAM)
2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
3. Database: 1 instance (2 vCPU, 32 GB RAM)
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
1. Coder: 2 replicas (4 vCPU, 16 GB RAM)
1. Database: 1 instance (2 vCPU, 32 GB RAM)
1. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)

The test is deemed successful if:

The test is deemed successful if users did not experience interruptions in their
workflows, `coderd` did not crash or require restarts, and no other internal
errors were observed.
- Users did not experience interruptions in their
workflows,
- `coderd` did not crash or require restarts, and
- No other internal errors were observed.

## Traffic Projections

Expand Down Expand Up @@ -90,11 +95,11 @@ Database:

## Available reference architectures

[Up to 1,000 users](./validated-architectures/1k-users.md)
- [Up to 1,000 users](./validated-architectures/1k-users.md)

[Up to 2,000 users](./validated-architectures/2k-users.md)
- [Up to 2,000 users](./validated-architectures/2k-users.md)

[Up to 3,000 users](./validated-architectures/3k-users.md)
- [Up to 3,000 users](./validated-architectures/3k-users.md)

## Hardware recommendation

Expand All @@ -107,7 +112,7 @@ guidance on optimal configurations. A reasonable approach involves using scaling
formulas based on factors like CPU, memory, and the number of users.

While the minimum requirements specify 1 CPU core and 2 GB of memory per
`coderd` replica, it is recommended to allocate additional resources depending
`coderd` replica, we recommend that you allocate additional resources depending
on the workload size to ensure deployment stability.

#### CPU and memory usage
Expand Down
53 changes: 35 additions & 18 deletions docs/admin/infrastructure/scale-utility.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
# Scale Tests and Utilities

We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
We scale-test Coder with a built-in utility that can
be used in your environment for insights into how Coder scales with your
infrastructure. For scale-testing Kubernetes clusters we recommend to install
infrastructure. For scale-testing Kubernetes clusters we recommend that you install
and use the dedicated Coder template,
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).

Learn more about [Coder’s architecture](./architecture.md) and our
[scale-testing methodology](./scale-testing.md).

For more information about scaling, see our [Coder scaling best practices](../../tutorials/best-practices/scale-coder.md).

## Recent scale tests

> Note: the below information is for reference purposes only, and are not
> intended to be used as guidelines for infrastructure sizing. Review the
> [Reference Architectures](./validated-architectures/index.md#node-sizing) for
> hardware sizing recommendations.
The information in this doc is for reference purposes only, and is not intended
to be used as guidelines for infrastructure sizing.

Review the [Reference Architectures](./validated-architectures/index.md#node-sizing) for
hardware sizing recommendations.

| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
|------------------|-----------|-----------|----------------|-------------------|-------|-------------------|---------------------------------------|---------------|--------------|
Expand All @@ -25,26 +28,32 @@ Learn more about [Coder’s architecture](./architecture.md) and our
| Kubernetes (GKE) | 4 cores | 16 GB | 2 | db-custom-8-30720 | 2000 | 50 | 2000 simulated | `v2.8.4` | Feb 28, 2024 |
| Kubernetes (GKE) | 2 cores | 4 GB | 2 | db-custom-2-7680 | 1000 | 50 | 1000 simulated | `v2.10.2` | Apr 26, 2024 |

> Note: a simulated connection reads and writes random data at 40KB/s per
> connection.
> Note: A simulated connection reads and writes random data at 40KB/s per connection.

## Scale testing utility

Since Coder's performance is highly dependent on the templates and workflows you
support, you may wish to use our internal scale testing utility against your own
environments.

> Note: This utility is experimental. It is not subject to any compatibility
> guarantees, and may cause interruptions for your users. To avoid potential
> outages and orphaned resources, we recommend running scale tests on a
> secondary "staging" environment or a dedicated
> [Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).
> Run it against a production environment at your own risk.
<blockquote class="admonition important">

This utility is experimental.

It is not subject to any compatibility guarantees and may cause interruptions
for your users.
To avoid potential outages and orphaned resources, we recommend that you run
scale tests on a secondary "staging" environment or a dedicated
[Kubernetes playground cluster](https://github.com/coder/coder/tree/main/scaletest/terraform).

Run it against a production environment at your own risk.

</blockquote>

### Create workspaces

The following command will provision a number of Coder workspaces using the
specified template and extra parameters.
specified template and extra parameters:

```shell
coder exp scaletest create-workspaces \
Expand All @@ -56,8 +65,6 @@ coder exp scaletest create-workspaces \
--job-timeout 5h \
--no-cleanup \
--output json:"${SCALETEST_RESULTS_DIR}/create-workspaces.json"

# Run `coder exp scaletest create-workspaces --help` for all usage
```

The command does the following:
Expand All @@ -70,6 +77,12 @@ The command does the following:
1. If you don't want the creation process to be interrupted by any errors, use
the `--retry 5` flag.

For more built-in `scaletest` options, use the `--help` flag:

```shell
coder exp scaletest create-workspaces --help
```

### Traffic Generation

Given an existing set of workspaces created previously with `create-workspaces`,
Expand Down Expand Up @@ -105,7 +118,11 @@ The `workspace-traffic` supports also other modes - SSH traffic, workspace app:
1. For SSH traffic: Use `--ssh` flag to generate SSH traffic instead of Web
Terminal.
1. For workspace app traffic: Use `--app [wsdi|wsec|wsra]` flag to select app
behavior. (modes: _WebSocket discard_, _WebSocket echo_, _WebSocket read_).
behavior.

- `wsdi`: WebSocket discard
- `wsec`: WebSocket echo
- `wsra`: WebSocket read

### Cleanup

Expand Down
20 changes: 15 additions & 5 deletions docs/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,11 @@
"title": "Scaling Utilities",
"description": "Tools to help you scale your deployment",
"path": "./admin/infrastructure/scale-utility.md"
},
{
"title": "Scaling best practices",
"description": "How to prepare a Coder deployment for scale",
"path": "./tutorials/best-practices/scale-coder.md"
}
]
},
Expand Down Expand Up @@ -761,16 +766,21 @@
"description": "Guides to help you make the most of your Coder experience",
"path": "./tutorials/best-practices/index.md",
"children": [
{
"title": "Security - best practices",
"description": "Make your Coder deployment more secure",
"path": "./tutorials/best-practices/security-best-practices.md"
},
{
"title": "Organizations - best practices",
"description": "How to make the best use of Coder Organizations",
"path": "./tutorials/best-practices/organizations.md"
},
{
"title": "Scale Coder",
"description": "How to prepare a Coder deployment for scale",
"path": "./tutorials/best-practices/scale-coder.md"
},
{
"title": "Security - best practices",
"description": "Make your Coder deployment more secure",
"path": "./tutorials/best-practices/security-best-practices.md"
},
{
"title": "Speed up your workspaces",
"description": "Speed up your Coder templates and workspaces",
Expand Down
Loading
Loading