Thanks to visit codestin.com
Credit goes to github.com

Skip to content

chore(docs): tweak replica verbiage on reference architectures #16076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

stirby
Copy link
Collaborator

@stirby stirby commented Jan 8, 2025

A seller noted that the / operator made the node count hard to interpret.

| Up to 1,000 | 2 vCPU, 8 GB memory | 1-2 / 1 coderd each | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |
| Users | Node capacity | Replicas | GCP | AWS | Azure |
|-------------|---------------------|--------------------------|-----------------|------------|-------------------|
| Up to 1,000 | 2 vCPU, 8 GB memory | 1-2 nodes, 1 coderd each | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Up to 1,000 | 2 vCPU, 8 GB memory | 1-2 nodes, 1 coderd each | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |
| Up to 1,000 | 2 vCPU, 8 GB memory | 1-2 nodes | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |

Is it technically possible to run more than 1 coderd on each node? If yes does this benefit any of the use cases or customers? Why would someone run multiple coderd on a single node?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it technically possible to run more than 1 coderd on each node?

Yes, this can happen automatically during a rollout or during node unavailability.
Note that we do set a pod anti-affinity rule [1] in our Helm chart to prefer spreading out replicas across multiple nodes.

If yes does this benefit any of the use cases or customers?
Why would someone run multiple coderd on a single node?

As far as I'm aware, the main reason to do this would be more for redundancy in case one or more pods become unavilable for whatever reason.

The only other reason I could imagine for running multiple replicas on a single node is to spread out connections across more coderd replicas to minimize the user-facing impact of a single pod failing. However, this won't protect against a failure of the underlying node.

I'll defer to @spikecurtis to weigh in more on the pros and cons of running multiple replicas per node.

[1] https://github.com/coder/coder/blob/main/helm/coder/values.yaml#L223-L237

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any reference architectures we should always recommend having 1 coderd per node.

There are generally 2 reasons for multiple replicas: fault tolerance and scale.

For fault tolerance, you want the replicas spread out into different failure domains. Having all replicas on the same node means you aren't tolerant of node-level faults. There might still be some residual value in being tolerant to replica level faults: e.g. software crashes, OOM. But, most people would rather the higher fault tolerance.

For scale, coderd is written to take advantage of multiple CPU cores in one process, so there is no scale advantage of putting multiple coderd instances on a single node. In fact, it's likely bad for scale since you have multiple processes competing for resources, and extra overhead of coderd to coderd communication.

@stirby stirby merged commit 5380690 into main Jan 14, 2025
28 checks passed
@stirby stirby deleted the scaletesting-typo branch January 14, 2025 22:26
@github-actions github-actions bot locked and limited conversation to collaborators Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants