-
Notifications
You must be signed in to change notification settings - Fork 881
feat: track resource replacements when claiming a prebuilt workspace #17571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If you’re using prebuilds to speed up provisioning, unexpected replacements will slow down | ||
workspace startup—even when claiming a prebuilt environment. | ||
|
||
For tips on preventing replacements and improving claim performance, see [this guide](https://coder.com/docs/TODO). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relies on #17580
4859410
to
634082b
Compare
0322146
to
13168f4
Compare
@@ -75,6 +75,7 @@ message CompletedJob { | |||
repeated provisioner.Resource resources = 2; | |||
repeated provisioner.Timing timings = 3; | |||
repeated provisioner.Module modules = 4; | |||
repeated provisioner.ResourceReplacement resourceReplacements = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is backwards-compatible since repeated
fields are effectively optional
if nil length.
// nolint:gocritic // Necessary to query all the required data. | ||
ctx = dbauthz.AsSystemRestricted(ctx) | ||
// Since this may be called in a fire-and-forget fashion, we need to give up at some point. | ||
trackCtx, trackCancel := context.WithTimeout(ctx, time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a best-effort attempt to warn operators of this situation; it's ok if it times out, we'll get a log to trace this with.
@@ -258,7 +258,7 @@ func getStateFilePath(workdir string) string { | |||
} | |||
|
|||
// revive:disable-next-line:flag-parameter | |||
func (e *executor) plan(ctx, killCtx context.Context, env, vars []string, logr logSink, destroy bool) (*proto.PlanComplete, error) { | |||
func (e *executor) plan(ctx, killCtx context.Context, env, vars []string, logr logSink, metadata *proto.Metadata) (*proto.PlanComplete, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the meat of the feature; everything else is just plumbing between system and user eyeball.
level := proto.LogLevel_INFO | ||
|
||
// Terraform indicates that a resource will be deleted and recreated by showing the change along with this substring. | ||
if bytes.Contains(line, []byte("# forces replacement")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit flimsy; open to other ideas.
In any case, this is just sugar. The fact that the plan, with all its drift details, are shown will be sufficient. Highlighting the lines is just a courtesy to the user.
|
||
// TrackResourceReplacement handles a pathological situation whereby a terraform resource is replaced due to drift, | ||
// which can obviate the whole point of pre-provisioning a prebuilt workspace. | ||
// See more detail at https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces.md#preventing-resource-replacement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on #17580
If you’re using prebuilds to speed up provisioning, unexpected replacements will slow down | ||
workspace startup—even when claiming a prebuilt environment. | ||
|
||
For tips on preventing replacements and improving claim performance, see [this guide](https://coder.com/docs/admin/templates/extending-templates/prebuilt-workspaces.md#preventing-resource-replacement). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on #17580
@@ -13,43 +16,65 @@ import ( | |||
"github.com/coder/coder/v2/coderd/prebuilds" | |||
) | |||
|
|||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Muddies the purpose of the PR a bit, but it was a worthwhile driveby refactoring given that we're adding a new metric (MetricResourceReplacementsCount
) and we need to check for its value in a test.
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
…replacement(s) Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
also a test that was broken from an earlier fix Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
13168f4
to
70f9a53
Compare
@@ -42,6 +42,11 @@ FROM templates t | |||
WHERE tvp.desired_instances IS NOT NULL -- Consider only presets that have a prebuild configuration. | |||
AND (t.id = sqlc.narg('template_id')::uuid OR sqlc.narg('template_id') IS NULL); | |||
|
|||
-- name: GetTemplatePresetsByID :one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should the name be singular instead of plural?
- move it to
presets.sql
? - there are similar query:
-- name: GetPresetByID :one
, you may consider reuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I missed that.
@@ -75,6 +75,7 @@ message CompletedJob { | |||
repeated provisioner.Resource resources = 2; | |||
repeated provisioner.Timing timings = 3; | |||
repeated provisioner.Module modules = 4; | |||
repeated provisioner.ResourceReplacement resourceReplacements = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed on Slack, but these changes need to bump the minor version --- unless it was already bumped since the last release, in which case you need to update the comment describing the version bump, but don't need to bump it twice in a single release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also prebuild_claim_for_user_id
in provisionersdk/proto/provisioner.proto
, but I'm trying to see if I can remove this in favour of passing whether the workspace-prebuilds
experiment is used down, since it's effectively just being used as a control flag for that in provisioner/terraform/executor.go
.
Closes coder/internal#369
We can't know whether a replacement (i.e. drift of terraform state leading to a resource needing to be deleted/recreated) will take place apriori; we can only detect it at
plan
time, because the provider decides whether a resource must be replaced and it cannot be inferred through static analysis of the template.This is likely to be the most common gotcha with using prebuilds, since it requires a slight template modification to use prebuilds effectively, so let's head this off before it's an issue for customers.
Drift details will now be logged in the workspace build logs:
Plus a notification will be sent to template admins when this situation arises:
A new metric -
coderd_prebuilt_workspaces_resource_replacements_total
- will also increment each time a workspace encounters replacements.We only track that a resource replacement occurred, not how many. Just one is enough to ruin a prebuild, but we can't know apriori which replacement would cause this.
For example, say we have 2 replacements: a
docker_container
and anull_resource
; we don't know which one mightcause an issue (or indeed if either would), so we just track the replacement.