-
Notifications
You must be signed in to change notification settings - Fork 925
feat(agent/agentcontainers): retry with longer name on failure #18513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Closes coder/internal#732 We now try (up to 5 times) when attempting to create an agent using the workspace folder as the name. It is important to note this flow is only ever ran when attempting to create an agent using the workspace folder as the name. If a deployment uses terraform or the devcontainer customization, we do not fall back to this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a retry mechanism for sub-agent creation when using the workspace-folder-derived name fails due to unique constraint violations, and covers the new behavior with tests.
- Implement retry loop (up to 5 attempts) with increasingly expanded names on unique‐constraint errors.
- Track when the workspace-folder name is used (
usingWorkspaceFolderName
) and fall back toexpandedAgentName
. - Extend the fake client and add tests (
TestSubAgentCreationWithNameRetry
andTestExpandedAgentName
) to validate collision handling.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
agent/agentcontainers/api.go | Add usingWorkspaceFolderName , retry loop with expandedAgentName , and PQ error handling |
agent/agentcontainers/api_test.go | Enhance fakeSubAgentClient for conflict simulation and add retry tests |
agent/agentcontainers/api_internal_test.go | Add tests for expandedAgentName logic at various depths and edge cases |
Comments suppressed due to low confidence (1)
agent/agentcontainers/api.go:22
- The retry code uses errors.As, but the standard "errors" package is not imported, causing a compile error. Add
import "errors"
.
"github.com/lib/pq"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested some changes to state handling and finalizing the name if creation succeeds (assuming it's not the containers name).
agent/agentcontainers/api.go
Outdated
@@ -591,6 +593,7 @@ func (api *API) processUpdatedContainersLocked(ctx context.Context, updated code | |||
// agent name based off of the folder name (i.e. no valid characters), | |||
// we will instead fall back to using the container's friendly name. | |||
dc.Name = safeAgentName(path.Base(filepath.ToSlash(dc.WorkspaceFolder)), dc.Container.FriendlyName) | |||
api.usingWorkspaceFolderName[dc.WorkspaceFolder] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check if dc.Name != dc.Container.FriendlyName
? Also, also good to ensure there's no conflict with api.devcontainerNames
so that we don't accidentally create this agent before the predetermined one and then trigger a reverse conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yep good point. Will do 👍
agent/agentcontainers/api.go
Outdated
recreateSuccessTimes map[string]time.Time // By workspace folder. | ||
recreateErrorTimes map[string]time.Time // By workspace folder. | ||
injectedSubAgentProcs map[string]subAgentProcess // By workspace folder. | ||
usingWorkspaceFolderName map[string]struct{} // By workspace folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: While struct{}
works and takes up less space, using a boolean would be a bit cleaner IMO, but up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to make that change 👍
agent/agentcontainers/api.go
Outdated
// We increase how much of the workspace folder is used for generating | ||
// the agent name. With each iteration there is greater chance of this | ||
// being successful. | ||
subAgentConfig.Name = expandedAgentName(dc.WorkspaceFolder, dc.Container.FriendlyName, attempt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this isn't "container friendly name", and the creation succeeds on next iteration, we should write it to api.devcontainerNames
and avoid re-evaluating it again so it no longer changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
// | ||
// We only care if sub agent creation has failed due to a unique constraint | ||
// violation on the agent name, as we can _possibly_ rectify this. | ||
if !strings.Contains(err.Error(), "workspace agent name") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors.As
is a bit tricky but typically works if you have a pointer to the type like:
myErr := &someError{}
// or maybe: var myErr *someError
if errors.As(err, &myErr) {
// ...
}
It's very possible I got something wrong there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I was trying. I'll try again because I'm not happy with the solution I ended up with.
I'm not entirely sure why it didn't work but it could possibly be because it is wrapped in xerrors.Errorf
and then transported over the wire via drpc. Is it possible that in the serialization and deserialization process some of that information was lost?
coder/coderd/agentapi/subagent.go
Lines 106 to 108 in 03d473f
if err != nil { | |
return nil, xerrors.Errorf("insert sub agent: %w", err) | |
} |
|
||
for attempt := 1; attempt <= maxAttempts; attempt++ { | ||
if proc.agent, err = client.Create(ctx, subAgentConfig); err == nil { | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can delete from usingWorkspaceFolder
here?
agent/agentcontainers/api.go
Outdated
originalName := subAgentConfig.Name | ||
maxAttempts := 5 | ||
|
||
for attempt := 1; attempt <= maxAttempts; attempt++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know all sub-agent names for the workspace at this point? If so, could we not do this check in-memory before trying the creation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. We know all sub-agent names for this agent. There could be another parent agent that creates dev containers (not really sure why you would but it is possible).
I'm happy to update the logic slightly to first create a known unique name for this parent agent, and then fallback to adding more context again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most workspaces are going to only have a single top-level agent. It makes sense to avoid this round trip if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'll try to update the logic to do the best it can before hand (and keep this fallback).
This logic is already a fallback from the assumed default of only 1 devcontainer per workspace (and span up with devcontainer up
instead of defined in terraform) so I think it probably isn't going to be a hot path anyways.
{ | ||
name: "path with multiple leading slashes", | ||
workspaceFolder: "///home/coder/project", | ||
friendlyName: "friendly-fallback", | ||
depth: 1, | ||
expected: "coder-project", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For funsies, how about a Windows-style path?
C:\Documents and Settings\My Username\Documents\Code\Some Project Version 3\
We can skip if you don't think that's valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, nice work!
Closes coder/internal#732
We now try (up to 5 times) when attempting to create an agent using the workspace folder as the name.
It is important to note this flow is only ever ran when attempting to create an agent using the workspace folder as the name. If a deployment uses terraform or the devcontainer customization, we do not fall back to this approach.