Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: Allow running standalone provisioner daemons #3563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

dwahler
Copy link
Contributor

@dwahler dwahler commented Aug 18, 2022

This PR adds support for running out-of-process provisioner daemon instances, authenticated by tokens (basically the same way we authenticate workspace agents).

Example usage

# Run coderd with the in-process provisioners disabled
$ coder server --address http://127.0.0.1:3000 --provisioner-daemons 0
[...]
$ coder provisioners create foobar
A new provisioner daemon has been registered.

Start the provisioner daemon with the following command:

coder provisioners run --token 033dc6c6-48a6-4316-83e3-536433a57521
$ coder provisioners run --token 033dc6c6-48a6-4316-83e3-536433a57521
[...]

Other notable things about this change:

  • ProvisionerDaemon API resources now have an auth_token field, which is null for in-process provisioners. External connections must include a non-null token in the session_token cookie. To avoid leaking tokens, we now prevent non-owner users from listing provisioner daemons through the API.
  • For more convenient testing, adds the ability to pass flags to coder server using ./scripts/develop.sh -- <flags...>.
  • Since we want to include the echo provisioner when running tests, but not when running in production, the provisioner daemon now performs a Connect RPC as its first action on each connection to coderd. This message is used to register the set of supported provisioner names and store them in the provisioner_daemons.provisioners database field.
  • As a side effect of the previous point, the provisioner_daemons.updated_at field, which was previously never set, is now updated on every provisioner daemon connection.

Fixes #1391, fixes #1392, fixes #1393, fixes #1605

@github-actions
Copy link

This Pull Request is becoming stale. In order to minimize WIP, prevent merge conflicts and keep the tracker readable, I'm going close to this PR in 3 days if there isn't more activity.

@github-actions github-actions bot added the stale This issue is like stale bread. label Aug 26, 2022
@kylecarbs kylecarbs removed the stale This issue is like stale bread. label Aug 29, 2022
@dwahler dwahler marked this pull request as ready for review August 30, 2022 00:50
@dwahler dwahler requested a review from a team as a code owner August 30, 2022 00:50
@dwahler dwahler requested review from presleyp and a team and removed request for a team August 30, 2022 00:50
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job on this PR! In general I think it looks really good, despite the number of comments I left.

return exitErr
},
}
defaultCacheDir := filepath.Join(os.TempDir(), "coder-cache")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (optional): We could consider fixing #2534 (partially) here too? Something like adrg/xdg and using xdg.CacheHome could work (I only took a quick look, and it seems pretty fully-featured).

Then again, perhaps we should do a more thorough fix all at once, (i.e. respect XDG elsewhere too, like config).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, but I think this changeset is already kind of big and sprawling as it is (it fixes four separate issues) so I would lean toward doing that in a separate PR.

require.ErrorIs(t, err, context.Canceled, "provisioner command terminated with error")
}()

ctx, cancelFunc := context.WithCancel(context.Background())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider allowing tests to timeout individually, e.g.:

Suggested change
ctx, cancelFunc := context.WithCancel(context.Background())
ctx, cancelFunc := context.WithTimeout(context.Background(), testutil.WaitLong)

@@ -203,7 +203,8 @@ CREATE TABLE provisioner_daemons (
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone,
name character varying(64) NOT NULL,
provisioners provisioner_type[] NOT NULL
provisioners provisioner_type[] NOT NULL,
auth_token uuid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use-case for allowing auth_token to be NULL? Do we want to be able to revoke an auth token without deleting the provisioner daemon? Maybe we want to lookup what provisioner daemon had a specific auth token at some point? In that case it could make sense to have a deleted bool field instead of a nullable auth token.

Maybe auth tokens should be renewable too? That could also work as a delete/create using the same name (would also leave a "trace" about which auth token has produced what builds).

Another alternative would be to have provisioner_daemon_auth_tokens with fields like daemon_id, token, granted, revoked.

Just putting out some ideas, since I wasn't sure of the purpose of the nullability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, this could have used a better explanation. It doesn't have anything to do with changing or revoking auth tokens.

With this change, we support both in-process and out-of-process provisioners, both of which are persisted in the database. An in-process provisioner has a NULL auth_token to represent the fact that external connections are not allowed to "become" that provisioner. That seemed cleaner than assigning a token that would never be used.

api.websocketWaitMutex.Unlock()
defer api.websocketWaitGroup.Done()

conn, err := websocket.Accept(rw, r, &websocket.AcceptOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do consider that r.Context() is invalid from this point onwards (due to http Hijack). So if it's relied upon for connection closure / cancellation, it will not work.

You can consider using func websocketNetConn to rewire the context below instead of websocket.NetConn(...).

}

errCh := make(chan error, 1)
provisionerDaemon, err := newProvisionerDaemon(ctx, client.ListenProvisionerDaemon, logger, cacheDir, errCh, useEchoProvisioner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying it has to happen here, but since the provisioner has a name, consider using filepath.Join(cacheDir, "provisionerd", name). Perhaps in newProvisionerDaemon.

This will allow multiple provisioners to run on the same machine without potentially breaking terraform init.

return user
}

// ExtractWorkspaceAgent requires authentication using a valid provisioner token.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ExtractWorkspaceAgent requires authentication using a valid provisioner token.
// ExtractProvisionerDaemon requires authentication using a valid provisioner token.

}
token, err := uuid.Parse(cookie.Value)
if err != nil {
httpapi.Write(rw, http.StatusUnauthorized, codersdk.Response{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider this a bad request, but perhaps there's a reason it's unauthorized?

Suggested change
httpapi.Write(rw, http.StatusUnauthorized, codersdk.Response{
httpapi.Write(rw, http.StatusBadRequest, codersdk.Response{

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly just for consistency with ExtractWorkspaceAgent, but I think a 401 error is reasonable here.

From a client's point of view, a token is an opaque string, and our implementation happens to generate tokens that look like UUIDs. Since we return a 401 error if the cookie is a valid UUID but isn't a token that exists in the DB, it makes sense to return the same error if it's not a valid UUID (and therefore can't be a valid token).

if err != nil {
if errors.Is(err, sql.ErrNoRows) {
httpapi.Write(rw, http.StatusUnauthorized, codersdk.Response{
Message: "Provisioner token is invalid.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is a bit misleading, token is a valid (format), but not registered or revoked. Simply saying Forbidden. could work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like my other comment, this is basically just mimicking the way we handle agent tokens. I think if we get a token that isn't equal to the auth_token of a valid provisioner, we should generate the same error message regardless of whether it happens to be formatted like a UUID.

But I'm totally open to changing the wording if there's a better way to describe that situation than "invalid".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of auth, I think "invalid" encompasses both format and non-format problems with the credential. Wording here is fine IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spikecurtis I agree from a security perspective (don't reveal too much), but from a usability perspective I think it could be more helpful to the user. But I'm fine with either or.

}
conn, res, err := websocket.Dial(ctx, serverURL.String(), &websocket.DialOptions{
HTTPClient: httpClient,
// Need to disable compression to avoid a data-race.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't your code, but I wonder if anyone knows what the data race is with compression enabled? 😄

return err
}

if provisionerDaemon.AuthToken == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of related to my other question about nullability of the auth_token field, but why would this ever be allowed to happen during create?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it shouldn't be possible. A null auth_token would indicate that the provisioner was incorrectly registered as "in-process" rather than "out-of-process". But if that does somehow happen, this is just a sanity check so that we generate a meaningful error message rather than panicking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it so the API errors instead (and doesn't return a nullable UUID)? (I think that would be nicer for consumers in general.)

Copy link
Contributor

@presleyp presleyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend ✅

Copy link
Member

@Emyrk Emyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question. If coderd or the provisionerd go down, do they reconnect?

}

func (api *API) postProvisionerDaemon(rw http.ResponseWriter, r *http.Request) {
// Create the user on the site.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect comment

Suggested change
// Create the user on the site.

@@ -141,6 +141,9 @@ func (p *Server) connect(ctx context.Context) {
if p.isClosed() {
return
}

p.sendConnectRequest(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this fail? If it does, is it ok to just go to the for loop?

@github-actions
Copy link

github-actions bot commented Sep 7, 2022

This Pull Request is becoming stale. In order to minimize WIP, prevent merge conflicts and keep the tracker readable, I'm going close to this PR in 3 days if there isn't more activity.

@github-actions github-actions bot added the stale This issue is like stale bread. label Sep 7, 2022
@dwahler dwahler removed the stale This issue is like stale bread. label Sep 9, 2022
@github-actions
Copy link

This Pull Request is becoming stale. In order to minimize WIP, prevent merge conflicts and keep the tracker readable, I'm going close to this PR in 3 days if there isn't more activity.

@github-actions github-actions bot added the stale This issue is like stale bread. label Sep 17, 2022
@bpmct
Copy link
Member

bpmct commented Sep 19, 2022

Tried it and worked like a charm (both as a separate process and on a different machine). Some feedback/questions. Don't have to be addressed in this PR.

Wondering if we should show another message instead of Queued when there are no provisioners present. Maybe even the place in line could help?

codertester@coder-v2:/tmp/docker$ coder templates create
> Create and upload "/tmp/docker"? (yes/no) yes
⧗  Queued 

Is there a way to "assign" a workspace to a provisioner daemon? Some use cases in mind

  • Specific provisioner/Docker host is used for workspaces
  • Template requires a specific provisioner daemon that has credentials

Can we add a coder provisioners ls?

@github-actions github-actions bot removed the stale This issue is like stale bread. label Sep 20, 2022
@github-actions
Copy link

This Pull Request is becoming stale. In order to minimize WIP, prevent merge conflicts and keep the tracker readable, I'm going close to this PR in 3 days if there isn't more activity.

@github-actions github-actions bot added the stale This issue is like stale bread. label Sep 27, 2022
@github-actions github-actions bot closed this Sep 30, 2022
@github-actions github-actions bot deleted the dwahler/provisionerd branch March 1, 2023 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This issue is like stale bread.
Projects
None yet
7 participants