feat: Add provisionerdaemon to coderd #141

kylecarbs · 2022-02-02T01:21:52Z

Integrates provisionerd into coderd. This modifies workspace history tests to provision resources with Terraform, and ensure the result was successful.

These will be used for streaming logs, checking status, and other operations related to workspace and project history.

Nesting all structs behind their respective structures is leaky, and promotes naming conflicts between handlers. Our HTTP routes cannot have conflicts, so neither should function naming.

codecov · 2022-02-02T01:23:57Z

Codecov Report

Merging #141 (aa760cb) into main (35dd208) will increase coverage by 2.50%.
The diff coverage is 54.24%.

@@            Coverage Diff             @@
##             main     #141      +/-   ##
==========================================
+ Coverage   64.92%   67.43%   +2.50%     
==========================================
  Files          55      101      +46     
  Lines         633     5098    +4465     
  Branches       68       68              
==========================================
+ Hits          411     3438    +3027     
- Misses        210     1349    +1139     
- Partials       12      311     +299

Flag	Coverage Δ
unittest-go-macos-latest	`64.33% <50.82%> (?)`
unittest-go-ubuntu-latest	`66.53% <51.68%> (?)`
unittest-js	`64.92% <ø> (ø)`

Impacted Files	Coverage Δ
coderd/workspaces.go	`66.35% <ø> (ø)`
codersdk/provisioners.go	`0.00% <0.00%> (ø)`
peer/conn.go	`79.32% <0.00%> (ø)`
coderd/provisionerdaemons.go	`47.33% <47.33%> (ø)`
coderd/workspacehistory.go	`55.13% <56.62%> (ø)`
codersdk/projects.go	`70.00% <57.14%> (ø)`
provisionersdk/transport.go	`55.31% <57.14%> (ø)`
provisionerd/provisionerd.go	`71.64% <60.52%> (ø)`
coderd/projecthistory.go	`65.21% <62.50%> (ø)`
httpmw/workspacehistoryparam.go	`68.88% <70.00%> (ø)`
... and 50 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35dd208...aa760cb. Read the comment docs.

coderd/provisionerdaemons.go

bryphe-coder · 2022-02-02T01:32:57Z

coderd/provisioners.go

+	CancelledAt *time.Time               `json:"canceled_at,omitempty"`
+	CompletedAt *time.Time               `json:"completed_at,omitempty"`


Curious why both CancelledAt and CompletedAt are needed? I'm thinking there could be just FinishedAt - and it would be the completion time on success, and cancel time if cancelled.

bryphe-coder · 2022-02-02T01:33:07Z

coderd/provisioners.go

+	ProvisionerJobStatusPending   ProvisionerJobStatus = "pending"
+	ProvisionerJobStatusRunning   ProvisionerJobStatus = "running"
+	ProvisionerJobStatusSucceeded ProvisionerJobStatus = "succeeded"
+	ProvisionerJobStatusFailed    ProvisionerJobStatus = "failed"


Should there be a state for cancelled?

I suppose I'm not sure. I felt like failed would represent cancelled, but I suppose that's a different state. I'm going to add it in!

coderd/workspacehistory.go

bryphe-coder · 2022-02-02T01:49:19Z

coderd/workspacehistory.go

 	WorkspaceID      uuid.UUID                    `json:"workspace_id"`
 	ProjectHistoryID uuid.UUID                    `json:"project_history_id"`
 	BeforeID         uuid.UUID                    `json:"before_id"`
 	AfterID          uuid.UUID                    `json:"after_id"`
 	Transition       database.WorkspaceTransition `json:"transition"`
 	Initiator        string                       `json:"initiator"`
+	Provision        ProvisionerJob               `json:"provision"`


Cool, looks like this type w/ Provision now has most of the stuff I need to power the workspaces timeline in my mock!

bryphe-coder · 2022-02-02T01:50:42Z

coderd/workspacehistory.go

@@ -22,13 +23,13 @@ type WorkspaceHistory struct {
 	ID               uuid.UUID                    `json:"id"`
 	CreatedAt        time.Time                    `json:"created_at"`
 	UpdatedAt        time.Time                    `json:"updated_at"`
-	CompletedAt      time.Time                    `json:"completed_at"`
 	WorkspaceID      uuid.UUID                    `json:"workspace_id"`
 	ProjectHistoryID uuid.UUID                    `json:"project_history_id"`
 	BeforeID         uuid.UUID                    `json:"before_id"`
 	AfterID          uuid.UUID                    `json:"after_id"`
 	Transition       database.WorkspaceTransition `json:"transition"`
 	Initiator        string                       `json:"initiator"`


Not an issue with this PR, but I worry that this schema will make it difficult for a user to change names. That username-change will have to percolate through all these places we're storing their name.

This is a user ID. It's not very clear, so I should add a comment.

We can't use the uuid.UUID type, because we have to support v1 users.

bryphe-coder

Just had a few thoughts and questions that I had while reading through - nothing blocking though.

Nice work on connecting up these pieces!

There is a race condition in the interop between the websocket and `dRPC`: https://github.com/coder/coder/runs/5038545709?check_suite_focus=true#step:7:117 - it seems both the websocket and dRPC feel like they own the `byte[]` being sent between them. This can lead to data races, in which both `dRPC` and the websocket are writing. This is just tracking some experimentation to fix that race condition ## Run results: ## - Run 1: peer test failure - Run 2: peer test failure - Run 3: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040858460?check_suite_focus=true#step:8:45 ``` status code 412: The provided project history is running. Wait for it to complete importing!` ``` - Run 4: `TestWorkspaceHistory/CreateHistory` - https://github.com/coder/coder/runs/5040957999?check_suite_focus=true#step:7:176 ``` workspacehistory_test.go:122: Error Trace: workspacehistory_test.go:122 Error: Condition never satisfied Test: TestWorkspaceHistory/CreateHistory ``` - Run 5: peer failure - Run 6: Pass ✅ - Run 7: Peer failure ## Open Questions: ## ### Is `dRPC` or `websocket` at fault for the data race? It looks like this condition is specifically happening when `dRPC` decides to [`SendError`]). This constructs a new byte payload from [`MarshalError`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/error.go#L15) - so `dRPC` has created this buffer and owns it. From `dRPC`'s perspective, the callstack looks like this: - [`sendPacket`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcstream/stream.go#L253) - [`writeFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L65) - [`AppendFrame`](https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/packet.go#L128) - with finally the data race happening here: ```go // AppendFrame appends a marshaled form of the frame to the provided buffer. func AppendFrame(buf []byte, fr Frame) []byte { ... out := buf out = append(out, control). // <--------- ``` This should be fine, since `dPRC` create this buffer, and is taking the byte buffer constructed from `MarshalError` and tacking a bunch of headers on it to create a proper frame. Once `dRPC` is done writing, it _hangs onto the buffer and resets it here__: https://github.com/storj/drpc/blob/f6e369438f636b47ee788095d3fc13062ffbd019/drpcwire/writer.go#L73 However... the websocket implementation, once it gets the buffer, it runs a `statelessDeflate` [here](https://github.com/nhooyr/websocket/blob/8dee580a7f74cf1713400307b4eee514b927870f/write.go#L180), which compresses the buffer on the fly. This functionality actually [mutates the buffer in place](https://github.com/klauspost/compress/blob/a1a9cfc821f00faf2f5231beaa96244344d50391/flate/stateless.go#L94), which is where get our race. In the case where the `byte[]` aren't being manipulated anywhere else, this compress-in-place operation would be safe, and that's probably the case for most over-the-wire usages. In this case, though, where we're plumbing `dRPC` -> websocket, they both are manipulating it (`dRPC` is reusing the buffer for the next `write`, and `websocket` is compressing on the fly). ### Why does cloning on `Read` fail? Get a bunch of errors like: ``` 2022/02/02 19:26:10 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0 2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF 2022/02/02 19:26:25 [ERR] yamux: Failed to read header: unexpected EOF 2022/02/02 19:26:25 [WARN] yamux: frame for missing stream: Vsn:0 Type:0 Flags:0 StreamID:0 Length:0 ``` # UPDATE: We decided we could disable websocket compression, which would avoid the race because the in-place `deflate` operaton would no longer be run. Trying that out now: - Run 1: ✅ - Run 2: https://github.com/coder/coder/runs/5042645522?check_suite_focus=true#step:8:338 - Run 3: ✅ - Run 4: https://github.com/coder/coder/runs/5042988758?check_suite_focus=true#step:7:168 - Run 5: ✅

Found another data race while running the tests: https://github.com/coder/coder/runs/5044320845?check_suite_focus=true#step:7:83 __Issue:__ There is a race in the p.acquiredJobDone chan - in particular, there can be a case where we're waiting on the channel to finish (in close) with <-p.acquiredJobDone, but in parallel, an acquireJob could've been started, which would create a new channel for p.acquiredJobDone. There is a similar race in `close(..)`ing the channel, which also came up in test runs. __Fix:__ Instead of recreating the channel everytime, we can use `sync.WaitGroup` to accomplish the same functionality - a semaphore to make close wait for the current job to wrap up.

This is an attempted fix for failures like: https://github.com/coder/coder/runs/5043435263?check_suite_focus=true#step:7:32 Looking at the timing of the test: ``` t.go:56: 2022-02-02 21:33:21.964 [DEBUG] (terraform-provisioner) <provision.go:139> ran apply t.go:56: 2022-02-02 21:33:21.991 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.050 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.090 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.140 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.195 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running t.go:56: 2022-02-02 21:33:22.240 [DEBUG] (provisionerd) <provisionerd.go:162> skipping acquire; job is already running workspacehistory_test.go:122: Error Trace: workspacehistory_test.go:122 Error: Condition never satisfied Test: TestWorkspaceHistory/CreateHistory ``` It appears that the `terraform apply` job had just finished - with less than a second to spare until our `require.Eventually` completes - but there's still work to be done (ie, collecting the state files). So my suspicion is that terraform might, in some cases, exceed our 5s timeout. Note that in the setup for this test - there is a similar project history wait that waits for 15s, so I borrowed that here. In the future - we can look at potentially using a simple echo provider to exercise this in the unit test, in a way that is more reliable in terms of timing. I'll log an issue to track that.

kylecarbs added 8 commits February 1, 2022 18:19

feat: Add history middleware parameters

ea69016

These will be used for streaming logs, checking status, and other operations related to workspace and project history.

refactor: Move all HTTP routes to top-level struct

7a9f714

Nesting all structs behind their respective structures is leaky, and promotes naming conflicts between handlers. Our HTTP routes cannot have conflicts, so neither should function naming.

Add provisioner daemon routes

33ecab0

Add periodic updates

7987596

Skip pubsub if short

bc27864

Merge branch 'main' into provisionerdaemon

c448e99

Return jobs with WorkspaceHistory

d062b5b

Add endpoints for extracting singular history

0a83477

kylecarbs requested a review from bryphe-coder February 2, 2022 01:21

kylecarbs self-assigned this Feb 2, 2022

kylecarbs force-pushed the provisionerdaemon branch from 6606e01 to 0a1623f Compare February 2, 2022 01:24

bryphe-coder reviewed Feb 2, 2022

View reviewed changes

coderd/provisionerdaemons.go Show resolved Hide resolved

bryphe-coder reviewed Feb 2, 2022

View reviewed changes

coderd/workspacehistory.go Show resolved Hide resolved

bryphe-coder reviewed Feb 2, 2022

View reviewed changes

bryphe-coder approved these changes Feb 2, 2022

View reviewed changes

kylecarbs force-pushed the provisionerdaemon branch 3 times, most recently from cb2733b to 3e1e60c Compare February 2, 2022 03:26

The full end-to-end operation works

e87f31d

kylecarbs force-pushed the provisionerdaemon branch 7 times, most recently from 7eb5519 to 319f640 Compare February 2, 2022 04:15

kylecarbs force-pushed the provisionerdaemon branch 7 times, most recently from 53802fb to f26bdbb Compare February 2, 2022 13:35

Merge branch 'main' into provisionerdaemon

2a76b60

kylecarbs force-pushed the provisionerdaemon branch from f26bdbb to 2a76b60 Compare February 2, 2022 13:45

bryphe-coder and others added 4 commits February 2, 2022 13:32

Merge branch 'main' into provisionerdaemon

aa760cb

kylecarbs enabled auto-merge (squash) February 3, 2022 20:23

kylecarbs merged commit e75bde4 into main Feb 3, 2022

kylecarbs deleted the provisionerdaemon branch February 3, 2022 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add provisionerdaemon to coderd #141

feat: Add provisionerdaemon to coderd #141

Uh oh!

kylecarbs commented Feb 2, 2022

Uh oh!

codecov bot commented Feb 2, 2022 •

edited

Loading

Uh oh!

Uh oh!

bryphe-coder Feb 2, 2022

Uh oh!

bryphe-coder Feb 2, 2022

Uh oh!

kylecarbs Feb 2, 2022

Uh oh!

Uh oh!

bryphe-coder Feb 2, 2022

Uh oh!

bryphe-coder Feb 2, 2022

Uh oh!

kylecarbs Feb 2, 2022

Uh oh!

bryphe-coder left a comment

Uh oh!

Uh oh!

		CancelledAt *time.Time `json:"canceled_at,omitempty"`
		CompletedAt *time.Time `json:"completed_at,omitempty"`

feat: Add provisionerdaemon to coderd #141

feat: Add provisionerdaemon to coderd #141

Uh oh!

Conversation

kylecarbs commented Feb 2, 2022

Uh oh!

codecov bot commented Feb 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

bryphe-coder Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

bryphe-coder Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

kylecarbs Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bryphe-coder Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

bryphe-coder Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

kylecarbs Feb 2, 2022

Choose a reason for hiding this comment

Uh oh!

bryphe-coder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Feb 2, 2022 •

edited

Loading