Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@Kira-Pilot
Copy link
Member

@Kira-Pilot Kira-Pilot commented Apr 10, 2023

Trying to address #5800 and #6241

Need some help figuring out the best approach here. We've added a new transition, restart and attempted to amend api.postWorkspaceBuilds such that if a restart is requested, we insert two workspace builds: 1 for stop and 1 for start.

I'm worried Provisioner isn't set up to handle this scenario. @mafredri pointed out that if we create two jobs, we may have separate daemons execute each simultaneously.

@Kira-Pilot Kira-Pilot marked this pull request as draft April 10, 2023 19:53
@Kira-Pilot Kira-Pilot requested a review from mafredri April 10, 2023 20:13
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kira-Pilot I think your approach here can work, but ultimately I wonder if we should take a slightly different one.

My thought is that we would introduce a new transition state for workspaces called restart (

type WorkspaceTransition string
const (
WorkspaceTransitionStart WorkspaceTransition = "start"
WorkspaceTransitionStop WorkspaceTransition = "stop"
WorkspaceTransitionDelete WorkspaceTransition = "delete"
)
).

The motivation behind a new state is that right now, the restart action is dependent on the Coder API and can be interrupted. Consider if we issue restart, and begin by stopping. Now while the workspace is stopping, the coder server is restarted (or updated). Now the workspace would remain in the stopped state.

The new restart state would allow for the state to be fully registered so that we can ensure it's completion.

Do you have any thoughts on this @kylecarbs?

Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: the abort transaction error mentioned in the Github issue can be fixed by increasing the build number for the "start" job.

Regarding the overall design, the "restart" transition is purely virtual, and could do the trick if the provisionerserver doesn't acquire jobs that are conflicting. In this case, stop and start are executed at the same time:

Screenshot 2023-04-13 at 10 47 48

and the "stop" job fails:

Screenshot 2023-04-13 at 10 48 19

... or does nothing:

Screenshot 2023-04-13 at 10 50 49

Fun fact:

a similar inconsistency can be observed for the "start" job. It can just add a new resource:

Screenshot 2023-04-13 at 10 52 20

... or add the resource and destroy the old one (depending on the race with "stop" job):

Screenshot 2023-04-13 at 10 53 12

This inconsistency is kind of funny, because for some (all?) providers like Docker it's absolutely fine to just call /workspacebuilds with transition=start. I'm afraid that to fully solve the racing challenge, you need to implement mutual exclusiveness between provisioner jobs, so tweak the provisionerserver. I strongly recommend doing it in a separate PR.

@mafredri Please doublecheck if this all makes sense 👍

@mafredri
Copy link
Member

@mtojek great insights. I like your proposal of limiting the acquirement of conflicting jobs in the provisioner(s). One thing that worries me though is if stop fails, we should probably abort the start job? But thins brings me to another idea for potentially solving the conflict:

Provisioner job dependencies. I.e. a job could depend on another job. Meaning we wouldn't pick up a job if it's dependent on a job that hasn't completed. A toggle for requiring success or simply completion (to continue after fail) of the dependent job could also be added. (Motivation for toggle: Possibility to implement "force restart".)

@mtojek
Copy link
Member

mtojek commented Apr 13, 2023

Provisioner job dependencies. I.e. a job could depend on another job. Meaning we wouldn't pick up a job if it's dependent on a job that hasn't completed. A toggle for requiring success or simply completion (to continue after fail) of the dependent job could also be added. (Motivation for toggle: Possibility to implement "force restart".)

It is a valid point and good motivation for feature expansion. I'm a bit concerned about the provisioner job being left in the queue forever (if the dependent job failed), so maybe provisionerserver can pick it up and simply discard it.

@mafredri
Copy link
Member

It is a valid point and good motivation for feature expansion. I'm a bit concerned about the provisioner job being left in the queue forever (if the dependent job failed), so maybe provisionerserver can pick it up and simply discard it.

Yeah, that's about how I'd imagine it would behave as well. Or rather than discard, mark the dependent job as failed too.

@Kira-Pilot
Copy link
Member Author

@mafredri and @mtojek - thanks so much for your feedback! I spoke with @sreya and we've decided to close this PR for now in favor of #7137, which is an entirely FE solution. The concern is that this feature is quite small and does not warrant a Provisioner update. That might change in the future, in which case I'd resurrect this implementation. Let me know if you have any concerns!

@Kira-Pilot Kira-Pilot closed this Apr 14, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Apr 14, 2023
@mafredri
Copy link
Member

@Kira-Pilot sounds good, that is what the CLI does currently too so we should deal with it at some point but doesn't have to be now. 👍🏻

@Kira-Pilot Kira-Pilot deleted the workspace-restart-button/kira-pilot branch September 25, 2023 15:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants