-
Notifications
You must be signed in to change notification settings - Fork 889
Workspace restart button/kira pilot #7070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kira-Pilot I think your approach here can work, but ultimately I wonder if we should take a slightly different one.
My thought is that we would introduce a new transition state for workspaces called restart
(
coder/codersdk/workspacebuilds.go
Lines 14 to 20 in 4dd5d79
type WorkspaceTransition string | |
const ( | |
WorkspaceTransitionStart WorkspaceTransition = "start" | |
WorkspaceTransitionStop WorkspaceTransition = "stop" | |
WorkspaceTransitionDelete WorkspaceTransition = "delete" | |
) |
The motivation behind a new state is that right now, the restart action is dependent on the Coder API and can be interrupted. Consider if we issue restart
, and begin by stopping. Now while the workspace is stopping, the coder server is restarted (or updated). Now the workspace would remain in the stopped state.
The new restart state would allow for the state to be fully registered so that we can ensure it's completion.
Do you have any thoughts on this @kylecarbs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: the abort transaction error mentioned in the Github issue can be fixed by increasing the build number for the "start" job.
Regarding the overall design, the "restart" transition is purely virtual, and could do the trick if the provisionerserver doesn't acquire jobs that are conflicting. In this case, stop
and start
are executed at the same time:
and the "stop" job fails:
... or does nothing:
Fun fact:
a similar inconsistency can be observed for the "start" job. It can just add a new resource:
... or add the resource and destroy the old one (depending on the race with "stop" job):
This inconsistency is kind of funny, because for some (all?) providers like Docker it's absolutely fine to just call /workspacebuilds
with transition=start
. I'm afraid that to fully solve the racing challenge, you need to implement mutual exclusiveness between provisioner jobs, so tweak the provisionerserver. I strongly recommend doing it in a separate PR.
@mafredri Please doublecheck if this all makes sense 👍
@mtojek great insights. I like your proposal of limiting the acquirement of conflicting jobs in the provisioner(s). One thing that worries me though is if Provisioner job dependencies. I.e. a job could depend on another job. Meaning we wouldn't pick up a job if it's dependent on a job that hasn't completed. A toggle for requiring success or simply completion (to continue after fail) of the dependent job could also be added. (Motivation for toggle: Possibility to implement "force restart".) |
It is a valid point and good motivation for feature expansion. I'm a bit concerned about the provisioner job being left in the queue forever (if the dependent job failed), so maybe provisionerserver can pick it up and simply discard it. |
Yeah, that's about how I'd imagine it would behave as well. Or rather than discard, mark the dependent job as failed too. |
@mafredri and @mtojek - thanks so much for your feedback! I spoke with @sreya and we've decided to close this PR for now in favor of #7137, which is an entirely FE solution. The concern is that this feature is quite small and does not warrant a Provisioner update. That might change in the future, in which case I'd resurrect this implementation. Let me know if you have any concerns! |
@Kira-Pilot sounds good, that is what the CLI does currently too so we should deal with it at some point but doesn't have to be now. 👍🏻 |
Trying to address #5800 and #6241
Need some help figuring out the best approach here. We've added a new transition,
restart
and attempted to amendapi.postWorkspaceBuilds
such that if arestart
is requested, we insert two workspace builds: 1 forstop
and 1 forstart
.I'm worried Provisioner isn't set up to handle this scenario. @mafredri pointed out that if we create two jobs, we may have separate daemons execute each simultaneously.