-
Notifications
You must be signed in to change notification settings - Fork 891
Handle Kubernetes Pod provision failure due to resource constraints #5783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An interesting problem here... this might be better solved by interfacing directly with the Kubernetes API instead, and maybe leads to #5321. It's difficult to surface good error messages with Terraform because there are so many scenarios. Would y'all maintain a custom provisioner if it was some very basic JavaScript like:
We'd likely maintain a set of helpers that would make this easier, or use Pulumi which would entirely automate most of that scaffold code away. |
Hmmm thanks @kylecarbs we'll have a think about it some more. The other thing is being able to do preflight checks to attempt to block a user before they press button or command. |
@dcarrion87 we now have an endpoint that can be used to send custom startup logs for a pod. While this might not be ideal for your case, you could jank together a way to ship Kubernetes events. Curious for your thoughts |
@kylecarbs not 100% certain what you mean by this. Is there a link with an example or doc? This is still a problem for us and we're still thinking of ways of solving without creating too much tech debt. We have created a dashboard to show resource availability but blocking in coder would be most ideal since getting users to eyeball a dashboard is too much to brain. The biggest painpoint for us around this is people aren't cleaning up their failures and leaving resources pending in Kubernetes (I've another issue raised about this). |
Closing as duplicate of #7576 |
Uh oh!
There was an error while loading. Please reload this page.
Due to having finite resources available in a Kubernetes Cluster there's scenarios where users try to provision a workspace and the pod sits in pending state.
The pod sits in pending state forever and the terraform eventually bombs out with context deadline exceeded and the workspace going into failed state.
The text was updated successfully, but these errors were encountered: