Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Handle Kubernetes Pod provision failure due to resource constraints #5783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcarrion87 opened this issue Jan 19, 2023 · 5 comments
Closed

Comments

@dcarrion87
Copy link
Contributor

dcarrion87 commented Jan 19, 2023

Due to having finite resources available in a Kubernetes Cluster there's scenarios where users try to provision a workspace and the pod sits in pending state.

The pod sits in pending state forever and the terraform eventually bombs out with context deadline exceeded and the workspace going into failed state.

  • Is there plans to provide a preflight check so it blocks the user from provisioning?
  • Is there something we can interface with coder to signal resource constraint issue during workspace creation?
  • How can we auto cleanup failed workspaces from Coder so the pod does not sit in pending forever?
@dcarrion87 dcarrion87 changed the title Preflight checks and Kubernetes pod provisioning failure handling Preflight checks and Kubernetes pod provisioning failure handling due to resource constraints Jan 19, 2023
@kylecarbs
Copy link
Member

An interesting problem here... this might be better solved by interfacing directly with the Kubernetes API instead, and maybe leads to #5321.

It's difficult to surface good error messages with Terraform because there are so many scenarios.

Would y'all maintain a custom provisioner if it was some very basic JavaScript like:

const k8s = require('k8s');

const kubeconfig = {
  apiVersion: 'v1',
  kind: 'Pod',
  metadata: {
    name: 'my-pod'
  },
  spec: {
    containers: [{
      name: 'my-container',
      image: 'nginx:latest'
    }]
  }
};

const k8sApi = k8s.api({
  endpoint: 'http://localhost:8080',
  auth: {
    bearer: 'YOUR_BEARER_TOKEN'
  }
});

k8sApi.post('/api/v1/namespaces/default/pods', kubeconfig, (err, res) => {
  if (err) {
    console.log(`Error: ${err}`);
  } else {
    console.log(`Pod created: ${res}`);
  }
});

We'd likely maintain a set of helpers that would make this easier, or use Pulumi which would entirely automate most of that scaffold code away.

@dcarrion87
Copy link
Contributor Author

Hmmm thanks @kylecarbs we'll have a think about it some more.

The other thing is being able to do preflight checks to attempt to block a user before they press button or command.

@kylecarbs
Copy link
Member

@dcarrion87 we now have an endpoint that can be used to send custom startup logs for a pod. While this might not be ideal for your case, you could jank together a way to ship Kubernetes events. Curious for your thoughts

@kylecarbs kylecarbs changed the title Preflight checks and Kubernetes pod provisioning failure handling due to resource constraints Handle Kubernetes Pod provision failure due to resource constraints Apr 3, 2023
@dcarrion87
Copy link
Contributor Author

dcarrion87 commented Apr 3, 2023

@kylecarbs not 100% certain what you mean by this. Is there a link with an example or doc?

This is still a problem for us and we're still thinking of ways of solving without creating too much tech debt. We have created a dashboard to show resource availability but blocking in coder would be most ideal since getting users to eyeball a dashboard is too much to brain.

The biggest painpoint for us around this is people aren't cleaning up their failures and leaving resources pending in Kubernetes (I've another issue raised about this).

@bpmct
Copy link
Member

bpmct commented Jun 21, 2023

Closing as duplicate of #7576

@bpmct bpmct closed this as not planned Won't fix, can't repro, duplicate, stale Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants