Need more visibility into agent issues when workspaces are running in kubernetes #17994

rhysduggan5 · 2025-05-22T08:37:52Z

rhysduggan5
May 22, 2025

This is extremely similar to an issue that was already closed due to not enough information #15867.

This yellow warning can appear for a number of reasons, and some of them are completely unrelated to the agent not being able to connect, but instead to do with the pod that the agent is connecting to being difficult to start up for a variety of issues.

In the example I have screenshotted, the agent cannot connect to this pod (this is a k8s deployment) for 2 reasons:

There are not enough nodes with enough room to support the pod, so before it can deploy it needs to create a new node, and then deploy the node on that pod (this can sometimes take a minute or 2)
Once the pod is deployed, if the image it is using is not cached (Which will always be true if the node was just created), the pod cannot start until the image has been pulled from the relevant registry. In this case, the image is just under 50GB, so it takes a fair amount of time.

The problem with this is that there is absolutely no visibility to this in the Coder UI, and the only way to know that this is happening is to have direct access to the cluster, and look at the events being raised against the pod.

What would be ideal is to have a way to configure the expected startup of a workspace, based on internal knowledge about how long it could potentially take. Or - show the users exactly why the workspace is taking a while. Even just bubbling up the events on the pod the agent is trying to connect to would be an improvement.

Answered by matifali

May 22, 2025

Hi @rhysduggan5, Thanks for submitting the issue

There are not enough nodes with enough room to support the pod, so before it can deploy it needs to create a new node, and then deploy the node on that pod (this can sometimes take a minute

Once the pod is deployed, if the image it is using is not cached (Which will always be true if the node was just created), the pod cannot start until the image has been pulled from the relevant registry. In this case, the image is just under 50GB, so it takes a fair amount of time.

Both of these issues can be resolved by using the Kubernetes Logging integration. It's a small service that you deploy in the same cluster as the workspace pods. I agree t…

View full answer

matifali · 2025-05-22T10:50:32Z

matifali
May 22, 2025
Maintainer

Part of #15423

0 replies

matifali · 2025-05-22T10:54:47Z

matifali
May 22, 2025
Maintainer

Hi @rhysduggan5, Thanks for submitting the issue

There are not enough nodes with enough room to support the pod, so before it can deploy it needs to create a new node, and then deploy the node on that pod (this can sometimes take a minute

Once the pod is deployed, if the image it is using is not cached (Which will always be true if the node was just created), the pod cannot start until the image has been pulled from the relevant registry. In this case, the image is just under 50GB, so it takes a fair amount of time.

Both of these issues can be resolved by using the Kubernetes Logging integration. It's a small service that you deploy in the same cluster as the workspace pods. I agree that we need to improve the troubleshooting experience for other agent connection issues.

It's a chicken-and-egg problem, as the agent is the one sending logs from the workspace to be displayed in the Dashboard. Without the agent starting, we have no visibility into what's happening

0 replies

rhysduggan5 · 2025-05-22T14:09:42Z

rhysduggan5
May 22, 2025
Author

I have just installed the Kubernetes Logging Interaction service (that I found shortly after submitting this issue), and it has completely solved the problem. The good news here is that for my users, seeing a big orange box with no context is really poor, but seeing a black box with logs progressing is "mentally" a much more positive environment. Thanks for the tip!

0 replies

matifali · 2025-05-22T17:55:40Z

matifali
May 22, 2025
Maintainer

I will move this to a discussion for others to discover.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Need more visibility into agent issues when workspaces are running in kubernetes #17994

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Need more visibility into agent issues when workspaces are running in kubernetes #17994

Uh oh!

Uh oh!

rhysduggan5 May 22, 2025

Replies: 4 comments

Uh oh!

matifali May 22, 2025 Maintainer

Uh oh!

Uh oh!

matifali May 22, 2025 Maintainer

Uh oh!

Uh oh!

rhysduggan5 May 22, 2025 Author

Uh oh!

matifali May 22, 2025 Maintainer

rhysduggan5
May 22, 2025

matifali
May 22, 2025
Maintainer

matifali
May 22, 2025
Maintainer

rhysduggan5
May 22, 2025
Author

matifali
May 22, 2025
Maintainer