-
Notifications
You must be signed in to change notification settings - Fork 881
Improve agent connection troubleshooting #15423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related to #6711. I spent a little time on that issue not long ago and we decided it wouldn't be unreasonable for the agent script to attempt to log errors to If there's a network or DNS issue, I don't think we're gonna able to propagate that to the UI in any way. If the script can't download the agent binary, it's not going to be able to post logs to the agent route. In any case, I think we should definitely improve the log output to make some of these common cases easily identifiable and fixable. *We'd also need to un-deprecate the route, as there's no way we can do a dRPC call from a bash script. |
Nice. It seems like a combination of logs sent to the server, improved logging of the provisioner script, and better documentation is the key here.
Yep, ideally, we can have better hints in the logs of our agent download & install script or even hint that there is a DNS error 🤞🏼
I think it'd be awesome to drill down a bit more on the current gaps in the docs and UI. |
@EdwardAngert, if you can help find the shortcoming in our getting-started docs. |
From #15462:
|
We discussed this today in stand-up. There are a few different sorts of issues here:
With regard to 1), we don't have many options for surfacing errors here. |
Did we also discuss how we can improve the docs and UX to point people to docs, or is this already sufficient? |
No, we were mostly focused on how to improve the troubleshooting situation. We didn't focus much on docs. |
Problem Description
Coder agents sometimes fail to connect to the Coder server due to a variety of issues, including network restrictions (e.g., DNS issues, firewalls), missing permissions (e.g., CAP_NET_ADMIN), OS or architecture mismatches, and missing tools for downloading the agent binary. Currently, there’s limited guidance in the UI to help users diagnose and resolve these issues effectively, leading to delays in troubleshooting.
For example, failures in the agent bootstrap script can result in non-connecting agents without a clear indication of the root cause. When checking the workspace logs i.e.,
docker logs <container name or container id>
a typical DNS failure log might look like this:Desired Solution
Implement enhanced diagnostics and UI hints that provide actionable guidance to users based on the detected issue. By giving users specific suggestions directly in the UI, they can resolve connectivity issues faster and with less frustration. This includes:
Enhanced Error Logging and Diagnostics
curl
orwget
), with instructions on how to install the required tool.UI Hints for Diagnosed Issues1
“It appears there’s a DNS or firewall issue preventing the agent from connecting to the server. Learn more about network configuration.”
curl
,wget
) are unavailable, suggest a hint:“Required download tool not found. Please install either
curl
orwget
.”“This environment may be unsupported. Review supported OS and architectures.”
Proposed Implementation
Footnotes
This may not be possible currently, as we do not have any way to expose these logs to the UI without the agent running. ↩ ↩2
The text was updated successfully, but these errors were encountered: