-
Notifications
You must be signed in to change notification settings - Fork 886
Coder web terminal and apps not loading on big #12136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For debugging purposes: HTTP 500 on
@spikecurtis suggested a similar root cause as https://github.com/coder/customers/issues/488 |
@spikecurtis Is'nt this our failure? Or very close maybe? |
@MrPeacockNLB it appears to be. Digging in |
…ailnet (#12140) I think this will resolve #12136 but lets get a proper test at the system level before closing. Before this change, we only register the node callback at start of day for the server tailnet. If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord. The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator. If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback. This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator. Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
…ailnet (#12140) I think this will resolve #12136 but lets get a proper test at the system level before closing. Before this change, we only register the node callback at start of day for the server tailnet. If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord. The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator. If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback. This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator. Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
…ailnet (#12140) (#12150) I think this will resolve #12136 but lets get a proper test at the system level before closing. Before this change, we only register the node callback at start of day for the server tailnet. If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord. The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator. If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback. This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator. Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
Hey folks. I'm still seeing the The terminal returns When I try to open the VS code Desktop, this is what I see in the logs:
Also, I can't SSH with the coder CLI too:
In the Coder container I see these lines:
Here is my docker compose: services:
coder:
image: ghcr.io/coder/coder:${CODER_VERSION:-latest}
container_name: coder
ports:
- "7080:7080"
environment:
CODER_PG_CONNECTION_URL: "postgresql://${POSTGRES_USER:-username}:${POSTGRES_PASSWORD:-password}@postgresql/${POSTGRES_DB:-coder}?sslmode=disable"
CODER_HTTP_ADDRESS: "0.0.0.0:7080"
CODER_ACCESS_URL: "https://coder.myhost.com"
CODER_WILDCARD_ACCESS_URL: "*.coder.myhost.com"
group_add:
- "983"
volumes:
- config:/home/coder/.config
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
postgresql:
condition: service_healthy
postgresql:
image: "postgres:16"
container_name: coder-postgres
ports:
- "5432:5432"
environment:
POSTGRES_USER: ${POSTGRES_USER:-username}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password}
POSTGRES_DB: ${POSTGRES_DB:-coder}
volumes:
- postgresql:/var/lib/postgresql/data
healthcheck:
test:
[
"CMD-SHELL",
"pg_isready -U ${POSTGRES_USER:-username} -d ${POSTGRES_DB:-coder}",
]
interval: 5s
timeout: 5s
retries: 5
volumes:
config: {}
postgresql: {} |
@lbeier what Coder version are you running? Can you gather logs from the Coder agent ( |
Hey 👋 ! Not sure if any of the info below is relevant, but decided to share it anyway. Let me know if you think I could provide more data. From Coder UI, this is the version:
Let me know if these are wrong or if you need further infos. I'm not entirely sure where to gather the logs for the agent. I've followed this tutorial and based my compose file on this example. It spins up two containers: the coder-server and Postgres. Not sure where should I look for the agent information. Just created a new workspace based on Container template, maybe it contains some useful information. Workspace logs:
Interestingly, I see a spin next to the open ports in the workspace: When opening the terminal in When opening a new Code Server on Coder Server logs has some issues with the agent:
Here are the containers I have that are related to Coder:
Here are the logs from the workspace container:
|
Your problem is that the workspace is unable to connect to the DERP service that Coder server is running.
That's unrelated to the root cause of this GitHub issue, so I suggest you open a new issue if you want to continue discussion. |
On big.cdr.dev we’re currently unable to use web terminal or workspace apps. This has most likely been ongoing since v2.8.0 at the very least, but as that release had some other issues, we didn’t look too closely. That is, until yesterday when we tried re-running scale tests as we believed v2.8.2 to have fixed the issue.
Demo: try opening a web terminal here: https://big.cdr.dev/@scaletest-0o0t921W-1/scaletest-Jeui7qL1-1
(If the workspace isn’t running, feel free to start it.)
In Safari I can see
WebSocket failed: socket errored
after a while. In Brave it just stalls. I can see were sending data, but not receiving a single reply.SSH seems to work fine, however.
Slack thread
The text was updated successfully, but these errors were encountered: