Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Coder web terminal and apps not loading on big #12136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mafredri opened this issue Feb 14, 2024 · 7 comments · Fixed by #12140
Closed

Coder web terminal and apps not loading on big #12136

mafredri opened this issue Feb 14, 2024 · 7 comments · Fixed by #12140

Comments

@mafredri
Copy link
Member

On big.cdr.dev we’re currently unable to use web terminal or workspace apps. This has most likely been ongoing since v2.8.0 at the very least, but as that release had some other issues, we didn’t look too closely. That is, until yesterday when we tried re-running scale tests as we believed v2.8.2 to have fixed the issue.

Demo: try opening a web terminal here: https://big.cdr.dev/@scaletest-0o0t921W-1/scaletest-Jeui7qL1-1

(If the workspace isn’t running, feel free to start it.)

In Safari I can see WebSocket failed: socket errored after a while. In Brave it just stalls. I can see were sending data, but not receiving a single reply.

SSH seems to work fine, however.

Slack thread

@cdr-bot cdr-bot bot added the bug label Feb 14, 2024
@mtojek
Copy link
Member

mtojek commented Feb 14, 2024

For debugging purposes:

HTTP 500 on /listening-ports:

{
    "message": "Internal error dialing workspace agent.",
    "detail": "agent is unreachable"
}

@spikecurtis suggested a similar root cause as https://github.com/coder/customers/issues/488

@MrPeacockNLB
Copy link
Contributor

MrPeacockNLB commented Feb 14, 2024

@spikecurtis Is'nt this our failure? Or very close maybe?

@spikecurtis
Copy link
Contributor

@MrPeacockNLB it appears to be. Digging in

spikecurtis added a commit that referenced this issue Feb 14, 2024
…ailnet (#12140)

I think this will resolve #12136 but lets get a proper test at the system level before closing.

Before this change, we only register the node callback at start of day for the server tailnet.  If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord.

The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator.

If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback.

This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator.  Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
spikecurtis added a commit that referenced this issue Feb 15, 2024
…ailnet (#12140)

I think this will resolve #12136 but lets get a proper test at the system level before closing.

Before this change, we only register the node callback at start of day for the server tailnet.  If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord.

The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator.

If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback.

This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator.  Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
spikecurtis added a commit that referenced this issue Feb 15, 2024
…ailnet (#12140) (#12150)

I think this will resolve #12136 but lets get a proper test at the system level before closing.

Before this change, we only register the node callback at start of day for the server tailnet.  If the coordinator changes, like we know happens when we are licensed for the PGCoordinator, we close the connection to the old coord, and open a new one to the new coord.

The callback is designed to direct the updates to the new coordinator, but there is nothing that specifically triggers it to fire after we connect to the new coordinator.

If we have STUN, then period re-STUNs will generally get it to fire eventually, but without STUN it we could go indefinitely without a callback.

This PR changes the servertailnet to re-register the callback each time we reconnect to the coordinator.  Registering a callback (even if it's the same callback) triggers an immediate call with our node information, so the new coordinator will have it.
@lbeier
Copy link

lbeier commented Nov 9, 2024

Hey folks. I'm still seeing the WebSocket failed: socket errored.

The terminal returns WebSocket failed: socket errored. I can't open the code-server.

When I try to open the VS code Desktop, this is what I see in the logs:

04:07:40.832] stderr> 2024-11-09 03:07:40.831 [debu]  net.wgengine: wg: [v2] [FRhHu] - Sending handshake initiation
[04:07:40.842] stderr> 2024-11-09 03:07:40.841 [debu]  net.wgengine: [unexpected] magicsock: derp-999 does not know about peer [FRhHu], removing route
[04:07:41.901] stderr> 2024-11-09 03:07:41.901 [debu]  net.wgengine: ping(fd7a:115c:....): sending TSMP ping to [FRhHu]  ...
[04:07:45.842] stderr> 2024-11-09 03:07:45.842 [debu]  net.wgengine: wg: [v2] [FRhHu] - Handshake did not complete after 5 seconds, retrying (try 2)
[04:07:45.843] stderr> 2024-11-09 03:07:45.842 [debu]  net.wgengine: wg: [v2] [FRhHu] - Sending handshake initiation
[04:07:45.854] stderr> 2024-11-09 03:07:45.853 [debu]  net.wgengine: [unexpected] magicsock: derp-999 does not know about peer [FRhHu], removing route
[04:07:51.118] stderr> 2024-11-09 03:07:51.118 [debu]  net.wgengine: wg: [v2] [FRhHu] - Handshake did not complete after 5 seconds, retrying (try 3)
[04:07:51.118] stderr> 2024-11-09 03:07:51.118 [debu]  net.wgengine: wg: [v2] [FRhHu] - Sending handshake initiation
[04:07:51.125] stderr> 2024-11-09 03:07:51.125 [debu]  net.wgengine: [unexpected] magicsock: derp-999 does not know about peer [FRhHu], removing route
[04:07:55.204] stderr> 2024-11-09 03:07:55.203 [debu]  net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1001stun0 from <MY IP>:51289 (9001e6...) in 81.090125ms
[04:07:55.204] stderr> 2024-11-09 03:07:55.203 [debu]  net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1004stun0 from<MY IP>:51289 (fec13) in 80.992208ms
[04:07:55.204] stderr> 2024-11-09 03:07:55.203 [debu]  net.wgengine: netcheck: netcheck.runProbe: got STUN response for 1000stun0 from <MY IP>:51289 (875da74) in 80.962917ms
[04:07:55.205] stderr> 2024-11-09 03:07:55.203 [debu]  net.wgengine: netcheck: [v1] measuring ICMP latency of coder (999): no address for node 999b
[04:07:55.243] stderr> 2024-11-09 03:07:55.242 [debu]  net.wgengine: netcheck: [v1] report: udp=true v6=false v6os=true mapvarydest=false hair= portmap= v4a=<MY IP>:51289 derp=999 derpdist=999v4:7ms
[04:07:56.198] stderr> 2024-11-09 03:07:56.197 [debu]  net.wgengine: wg: [v2] [FRhHu] - Handshake did not complete after 5 seconds, retrying (try 4)
[04:07:56.198] stderr> 2024-11-09 03:07:56.197 [debu]  net.wgengine: wg: [v2] [FRhHu] - Sending handshake initiation
[04:07:56.248] stderr> 2024-11-09 03:07:56.247 [debu]  net.wgengine: [unexpected] magicsock: derp-999 does not know about peer [FRhHu], removing route
[04:08:01.339] stderr> 2024-11-09 03:08:01.338 [debu]  net.wgengine: wg: [v2] [FRhHu] - Handshake did not complete after 5 seconds, retrying (try 5)
[04:08:01.339] stderr> 2024-11-09 03:08:01.338 [debu]  net.wgengine: wg: [v2] [FRhHu] - Sending handshake initiation
[04:08:01.344] stderr> 2024-11-09 03:08:01.343 [debu]  net.wgengine: [unexpected] magicsock: derp-999 does not know about peer [FRhHu], removing route
[04:08:05.518] stderr> 2024-11-09 03:08:05.517 [debu]  net.wgengine: ping(fd7a:115c:a1e0:4091:994b:ec38:883f:96ff): sending TSMP ping to [FRhHu]  ...
[04:08:06.536] stderr> 2024-11-09 03:08:06.535 [debu]  net.wgengine: wg: [v2] [FRhHu] - Handshake did not complete after 5 seconds, retrying (try 2)
[04:08:06.536] stderr> 2024-11-09 03:08:06.535 [debu]  net.wgengine: wg: [v2] [FRhHu] - Sending handshake initiation
[04:08:06.543] stderr> 2024-11-09 03:08:06.542 [debu]  net.wgengine: [unexpected] magicsock: derp-999 does not know about peer [FRhHu], removing route

Also, I can't SSH with the coder CLI too:

❯ ssh -v coder.Test
OpenSSH_9.8p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/$USER/.ssh/config
debug1: /Users/$USER/.ssh/config line 1: Applying options for *
debug1: /Users/$USER/.ssh/config line 29: Applying options for coder.Test
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/100-macos.conf
debug1: /etc/ssh/ssh_config.d/100-macos.conf line 1: Applying options for *
debug1: Reading configuration data /etc/ssh/crypto.conf
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Executing proxy command: exec /opt/homebrew/bin/coder --global-config "/Users/$USER/Library/Application Support/coderv2" ssh --stdio Test
debug1: identity file /Users/$USER/.ssh/id_rsa type -1
debug1: identity file /Users/$USER/.ssh/id_rsa-cert type -1
debug1: identity file /Users/$USER/.ssh/id_ecdsa type -1
debug1: identity file /Users/$USER/.ssh/id_ecdsa-cert type -1
debug1: identity file /Users/$USER/.ssh/id_ecdsa_sk type -1
debug1: identity file /Users/$USER/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /Users/$USER/.ssh/id_ed25519 type -1
debug1: identity file /Users/$USER/.ssh/id_ed25519-cert type -1
debug1: identity file /Users/$USER/.ssh/id_ed25519_sk type -1
debug1: identity file /Users/$USER/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /Users/$USER/.ssh/id_xmss type -1
debug1: identity file /Users/$USER/.ssh/id_xmss-cert type -1
debug1: identity file /Users/$USER/.ssh/id_dsa type -1
debug1: identity file /Users/$USER/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.8

In the Coder container I see these lines:

2024-11-09 03:06:15.481 [info]  coderd.workspace_usage_tracker: updated workspaces last_used_at  count=1  now="2024-11-09T03:06:15.467771291Z"
2024-11-09 03:08:05.920 [warn]  coderd: GET  host=coder.myhost.com  path=/api/v2/workspaceagents/273e5ead-990e-4091-994b-ec38883f96ff/listening-ports  proto=HTTP/1.1  remote_addr=192.168.192.1  start="2024-11-09T03:07:35.901053099Z"  took=30.019251552s  status_code=500  latency_ms=30019  response_body="{\"message\":\"Internal error dialing workspace agent.\",\"detail\":\"agent is unreachable\"}\n"  request_id=21104f84-0071-44bc-a2de-1eceb3fecc62
2024-11-09 03:10:33.642 [warn]  coderd: GET  host=coder.myhost.com  path=/api/v2/workspaceagents/273e5ead-990e-4091-994b-ec38883f96ff/listening-ports  proto=HTTP/1.1  remote_addr=192.168.192.1  start="2024-11-09T03:10:03.625803561Z"  took=30.016745141s  status_code=500  latency_ms=30016  response_body="{\"message\":\"Internal error dialing workspace agent.\",\"detail\":\"agent is unreachable\"}\n"  request_id=fbaf1c3e-bf9e-40e7-b4dc-0b4434975100

Here is my docker compose:

services:
  coder:
    image: ghcr.io/coder/coder:${CODER_VERSION:-latest}
    container_name: coder
    ports:
      - "7080:7080"
    environment:
      CODER_PG_CONNECTION_URL: "postgresql://${POSTGRES_USER:-username}:${POSTGRES_PASSWORD:-password}@postgresql/${POSTGRES_DB:-coder}?sslmode=disable"
      CODER_HTTP_ADDRESS: "0.0.0.0:7080"
      CODER_ACCESS_URL: "https://coder.myhost.com"
      CODER_WILDCARD_ACCESS_URL: "*.coder.myhost.com"
    group_add:
     - "983"
    volumes:
      - config:/home/coder/.config
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      postgresql:
        condition: service_healthy

  postgresql:
    image: "postgres:16"
    container_name: coder-postgres
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: ${POSTGRES_USER:-username}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-password}
      POSTGRES_DB: ${POSTGRES_DB:-coder}
    volumes:
      - postgresql:/var/lib/postgresql/data
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "pg_isready -U ${POSTGRES_USER:-username} -d ${POSTGRES_DB:-coder}",
        ]
      interval: 5s
      timeout: 5s
      retries: 5

volumes:
  config: {}
  postgresql: {}

@spikecurtis
Copy link
Contributor

@lbeier what Coder version are you running? Can you gather logs from the Coder agent (/tmp/coder-agent.log on Linux) in the workspace?

@lbeier
Copy link

lbeier commented Nov 15, 2024

@lbeier what Coder version are you running? Can you gather logs from the Coder agent (/tmp/coder-agent.log on Linux) in the workspace?

Hey 👋 !

Not sure if any of the info below is relevant, but decided to share it anyway. Let me know if you think I could provide more data.

From Coder UI, this is the version:

v2.17.0+5a6d23a
2973d798-828d-42ab-b37c-df32d74310e3

Let me know if these are wrong or if you need further infos.

I'm not entirely sure where to gather the logs for the agent. I've followed this tutorial and based my compose file on this example. It spins up two containers: the coder-server and Postgres. Not sure where should I look for the agent information.

Just created a new workspace based on Container template, maybe it contains some useful information.

Workspace logs:

+ mkdir -p /tmp/code-server
+ mkdir -p /tmp/code-server/lib /tmp/code-server/bin
+ tar -C /tmp/code-server/lib -xzf ~/.cache/code-server/code-server-4.95.2-linux-amd64.tar.gz
+ mv -f /tmp/code-server/lib/code-server-4.95.2-linux-amd64 /tmp/code-server/lib/code-server-4.95.2
+ ln -fs /tmp/code-server/lib/code-server-4.95.2/bin/code-server /tmp/code-server/bin/code-server
Standalone release has been installed into /tmp/code-server/lib/code-server-4.95.2
Extend your path to use code-server:
  PATH="/tmp/code-server/bin:$PATH"
Then run with:
  code-server
Deploy code-server for your team with Coder: https://github.com/coder/coder

Interestingly, I see a spin next to the open ports in the workspace:
Image

When opening the terminal in https://coder.<hostname>@<user>/Test.main/terminal?reconnect=<uuid>, I don't see any errors in the requests:
Image

When opening a new Code Server on https://coder.<hostname>/@<user>/Test.main/apps/code-server/, I do see an error:
Image

Coder Server logs has some issues with the agent:

2024-11-15 13:07:46.974 [warn]  coderd: GET  host=coder.<MY_HOSTNAME>  path=/api/v2/workspaceagents/698d503d-ffbc-4121-8329-6f0a6e68528f/listening-ports  proto=HTTP/1.1  remote_addr=172.27.0.1  start="2024-11-15T13:07:16.922657308Z"  took=30.052267789s  status_code=500  latency_ms=30052  response_body="{\"message\":\"Internal error dialing workspace agent.\",\"detail\":\"agent is unreachable\"}\n"  request_id=43b6668d-9d96-4c7a-b680-e45ab4eebf73
2024-11-15 13:08:27.102 [warn]  coderd: GET  host=coder.<MY_HOSTNAME>  path=/api/v2/workspaceagents/698d503d-ffbc-4121-8329-6f0a6e68528f/listening-ports  proto=HTTP/1.1  remote_addr=172.27.0.1  start="2024-11-15T13:07:57.082535364Z"  took=30.019928041s  status_code=500  latency_ms=30019  response_body="{\"message\":\"Internal error dialing workspace agent.\",\"detail\":\"agent is unreachable\"}\n"  request_id=736c01bc-2d58-4175-a2a5-34f54acfc691

Here are the containers I have that are related to Coder:

6bb095d124d8   ghcr.io/coder/coder:v2.17.0                     "/opt/coder server"      4 minutes ago   Up 4 minutes             0.0.0.0:7080->7080/tcp, :::7080->7080/tcp                                                                                                                                                                        coder
18090f4c4349   postgres:16                                     "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:5432->5432/tcp, :::5432->5432/tcp                                                                                                                                                                        coder-postgres
dc6737b8dc5b   coder-bae47c28-c1e6-4abb-b6fe-1b0ae97dccbe      "sh -c '#!/usr/bin/e…"   9 minutes ago   Up 9 minutes                                                                                                                                                                                                                              coder-test

Here are the logs from the workspace container:

2024-11-15 13:13:22.721 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:22.721 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:22.721 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:22.721 [debu]  net.tailnet.net.wgengine: Rebind; defIf="eth0", ips=[172.17.0.2/16]
2024-11-15 13:13:23.736 [debu]  net.tailnet.net.wgengine: magicsock: [0xc000354a00] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:23.736 [debu]  net.tailnet.net.wgengine: magicsock: closing connection to derp-999 (rebind-no-localaddr), age 3s
2024-11-15 13:13:23.736 [debu]  net.tailnet.net.wgengine: magicsock: 0 active derp conns
2024-11-15 13:13:23.736 [debu]  net.tailnet.net.wgengine: magicsock: adding connection to derp-999 for home-keep-alive
2024-11-15 13:13:23.737 [debu]  net.tailnet.net.wgengine: magicsock: 1 active derp conns: derp-999=cr0s,wr0s
2024-11-15 13:13:23.737 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2024-11-15 13:13:23.737 [debu]  net.tailnet: wireguard status  status="&{AsOf:2024-11-15 13:13:23.737240638 +0000 UTC m=+729.301780831 Peers:[{TxBytes:0 RxBytes:0 LastHandshake:1970-01-01 00:00:00 +0000 UTC NodeKey:nodekey:655513d2c508c7f5f768bf4853030cc7c00c41b380f0af78dc238d457dec7d16}] LocalAddrs:[{Addr:172.17.0.2:36539 Type:local}] DERPs:1}"  error=<nil>
2024-11-15 13:13:25.238 [debu]  net.tailnet.net.wgengine: magicsock: [0xc000258a00] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:25.238 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 12 msec
2024-11-15 13:13:25.252 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1002stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1000stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1003stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=false v4=false icmpv4=false v6=false v6os=true mapvarydest= hair= portmap= derp=0
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1000stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1003stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1001stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1001stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1004stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1000stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1004stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1001stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1002stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1002stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1003stun0" has no address
2024-11-15 13:13:25.737 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1004stun0" has no address
2024-11-15 13:13:25.738 [debu]  net.tailnet.net.wgengine: magicsock: last netcheck reported send error. Rebinding.
2024-11-15 13:13:25.738 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:25.738 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:25.738 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:25.738 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:25.739 [debu]  net.tailnet.net.wgengine: Rebind; defIf="eth0", ips=[172.17.0.2/16]
2024-11-15 13:13:26.753 [debu]  net.tailnet.net.wgengine: magicsock: [0xc000258a00] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:26.753 [debu]  net.tailnet.net.wgengine: magicsock: closing connection to derp-999 (rebind-no-localaddr), age 3s
2024-11-15 13:13:26.753 [debu]  net.tailnet.net.wgengine: magicsock: 0 active derp conns
2024-11-15 13:13:26.753 [debu]  net.tailnet.net.wgengine: magicsock: adding connection to derp-999 for home-keep-alive
2024-11-15 13:13:26.754 [debu]  net.tailnet.net.wgengine: magicsock: 1 active derp conns: derp-999=cr0s,wr0s
2024-11-15 13:13:26.754 [debu]  net.tailnet: wireguard status  status="&{AsOf:2024-11-15 13:13:26.754285028 +0000 UTC m=+732.318825220 Peers:[{TxBytes:0 RxBytes:0 LastHandshake:1970-01-01 00:00:00 +0000 UTC NodeKey:nodekey:655513d2c508c7f5f768bf4853030cc7c00c41b380f0af78dc238d457dec7d16}] LocalAddrs:[{Addr:172.17.0.2:36539 Type:local}] DERPs:1}"  error=<nil>
2024-11-15 13:13:26.754 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2024-11-15 13:13:26.889 [debu]  apphealth: workspace app healthy  id=66480c41-3496-40f5-a451-881f5f5b8a3b  slug=code-server
2024-11-15 13:13:28.254 [debu]  net.tailnet.net.wgengine: magicsock: [0xc000354b40] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:28.255 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 9 msec
2024-11-15 13:13:28.265 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1001stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1002stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1002stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: [v1] report: udp=false v4=false icmpv4=false v6=false v6os=true mapvarydest= hair= portmap= derp=0
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1003stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1002stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1003stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1004stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1000stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1001stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1004stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1003stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1004stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1000stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1000stun0" has no address
2024-11-15 13:13:28.755 [debu]  net.tailnet.net.wgengine: netcheck: netcheck.runProbe: named node "1001stun0" has no address
2024-11-15 13:13:28.756 [debu]  net.tailnet.net.wgengine: magicsock: last netcheck reported send error. Rebinding.
2024-11-15 13:13:28.756 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:28.756 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:28.756 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP read buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:28.756 [debu]  net.tailnet.net.wgengine: magicsock: [warning] failed to force-set UDP write buffer size to 7340032: operation not permitted; using kernel default values (impacts throughput only)
2024-11-15 13:13:28.756 [debu]  net.tailnet.net.wgengine: Rebind; defIf="eth0", ips=[172.17.0.2/16]
2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: [0xc000354b40] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: closing connection to derp-999 (rebind-no-localaddr), age 3s
2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: 0 active derp conns
2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: adding connection to derp-999 for home-keep-alive
2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: 1 active derp conns: derp-999=cr0s,wr0s
2024-11-15 13:13:29.767 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)
2024-11-15 13:13:29.767 [debu]  net.tailnet: wireguard status  status="&{AsOf:2024-11-15 13:13:29.767540586 +0000 UTC m=+735.332080779 Peers:[{TxBytes:0 RxBytes:0 LastHandshake:1970-01-01 00:00:00 +0000 UTC NodeKey:nodekey:655513d2c508c7f5f768bf4853030cc7c00c41b380f0af78dc238d457dec7d16}] LocalAddrs:[{Addr:172.17.0.2:36539 Type:local}] DERPs:1}"  error=<nil>
2024-11-15 13:13:31.268 [debu]  net.tailnet.net.wgengine: magicsock: [0xc0001a08c0] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:31.268 [debu]  net.tailnet.net.wgengine: derp-999: [v1] backoff: 5 msec
2024-11-15 13:13:31.273 [debu]  net.tailnet.net.wgengine: derphttp.Client.Recv: connecting to derp-999 (coder)

@spikecurtis
Copy link
Contributor

Your problem is that the workspace is unable to connect to the DERP service that Coder server is running.

2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: [0xc000354b40] derp.Recv(derp-999): derphttp.Client.Recv connect to region 999 (coder): context deadline exceeded
2024-11-15 13:13:29.766 [debu]  net.tailnet.net.wgengine: magicsock: closing connection to derp-999 (rebind-no-localaddr), age 3s

That's unrelated to the root cause of this GitHub issue, so I suggest you open a new issue if you want to continue discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants