Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix: close server pty connections on client disconnect #15201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

f0ssel
Copy link
Contributor

@f0ssel f0ssel commented Oct 23, 2024

Closes #15174

We originally noticed that last_used_at was constantly ticking upwards even after clients disconnect. After narrowing it down to the web terminal I did some digging and found that agentssh.Bicopy was never exiting even after the client disconnects. This meant our websocket connections on the server were held open forever? and this continually counted as an open connection to the workspace and bumped last_used_at as a result.

https://github.com/coder/coder/blob/f0ssel/last_used_at_inc/coderd/workspaceapps/proxy.go#L702

This is due to a combination of our use of agentssh.Bicopy as the primary reader of the websocket and our lack of timeout on our websocket.Ping attempt.

I've added a timeout equal to the loop interval for httpapi.HeartbeatClose which can now catch the newly failing websocket.Ping call and cancel the entire request context -- leading to the agentssh.Bicopy finally exiting and releasing the connection.

But what makes last_used_at get bumped forever?

Great question, I wasn't sure at first since I was logging the inputs and outputs and clearly saw stats from the terminal session stop coming through on disconnect. When I logged the contents of the workspaceapps.StatsCollector over time I noticed that the stats from the hung websocket connections were never getting cleaned up. After some digging I found that in order for a stats to get cleared out of the stat collector it must get a stat published with a non-zero ended_at time value. The /pty route would send this final stat in a defer block in the handler but since the handler never closed, we never got the stat, so it stayed refreshing forever.

I think there's some improvements we can make to make this safer in the future when we take a look at how to finally merge agent session stats and workspace stats together.

@f0ssel f0ssel requested a review from johnstcn October 23, 2024 18:55
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the intricacies of this issue, is it possible to add a test that reproduces the original issue in the description and validates the fix?

Apart from that LGTM.

@f0ssel
Copy link
Contributor Author

f0ssel commented Oct 24, 2024

@johnstcn I'm working on a test for this but it's pretty messy to get a good one injected into the right places. Mind if I add it in a follow up since it'll be a bit meatier of a change?

@f0ssel f0ssel merged commit 81e99be into main Oct 24, 2024
35 checks passed
@f0ssel f0ssel deleted the f0ssel/last_used_at_inc branch October 24, 2024 19:12
@github-actions github-actions bot locked and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

last_used_at continues to increment with workspace-usage experiment
2 participants