-
Notifications
You must be signed in to change notification settings - Fork 881
Bug: Goroutine leak in coderd.(*api).workspaceAgentTurn
#1508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Absolutely awesome bug find! |
@mafredri what is the potential impact of this bug? |
Very interested in how you found this. I'm sure there was more to it than |
@tjcran For the codepath I analyzed, I estimate ~0.25MB of memory leaked per SSH connection, for active long running servers it could mean significant memory consumption that is unrecoverable without a restart of the coder server. There could be other potential issues I haven't explored, like file descriptor exhaustion on the host. Edit: There's also a similar (but smaller) memory increase on the workspace running @ketang Not terribly exiting I'm afraid, I was simply curious as to what was causing slight CPU usage while |
@mafredri this is significant enough I"d like to include in Community MVP. |
There seems to be a goroutine leak in
coderd.(*api).workspaceAgentTurn
.This could be seen as two bugs:
(*http.Request).Context()
afterHijack
in more than one placeSteps to Reproduce
coder ssh dev
ctlr+d
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine
)The leak is in part due to reliance on the
http.Request
context and use of websockets. The underlying websocket library calls(*http.Request).Hijack
which disables context propagation.This happens here:
coder/coderd/workspaceagents.go
Line 304 in 668a671
And the following contexts will not cancel until the http handler completes:
coder/coderd/workspaceagents.go
Line 316 in 668a671
coder/coderd/workspaceagents.go
Line 320 in 668a671
We must avoid using
r.Context()
after hijack, unless we are using it with the expectation that the http handler will exit (at which point the context will complete).I'm unfamiliar with the
pion/turn
package, but another factor could be wrt how it handles connection closure, perhaps it does not propagate as we expect since we're not callingwsConn.Close()
due to context reliance?Similar reliance on request context after hijack is done elsewhere, we should rethink all of them. Example:
coder/coderd/workspaceagents.go
Line 155 in 668a671
The text was updated successfully, but these errors were encountered: