-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
There seems to be a goroutine leak in coderd.(*api).workspaceAgentTurn.
This could be seen as two bugs:
- Goroutine leak
- Use of
(*http.Request).Context()afterHijackin more than one place
Steps to Reproduce
- Enable pprof for coder server
coder ssh devctlr+d- Goto 1
- Check pprof (
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine)
The leak is in part due to reliance on the http.Request context and use of websockets. The underlying websocket library calls (*http.Request).Hijack which disables context propagation.
This happens here:
coder/coderd/workspaceagents.go
Line 304 in 668a671
| wsConn, err := websocket.Accept(rw, r, &websocket.AcceptOptions{ |
And the following contexts will not cancel until the http handler completes:
coder/coderd/workspaceagents.go
Line 316 in 668a671
| netConn := websocket.NetConn(r.Context(), wsConn, websocket.MessageBinary) |
coder/coderd/workspaceagents.go
Line 320 in 668a671
| case <-r.Context().Done(): |
We must avoid using r.Context() after hijack, unless we are using it with the expectation that the http handler will exit (at which point the context will complete).
I'm unfamiliar with the pion/turn package, but another factor could be wrt how it handles connection closure, perhaps it does not propagate as we expect since we're not calling wsConn.Close() due to context reliance?
Similar reliance on request context after hijack is done elsewhere, we should rethink all of them. Example:
coder/coderd/workspaceagents.go
Line 155 in 668a671
| resource, err := api.Database.GetWorkspaceResourceByID(r.Context(), workspaceAgent.ResourceID) |