Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix websocket/yamux/drpc goleak failure on Windows #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bryphe-coder opened this issue Feb 17, 2022 · 5 comments · Fixed by #329 or #337
Closed

Fix websocket/yamux/drpc goleak failure on Windows #317

bryphe-coder opened this issue Feb 17, 2022 · 5 comments · Fixed by #329 or #337
Labels
api Area: HTTP API
Milestone

Comments

@bryphe-coder
Copy link
Contributor

bryphe-coder commented Feb 17, 2022

There is failure that occurs somewhat regularly in CI - the test pass, but goleak reports a failure.

Full Failure
coverage: 74.1% of statements
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 88026 in state select, with nhooyr.io/websocket.(*Conn).writeFrame.func1 on top of the stack:
goroutine 88026 [select]:
nhooyr.io/websocket.(*Conn).writeFrame.func1()
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/write.go:276 +0x129
nhooyr.io/websocket.(*Conn).writeFrame(0xc0025b6000, {0x181dee0, 0xc000873580}, 0x1, 0x0, 0x2, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/write.go:318 +0xb48
nhooyr.io/websocket.(*Conn).write(0xc0025b6000, {0x181dee0, 0xc000873580}, 0x1, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/write.go:129 +0x2d2
nhooyr.io/websocket.(*Conn).Write(0xc0001391a0, {0x181dee0, 0xc000873580}, 0xc00021b268, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/write.go:42 +0x8e
nhooyr.io/websocket.(*netConn).Write(0xc000139080, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/netconn.go:84 +0xb0
github.com/hashicorp/yamux.(*Session).send(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:[44](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:44)3 +0x234
created by github.com/hashicorp/yamux.newSession
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:114 +0x8d1

 Goroutine 88025 in state select, with nhooyr.io/websocket.(*mu).lock on top of the stack:
goroutine 88025 [select]:
nhooyr.io/websocket.(*mu).lock(0xc0041b4fe0, {0x181df50, 0xc003405b60})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/conn_notjs.go:238 +0x17f
nhooyr.io/websocket.(*Conn).writeFrame(0xc0025b6000, {0x181df50, 0xc003405b60}, 0x1, 0x0, 0x8, {0xc001508855, 0xb, 0xb})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/write.go:2[45](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:45) +0xd2
nhooyr.io/websocket.(*Conn).writeControl(0xc0025b6158, {0x181df18, 0xc0000a0198}, 0xa8cf4c, {0xc001508855, 0xb, 0xb})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/write.go:236 +0x105
nhooyr.io/websocket.(*Conn).writeClose(0xc0025b6000, 0x3f0, {0xc001508827, 0x9})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/close_notjs.go:76 +0x21c
nhooyr.io/websocket.(*Conn).handleControl(0xc0025b6000, {0x181dee0, 0xc0008735c0}, {0x1, 0x0, 0x0, 0x0, 0x8, 0xb, 0x1, ...})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/read.go:295 +0x4f4
nhooyr.io/websocket.(*Conn).readLoop(0xc0025b6000, {0x181dee0, 0xc0008735c0})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/read.go:163 +0x206
nhooyr.io/websocket.(*Conn).reader(0xc0025b6000, {0x181dee0, 0xc0008735c0})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/read.go:315 +0x245
nhooyr.io/websocket.(*Conn).Reader(...)
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/read.go:30
nhooyr.io/websocket.(*netConn).Read(0xc000139080, {0xc00093e000, 0x1000, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/netconn.go:100 +0x167
bufio.(*Reader).Read(0xc0001390e0, {0xc00144c9e4, 0xc, 0xc})
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/bufio/bufio.go:227 +0x4db
io.ReadAtLeast({0x180ae00, 0xc0001390e0}, {0xc00144c9e4, 0xc, 0xc}, 0xc)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/io/io.go:328 +0xde
io.ReadFull(...)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/io/io.go:347
github.com/hashicorp/yamux.(*Session).recvLoop(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:483 +0x136
github.com/hashicorp/yamux.(*Session).recv(0x0)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:[46](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:46)2 +0x2f
created by github.com/hashicorp/yamux.newSession
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:113 +0x865

 Goroutine 88707 in state select, with storj.io/drpc/drpcmanager.(*Manager).manageStream on top of the stack:
goroutine 88707 [select]:
storj.io/drpc/drpcmanager.(*Manager).manageStream(0xc001576b60, {0x181df88, 0xc001dc5950}, 0xc0009a2d80)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:289 +0x1b7
storj.io/drpc/drpcmanager.(*Manager).manageStreams(0xc001576b60)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:276 +0xfa
created by storj.io/drpc/drpcmanager.NewWithOptions
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:104 +0x7d1

 Goroutine 88027 in state select, with github.com/hashicorp/yamux.(*Session).keepalive on top of the stack:
goroutine 88027 [select]:
github.com/hashicorp/yamux.(*Session).keepalive(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:338 +0x129
created by github.com/hashicorp/yamux.newSession
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:116 +0x95a

 Goroutine 88024 in state select, with nhooyr.io/websocket.(*Conn).timeoutLoop on top of the stack:
goroutine 88024 [select]:
nhooyr.io/websocket.(*Conn).timeoutLoop(0xc0025b6000)
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/conn_notjs.go:153 +0x23d
created by nhooyr.io/websocket.newConn
	C:/Users/runneradmin/go/pkg/mod/nhooyr.io/[email protected]/conn_notjs.go:114 +0xc65

 Goroutine 88028 in state chan receive, with storj.io/drpc/drpcserver.(*Server).Serve.func1 on top of the stack:
goroutine 88028 [chan receive]:
storj.io/drpc/drpcserver.(*Server).Serve.func1({0x181e5e0, 0xc001777890})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcserver/server.go:80 +0x51
storj.io/drpc/drpcctx.(*Tracker).track(0xc001777890, 0xc0014c3620)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcctx/transport.go:52 +0x3e
created by storj.io/drpc/drpcctx.(*Tracker).Run
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcctx/transport.go:[47](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:47) +0xf1

 Goroutine 88688 in state select, with github.com/hashicorp/yamux.(*Session).waitForSendErr on top of the stack:
goroutine 88688 [select]:
github.com/hashicorp/yamux.(*Session).waitForSendErr(0xc001718f20, {0xc001cb5fd4, 0xc, 0xc}, {0xc000a6b000, 0x4, 0x1000}, 0xc0034051a0)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:385 +0x532
github.com/hashicorp/yamux.(*Stream).write(0xc001b4c410, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/stream.go:202 +0x474
github.com/hashicorp/yamux.(*Stream).Write(0xc001b4c410, {0xc000a6b000, 0x4, 0x1000})
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/stream.go:158 +0x18d
storj.io/drpc/drpcwire.(*Writer).Flush(0xc0018f9d40)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcwire/writer.go:85 +0xba
storj.io/drpc/drpcstream.(*Stream).rawFlushLocked(0xc0009a2d80)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcstream/stream.go:352 +0x176
storj.io/drpc/drpcstream.(*Stream).MsgSend(0xc0009a2d80, {0x15950c0, 0xc0018f9dc0}, {0x18177e0, 0x1dc8e68})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcstream/stream.go:404 +0x2b2
storj.io/drpc/drpcmux.(*Mux).HandleRPC(0xc000a41918, {0x1823f38, 0xc0009a2d80}, {0xc0005b6420, 0x29})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcmux/handle_rpc.go:38 +0x57f
storj.io/drpc/drpcserver.(*Server).handleRPC(0xc000139200, 0x181df88, {0xc0005b6420, 0x29})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcserver/server.go:122 +0x72
storj.io/drpc/drpcserver.(*Server).ServeOne(0xc000139200, {0x181e5e0, 0xc001777890}, {0x16245952598, 0xc001b4c410})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcserver/server.go:66 +0x2db
storj.io/drpc/drpcserver.(*Server).Serve.func2({0x181e5e0, 0xc001777890})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcserver/server.go:112 +0x77
storj.io/drpc/drpcctx.(*Tracker).track(0xc001777890, 0xc00004fac0)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcctx/transport.go:52 +0x3e
created by storj.io/drpc/drpcctx.(*Tracker).Run
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcctx/transport.go:47 +0xf1

 Goroutine 88706 in state select, with github.com/hashicorp/yamux.(*Stream).Read on top of the stack:
goroutine 88706 [select]:
github.com/hashicorp/yamux.(*Stream).Read(0xc001b4c410, {0xc000114000, 0x1000, 0x0})
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/stream.go:141 +0x4bc
storj.io/drpc/drpcwire.(*Reader).ReadPacketUsing(0xc000081d60, {0xc0005b63f0, 0x0, 0x30})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcwire/reader.go:68 +0x8f6
storj.io/drpc/drpcmanager.(*Manager).manageReader(0xc001576b60)
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:199 +0x185
created by storj.io/drpc/drpcmanager.NewWithOptions
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcmanager/manager.go:103 +0x765

 Goroutine 88022 in state select, with github.com/hashicorp/yamux.(*Session).AcceptStream on top of the stack:
goroutine 88022 [select]:
github.com/hashicorp/yamux.(*Session).AcceptStream(0xc001718f20)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:237 +0x109
github.com/hashicorp/yamux.(*Session).Accept(0xc001777890)
	C:/Users/runneradmin/go/pkg/mod/github.com/hashicorp/[email protected]/session.go:227 +0x31
storj.io/drpc/drpcserver.(*Server).Serve(0xc000139200, {0x181df88, 0xc001777680}, {0x18182e0, 0xc001718f20})
	C:/Users/runneradmin/go/pkg/mod/storj.io/[email protected]/drpcserver/server.go:85 +0x326
github.com/coder/coder/coderd.(*api).provisionerDaemonsServe(0xc000561d00, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	D:/a/coder/coder/coderd/provisionerdaemons.go:100 +0x1185
net/http.HandlerFunc.ServeHTTP(0xc0025e5d00, {0x18186d0, 0xc001f090a0}, 0xc00144c98a)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc001c94240, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:442 +0x498
net/http.HandlerFunc.ServeHTTP(0xc0025e5cf0, {0x18186d0, 0xc001f090a0}, 0xa720dc)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc001c94240, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:71 +0x937
github.com/go-chi/chi/v5.(*Mux).Mount.func1({0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:314 +0x516
net/http.HandlerFunc.ServeHTTP(0xc0000989c0, {0x18186d0, 0xc001f090a0}, 0xc0033efca0)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc00077a2a0, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:442 +0x498
net/http.HandlerFunc.ServeHTTP(0xc0025e5610, {0x18186d0, 0xc001f090a0}, 0xa720dc)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc00077a2a0, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:71 +0x937
github.com/go-chi/chi/v5.(*Mux).Mount.func1({0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:314 +0x516
net/http.HandlerFunc.ServeHTTP(0xc0000989e0, {0x18186d0, 0xc001f090a0}, 0xc00045b174)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc00077a240, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:442 +0x498
net/http.HandlerFunc.ServeHTTP(0xc0025e5d30, {0x18186d0, 0xc001f090a0}, 0x1d51b60)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2047 +0x4e
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc00077a240, {0x18186d0, 0xc001f090a0}, 0xc0007d8400)
	C:/Users/runneradmin/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:88 +0x8b7
net/http.serverHandler.ServeHTTP({0xc0017775c0}, {0x18186d0, 0xc001f090a0}, 0xc0007d8400)
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:2879 +0x89b
net/http.(*conn).serve(0xc0008337c0, {0x181df88, 0xc001c8df50})
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:1930 +0x12e5
created by net/http.(*Server).Serve
	C:/hostedtoolcache/windows/go/1.17.6/x64/src/net/http/server.go:3034 +0x8[48](https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:48)

Link to example manifestation: https://github.com/coder/coder/runs/5239756778?check_suite_focus=true#step:7:44

I suspect we're failing to close a connection or server - it seems like this is stemming from

(*api).provisionerDaemonsServe(0xc000561d00, {0x18186d0, 0xc001f090a0}, 0xc0007d8500)

and an issue where a connection is staying open between the provisioner <-> coderd host.

Interestingly, I've only observed this particular failure on Windows runs.

@kylecarbs
Copy link
Member

Pretty sure it's this:

https://github.com/nhooyr/websocket/blob/e4c3b0f8168d619c279822ab882b8f15717041af/close_notjs.go#L76-L80

It's using a context.Background when writing the close message, which seems incorrect. I've pinged nhooyr.

@kylecarbs
Copy link
Member

Just confirmed with nhooyr that's it! Working on a fix now.

kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. We depended on the context canceling the yamux connection,
but this isn't a sync operation. Explicitly calling close ensures the
handler waits for yamux to complete before exit.
@kylecarbs
Copy link
Member

Turns out that is a leak but isn't the cause of our problem. Not synchronously closing yamux caused this leak.

kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 18, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 19, 2022
Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.
kylecarbs added a commit that referenced this issue Feb 19, 2022
* fix: Leaking yamux session after HTTP handler is closed

Closes #317. The httptest server cancels the context after the connection
is closed, but if a connection takes a long time to close, the request
would never end. This applies a context to the entire listener that cancels
on test cleanup.

After discussion with @bryphe-coder, reducing the parallel limit on
Windows is likely to reduce failures as well.

* Switch to windows-2022 to improve decompression

* Invalidate cache on matrix OS
kylecarbs added a commit that referenced this issue Feb 19, 2022
* feat: Add workspace agent for SSH

This adds the initial agent that supports TTY
and execution over SSH. It functions across MacOS,
Windows, and Linux.

This does not handle the coderd interaction yet,
but does setup a simple path forward.

* Fix pty tests on Windows

* Fix log race

* Lock around dial error to fix log output

* Fix context return early

* fix: Leaking yamux session after HTTP handler is closed

Closes #317. We depended on the context canceling the yamux connection,
but this isn't a sync operation. Explicitly calling close ensures the
handler waits for yamux to complete before exit.

* Lock around close return

* Force failure with log

* Fix failed handler

* Upgrade dep

* Fix defer inside loops

* Fix context cancel for HTTP requests

* Fix resize
@kylecarbs kylecarbs reopened this Feb 20, 2022
@kylecarbs
Copy link
Member

Fixing the yamux portion helped, but nhooyr/websocket still has it's leak.

https://github.com/coder/coder/runs/5265928205?check_suite_focus=true#step:7:44

@kylecarbs
Copy link
Member

coder/websocket#330

@misskniss misskniss added this to the V2 Alpha milestone May 3, 2022
@misskniss misskniss added the api Area: HTTP API label Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment