Fix race sending error report before tcp connection closed #142
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I seem to have hit the problem described in #90
I run
stayrtrin few locations globally and some clients have a high RTT reaching my RTR server instances. It seems like as a result I often am not receiving error reports fromstayrtrbefore the tcp connection is closed, which is unfortunately needed as most of my clients are BIRD 2.x attempting to use RTR version 2 which is not supported by stayrtr at this time. Another side effect of this problem is that due to the way the code is currently implemented I end up with a goroutine leak tiedSendRawPDU(). This is fairly easy to simulate usingtc-netemon either the client or server side. I have PCAPs and pprof output if any evidence is needed.This change introduces an errgroup to manage the client read and write loops. Whenever anything calls
Disconnect()on the client we cancel the context. The errgroup ensures that both the read and write loops are cleanly stopped before closing the tcp connection. After running a canary of this for a while this seems to have resolved the problem for me.