-
Couldn't load subscription status.
- Fork 881
testutils: fix GoroutineAssistant and httputils #1608
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have several issues with GoroutineAssistant's Fatalf:
- The
schannel should have a buffer (say of two elements), so sending a message there is not blocked. - It should not call Done() on WaitGroup - it is done in every goroutine as a deferred action. The problem was that this deferred action might not be executed, because of being blocked by sending an error to blocking channel.
- The
returndirectives were removed after calls to Fatalf in goroutines, so it should imply that Fatalf does not return (it should callruntime.Goexit()).
|
Perhaps we need to start thinking about it as a gexpect helper, and use Close during shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two of three issues I wrote before are still not addressed.
- This function is not supposed to return - it should call
runtime.Goexit()as the last thing in its body. We assume this function does not return in other places (we removedreturnclauses). - I think that
a.schannel should be non-blocking (as ins: make(chan error, 10)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. I added the call to runtime.Goexit(). I'm not convinced to make the channel buffered, that would make the code more complex to allow dynamic channel size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make it fixed at 10 and that's it. For now go does not allow to grow channel's buffer. For now we don't use too many goroutines on single assistant (2 at most), so it should be enough in the forseeable future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why exactly would you rather have a buffered channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have it thought out with unbuffered channel to take advantage of the synchronizing nature of these. This will force us to think about the written test a little more and improve the quality. Buffered channels lead to fire&forget runs, and we'd need another mechanism to synchronize/stop failed tests(routines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending to this channel is usually the last action goroutine does explicitly (that is - not counting the deferred actions).
I wanted to have a buffered channel, just to let the goroutine to die quickly, without waiting for the receiving side of channel, so deferred actions can be executed immediately.
tests/rkt_net_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will deadlock if spawning fails - it will try to send an error over a blocking channel which is not listened yet by anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we're not supposed to use ga outside of the gouroutines, so SpawnOrFail is not supposed to be used here.
|
I tend to agree with krnowak that ultimately this should live in the context, but I think this is OK as a stopgap for now. |
|
@krnowak PTAL. |
tests/rkt_tests.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same thing has to be done also in reset function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe put this into runGC? But that might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not assumed that ctx.cleanup() is called before ctx.reset()? does it make sense to run reset() without cleanup()? if not, we should have a variable for tracking the context's state and run cleanup() if only reset() is invoked.
|
The RegisterChild is good for now. Maybe in the follow-up PR we could modify the spawnOrFail function to take the context and register the child immediately. Also, is the commit "tests/net: one call to ga.Add(1) per goroutine" needed? I suppose we can drop it. |
Use a single channel for shutting down GoroutineAssistant, of type error instead of string; in this way, Done() and Fatalf() are serialized and can't race. Fixes #1590
Calling log.Fatal would circumvent proper test shutdown/error propagation. All callers are already checking for the error appropriately.
After a GoroutineAssistant is initialised during tests, all operations that might end in a testing.Fatal should be serialized through it.
The test ACI server closes both Msg and Stop channels on Close(). The Stop channel never receives any messages - it is only used to listen for service shutdown. The channel listening is done in serverHandler, which is run in its own goroutine. The Stop channel was used shortly before, but I wrongly removed it during review process. Because of that, the function never returned and the goroutine was unnecessarily kept alive.
Otherwise there are issues when cleaning up the rkt context.
Since the tests that use the inspect binary rely on stdout parsing, printing this too early will cause a race condition between the serve and test code. Simply moving the print to a point after the Listen() call prevents that.
* don't branch since Fatalf will not return * exit the calling goroutine in Fatalf
When tests spawn children and then cause a failure, they don't have the chance to wait for the children to complete. These tests are supposed to use the ctx.RegisterChild function for every child so that ctx.cleanup() will be able to handle child shutdown.
|
LFAD if green. |
tests/rkt_tests.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the errors from these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could pass it on and then handle (ignore) it in the invoking function, would you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest doing a nominal check and just logging for information purposes. I'm just trying to future-proof us against the kind of murky mad situation that got us here in the first place.
If you feel strongly against that, let's at least move to the better practice/style of explicitly ignoring the error:
_ = child.Cmd.Process.Wait()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with logging the error. If it was just me, we'd be logging a lot more.
|
This PR is dragging on and on and is gathering more and more changes. I'd like to merge ASAP, but first some of my notes/questions first:
But can we please merge it already? It fixes those pesky failures in networking tests and we are good. I'd prefer to address the other issues (no leftover goroutines, agree on logging solution, proper rkt registration for all tests and so on) in separate, uh, issues. Github issues. |
|
SGTM |
testutils: fix GoroutineAssistant and httputils
Closes #1595.
Fixes #1590.