newlog: get rid of Fatal's in vector.go#5292
Conversation
| func createVectorSockets(sockPath string, backoffTime time.Duration) *net.UnixListener { | ||
| for { | ||
| // Create unix socket | ||
| os.Remove(sockPath) // Remove any existing socket |
There was a problem hiding this comment.
Supposing everything passed until the os.Chmod() call... so the loop will continue (returning here) and at this point both unixAddr + unixListerner are valid... shouldn't you close unixListener before remove the socket here?
| g.Expect(err).To(gomega.BeNil()) | ||
|
|
||
| // Wait a bit to let the function succeed | ||
| time.Sleep(2 * backoffPeriod) |
There was a problem hiding this comment.
this may make the test a bit flaky ...
There was a problem hiding this comment.
ok, I'll make them communicate through a channel
| time.Sleep(2 * backoffPeriod) | ||
|
|
||
| // verify that the listener was created | ||
| g.Expect(unixListener).ToNot(gomega.BeNil(), "createVectorSockets should succeed after directory creation") |
There was a problem hiding this comment.
isn't this in a race condition with the go func lambda above?
There was a problem hiding this comment.
ok, I'll make them communicate through a channel
| func createVectorSockets(sockPath string, backoffTime time.Duration) *net.UnixListener { | ||
| for { | ||
| // Create unix socket | ||
| os.Remove(sockPath) // Remove any existing socket |
There was a problem hiding this comment.
nitpicking because it is already there in the old code:
Would be nice to check the error of os.Remove and if it is not ENOENT, then log a warning/error.
|
Thank you for adding a test! |
| } | ||
| unixListener := createVectorSockets(sockPath, 10*time.Second) | ||
| defer unixListener.Close() | ||
| defer os.Remove(sockPath) |
There was a problem hiding this comment.
Oops, I've just noticed it. The order of the defers is strange. First, it will delete the socket, then it will attempt to close the connection (deferred tasks are handled in LIFO order).
Instead of fataling out when we fail to create the socket listener, we retry forever with a backoff. This is important because if newlogd exits, the watchdog will reboot the whole system, which is not what we want for a transient failure like a missing directory. Signed-off-by: Paul Gaiduk <[email protected]>
aecd811 to
64c17a9
Compare
| if err := os.Remove(sockPath); errors.Is(err, os.ErrNotExist) { | ||
| // Socket doesn't exist, this is expected | ||
| } else if err != nil { |
There was a problem hiding this comment.
| if err := os.Remove(sockPath); errors.Is(err, os.ErrNotExist) { | |
| // Socket doesn't exist, this is expected | |
| } else if err != nil { | |
| if err := os.Remove(sockPath); err != nil && !errors.Is(err, os.ErrNotExist) { |
There was a problem hiding this comment.
yeah, I thought for a minute about this and then I thought my version is more verbose and explicit
There was a problem hiding this comment.
okay, I am also fine with this if you prefer it.
Description
This addresses a promise from #5008 (comment).
Instead of fataling out when we fail to create the socket listener, we retry forever with a backoff. This is important because if newlogd exits, the watchdog will reboot the whole system, which is not what we want for a transient failure like a missing directory.
PR dependencies
None
How to test and validate this PR
No validation needed, the change is covered by unit and Eden tests.
Changelog notes
N/A since it's just a home work from #5008.
PR Backports
No need to backport since this code was added in EVE 15.
Checklist