[1.20] utils/RunUnderSystemdScope: fix wrt channel deadlock #6126
+12
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As seen in [1], sometimes coreos/go-systemd/dbus package deadlocks: the
jobCompete is stuck trying to send job result string to the channel
while holding the jobListener lock, while startJob (called by
StartTransientUnit) waits for the same lock.
Alas, it is not clear why the channel is not being read, nor was I able
to reproduce it locally.
Make the job result channel buffered, so jobJistener won't block on
channel send and thus StartTransientUnit won't be stuck either.
While at it,
move the error wrapping out of mgr.RetryOnDisconnect function,
and use fmt.Errorf with %w instead of obsoleted errors.Wrap;
improve error messages, printing the systemd unit name (so we can
check it in systemd log);
do check the job result string -- in case it is not "done",
return an error back to the caller, which should help avoid other
issues down the line.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2082344
Cherry picked from commit 343bcdd / #5914
What type of PR is this?
/kind bug
What this PR does / why we need it:
See above.
Which issue(s) this PR fixes:
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2095389
Special notes for your reviewer:
None
Does this PR introduce a user-facing change?