Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions utils/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,26 +69,32 @@ func RunUnderSystemdScope(mgr *dbusmgr.DbusConnManager, pid int, slice, unitName
if slice != "" {
properties = append(properties, systemdDbus.PropSlice(slice))
}
ch := make(chan string)
// Make a buffered channel so that the sender (go-systemd's jobComplete)
// won't be blocked on channel send while holding the jobListener lock
// (RHBZ#2082344).
ch := make(chan string, 1)
if err := mgr.RetryOnDisconnect(func(c *systemdDbus.Conn) error {
_, err = c.StartTransientUnit(unitName, "replace", properties, ch)
return errors.Wrap(err, "start transient unit")
}); err != nil {
return err
}); err != nil {
return fmt.Errorf("start transient unit %q: %w", unitName, err)
}

// Block until job is started
// Wait for the job status.
select {
case <-ch:
case s := <-ch:
close(ch)
if s != "done" {
return fmt.Errorf("error moving conmon with pid %d to systemd unit %s: got %s", pid, unitName, s)
}
case <-time.After(time.Minute * 6):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself wondering if we still need this case. Ideally, the buffered channel would fix the deadlock long term. From the sounds of it, having the request timeout just caused a less obvious deadlock

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we do. The buffered channel fixes the deadlock, but if we haven't received a reply from systemd we should still say so and return an error.

// This case is a work around to catch situations where the dbus library sends the
// request but it unexpectedly disappears. We set the timeout very high to make sure
// we wait as long as possible to catch situations where dbus is overwhelmed.
// We also don't use the native context cancelling behavior of the dbus library,
// because experience has shown that it does not help.
// TODO: Find cause of the request being dropped in the dbus library and fix it.
return errors.Errorf("timed out moving conmon with pid %d to cgroup", pid)
return errors.Errorf("timed out moving conmon with pid %d to systemd unit %s", pid, unitName)
}

return nil
Expand Down