Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

We're now using a timeout which is able to unblock gaining
the sandbox mutex in a container creation vs sandbox stop race.

Signed-off-by: Sascha Grunert [email protected]
Signed-off-by: Peter Hunt [email protected]

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

We're now using a timeout which is able to unblock gaining
the sandbox mutex in a container creation vs sandbox stop race.

Signed-off-by: Sascha Grunert <[email protected]>
Signed-off-by: Peter Hunt <[email protected]>
@openshift-ci openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jan 10, 2022
@openshift-ci openshift-ci bot requested review from sboeuf and vrothberg January 10, 2022 15:32
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 10, 2022
@codecov
Copy link

codecov bot commented Jan 10, 2022

Codecov Report

Merging #5535 (fd2e2ae) into main (9b7f5ae) will decrease coverage by 0.01%.
The diff coverage is 28.57%.

@@            Coverage Diff             @@
##             main    #5535      +/-   ##
==========================================
- Coverage   43.11%   43.10%   -0.02%     
==========================================
  Files         121      121              
  Lines       12148    12150       +2     
==========================================
- Hits         5238     5237       -1     
- Misses       6407     6410       +3     
  Partials      503      503              

@TomSweeneyRedHat
Copy link
Contributor

Change LGTM, but 6 minutes seems a bit long to wait.
Couple of tests still aren't hip.

Copy link
Member

@mrunalp mrunalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

@haircommander: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/openshift-jenkins/integration_crun_cgroupv2 fd2e2ae link false /test integration_cgroupv2
ci/openshift-jenkins/e2e_crun_cgroupv2 fd2e2ae link false /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a8a5653 into cri-o:main Jan 11, 2022
@haircommander
Copy link
Member Author

/cherry-pick release-1.23

@openshift-cherrypick-robot

@haircommander: new pull request created: #5540

Details

In response to this:

/cherry-pick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member Author

/cherry-pick release-1.22

@openshift-cherrypick-robot

@haircommander: new pull request created: #5545

Details

In response to this:

/cherry-pick release-1.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines 74 to +93
if err := mgr.RetryOnDisconnect(func(c *systemdDbus.Conn) error {
_, err = c.StartTransientUnitContext(ctx, unitName, "replace", properties, ch)
return err
return errors.Wrap(err, "start transient unit")
}); err != nil {
return err
}

// Block until job is started
<-ch
close(ch)
select {
case <-ch:
close(ch)
case <-time.After(time.Minute * 6):
// This case is a work around to catch situations where the dbus library sends the
// request but it unexpectedly disappears. We set the timeout very high to make sure
// we wait as long as possible to catch situations where dbus is overwhelmed.
// We also don't use the native context cancelling behavior of the dbus library,
// because experience has shown that it does not help.
// TODO: Find cause of the request being dropped in the dbus library and fix it.
return errors.Errorf("timed out moving conmon with pid %d to cgroup", pid)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two ideas coming into my mind:

  • What if we reduce the timeout to 1 minute and retry multiple times with an exponential backoff?
  • What if we check for the error "Message recipient disconnected from message bus without replying" and restart dbus in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we be able to get the error in this case?

@haircommander
Copy link
Member Author

/cherry-pick release-1.20

@openshift-cherrypick-robot

@haircommander: #5535 failed to apply on top of branch "release-1.20":

Applying: Use timeout for conmon cgroup move
Using index info to reconstruct a base tree...
M	utils/utils.go
Falling back to patching base and 3-way merge...
Auto-merging utils/utils.go
CONFLICT (content): Merge conflict in utils/utils.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Use timeout for conmon cgroup move
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants