Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

@haircommander haircommander commented Oct 12, 2020

What type of PR is this?

/kind bug

/kind cleanup
/kind dependency-change
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

If we create the network before we have an infra container, but fail to fully create a sandbox,
we attempt to clean up the network. Calling networkStop() causes CRI-O to place a file in the
sandbox's infra container's directory, thus allowing us to restore the fact that the network had been stopped

The problem is, we don't have a infra container directory, so the call segfaults.

Instead, check if the sandbox has finished creating before attempting to create the file. if it hasn't, there will be
no sandbox to restore, so we don't really need the temp file.

Another option would be to wire it so that the sandbox has access to the infraContainer.Dir() without actually having an infra container.
That requires another item in libsandbox.New(), which I find cumbersome. Further, I think sandbox creation code is itching for a refactor,
which can include that fix if we find it desireable. In the meantime, this work around is sufficient.

This PR un-reverts #4244 (i.e. reverting #4244), but also fixes #4240 (comment)

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

@openshift-ci-robot openshift-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. labels Oct 12, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2020
// cleaning up a failed sandbox creation.
// We don't need to create the file, as there will be no
// sandbox to restore
if !s.created {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should probably come before infra := s.InfraContainer() line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kolyshkin
Copy link
Collaborator

Would be nice to amend the first commit message with explanation about why this was reverted. It's nice to be able to read git log without referring to PRs.

@codecov
Copy link

codecov bot commented Oct 12, 2020

Codecov Report

Merging #4258 (fecf1a1) into master (dbda682) will decrease coverage by 0.01%.
The diff coverage is 16.66%.

@@            Coverage Diff             @@
##           master    #4258      +/-   ##
==========================================
- Coverage   38.59%   38.57%   -0.02%     
==========================================
  Files         111      111              
  Lines        8893     8894       +1     
==========================================
- Hits         3432     3431       -1     
- Misses       5077     5079       +2     
  Partials      384      384              

This reverts commit ef07f71.
commit ef07 (henceforth called revert-1) was a revert of 83169c5

revert-1 was needed because there was a chance that cri-o segfaulted because of an expectation of ordering.
If a RunPodSandbox request failed after the sandbox was created, but before the infra container was created,
the cleanup func called to clean up the network: networkStop would attempt to write a file to the infra container's Dir().
Since the infra container doesn't exist yet, that infraContainer.Dir() call would segfault.

This revert (revert-2) is the first in a two commit series, the second of which will fix that segfault,
thus allowing us to revert revert-1

Signed-off-by: Peter Hunt <[email protected]>
If we create the network before we have an infra container, but fail to fully create a sandbox,
we attempt to clean up the network. Calling networkStop() causes CRI-O to place a file in the
sandbox's infra container's directory, thus allowing us to restore the fact that the network had been stopped

The problem is, we don't have a infra container directory, so the call segfaults.

Instead, check if the sandbox has finished creating before attempting to create the file. if it hasn't, there will be
no sandbox to restore, so we don't really need the temp file.

Another option would be to wire it so that the sandbox has access to the infraContainer.Dir() without actually having an infra container.
That requires another item in libsandbox.New(), which I find cumbersome. Further, I think sandbox creation code is itching for a refactor,
which can include that fix if we find it desireable. In the meantime, this work around is sufficient.

Signed-off-by: Peter Hunt <[email protected]>
@haircommander haircommander force-pushed the fix-network-start-no-infra branch from 87a3747 to fecf1a1 Compare October 13, 2020 14:51
@fidencio
Copy link
Contributor

LGTM

@haircommander
Copy link
Member Author

/retest

Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kolyshkin, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fidencio
Copy link
Contributor

/lgtm
Thanks, @haircommander!

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2020
@fidencio
Copy link
Contributor

/retest

5 similar comments
@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

/retest

@fidencio
Copy link
Contributor

/retest

@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

/retest

@haircommander
Copy link
Member Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 6c5af83 into cri-o:master Nov 19, 2020
@openshift-merge-robot
Copy link
Contributor

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/e2e_fedora fecf1a1 link /test e2e_fedora

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants