Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

What type of PR is this?

/kind design

What this PR does / why we need it:

It is not really needed, and causes unnecessary overhead. Plus, we drop an unkillable child from the mix
allowing cri-o to keep track of its children better.

Signed-off-by: Peter Hunt [email protected]

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

ExecSync requests now don't use conmon, instead calling the runtime directly, which reduces overhead.

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/design Categorizes issue or PR as related to design. labels May 25, 2021
@openshift-ci openshift-ci bot requested review from giuseppe and sameo May 25, 2021 18:23
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2021
@codecov
Copy link

codecov bot commented May 25, 2021

Codecov Report

Merging #4943 (53a06fb) into master (0a81d8f) will increase coverage by 0.44%.
The diff coverage is 0.00%.

❗ Current head 53a06fb differs from pull request most recent head eebef46. Consider uploading reports for the commit eebef46 to get more accurate results

@@            Coverage Diff             @@
##           master    #4943      +/-   ##
==========================================
+ Coverage   42.94%   43.38%   +0.44%     
==========================================
  Files         107      107              
  Lines        9933     9825     -108     
==========================================
- Hits         4266     4263       -3     
+ Misses       5217     5112     -105     
  Partials      450      450              

Stdout: stdout,
Stderr: stderr,
ExitCode: exitCode,
}, waitErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block doesn't match the existing code block and that may break container restarts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if the kubelet will tolerate this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's something awry with the code. I am working on getting critests passing, and that will likely involve changing this block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just spent nearly a whole day looking for

        // gather exit code from waitErr
        exitCode := int32(0)
        if waitErr != nil {
-               if exitError, ok := err.(*exec.ExitError); ok {
+               if exitError, ok := waitErr.(*exec.ExitError); ok {
                        exitCode = int32(exitError.ExitCode())
                }
        }

😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch

@haircommander haircommander force-pushed the conmonless-exec branch 2 times, most recently from 6e02a4b to 98e83f9 Compare May 26, 2021 14:42
@haircommander
Copy link
Member Author

/retest

@haircommander haircommander force-pushed the conmonless-exec branch 6 times, most recently from ac2d146 to 9a6598b Compare May 26, 2021 19:55
@haircommander haircommander changed the title oci: don't use conmon for exec sync oci: do not use conmon for exec sync May 26, 2021
@haircommander
Copy link
Member Author

/test integration_fedora

@haircommander
Copy link
Member Author

haircommander commented May 26, 2021

goodness this is a rabbit hole
we needed to handle the case where the command times out but the container process still runs. since the container process isn't a child of the runtime process, we can't kill the exec.Command struct, but rather find the process with pidFile and kill that (what conmon does).

I think this passes critest now, hard to tell because I keep getting rate limited. no idea what's up with the openshift jenkins tests either...

edit: openshift jenkins didn't seem to like the ' in the title 🙃

@saschagrunert
Copy link
Member

/override ci/prow/e2e-agnostic
/override ci/prow/e2e-gcp

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 27, 2021

@saschagrunert: Overrode contexts on behalf of saschagrunert: ci/prow/e2e-agnostic, ci/prow/e2e-gcp

Details

In response to this:

/override ci/prow/e2e-agnostic
/override ci/prow/e2e-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two nits, otherwise LGTM

Stderr: stderrBuf,
ExitCode: -1,
Err: err,
log.Errorf(ctx, "failed to get pid (%d) or pgid (%d) from file %s: %v", ctrPid, ctrPgid, pidFile, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Errorf(ctx, "failed to get pid (%d) or pgid (%d) from file %s: %v", ctrPid, ctrPgid, pidFile, err)
log.Errorf(ctx, "Failed to get pid (%d) or pgid (%d) from file %s: %v", ctrPid, ctrPgid, pidFile, err)
return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

log.Errorf(ctx, "Failed to kill process after timeout: %v", err)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

It is not really needed, and causes unnecessary overhead. Plus, we drop an unkillable child from the mix
allowing cri-o to keep track of its children better.

Signed-off-by: Peter Hunt <[email protected]>
@haircommander
Copy link
Member Author

/retest

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
@mrunalp @haircommander feel free to lift the hold when ready.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 27, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

haircommander referenced this pull request in ffromani/cri-o May 27, 2021
For ultra-low-latency workloads, it is helpful to guarantee
that the container entry point, and any other foreign process
run in the container (kubectl exec...) are always scheduled
on fixed and predictable cpu in the container cpuset (e.g. not in a
random one).

This patch implements this optional behaviour depending on
container annotations.

Signed-off-by: Francesco Romani <[email protected]>
@haircommander
Copy link
Member Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 27, 2021

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/e2e_crun_cgroupv2 eebef46 link /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mrunalp
Copy link
Member

mrunalp commented May 30, 2021

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2021
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 8fcce26 into cri-o:master May 30, 2021
@saschagrunert
Copy link
Member

/cherry-pick release-1.20

@openshift-cherrypick-robot

@saschagrunert: #4943 failed to apply on top of branch "release-1.20":

Applying: oci: do not use conmon for exec sync
Using index info to reconstruct a base tree...
M	internal/oci/runtime_oci.go
Falling back to patching base and 3-way merge...
Auto-merging internal/oci/runtime_oci.go
CONFLICT (content): Merge conflict in internal/oci/runtime_oci.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 oci: do not use conmon for exec sync
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@saschagrunert
Copy link
Member

Backported manually in #4953 and #4954

@haircommander
Copy link
Member Author

/cherry-pick release-1.21

@openshift-cherrypick-robot

@haircommander: new pull request created: #4962

Details

In response to this:

/cherry-pick release-1.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/design Categorizes issue or PR as related to design. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants