Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@haircommander
Copy link
Member

What type of PR is this?

/kind bug

What this PR does / why we need it:

turns out, starting a new inotify watcher for each exec probe was really inefficient. Replace it with one single watcher for all execs, which makes the RSS addition per exec much less (pretty much the overhead of exec.Cmd, which is unavoidable)

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix a performance regression with exec probes

@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jul 29, 2021
@openshift-ci openshift-ci bot requested review from fidencio and sameo July 29, 2021 18:11
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2021
@haircommander haircommander force-pushed the exec-rss branch 4 times, most recently from 22c7d13 to 1440701 Compare July 29, 2021 18:31
func New(c *config.Config) *Runtime {
func New(c *config.Config) (*Runtime, error) {
execNotifyDir := filepath.Join(c.ContainerAttachSocketDir, "exec-pid-dir")
if err := os.MkdirAll(execNotifyDir, 0o755); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G301: Expect directory permissions to be 0750 or less
(at-me in a reply with help or ignore)

@haircommander haircommander force-pushed the exec-rss branch 3 times, most recently from b2def49 to 657d53c Compare July 29, 2021 20:58
@codecov
Copy link

codecov bot commented Jul 29, 2021

Codecov Report

Merging #5136 (35dc3b6) into master (6ac8cee) will increase coverage by 0.15%.
The diff coverage is 70.49%.

❗ Current head 35dc3b6 differs from pull request most recent head 81ceaf1. Consider uploading reports for the commit 81ceaf1 to get more accurate results

@@            Coverage Diff             @@
##           master    #5136      +/-   ##
==========================================
+ Coverage   43.93%   44.08%   +0.15%     
==========================================
  Files         110      111       +1     
  Lines       11453    11484      +31     
==========================================
+ Hits         5032     5063      +31     
+ Misses       5944     5939       -5     
- Partials      477      482       +5     

@haircommander
Copy link
Member Author

/test e2e-gcp

1 similar comment
@haircommander
Copy link
Member Author

/test e2e-gcp

@TomSweeneyRedHat
Copy link
Contributor

Just curious why you're creating notify in CRI-O rather than fixing upstream?
Otherwise, LGTM once the last unhappy test gets happy.

@haircommander
Copy link
Member Author

Just curious why you're creating notify in CRI-O rather than fixing upstream?
Otherwise, LGTM once the last unhappy test gets happy.

I'm not super sure which upstream you're talking about. I would really like to get a timeout feature into runc, but there is a lot of nuance to the behavior here (if exec takes more than x seconds, sig kill it is not a very generic requirement). So we do the next best thing: our best in CRI-O.

@TomSweeneyRedHat
Copy link
Contributor

Maybe I misunderstood this part, but it looked like you dropped github.com/rjeczalik/notify and replaced it with your own spin on notify.

return nil, err
}
go func() {
defer watcher.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be put after the err check on NewWatcher.

Copy link
Member Author

@haircommander haircommander Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it here because we want to close it after we receive <-done, not after the function exits

defer close(eiCh)
defer notify.Stop(eiCh)
for {
select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing with the container exit monitor, we are missing a case for wacher.Errors.

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, otherwise LGTM

@haircommander
Copy link
Member Author

Maybe I misunderstood this part, but it looked like you dropped github.com/rjeczalik/notify and replaced it with your own spin on notify.

ah not quite. I introduced a second library in the initial commit because it seemed to work better. I reverted that addition in this PR

to prevent excessive rss from being used in execs

Signed-off-by: Peter Hunt <[email protected]>
@haircommander
Copy link
Member Author

/retest

@mrunalp
Copy link
Member

mrunalp commented Aug 3, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 3, 2021
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [haircommander,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2021

@haircommander: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/openshift-jenkins/e2e_crun_cgroupv2 81ceaf1 link /test e2e_cgroupv2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

/retest-required

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants