Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sairameshv
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR implements the following enhancement related to the Evented PLEG

Which issue(s) this PR fixes:

Fixes # kubernetes/enhancements#3386

Special notes for your reviewer:

Does this PR introduce a user-facing change?

This PR contains a new config option to enable "evented-pleg" while starting/running the crio binary. The same has been updated wherever required.

Added a new boolean configuration flag "--evented-pleg"(defaulted to "false") to enable the evented pleg mechanism in cri-o. The environment variable "EVENTED_PLEG" when set to "true" also enables the evented pleg in the cri-o.

@sairameshv sairameshv requested a review from mrunalp as a code owner November 29, 2022 08:57
@openshift-ci openshift-ci bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Nov 29, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 29, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 29, 2022

Hi @sairameshv. Thanks for your PR.

I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 29, 2022
@openshift-ci openshift-ci bot requested review from klihub and wgahnagl November 29, 2022 08:58
@sairameshv sairameshv changed the title Events pod status Support for Evented PLEG Nov 29, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 29, 2022
@sairameshv
Copy link
Member Author

Comment on lines 8 to 15

if s.server.Config().EventedPLEG {
for containerEvent := range s.server.ContainerEventsChan {
if err := ces.Send(&containerEvent); err != nil {
return err
}
}
}
Copy link
Contributor

@klihub klihub Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really how this should work ? Maybe I misread the code, but to me it looks like that you are

  • using s.server.ContainerEventsChan to buffer at most a 1000 events at a time
  • you send all pending/'buffered' events from the channel to the stream when you get a GetContainerEvents()
  • you never send any subsequent events to the stream (even if the client sticks around)

I might be totally wrong here, but my impression of the intention of this new CRI API was that when a client makes a GetContainerEvents() call, it should keep receiving container events over the stream until the client itself goes away (or until the server decides to stop sending events and close the stream). And that there might be several concurrent clients listening for container events... now you only really support one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again IIUC, the exhibited behavior of this implementation will be that when a client does a GetContainerEvents(), it will get all the queued events from the ContainerEventsChan and then receive nothing ever after. Also, if there has been more than a 1000 events emitted/generated by that time, the client will get the oldest 1000 with anything newer being already lost.

So shouldn't this instead work somehow so that:

  • in GetContainerEvents() you simply store the new client stream along with any other existing ones
  • you set up an 'event pump' goroutine which sits in loop reading events from ContainerEventsChan and multiplexes them to all client streams
  • in the event pump you close/remove any client stream on which sending an event fails (client gone, or too slow to read)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a multiplexed approach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub, @haircommander Thanks for the suggestions, let me check this out and get back.

@haircommander
Copy link
Member

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 29, 2022
if s.config.EventedPLEG {
err := s.generateCRIEvent(ctx, c, types.ContainerEventType_CONTAINER_DELETED_EVENT)
if err != nil {
log.Errorf(ctx, "Unable to generate event %s for container %s due to err %s", types.ContainerEventType_CONTAINER_DELETED_EVENT, c.ID(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO item: should we emit a metric for events that were failed to generate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a great idea! Not per container but just having a counter collecting failures.

pb "k8s.io/cri-api/pkg/apis/runtime/v1"
)

func (s *service) GetContainerEvents(req *pb.GetEventsRequest, ces pb.RuntimeService_GetContainerEventsServer) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit about code structure: we have a convention of having functions in this folder this to call directly into a function in the server (see the other rpcs for examples)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@codecov
Copy link

codecov bot commented Nov 29, 2022

Codecov Report

Merging #6404 (1b8b284) into main (cade249) will decrease coverage by 0.22%.
The diff coverage is 11.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6404      +/-   ##
==========================================
- Coverage   42.74%   42.51%   -0.23%     
==========================================
  Files         126      126              
  Lines       14834    14942     +108     
==========================================
+ Hits         6341     6353      +12     
- Misses       7806     7899      +93     
- Partials      687      690       +3     

docs/crio.8.md Outdated

**--enable-tracing**: Enable OpenTelemetry trace data exporting.

**--evented-pleg**: Enable CRI-O to generate the container pod-level events in order to optimize the performance of the Pod Lifecycle Event Generator (PLEG) module in Kubelet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of --emitted-events. @haircommander thoughts on the name?

Seems weird to tie this directly with the pleg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean, but I'm not as worried about it, as kubelet is the only client we care about. I think it would be most clear as --emit-pod-and-container-events but that's too verbose. I think pleg describes what they're for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the --enable-… pattern for features to be used. Let me throw in a third opinion, how about: --enable-pod-events?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like --enable-pod-events.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@haircommander
Copy link
Member

something worth noting is, as it is now, all clients share the same events source, so if we add a crictl events call, kubelet will miss those events... I don't think it's vital to address now but is something we may want to fix in the future

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 21, 2022
server/server.go Outdated
PodSandboxStatus: sandboxStatuses, ContainersStatuses: containerStatuses}:
log.Debugf(ctx, "Container event %s generated for %s", eventType, container.ID())
default:
log.Errorf(ctx, "generateCRIEvent: failed to generate event %s for container %s", eventType, container.ID())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/generateCRIEvent/GenerateCRIEvent to avoid the log-capitalization githubaction test failure.
Likewise for other places too

@haircommander
Copy link
Member

@sairameshv there's a bunch of lint, shfmt, vendor and docs issues to address before this can merge (cc @harche )

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2022
@haircommander
Copy link
Member

/retest

sairameshv and others added 2 commits December 22, 2022 13:46
1. Emit CRI events during the sandbox and container lifecycle
2. Enable a config option `enable-pod-events` to switch on/off the Evented PLEG
3. Add relevant test cases to validate the functionality of the evented pleg

Signed-off-by: Sai Ramesh Vanka <[email protected]>
Co-authored-by: Harshal Patil <[email protected]>
@haircommander
Copy link
Member

/retest
/override ci/prow/ci-rhel-integration

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 22, 2022

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-rhel-integration

Details

In response to this:

/retest
/override ci/prow/ci-rhel-integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Member

/test ci-cgroupv2-e2e-crun

@haircommander
Copy link
Member

/test kata-containers

@openshift-ci-robot
Copy link

@sairameshv: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/kata-jenkins 1b8b284 link true /test kata-containers
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@haircommander
Copy link
Member

/override ci/prow/ci-cgroupv2-e2e-crun
/override ci/prow/ci-e2e-conmonrs
/override ci/prow/ci-rhel-integration
/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 22, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 22, 2022

@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-cgroupv2-e2e-crun, ci/prow/ci-e2e-conmonrs, ci/prow/ci-rhel-integration

Details

In response to this:

/override ci/prow/ci-cgroupv2-e2e-crun
/override ci/prow/ci-e2e-conmonrs
/override ci/prow/ci-rhel-integration
/lgtm
/approve

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 22, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, sairameshv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 22, 2022
@haircommander
Copy link
Member

/override ci/kata-jenkins

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 22, 2022

@haircommander: Overrode contexts on behalf of haircommander: ci/kata-jenkins

Details

In response to this:

/override ci/kata-jenkins

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants