-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support for Evented PLEG #6404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Evented PLEG #6404
Conversation
|
Hi @sairameshv. Thanks for your PR. I'm waiting for a cri-o member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9e24302 to
07ccf37
Compare
|
|
||
| if s.server.Config().EventedPLEG { | ||
| for containerEvent := range s.server.ContainerEventsChan { | ||
| if err := ces.Send(&containerEvent); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really how this should work ? Maybe I misread the code, but to me it looks like that you are
- using
s.server.ContainerEventsChanto buffer at most a 1000 events at a time - you send all pending/'buffered' events from the channel to the stream when you get a
GetContainerEvents() - you never send any subsequent events to the stream (even if the client sticks around)
I might be totally wrong here, but my impression of the intention of this new CRI API was that when a client makes a GetContainerEvents() call, it should keep receiving container events over the stream until the client itself goes away (or until the server decides to stop sending events and close the stream). And that there might be several concurrent clients listening for container events... now you only really support one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again IIUC, the exhibited behavior of this implementation will be that when a client does a GetContainerEvents(), it will get all the queued events from the ContainerEventsChan and then receive nothing ever after. Also, if there has been more than a 1000 events emitted/generated by that time, the client will get the oldest 1000 with anything newer being already lost.
So shouldn't this instead work somehow so that:
- in
GetContainerEvents()you simply store the new client stream along with any other existing ones - you set up an 'event pump' goroutine which sits in loop reading events from
ContainerEventsChanand multiplexes them to all client streams - in the event pump you close/remove any client stream on which sending an event fails (client gone, or too slow to read)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to a multiplexed approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub, @haircommander Thanks for the suggestions, let me check this out and get back.
|
/ok-to-test |
server/container_stop.go
Outdated
| if s.config.EventedPLEG { | ||
| err := s.generateCRIEvent(ctx, c, types.ContainerEventType_CONTAINER_DELETED_EVENT) | ||
| if err != nil { | ||
| log.Errorf(ctx, "Unable to generate event %s for container %s due to err %s", types.ContainerEventType_CONTAINER_DELETED_EVENT, c.ID(), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO item: should we emit a metric for events that were failed to generate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a great idea! Not per container but just having a counter collecting failures.
| pb "k8s.io/cri-api/pkg/apis/runtime/v1" | ||
| ) | ||
|
|
||
| func (s *service) GetContainerEvents(req *pb.GetEventsRequest, ces pb.RuntimeService_GetContainerEventsServer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a nit about code structure: we have a convention of having functions in this folder this to call directly into a function in the server (see the other rpcs for examples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6404 +/- ##
==========================================
- Coverage 42.74% 42.51% -0.23%
==========================================
Files 126 126
Lines 14834 14942 +108
==========================================
+ Hits 6341 6353 +12
- Misses 7806 7899 +93
- Partials 687 690 +3 |
docs/crio.8.md
Outdated
|
|
||
| **--enable-tracing**: Enable OpenTelemetry trace data exporting. | ||
|
|
||
| **--evented-pleg**: Enable CRI-O to generate the container pod-level events in order to optimize the performance of the Pod Lifecycle Event Generator (PLEG) module in Kubelet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of --emitted-events. @haircommander thoughts on the name?
Seems weird to tie this directly with the pleg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you mean, but I'm not as worried about it, as kubelet is the only client we care about. I think it would be most clear as --emit-pod-and-container-events but that's too verbose. I think pleg describes what they're for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the --enable-… pattern for features to be used. Let me throw in a third opinion, how about: --enable-pod-events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like --enable-pod-events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
something worth noting is, as it is now, all clients share the same events source, so if we add a |
2e19aa8 to
3434af9
Compare
server/server.go
Outdated
| PodSandboxStatus: sandboxStatuses, ContainersStatuses: containerStatuses}: | ||
| log.Debugf(ctx, "Container event %s generated for %s", eventType, container.ID()) | ||
| default: | ||
| log.Errorf(ctx, "generateCRIEvent: failed to generate event %s for container %s", eventType, container.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/generateCRIEvent/GenerateCRIEvent to avoid the log-capitalization githubaction test failure.
Likewise for other places too
|
@sairameshv there's a bunch of lint, shfmt, vendor and docs issues to address before this can merge (cc @harche ) |
3434af9 to
7e85fec
Compare
7e85fec to
8618319
Compare
|
/retest |
Signed-off-by: Sai Ramesh Vanka <[email protected]>
1. Emit CRI events during the sandbox and container lifecycle 2. Enable a config option `enable-pod-events` to switch on/off the Evented PLEG 3. Add relevant test cases to validate the functionality of the evented pleg Signed-off-by: Sai Ramesh Vanka <[email protected]> Co-authored-by: Harshal Patil <[email protected]>
8618319 to
1b8b284
Compare
|
/retest |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-rhel-integration DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test ci-cgroupv2-e2e-crun |
|
/test kata-containers |
|
@sairameshv: The following test failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/override ci/prow/ci-cgroupv2-e2e-crun |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/prow/ci-cgroupv2-e2e-crun, ci/prow/ci-e2e-conmonrs, ci/prow/ci-rhel-integration DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, sairameshv The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/kata-jenkins |
|
@haircommander: Overrode contexts on behalf of haircommander: ci/kata-jenkins DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR implements the following enhancement related to the Evented PLEG
Which issue(s) this PR fixes:
Fixes # kubernetes/enhancements#3386
Special notes for your reviewer:
Does this PR introduce a user-facing change?
This PR contains a new config option to enable "evented-pleg" while starting/running the crio binary. The same has been updated wherever required.