-
Notifications
You must be signed in to change notification settings - Fork 41.4k
Deflake TestDevicePluginReRegistrationProbeMode: Devices of previous registered should be removed #96048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflake TestDevicePluginReRegistrationProbeMode: Devices of previous registered should be removed #96048
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than a hard wait that makes every test that calls this take five seconds longer, can we PollImmediate on the dial attempt until it succeeds (with a backstop timeout that returns the dial error encountered)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review. Fixed!
79c0e7b
to
44d4b52
Compare
eac5682
to
a6d906e
Compare
@sjenning and I collectively agreed on a 1 second interval... Let me know if you want anything different. |
/approve |
/triage accepted |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rphillips, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1s is fine, left a comment about how to handle the dial failure case |
a6d906e
to
e8897dc
Compare
oops... |
There is a race when the server is coming up and the subsequent dial on the socket. Fix the race with a PollImmediate retry.
e8897dc
to
4fdfbc7
Compare
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
@liggitt could you re-review... thank you! |
/lgtm |
What type of PR is this?
/kind flake
What this PR does / why we need it:
There is a race when the GRPC server is coming up and the subsequent dial call for the unix socket. This fix waits within the stub with a retry to allow the server to start and the dial call to succeed.
Which issue(s) this PR fixes:
Fixes #94547
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
/cc @sjenning @liggitt