-
Notifications
You must be signed in to change notification settings - Fork 1.1k
reduce flakiness of UpdateContainerStatus #3867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f0e2276 to
135cfc1
Compare
saschagrunert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit, otherwise LGTM
internal/oci/container.go
Outdated
| pid := c.state.Pid | ||
| process, err := os.FindProcess(pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pid := c.state.Pid | |
| process, err := os.FindProcess(pid) | |
| process, err := os.FindProcess(c.state.Pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the function does not hold the lock, I am worried about the pid changing from underneath it. saving the value causes this function to run atomically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though, I see now we don't even use this variable anywhere else. taking the change as suggested...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: haircommander, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
we have seen cases where $runtime state calls fail supriously, but succeed later this is not great, though, we shouldn't incorrectly label pods if this happens. We now retry state calls up to three times if we determine the container is still running (by calling kill on its pid) Signed-off-by: Peter Hunt <[email protected]>
having exec sync update state each time is a bit excessive. In addition to exec'ing extra, it causes potential for runc state to flake, causing the container to go down. instead, we should just check if the pid is running, and proceed if so Signed-off-by: Peter Hunt <[email protected]>
135cfc1 to
eebb530
Compare
Codecov Report
@@ Coverage Diff @@
## master #3867 +/- ##
==========================================
- Coverage 40.54% 40.51% -0.03%
==========================================
Files 109 109
Lines 8798 8819 +21
==========================================
+ Hits 3567 3573 +6
- Misses 4913 4927 +14
- Partials 318 319 +1 |
mrunalp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also store the process start time in container state and use that for matching the process to rule
out any pid reuse scenarios.
I'd like to do that as a follow up, to use the implementation for all accesses of c.state.Pid |
|
/retest |
|
/hold the approach is changing |
|
closing in favor of #3868 |
What type of PR is this?
/kind bug
What this PR does / why we need it:
this PR does two things:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?