-
Notifications
You must be signed in to change notification settings - Fork 95
fix(controller): Ignore 404/409 error responses #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): Ignore 404/409 error responses #376
Conversation
1c24b53
to
d672fc9
Compare
Codecov Report
@@ Coverage Diff @@
## master #376 +/- ##
==========================================
+ Coverage 46.52% 49.72% +3.20%
==========================================
Files 49 55 +6
Lines 3196 3306 +110
==========================================
+ Hits 1487 1644 +157
+ Misses 1461 1420 -41
+ Partials 248 242 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
hi @mjsmith1028 , will take a look tommorow, thanks for the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and it seems that deletion of parent object is handled, but in a bit different manner. If you encoutnered this issue, could you post logs // stack trace ?
Hi @grzesuav, I still haven't had time to fully dig into this. But, I played around a little today and I think I see what is happening at least in one case.
The reason I started seeing this is I bumped up the go-client-qps/burst to 150/300 and num workers to 50 so we are processing workloads very fast. In this case, the object is deleted so fast that it can trigger this scenario occasionally. I added a
I'll try to get more concrete details on this. But, at least in my local branch I have added checks for 404 on deletes, 409 on updates, and 403 on creates. I'll update the PR with some more details later this week. |
@mjsmith1028 yes, it can be related to fact that client-go uses |
to be more exact, |
d672fc9
to
1583cab
Compare
7b3f255
to
10abb3a
Compare
Signed-off-by: Mike Smith <[email protected]>
10abb3a
to
a9968b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just doubts in those two places if we can safely swallow the error, could you comment ?
Hi @grzesuav , I added some comments on why I think it should be safe. One thing to note is that my metacontroller setup has 50 workers, 150 client-go qps and 300 client-go-burst. So, it is processing sync requests pretty fast and the cluster is pretty active with other work outside of my controller which may put some stress on K8s control plane causing occasional slowness in K8s API responses. I think that is why these edge cases showed up. Please let me know if the logic seems correct in my comments. Thanks! |
🎉 This PR is included in version 2.0.15 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
The goal of this PR is to avoid returning an error on the following conditions so that we don't reqeueue and retry K8s API calls that will fail.