-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
I recently started running into an issue where my controller, which was watching CRDs, would fail with a 410 from its watch on the list function.
After digging into it a bit, I noticed that the version I was getting back at the beginning was not in order. Then, after some time, it would fail with a 410 error code.
After poking around a bit, it sound like out of order events are possibly expected?
In the following example, I have three instances of the resource, let's call them "A", "B" and "C". In order, their resource versions are: 835745, 797550 and 746039. This is the order they are provided by the list. Of interest, I think, is the fact that they appear in alphabetical order, despite not being created in that order.
Here are some logs from my application highlighting the issue. Here, I have "solved" the problem by extracting the "new" resource version from the error code, then updating the watcher's internal resource version to that value.
2018-08-23 18:55:18,166 controller Starting watch with resource version:
2018-08-23 18:55:18,179 controller event ADDED for serverlists
2018-08-23 18:55:18,180 controller new version for serverlists: 835745
2018-08-23 18:55:18,181 controller event ADDED for serverlists
2018-08-23 18:55:18,181 controller new version for serverlists: 797550
2018-08-23 18:55:18,227 controller event ADDED for serverlists
2018-08-23 18:55:18,227 controller new version for serverlists: 746039
2018-08-23 18:57:03,055 controller event ERROR for serverlists
2018-08-23 18:57:03,055 controller Updating resource version to 826302 due to 'too old' error: {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 746039 (826302)', 'reason': 'Gone', 'code': 410}.
2018-08-23 18:57:03,055 controller Was: 746039
2018-08-23 18:57:03,070 controller event MODIFIED for serverlists
2018-08-23 18:57:03,070 controller new version for serverlists: 826303
2018-08-23 18:57:03,075 controller event MODIFIED for serverlists
2018-08-23 18:57:03,075 controller new version for serverlists: 826504
2018-08-23 18:57:03,078 controller event MODIFIED for serverlists
2018-08-23 18:57:03,079 controller new version for serverlists: 826888
2018-08-23 18:57:03,085 controller event MODIFIED for serverlists
2018-08-23 18:57:03,085 controller new version for serverlists: 827030
2018-08-23 18:57:03,089 controller event MODIFIED for serverlists
2018-08-23 18:57:03,090 controller new version for serverlists: 827214
2018-08-23 18:57:03,098 controller event MODIFIED for serverlists
2018-08-23 18:57:03,098 controller new version for serverlists: 827577
2018-08-23 18:57:03,102 controller event MODIFIED for serverlists
2018-08-23 18:57:03,102 controller new version for serverlists: 833563
2018-08-23 18:57:03,105 controller event MODIFIED for serverlists
2018-08-23 18:57:03,105 controller new version for serverlists: 833579
2018-08-23 18:57:03,118 controller event MODIFIED for serverlists
2018-08-23 18:57:03,118 controller new version for serverlists: 833745
2018-08-23 18:57:03,120 controller event MODIFIED for serverlists
2018-08-23 18:57:03,120 controller new version for serverlists: 835329
2018-08-23 18:57:03,121 controller event MODIFIED for serverlists
2018-08-23 18:57:03,122 controller new version for serverlists: 835474
2018-08-23 18:57:03,124 controller event MODIFIED for serverlists
2018-08-23 18:57:03,126 controller new version for serverlists: 835587
2018-08-23 18:57:03,131 controller event MODIFIED for serverlists
2018-08-23 18:57:03,132 controller new version for serverlists: 835679
2018-08-23 18:57:03,136 controller event MODIFIED for serverlists
2018-08-23 18:57:03,137 controller new version for serverlists: 835745
kubernetes-client/python-base#64 recently fixed the problem of the CRDs not updating the internal resourceVersion. However, should it be doing taking the max of the old and new resourceVersion to ensure things don't get messed up?
Ultimately there are a few things that are sub-par about the current state of affairs:
- I don't like having to mess around with the internals of the watcher. It should handle details like resources being out of order internally. Without messing around with its resource version, I get stuck in an infinite loop of errors.
- The events that come "later" are a replaying of all the edits I have made to various of the resources that the API server still knows about. That is kinda of nasty, since really I just want to see the most up to date version + any changes from once I've started the watch.
Regarding 1, I think that's definitely a bug in the watcher. Regarding # 2, is this an unreasonable expectation -- that is, should I be handling cases like this in my application?
I'm running my application on Ubuntu 18.04.
Python version: 3.6.5
python kubernetes client version: 7.0.0
Kubernetes server version: 1.11.1
Thanks,
Kyle