-
Notifications
You must be signed in to change notification settings - Fork 41
fix: keep the node status when a running node is bypassed #517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
下周来讨论一下 selfmon 的设计问题, 这里你先仔细测试我们的场景, bypass=true && available=true 的时候对上面容器的影响, etcdctl watch 一下 status key 看看 put 的数据对不对. |
jschwinger233
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
十五字
|
在BindStatus里ETCD产生PUT的条件分支都打了log,以后debug起来方便些。 同时修复了之前BindStatus并没有处理TTL变更的情况。 举个例子,在value没变的情况下,以前ttl=10,新来了个请求ttl=5,最终的ttl还是为10。类似地,ttl=10没法变成ttl=0,ttl=0的也没法变成ttl=10。这个影响面不大,只在手动调用set-status的时候才会出问题,不过还是顺手给改了。 |
|
还有一个遗留问题是BindStatusWithoutTTL里并没有判断entity是否存在,不清楚加上之后会不会有奇怪的后果,所以我没敢加... |
a04f905 to
b1d9043
Compare
b1d9043 to
a39d2d0
Compare
a39d2d0 to
ec5daf4
Compare
场景:一个node被设置了bypass,但是我们只希望它不参与后续的部署,node上面的workloads还可以正常提供服务。
问题:agent启动后更新了node status,selfmon收到了变更信息,调用SetNode,打算把这个node设置成UP。但是因为它被bypass了,所以触发了n.IsDown(),node status又被删除。selfmon再次收到变更信息,然后调用SetNode把node设置为DOWN,并且将上面所有workload标记为not running。
修复:修改一下判断条件,只有在非bypass且down的情况才会删除node status。