Report metrics from custom-plugin-monitor#315
Conversation
|
Hi @xueweiz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/assign @wangzhen127 |
|
/cc @andyxning |
|
/ok-to-test |
|
My bad. I'll fix the test. |
|
@Random-Liu Hi Lantao, sorry for the delay. I just fixed the tests. Could you help take a look? Thanks! |
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Generate counter and gauge at the end of the function based on the final events and condition.
The logic is too complex, the code makes it even harder to maintain.
There was a problem hiding this comment.
And let's do the same for system log monitor.
There was a problem hiding this comment.
Ah this is a very good idea! Thanks for noticing that1
Done.
| event.Reason, err) | ||
| } | ||
| } | ||
| for _, condition := range metricChangingConditions { |
There was a problem hiding this comment.
Can't we just go through condition list, and set problem gauge for all conditions correspondingly?
That seems more reliable, basically a periodic gauge sync. Will that cause any performance issue?
There was a problem hiding this comment.
Oh just realized that in custom-plugin-monitor, the generateStatus() is only called periodically. Then I think it's fine. Changing the code.
(in system-log-monitor, the generateStatus() is called for each matched Rule, which is much more frequent. So I'd prefer to keep the changedConditions logic in there.
There was a problem hiding this comment.
SG. The logic in system-log-monitor is simple, so I'm also fine with adding smarter logic there. :)
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Random-Liu, xueweiz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add a new field
metricsReporting(trueby default) in thecustom-plugin-monitorconfig file.By setting the
metricsReportingfield totrue,custom-plugin-monitorwill report metrics when temporary/permanent problems happen.The behavior was specified in the NPD metrics design doc here.
For example, when I make the custom plugin script to report problem (by manually editing the script to force it return error), I will see:
This PR maintains backward compatibility on existing
custom-plugin-monitorconfigs.This PR is part of #284:
Most of the heavy lifting has been done in #300 .
I tried to update the README for custom-plugin-monitor, but seems that's not necessary:
rules,conditions) were omitted.Since the
metricsReportingflag is now a common config flag used by bothsystem-log-monitorandcustom-plugin-monitor, I don't think it's necessary to add it to the custom-plugin-monitor design doc.