-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add alerts when VM is in unhealthy status #15227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @sradco - I've reviewed your changes - here's some feedback:
- The first alert’s PromQL only enforces 10m of VM unhealthy state without checking for 5m of missing VMI as described – consider adding a time-based absence condition on kubevirt_vmi_info or adjusting the
Forperiod accordingly. - Both alert expressions are quite complex; factoring repeated sub-expressions into recording rules could improve readability and maintainability.
- The test YAML introduces a lot of repeated series definitions—consider abstracting or reusing common blocks to reduce duplication and keep the tests concise.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The first alert’s PromQL only enforces 10m of VM unhealthy state without checking for 5m of missing VMI as described – consider adding a time-based absence condition on kubevirt_vmi_info or adjusting the `For` period accordingly.
- Both alert expressions are quite complex; factoring repeated sub-expressions into recording rules could improve readability and maintainability.
- The test YAML introduces a lot of repeated series definitions—consider abstracting or reusing common blocks to reduce duplication and keep the tests concise.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
8bf9a0d to
59731f7
Compare
59731f7 to
abb0edb
Compare
53f66ab to
85c96d5
Compare
PR adds 2 alerts: 1. VM that is in error status or in transitional status for more than 10 minutes, but doesnt have a VMI for more than 5 minutes 2. VM that is in error status or in transitional status and has a VMI for more than 5 minutes. Signed-off-by: Shirly Radco <[email protected]>
85c96d5 to
3e4d379
Compare
|
@machadovilaca @avlitman please review this PR |
|
/retest-required |
|
@avlitman @machadovilaca please review again |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: avlitman The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
Required labels detected, running phase 2 presubmits: |
|
/retest-required |
1 similar comment
|
/retest-required |
|
✋🧢 /hold Dear @sradco 🔎 Please check that the changes you committed are fine and that there are no infrastructure issues present! DetailsChecklist:
💬 How we calculate the number of retests: The number of retest comments are the number of 👌 After all issues have been resolved, you can remove the hold on this pull request by commenting 🙇 Thank you, your friendly referee automation, on behalf of the @sig-buildsystem and the KubeVirt community! |
|
@enp0s3 Hi, This pr has lgtm and approve, how can we progress it? |
|
/unhold |
|
/retest-required |
|
✋🧢 /hold Dear @sradco 🔎 Please check that the changes you committed are fine and that there are no infrastructure issues present! DetailsChecklist:
💬 How we calculate the number of retests: The number of retest comments are the number of 👌 After all issues have been resolved, you can remove the hold on this pull request by commenting 🙇 Thank you, your friendly referee automation, on behalf of the @sig-buildsystem and the KubeVirt community! |
|
@brianmcarey @dhiller Can you please check why this is blocked? |
|
/unhold |
What this PR does
Before this PR:
No indication in alerts when a VM is stuck in an unhealthy status.
After this PR:
PR adds 2 alerts:
Signed-off-by: Shirly Radco [email protected]
References
Why we need it and why it was done in this way
The following tradeoffs were made:
The following alternatives were considered:
Links to places where the discussion took place:
Special notes for your reviewer
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note